Semi-supervised Sentiment Analysis for Sentence Classification

University essay from Uppsala universitet/Institutionen för lingvistik och filologi

Abstract: In our work, we deploy semi-supervised learning methods to perform Sentiment Analysis on a corpus of sentences, meant to be labeled as either happy, neutral, sad, or angry. Sentence-BERT is used to obtain high-dimensional embeddings for the sentences in the training and testing sets, on which three classification methods are applied: the K-Nearest Neighbors classifier (KNN), Label Propagation, and Label Spreading. The latter two are graph-based classifying methods that are expected to provide better predictions compared to the supervised KNN, due to their ability to propagate labels of known data to similar (and spatially close) unknown data. In our study, we experiment with multiple combinations of labeled and unlabeled data, various hyperparameters, and 4 distinct classes of data, and we perform both binary and fine-grained classification tasks. A custom Radial Basis Function kernel is created for this study, in which Euclidean distance is replaced with Cosine Similarity, in order to correspond to the metric used in SentenceBERT. It is found that, for 2 out of 4 tasks, and more specifically 3-class and 2-class classification, the two graph-based algorithms outperform the chosen baseline, although the scores are not significantly higher. The supervised KNN classifier performs better for the second 3-class classification, as well as the 4-class classification, especially when using embeddings of lower dimensionality. The conclusions drawn from the results are, firstly, that the dataset used is most likely not quite suitable for graph creation, and, secondly, that larger volumes of labeled data should be used for further interpretation.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)