Smirking or Smiling Smileys? : Evaluating the Use of Emoticons to Determine Sentimental Mood

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Elias Lousseief; Tobias Hindersson; [2015]

Keywords: ;

Abstract: Machine Learning classifiers are commonly used for the purpose of Sentiment Analysis. These classifiers use annotated training data from which they learn to predict the sentiment of texts, for example whether a text conveys a positive or a negative sentiment. In this thesis we compare the performance of two sources of training data for the purposes of sentiment classification on Twitter: (i) tweets annotated by hand of a fixed quantity (about 2000 tweets) and (ii) tweets annotated automatically by an emoticon heuristic of increasing quantity (from 2000 tweets to 1.6 million tweets). The performance of these training sets are evaluated by training commonly used classifiers (Naive Bayes, Support Vector Machines and Maximum Entropy) and comparing the classification accuracy of the different data sets on a test set annotated by hand. These tests are made with varying use of n-gram models (unigrams, bigrams, and a combination of both) and the varying use of a stop word filter. We show that while the hand-annotated training set performs well in equally sized training sets, the automatically annotated training set exceeds the accuracy of the hand-annotated training set in all test setups but one when 1.6 million automatically annotated tweets are used for training.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)