Combining Lexicon- and Learning-based Approaches for Improved Performance and Convenience in Sentiment Classification

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: Sentiment classification is the process of categorizing data into categories based on its polarity with a wide array of applications across several industries. This report examines a combination of two prominent approaches to sentiment classification using a lexicon of weighted words and machine learning respectively. These approaches are compared with the combined hybrid approach in order to give an account of their relative strengths and weaknesses. When run on a set of IMDb movie reviews the results indicate that the hybrid model performs better than the lexicon-based approach, in turn being outperformed by the learning-based approach. However, the gain in convenience brought on by eliminating the need for training data makes the hybrid model an appealing alternative to the other approaches with a slight trade-off in performance.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)