Sentiment Analysis on Stack Overflow with Respect to Document Type and Programming Language

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Li Ling; Simon Larsén; [2018]

Keywords: ;

Abstract: The sentiment expressed in software engineering (SE) texts has been shown to affect both the productivity and the quality of collaborative work. This is one reason for why sentiment analysis on SE texts has gained attention in research in recent yerars. A large and open resource of SE texts is Stack Overflow (SO). SO is the largest question and answer (Q&A) web site in the Stack Exchange network, and has been the subject for several sentiment analysis studies. It has lately been established that sentiment analyzers trained on social media perform poorly on SE texts, which could challenge the credibility of some of these studies. The Senti4SD sentiment polarity classifier was developed and trained on SO documents to address some of these issues. In this study, random samples of SO documents are drawn and then classified with Senti4SD. The classification into positive, negative and neutral sentiment is used to model the sentiment probability distributions of different document types on SO as a whole, as well as for the eight most popular programming languages. The results indicate that the sentiment of a document is correlated to both the document type and the associated programming language. Among the three sentiment classes, neutral sentiment dominates throughout all SO documents. However, the reliability of the results are reduced by concerns regarding the accuracy of Senti4SD, vaguely specified pre-processing steps and possibly varying classifier bias in different subdomains. In conclusion, further research on sentiment classifiers for SE is needed before any detailed comparative studies of this kind can yield reliable results.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)