Exploration of using Twitter data to predict Swedish political opinion polls with neural networks

University essay from Lunds universitet/Matematisk statistik

Abstract: This thesis aims to explore the possibility of using deep learning techniques to mine opinions on Twitter, with the objective to predict the political opinion distribution in Sweden. Different methods of gathering and annotating training data are evaluated to achieve accurate and reliable predictions. The models are quite successful at predicting test data, achieving F1-scores in the range of 70 % to 85 %. Some party divisions are found more difficult to classify than others. It is hypothesized and validated that the context of the tweets can aid in the classification process. In practice, this is carried out by exploiting the structure of the tweet thread structure. When the models were used to predict the general political discussion on Twitter, the results show that the predictions are subject to large variance. Different executions can yield wildly different results and, thus, are determined not reliable enough to use as input to regression when trying to find the relation between the predicted opinion distributions and the opinion polls. The underlying issue causing the large variance is investigated and results suggest that the training data is too small, or of too low quality, which causes the model to overfit and makes patterns hard to recognize. A lexicon-based classification is carried out as a supplement, but no significant relation can be stated between the predicted opinion and the opinion polls. Furthermore, it is discussed that the issue of insufficient results might lie within the method itself. The Swedish political discussion might not be polarized enough to make good classifications or Twitter political discussion might not be representative of the general opinion at all.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)