CLASSIFYING TWITTER BOTSA comparasion of methods for classifying whethert weets are written by humans or bots

University essay from Umeå universitet/Institutionen för datavetenskap

Author: Simon Västerbo; [2020]

Keywords: ;

Abstract: The use of bots to inuence public debate, spread disinformation and spam, creates a need for efficient methods for detecting the usage of bots. This study will compare different machine learning methods in the task of classifying if the author of a tweet is a bot or a human, using tweet level features. The study will look at how well the methods are able to generalize to unseen data. The methods included in the comparison are Random forest, AdaBoost and the Contextual LSTM model, to compare the models Area under the receiver operating  characteristic curve and Average precision will be used. In the study five datasets with tweets from bots are used, and one with tweets from humans. Two tests have been used to evaluate the performance. In the first test all but one bot set is used during training, where the models are evaluated on the excluded set. The second test the models was trained on the separate datasets, and evaluated on the separate datasets. In the results from the first test, the difference in performance of the models where very low. The same was true for Random forest and AdaBoost in the second test. The Contextual LSTM model achieved low performance in some combinations of data sets, in the second test. The low difference in performance between the models in the first test, and between Random forest and AdaBoost in the second test, makes it hard to determine what model is best at the task. When taking the time required to train and test using the models into consideration, Random forest seem to be the most suitable for the task.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)