Detecting Dissimilarity in Discourse on Social Media

University essay from Uppsala universitet/Matematiska institutionen

Author: Mattias Mineur; [2022]

Keywords: Natural Language Processing;

Abstract: A lot of interaction between humans take place on social media. Groups and communities are sometimes formed both with and without intention. These interactions generate a large quantity of text data. This project aims to detect dissimilarity in discourse between communities on social media using a distributed approach. A data set of tweets was used to test and evaluate the method. Tweets produced from two communities were extracted from the data set. Two Natural Language Processing techniques were used to vectorise the tweets for each community. Namely LIWC, dictionary based on knowledge acquired from professionals in linguistics and psychology, and BERT, an embedding model which uses machine learning to present words and sentences as a vector of decimal numbers. These vectors were then used as representations of the text to measure the similarity of discourse between the communities. Both distance and similarity were measured. It was concluded that none of the combinations of measure or vectorisation method that was tried could detect a dissimilarity in discourse on social media for the sample data set. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)