Comparing and contrasting the dissemination cascades of different topics in a social network : What are the lifetimes of different topics and how do they spread

University essay from Linköpings universitet/Institutionen för datavetenskap

Abstract: The web has granted everyone the opportunity to freely share large amounts of data. Individuals, corporations, and communities have made the web an important tool in their arsenal. These entities are spreading information online, but not all of it is constructive. Some spread misinformation to protect themselves or to attack other entities or ideas on the web. Checking the integrity of all the information online is a complex problem and an ethical solution would be equally complex. Multiple latent factors decide how a topic spreads and finding these factors is non-trivial. In this thesis, the patterns of different topics are compared with each other and the generalized patterns of fake, true, and mixed news, using Latent Dirichlet Allocation (LDA) topic models. We look at how the dissemination of topics can be compared through different metrics, and how these can be calculated through networks related to the data.  The analyzed data was collected using the Twitter API and news article scrapers. From this data, custom corpora were made through lemmatization and filtering unnecessary words and characters. The LDA models were made using these corpora, making it possible to extract the latent topics of the articles. By plotting the articles according to their most dominant topic, graphs for the popularity, size, and other distribution statistics could easily be drawn. From these graphs, the topics could be compared to each other and be categorized as fake, true, or mixed news by looking at their patterns and novelty. However, this brought up the question if it would be ethical to generalize topics in this way. Suppressing or censuring an article because it contains a lot of novel information might hide constructive novelties and violate freedom of speech. Finally, this thesis presents the means for further work in the future, which could involve collecting one large, continuous dataset for a fair and accurate comparison between topics.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)