Modeling topics and semantic similarity with data from You Tube community discussions

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Jonas Olsson; [2021]

Keywords: ;

Abstract: Climate change is generally considered one of the most prominent issues of the 21stcentury and resources from many fields of science are used on researching the topicand finding solutions. Computer science and social science are two fields that havecombined over the last ten years to study the topic. Furthermore, various machinelearning methods are also becoming more prevalent in fields outside computerscience and are often employed in social science.In this thesis, the junction of network science, natural language processing and climatechange is explored. A bespoke dataset has been collected through the YouTube API,consisting of ~300,000 comment interactions made by over ~150,000 users on thetopic of climate change between the years 2015-2020. A directed multigraph wherenodes are users and edges are comment interactions was constructed from the data.The topology of the network was further analysed with the popular Leidencommunity detection algorithm to find subsets of users in the network that havemany intra-edges inside the community but few inter-edges to other communities.This subset of users and the comments they generated are then used as the basis forfinding differences in topics between the largest communities. This analysis is based onboth an unsupervised approach via structural topic modeling. The results from thisanalysis showed that while the identified topics are highly related to climate change, itis difficult to point to useful differences in topics between communities. As theunsupervised technique was lackluster, a supervised approach was also used that lookat the semantic similarity between communities. The approach taken uses apredefined vocabulary of climate change terms as a reference and the communitycomments are ranked against it. The results show that this approach can be useful torank how well the comments made by a set of users stay on the topic.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)