TOPIC MODELING FOR ANALYSIS OF PUBLIC DISCOURSE -Enriching topic modeling with linguistic information to analyze Swedish housing policies

University essay from Göteborgs universitet/Institutionen för filosofi, lingvistikoch vetenskapsteori

Abstract: This work investigates how the method of topic modeling can be applied to investigate the public discourse of Swedish housing policies. The data used to represent this discourse is both from the Swedishparliament, the Riksdag, and Swedish newstexts. The lack of housing and current housing crisis in Swedenmakes this a relevant area to study. Topic modeling is an unsupervised probabilistic method for finding topics in large collections of data. This is a popular method for examining public discourse, howeverthere is a lack of including linguistic information in the preprocessing steps of it. Therefore, thiswork also investigates what effect linguistically informed preprocessing has on topic modeling.Three types of linguistic information are selected and investigated. These are part of speech, dependencyrelations and lemmatization. Based on these, filters are created for the data. The filters are applied to atest set (a subset of the original data), and a topic model is trained on each filtered version of this testset. The resulting topics from each model are evaluated by both humans and the computational methods perplexity and semantic coherence, and the results from the respective evaluation methods are compared.The semantic coherence named cv is found to have a higher correlation with human ratings than the npmicoherence. Perplexity is found to not correlate well with human ratings.Filtering the data based on part of speech is found to most improve the topic quality. Non-lemmatizedtopics are found to be rated higher than lemmatized topics. Topics from the filters based on dependencyrelations are found to have low ratings.Based on the human ratings, an optimum model for respective data set is chosen. The selected topicmodels are applied to the data, and the results are used for to exemplify how one can use them for analysis.Topic modeling is found to be a suitable method for the intended analysis.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)