Large-scale Exploratory Text Visualisation

University essay from Linköpings universitet/Medie- och Informationsteknik; Linköpings universitet/Tekniska fakulteten

Abstract: The amount of available text data has increased rapidly in the latest years, making it difficult for an everyday user to find relevant information. To solve this, NLP and visualisation methods have been developed for extracting valuable information from text and presenting it to the user. The aim of this project is to implement a proof-of-concept visualisation prototype for exploring a large amount of Swedish news articles with related metadata and investigate the temporal and relational aspects of the data. The project was divided into three major parts. In the first part, sketches of the visualisation were designed and evaluated through user tests. The second part consisted of designing and implementing a NLP pipeline, using BERTopic, where both Dynamic Topic Modeling (DTM) and Hierarchical Topic Modeling (HTM) were used. Some parameters of the pipeline were evaluated using evaluation metrics and through visual inspection, for instance a Swedish sentence transformer. The final part consisted of implementing and evaluating the visualisation prototype. The project resulted in a web-based visualisation, presenting the NLP results, with two different views: a top 10 topics view and a hierarchical view containing all topics. The prototype has various features, e.g., clicking and hovering for details-on-demand and options for changing and altering the view. The prototype was then evaluated through an internal case study and user tests. For the user tests, there were two groups of participants: people working in the journalism field and people working closely to the NLP field. Both groups thought there was more value in viewing the top 10 topics view than the hierarchical view. Furthermore, the quality of the top 10 topics view was considered higher overall compared to the hierarchical view. In the end, the result of this project is a proof-of-concept visualisation prototype presenting topics of Swedish news articles, over time and in relation to each other. A few possible improvement possibilities include improving the hierarchical relations between the topics and the run time of the topic model and prototype. Also, the prototype may be further improved with additional features, e.g., real-time data, a map, the full text of the articles and a search function.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)