Recommender System Using Online Latent Dirichlet Allocation And Wikipedia

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Simon Leijon; [2022]

Keywords: ;

Abstract: With the vast amount of natural language data that is widely availabletoday there is an increased demand in being able to process, analyzeand explore large corpora of texts efficiently. One method to explorethese corpora is by creating a recommender system based on the texts.The most common recommender system to this end is what is knownas Google. Two of the most important aspects of creating a recom-mender system is to reduce dimensionality of the texts that make up thedatabase of the system and to use an appropriate method of measuringsimilarity between the concentrated representations of the texts. Thispaper suggests a recommender system that reduces dimensionality bymeans of Online Latent Dirichlet Allocation (Online LDA) and mea-suring similarity by utilizing JS (Jensen-Shannon) distance. The finalsystem clusters the entirety of the Wikipedia database and makes rec-ommendations given an input Wikipedia article from all of Wikipedia.Results show that using Online LDA to reduce dimensionality, clusterand recommend Wikipedia articles works well. Furthermore, the re-sults show that using a LDA model with the right amount of topics is animportant design choice for the performance of the system while otherhyperparameters such as the Dirichlet parameters can be less significant.One avenue for further improvement could be filtering articles based ontheir length or other heuristics.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)