Online inference of topics : Implementation of the topic model Latent Dirichlet Allocation using an online variational bayes inference algorithm to sort news articles
Abstract: The client of the project has problems with complex queries and noisewhen querying their stream of ﬁve million news articles per day. Thisresults in much manual work when sorting and pruning the search result of their query. Instead of using direct text matching, the approachof the project was to use a topic model to describe articles in terms oftopics covered and to use this new information to sort the articles. An online version of the topic model Latent Dirichlet Allocationwas implemented using online variational Bayes inference to handlestreamed data. Using 100 dimensions, topics such as sports and politics emerged during training on a 1.7 million articles big simulatedstream. These topics were used to sort articles based on context. Theimplementation was found accurate enough to be useful for the client aswell as fast and stable enough to be a feasible solution to the problem.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)