Tracking Online Trend Locations using a Geo-Aware Topic Model

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: In automatically categorizing massive corpora of text, various topic models have been applied with good success. Much work has been done on applying machine learning and NLP methods on Internet media, such as Twitter, to survey online discussion. However, less focus has been placed on studying how geographical locations discussed in online fora evolve over time, and even less on associating such location trends with topics. Can online discussions be geographically tracked over time? This thesis attempts to answer this question by evaluating a geo-aware Streaming Latent Dirichlet Allocation (SLDA) implementation which can recognize location terms in text. We show how the model can predict time-dependent locations of the 2016 American primaries by automatic discovery of election topics in various Twitter corpora, and deduce locations over time.

