Interactive Topic Modeling for Source Code Analysis

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Patrik Ehrencrona Kjellin; [2017]

Keywords: ;

Abstract: Trying to make sense of large sets of data is becoming a task very central to computer science in general. Topic models, capable of uncovering the semantic themes pervading through large collections of documents, have seen a surge in popularity in recent years, with applications in a variety of domains. In this thesis, topic models are applied to source code repositories, specifically for the purpose of concept location - offering an overview of which features are contained within asystem, the relationships between such features, and their locality within the system. Topic models are high level statistical tools; their raw output is given in terms of probability distributions, suited neither for simple interpretation nor deep analysis.Interpreting an inferred model in an intuitive manner requires significant post process ingand tools suited for such purposes. Additionally, topic models rarely produce perfectly sensible and coherent topics without some level of supervision - some measure of human interaction is thus typically required for refining the output. Our objective is to simplify the process of topic modeling as it pertains to source code analysis, by addressing the afore mentioned issues. First, by implementing existing methods of semi-supervised topic modeling, offering users tools for iteratively refining an inferred model. Second, by tightly integrating topic modeling with high level visual representations of inferred models, capable of capturing the relationship between terms, documents and features related to a source code repository. We have implemented a fully working prototype of such a system. Through a survey, we have put the tool in the hands of users, thereby demonstrating the system to offer several perceived benefits from a user perspective - in terms of easily comprehending large-scale repositories and in terms of facilitating the process of topic modeling.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)