Using Git Commit History for Change Prediction : An empirical study on the predictive potential of file-level logical coupling

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Author: Anders Hagward; [2015]

Keywords: git; scm; data mining; machine learning;

Abstract: In recent years, a new generation of distributed version control systems have taken the place of the aging centralized ones, with Git arguably being the most popular distributed system today. We investigate the potential of using Git commit history to predict files that are often changed together. Specifically, we look at the rename tracking heuristic found in Git, and the impact it has on prediction performance. By applying a data mining algorithm to five popular GitHub repositories we extract logical coupling – inter-file dependencies not necessarily detectable by static analysis – on which we base our change prediction. In addition, we examine if certain commits are better suited for change prediction than others; we define a bug fix commit as a commit that resolves one or more issues in the associated issue tracking system and compare their prediction performance. While our findings do not reveal any notable differences in prediction performance when disregarding rename information, they suggest that extracting coupling from, and predicting on, bug fix commits in particular could lead to predictions that are both more accurate and numerous.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)