Mining Git Repositories : An introduction to repository mining

University essay from Linnéuniversitetet/Institutionen för datavetenskap (DV)

Abstract: When performing an analysis of the evolution of software quality and software metrics,there is a need to get access to as many versions of the source code as possible. There isa lack of research on how data or source code can be extracted from the source controlmanagement system Git. This thesis explores different possibilities to resolve thisproblem. Lately, there has been a boom in usage of the version control system Git. Githubalone hosts about 6,100,000 projects. Some well known projects and organizations thatuse Git are Linux, WordPress, and Facebook. Even with these figures and clients, thereare very few tools able to perform data extraction from Git repositories. A pre-studyshowed that there is a lack of standardization on how to share mining results, and themethods used to obtain them. There are several tools available for older version control systems, such as concurrentversions system (CVS), but few for Git. The examined repository mining applicationsfor Git are either poorly documented; or were built to be very purpose-specific to theproject for which they were designed. This thesis compiles a list of general issues encountered when using repositorymining as a tool for data gathering. A selection of existing repository mining tools wereevaluated towards a set of prerequisite criteria. The end result of this evaluation is thecreation of a new repository mining tool called Doris. This tool also includes a smallcode metrics analysis library to show how it can be extended.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)