University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: This paper shows the investigation of the viability of finding lines of code (LOC) contributing to technical debt (TD) using machine learning (ML), by trying to imitate the static code analysis tool SonarQube. This is approached by letting industry professionals choose the SonarQube rules, followed by training different classifiers with the help of CCFlex (a tool for training classifiers with lines of code), while using SonarQube as an oracle (a source of training sample data) which selects the faulty lines of code. The codebase consisted of a couple of proprietary software solutions, provided by Diadrom (a Swedish software consultancy company), along with open source software, such as ColourSharp [9]. The different classifiers were then analyzed for accuracy – compared against the oracle (SonarQube). The results of this paper demonstrate that using machine learning algorithms to detect LOC contributing to technical debt is a promising path that should be researched further. Within our chosen training parameters, the results show that increasing the percentage of LOC marked by the oracle brought increasingly better recall [7] values. The values increased more consistently than they did by just increasing the amount of LOC used for training. Furthermore, even though the precision is generally low within our parameters (meaning that the number of false positives is high), our classifiers still predicted many of the actually faulty LOC. These results are very promising when all of the training parameters are kept in mind. They show a lot of promise and open the gates to further exploration of this topic in the future.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)