Investigating the Practicality of Just-in-time Defect Prediction with Semi-supervised Learning on Industrial Commit Data

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Arsalan Syed; [2019]

Keywords: ;

Abstract: Some of the challenges faced with Just-in-time defect (JIT) prediction are achieving high performing models and obtaining large quantities of labelled data. There is also a limited number of studies that actually test the effectiveness of software defect prediction models in practice. In this thesis, the performance of five notable classification algorithms is investigated when applied to Just-in-time defect prediction. The utility of semi-supervised techniques such as the self-training algorithm is also explored. In order to test the viability of JIT defect prediction models in practice, a case study was set up at King, a game development company. Finally, to have a better understanding of how software developers at King identify and resolve bugs, a series of interviews were conducted. The investigation found that ensemble learning models such as XGBoost can outperform deep learning approaches such as Deeper. The self-training algorithm can be used to train on labelled and unlabelled data and still achieve similar performance to purely supervised approaches. The case study found that although a JIT defect prediction model based on random forests could achieve better performance than a random model, there is still a large discrepancy between the cross validation performance and the performance in practice. Finally, the interviews found that developers rely on inspecting builds, manual debugging and version control tools to identify bugs. Additionally, the interviews found that risky code tends to have high dependency on other code, is difficult to comprehend and does not follow proper coding practices.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)