Applying machine learning algorithms to multi-label text classification on GitHub issues

University essay from Högskolan i Halmstad

Author: Daniel Artmann; [2020]

Keywords: ;

Abstract: This report compares five machine learning algorithms in their ability to categorize code repositories. The focus of expanding software projects tend to shift from developing new software to the maintenance of the projects. Maintainers can label code repositories to organize the project, but this requires manual labor and time. This report will evaluate how machine learning algorithms perform in automatically classifying code repositories. Automatic classification can aid the management process by reducing both manual labor and human errors. GitHub provides online hosting for both private and public code repositories. In these repositories, users can open issues and assign labels to them, to keep track of bugs, enhancement, or requests. GitHub was used as a source for all data as it contains millions of open-source repositories. The focus was on the most popular labels from GitHub - both default labels and those defined by users. This report investigated the algorithms linear regression (LR), convolutional neural network (CNN), recurrent neural network (RNN), random forest (RF), and k-nearest-neighbor (KNN) - in multi-label text classification. The mentioned algorithms were implemented, trained, and tested with the Keras and Scikit-learn libraries. The training sets contained around 38 thousand rows and the test set around 12 thousand rows. Cross-validation was used to measure the performance of each algorithm. The metrics used to obtain the results were precision, recall, and F1-score. The algorithms were empirically tested on a different number of output labels. In order to maximize the F1-score, different designs of the neural networks and different natural language processing (NLP) methods were evaluated. This was done to see if the algorithms could be used to efficiently organize code repositories. CNN displayed the best scores in all experiments, but LR, RNN, and RF also showed some good results. LR, CNN, and RNN the had the highest F1-scores while RF could achieve a particularly high precision. KNN performed much worse than all other algorithms. The highest F1-score of 46.48% was achieved when using a non-sequential CNN model that used text input with stem words. The highest precision of 89.17% was achieved by RF. It was concluded that LR, CNN, RNN, and RF were all viable in classifying labels in software-related texts, among those found in GitHub issues. KNN wasn't found to be a viable candidate for this purpose.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)