Software Issue Time Estimation With Natural Language Processing and Machine Learning

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Time estimation for software issues is crucial to planning projects. Developers and experts have for many decades tried to estimate time requirements for issues as accurately as possible. The methods that are used today are often time-consuming and complex. This thesis investigates if the time estimation process can be done with natural language processing and machine learning. Three different word embeddings were used to represent the free text description, bag-of-words with tf-idf weighing, word2Vec and fastText. The different word embeddings were then fed into two types of machine learning approaches, classification and regression. The classification was binary and can be formulated as will the issue take more than three hours?. The goal of the regression problem was to predict an actual value for the time that the issue would take to complete. The classification models performance were measured with an F1-score, and the regression model was measured with an R2-score. The best F1- score for classification was 0.748 and was achieved with the word2Vec word embedding and an SVM classifier. The best score for the regression analysis was achieved with the bag-of-words word embedding, which achieved an R2- score of 0.380. Further evaluation of the results and a comparison to actual estimates made by the company show that humans only performs slightly better than the models given the binary classification defined above. The F1-score of the employees was 0.792, a difference of just 0.044 from the best F1-score made by the models. This thesis concludes that the models are not good enough to use in a professional setting. An F1-score of 0.748 could be used in other settings, but the classification question in this problem is too broad to be used for a real project. The results for the regression is also too low to be of any valuable use. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)