Automated Duplicate Bug Reports Detection - An Experiment at Axis Communication AB

University essay from

Abstract: Context. Bug tracking systems play an important role in software maintenance. They allow users to submit bug reports. However, it has been observed that often a bug report submitted is a duplicate (when several users submit bug reports for the same problem, these reports are called duplicated issue reports) which results in considerable duplicate bug reports in a bug tracking system. Solutions for automating the process of duplicate bug reports detection can increase the productivity of software maintenance activities, as new incoming bug reports are directly compared with the existing bug reports to identify their similar bug reports, which is no need for the human to spend time reading, understanding, and searching. Although recently there has been considerable research on such solutions, there is still much room for improvement regarding accuracy and recall rate during the duplicate detection process. Besides, very few tools were evaluated in an industrial setting. Objectives. In this study, firstly, we aim to characterize automated duplicate bug report detection methods by exploring categories of all those methods, identifying proposed evaluation methods, specifying performance difference between the categories of methods. Then we propose a method leveraging recent advances on using semantic model – Doc2vec and present an overall framework - preprocessing, training a semantic model, calculating and ranking similarity, and retrieving duplicate bug reports of the proposed method. Finally, we apply an experiment to evaluate the performance of the proposed method and compare it with the selected best methods for the task of duplicate bug report detection Methods. To classify and analyze all existing research on automated duplicate bug report detection, we conducted a systematic mapping study. To evaluate our proposed method, we conducted an experiment with an identified number of bug reports on the internal bug report database of Axis Communication AB. Results. We classified automated duplicate bug report detection techniques into three categories - TOP N recommendation and ranking approach, binary classification approach, and decision-making approach. We found that recall-rate@k is the most common evaluation metric, and found that TOP N recommendation and ranking approach has the best performance among the identified approaches. The experimental results showed that the recall rate of our proposed approach is significantly higher than the combination of TF-IDF with Word2vec and the combination of TF-IDF with LSI. Our combination of Doc2vec and TF-IDF approach, has a recall rate@1-10 of 18.66%-42.88% in the TROUBLE data, which is an improvement of 1.63%-9.42% to the state-of-art. Conclusions. In this thesis, we identified and classified 44 automated duplicate bug report detection research papers by conducting a systematic mapping study. We provide an overview of the state-of-art, identifying evaluation metrics, investigating the scientific evidence in the reported results, and identifying needs for future research. We implemented a bug tracking system with a duplicate bug report detection module where a list of Top-N related bug reports (along with a numerical value representing a similar score) is created. After conducting the experiment, we found that our proposed approach - the combination of Doc2vec and TF-IDF approach produces the best recall rate.Keywords: Similar

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)