Measuring Semantic Distances between Software Artifacts to Consolidate Issues from the Development and the Field

University essay from Lunds universitet/Institutionen för datavetenskap

Abstract: Identifying and keeping track of different structural representations of functionally overlapping issues is important in order to keep a well maintained issue management corpus, establishing efficient and organized response ability to develop and code software patches repairing these issues and defects. This is normally achieved by manual, time-costly reviewing-processes by special teams put up to this task. In this project we implement a tool using information retrieval technology, that intends to help these teams make better and faster qualitative assessments by providing quantitative indications in the form of similarity scores to other artifacts within a given dataset. This approach is inspired by a paper with a similar goal, namely detecting duplicate issue reports. That study found that 60 % of all marked duplicates could be found with the corresponding implementation of this approach. Achieving similar outcomes would contribute to improved and more effective reviewing-processes. We use the qualitative research method of informal interviews to define the semantic distance metric to implement. In the evaluation we mainly use a qualitative method to assess the accuracy of it, but verify our findings with a quantitative method. We also investigate the scalability of the tool with quantitative methods. As a result of the limited scope of this thesis work, the tool in its current state will have limited use in a live development environment. However, we conclude that this approach has a development potential and could bring fruitful findings in the issue management and issue maintenance field if developed further upon.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)