Machine Learning to Uncover Correlations Between Software Code Changes and Test Results

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Author: Negar Fazeli; [2017-12-05]

Keywords: ;

Abstract: Statistics show that many large software companies, particularly those dealing with large-scale legacy systems, ultimately face an ever-growing code base. As the product grows, it becomes increasingly difficult to adequately test new changes in the code and maintain quality at a low cost without running a large number of test cases [1, 2, 3]. So a common problem with such products is that, thoroughly testing changes to the source code can become prohibitively time consuming and generally adhoc testing of the product by the designers and testers can potentially miss bugs and errors that can be detrimental to the quality of the end product. In this thesis we setout to address this problem and investigate the possibility of using machine learning to conduct more economical testing procedures. To this end, the goal of this thesis is to create a test execution model which uses supervised machine learning techniques to predict potential points of failure in a set of tests. This will help to reduce the number of test cases needed to be executed in order to test changes in code. We have to state that this approach for automatic testing and test selection has been thoroughly investigated before. The proposed state-of-the-art algorithms for this purpose, however, rely on detailed data that includes e.g., the amount of changes made in each and all code modules, their importance and structure of the tests. In contrast, in this thesis we do not have access to such data. So in turn, in this thesis we investigate the possibility of using well-established machine learning techniques for intelligent test selection using the available data, and check whether it is possible to achieve satisfactory results using the information provided by this data. In case the results are not satisfac- tory this can potentially provide guidelines on how to modify the logging procedure of changes made to the modules and the test results report so as to better facilitate the use of available machine learning techniques. This work is a case study conducted at a large telecom company - more specifically, in the Session Border Gateway (SBG) product, which is a node within an IP Multimedia Subsystem (IMS) solution. The model is trained on the extracted data concerning the SBG code base and data from nightly builds from October 1st 2014 to August 31st 2015. Having collected the necessary data, we design relevant features based on available information in the data and interviews with the experts working with the product and the testing of the product. We then use use logistic regression and random forest algorithms for training the models or predictors for the test cases. One of the benefits of this work is to increase the quality and maintainability of the SBG software by creating faster feedback loops, hence resulting in cost savings and higher customer satisfaction [4]. We believe that this research can be of interest to anyone in the design organization of a large software company.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)