Application of Topic Models for Test Case Selection : A comparison of similarity-based selection techniques

University essay from Linköpings universitet/Programvara och system

Abstract: Regression testing is just as important for the quality assurance of a system, as it is time consuming. Several techniques exist with the purpose of lowering the execution times of test suites and provide faster feedback to the developers, examples are ones based on transition-models or string-distances. These techniques are called test case selection (TCS) techniques, and focuses on selecting subsets of the test suite deemed relevant for the modifications made to the system under test. This thesis project focused on evaluating the use of a topic model, latent dirichlet allocation, as a means to create a diverse selection of test cases for coverage of certain test characteristics. The model was tested on authentic data sets from two different companies, where the results were compared against prior work where TCS was performed using similarity-based techniques. Also, the model was tuned and evaluated, using an algorithm based on differential evolution, to increase the model’s stability in terms of inferred topics and topic diversity. The results indicate that the use of the model for test case selection purposes was not as efficient as the other similarity-based selection techniques studied in work prior to thist hesis. In fact, the results show that the selection generated using the model performs similar, in terms of coverage, to a randomly selected subset of the test suite. Tuning of the model does not improve these results, in fact the tuned model performs worse than the other methods in most cases. However, the tuning process results in the model being more stable in terms of inferred latent topics and topic diversity. The performance of the model is believed to be strongly dependent on the characteristics of the underlying data used to train the model, putting emphasis on word frequencies and the overall sizes of the training documents, and implying that this would affect the words’ relevance scoring to the better.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)