Evaluating the Trade-offs of Diversity-Based Test Prioritization: An Experiment

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: Different test prioritization techniquesdetect faults at earlier stages of test execution. To this end,Diversity-based techniques (DBT) have been cost-effective byprioritizing the most dissimilar test cases to maintain effectivenessand coverage with lower resources at different stages of thesoftware development life cycle, called levels of testing (LoT).Diversity is measured on static test specifications to convey howdifferent test cases are from one another. However, there is littleresearch on DBT applied to semantic similarities of words withintests. Moreover, diversity has been extensively studied withinindividual LoT (unit, integration and system), but the trade-offsof such techniques across different levels are not well understood.Objective and Methodology: This paper aims to reveal relationshipsbetween DBT and the LoT, as well as to compare andevaluate the cost-effectiveness and coverage of different diversitymeasures, namely Jaccard’s Index, Levenshtein, NormalizedCompression Distance (NCD), and Semantic Similarity (SS). Weperform an experiment on the test suites of 7 open source projectson the unit level, 1 industrial project on the integration level, and4 industry projects on the system level (where one project is usedon both system and integration levels).Results: Our results show that SS increases test coverage forsystem-level tests, and the differences in failure detection rateof each diversity increase as more prioritised tests execute. Interms of execution time, we report that Jaccard is the fastest,whereas Levenshtein is the slowest and, in some cases, simplyinfeasible to run. In contrast, Levenshtein detects more failureson integration level, and Jaccard more on system level.Conclusion: Future work can be done on SS to be implementedon code artefacts, as well as including other DBT in thecomparison. Suspected test suite properties that seem to affectDBT performance can be investigated in greater detail.

