Empirical Comparison Between Conventional and AI-based Automated Unit Test Generation Tools in Java

University essay from Linnéuniversitetet/Institutionen för datavetenskap och medieteknik (DM)

Abstract: Unit testing plays a crucial role in ensuring the quality and reliability of software systems. However, manual testing can often be a slow and time-consuming process. With current advancements in artificial intelligence (AI), new tools have emerged for automated unit testing to address this issue. But how do these new AI tools compare to conventional automated unit test generation tools? To answer this question, we compared two state-of-the-art conventional unit test tools (EVOSUITE and RANDOOP) with the sole commercially available AI-based unit test tool (DIFFBLUE COVER) for Java. We tested them on 10 sample classes from 3 real-life projects provided by the Defects4J dataset to evaluate their performance regarding code coverage, mutation score, and fault detection. The results showed that EVOSUITE achieved the highest code coverage, averaging 89%, while RANDOOP and DIFFBLUE COVER achieved similar results, averaging 63%. In terms of mutation score, DIFFBLUE COVER had the lowest average score of 40%, while EVOSUITE and RANDOOP scored 67% and 50%, respectively. For fault detection, EVOSUITE and RANDOOP detected a higher number of bugs (7 out of 10 and 5 out of 10, respectively) compared to DIFFBLUE COVER, which found only 4 out of 10. Although the AI-based tool was outperformed in all three criteria, it still shows promise by being able to achieve adequate results, in some cases even surpassing the conventional tools while generating a significantly smaller number of total assertions and more comprehensive tests. Nonetheless, the study acknowledges its limitations in terms of the restricted number of AI-based tools used and the small number of projects utilized from Defects4J.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)