NLP-based Failure log Clustering to Enable Batch Log Processing in Industrial DevOps Setting

University essay from Mälardalens universitet/Akademin för innovation, design och teknik

Abstract: The rapid development, updating, and maintenance of industrial software systems have increased the necessity for software artifact testing. Some medium and large industries are forced to automate the test analysis process due to the proliferation of test data. The examination of test results can be automated by grouping them into subsets comprised of comparable test outcomes and their batch analysis. In this instance, the first step is to identify a precise and reliable categorization mechanism based on structural similarities and error categories. In addition, since errors and the number of subgroups are not specified, a method that does not require prior knowledge of the target subsets should be implemented. Clustering is one of the appropriate methods for separating test results, given this description. This work presents an appropriate approach for grouping test results and accelerating the test analysis process by implementing multiple clustering algorithms (K-means, Agglomerative, DBSCAN, Fuzzy-c-means, and Spectral) on test results from industrial contexts and comparing their time and efficiency in outputs. The lack of organization and textual character of the test findings is one of the primary obstacles in this study, necessitating the implementation of feature selection methods. Consequently, this study employs three distinct approaches to feature selection (TF-IDF, FastText, and Bert). This research was conducted by implementing a series of trials in a controlled and isolated environment, with the assistance of Westermo Technologies AB's test process results, as part of the AIDOaRT Project, in order to establish an acceptable way for clustering industrial test results. The conclusion of this thesis shows that K-means and Agglomerative yield the highest performance and evaluation scores; however, the K-means is superior in terms of execution time and speed. In addition, by organizing a Focus Group meeting to qualitatively examine the results from the perspective of engineers and experts, it can be determined that, from their perspective, clustering results increases the speed of test analysis and decreases the review workload.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)