Anomaly Detection in Wait Reports and its Relation with Apache Cassandra Statistics

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background: Apache Cassandra is a highly scalable distributed system that can handle large amounts of data through several nodes / virtual machines grouped together as Apache Cassandra clusters. When one such node in an Apache Cassandra cluster is down, there is a need for a tool or an approach that can identify this failed virtual machine by analyzing the data generated from each of the virtual machines in the cluster. Manual analysis of this data is tedious and can be quite strenuous. Objectives: The objective of the thesis is to identify, build and evaluate a solution that can detect and report the behaviour of the erroneous or failed virtual machine by analyzing the data generated by each virtual machine in an Apache Cassandra cluster. In the study, we analyzed two specific data sources from each virtual machine, i.e., the wait reports and Apache Cassandra statistics, and proposed a tool named AnoDect to realize this objective. The tool has been built using the input provided by the technical support team at Ericsson through interviews and was also evaluated by them to realize its reliability, usability and, usefulness in an industrial setting. Methods: A case study methodology has been piloted at Ericsson and semi-structured interviews have been conducted to identify the key features in the data along with the functionalities AnoDect needs to perform to assist the CIL team (technical support team at Ericsson) to rectify the erroneous virtual machine in the cluster. An experimental evaluation and a static user evaluation have been conducted, as a part of the case study evaluation, where the experimental evaluation is conducted to identify the best technique for AnoDect's anomaly detection in wait reports and the static evaluation has been conducted to evaluate AnoDect for its reliability and usability once it is deployed for use. Results: From the feedback provided by the CIL team through the questionnaire, it has been observed that the results provided by the tool are quite satisfactory, in terms of usability and reliability of the tool.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)