Log Frequency Analysis for Anomaly Detection in Cloud Environments

University essay from Blekinge Tekniska Högskola/Institutionen för programvaruteknik

Abstract: Background: Log analysis has been proven to be highly beneficial in monitoring system behaviour, detecting errors and anomalies, and predicting future trends in systems and applications. However, with continuous evolution of these systems and applications, the amount of log data generated on a timely basis is increasing rapidly. Hence, the amount of manual effort invested in log analysis for error detection and root cause analysis is also increasing. While there is continuous research to reduce manual effort, This Thesis introduced a new approach based on the temporal patternsof logs in a particular system environment, to the current scenario of automated log analysis which can help in reducing manual effort to a great extent. Objectives: The main objective of this research is to identify temporal patterns in logs using clustering algorithms, extract the outlier logs which do not adhere to any time pattern, and further analyse them to check if these outlier logs are helpful in error detection and identifying the root cause of the said errors. Methods: Design Science Research was implemented to fulfil the objectives of the thesis, as the thesis required generation of intermediary results and an iterative and responsive approach. The initial part of the thesis consisted of building an artifact which aided in identifying temporal patterns in the logs of different log types using DBSCAN clustering algorithm. After identification of patterns and extraction of outlier logs, Interviews were conducted which employed manual analysis of the outlier logs by system experts, who then provided insights on the logs and validated the log frequency analysis. Results: The results obtained after running the clustering algorithm on logs of different log types show clusters which represent temporal patterns in most of the files. There are log files which do not have any time patterns, which indicate that not all log types have logs which adhere to a fixed time pattern. The interviews conducted with system experts on the outlier logs yield promising results, indicating that the log frequency analysis is indeed helpful in reducing manual effort involved in log analysis for error detection and root cause analysis. Conclusions: The results of the Thesis show that most of the logs in the given cloud environment adhere to time frequency patterns, and analysing these patterns and their outliers will lead to easier error detection and root cause analysis in the given cloud environment.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)