CONSTRUCTING AND VARYING DATA MODELS FOR UNSUPERVISED ANOMALY DETECTION ON LOG DATAData modelling and domain knowledge’s impact on anomaly detection and explainability
Abstract: As the complexity of today’s systems increases, manual system monitoring and log file analysis are no longer applicable, giving an increasing need for automated anomaly detection systems. However, most current research in the domain, tend to focus only on the technical details of the frameworks and the evaluations of the algorithms, and how this impacts anomaly detection results. In contrast, this study emphasizes the details of how one can approach to understand and model the data, and how this impact anomaly detection performance.Given log data from an education platform application, data is analysed to conform a concept of what is normal, with regards to educational course section behaviour. Data is then modelled to capture the dimensions of a course section, and a detection model created, running a statically tuned K-Nearest neighbours algorithm as classier - to emphasize the impact of the modelling, not the algorithm. The results showed that single point anomalies could successfully be detected. However, the results were hard to interpret due to lack of reason and explainability. Thereby, this study presents a method of modifying a multidimensional data model to conform a detection model with increased explainability. The original model is decomposed into smaller modules by utilizing explicit categorical domain knowledge of the available features. Each module will represent a more specic aspect of the whole model and results show a more explicit coverage of detected point anomalies and a higher degree of explainability of the detection output, in terms of increased interpretability as well as increased comprehensibility.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)