An experimental analysis of Link Prediction methods over Microservices Knowledge Graphs

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Graphs are a powerful way to represent data. They can be seen as a collection of objects (nodes) and the relationships between them (edges or links). The power of this structure has its intrinsic value in the relationship between data points that can even provide more information than the data properties. An important type of graph is Knowledge Graphs in which each node and edge has a type associated. Often graph data is incomplete and in this case, it is not possible to retrieve useful information. Link prediction, also known as knowledge graph completion, is the task of inferring if there are missing edges or nodes in a graph. Models of different types, including Machine Learning-based, Rule-based, and Neural Network-based models have been developed to address this problem. The goal of this research is to understand how link prediction methods perform in a real use-case scenario. Therefore, multiple models have been compared on different accuracy metrics and production case requirements on a microservice tracing dataset. Models have been trained and tested on two different knowledge graphs obtained from the data, one that takes into account the temporal information, and the other that does not. Moreover, the prediction of the models has been evaluated with what is usually done in the literature, and also mimicking a real use-case scenario. The comparison showed that too complex models cannot be used when the time, at training, and/or inference phase, is critical. The best model for traditional prediction has been RotatE which usually doubled the score of the second- best model. Considering the use-case scenario, RotatE was tied with QuatE, which required a lot more time for training and predicting. They scored 20% to 40% better than the third-best performing model, depending on the case. Moreover, most of the models required less than a millisecond for predicting a triplet, with NodePiece that was the fastest, beating ConvE by a 4% margin. For the training time, NodePiece beats AnyBURL by 40%. Considering the memory usage, again NodePiece is the best, by an order of magnitude of at least 10 when compared to most of the other models. RotatE has been considered the best model overall because it had the best accuracy and an above-average performance on the other requirements. Additionally, a simulation of the integration of RotatE with a dynamic sampling tracing tool has been carried out, showing similar results to the ones previously obtained. Lastly, a thorough analysis of the results and suggestions for future work are presented.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)