Observability of Cloud Native Systems: : An industrial case study of system comprehension with Prometheus & knowledge transfer
Abstract: Background: Acquiring comprehension and observability of software systems is a vital and necessary activity for testing and maintenance; however, these tasks are time-consuming for engineers. Concurrently cloud computing requires microservices to enhance the utilization of cloudnative deployment, which simultaneously introduces a high degree of complexity. Further,codifying and distributing technical knowledge within the organization has been proven to be vital for both competitiveness and financial performance. However, doing it successfully has been proven to be difficult, and transitioning to working virtually and in DevOps brings new potential challenges for software firms. Objective: The objective of this study is to explore how system comprehension of a microservice architecture can be improved from performance metrics through an exploratory data analysis approach. To further enhance the practical business value, the thesis also aims to explore the effects transitioning to virtual work and DevOps have had on knowledge sharing for software firms. Method: A case study is conducted at Ericsson with performance data generated from testing of a system deployed in Kubernetes. Data is extracted with Prometheus, and the performance behavior of four interacting pods is explored with correlation analysis and visualization tools.Furthermore, to explore virtual work and DevOps effects on intra-organizational knowledge sharing of technical knowledge, semi-structured interviews were cross analyzed with literature. Results: An overall high correlation between performance metrics could be observed with deviations between test cases. Also, we were able to generate propositions regarding the performance behavior as well as bring forward possible candidates for predictive modeling. Four new potential decisive factors driving the choice of activities and transfer mechanisms for knowledge transfer are identified, namely, accessibility, dynamicity, established processes, and efficiency. The transition to virtual work showed five positive factors and three negatives. Effects from DevOps were mostly connected to the frequency of sharing and the potentials of automation. Conclusions: Our findings suggest that correlation analysis, when used along with visualization tools, can improve system comprehension of cloud-native systems. And while it shows promise for analyzing individual services and hypothesis creation, the method utilized in the study showcased some drawbacks which are covered in the discussion. The findings also point towards the fact that performance metrics can be a rich information source for knowledge and thus deserves further investigation.Findings also suggest that knowledge sharing is not only considered an important element by academia but also deliberately practiced by industry agents. Looking at the transition to virtual work and DevOps, the results imply that they affect knowledge transfer, both in combination and isolation. However, the case study findings do point towards the fact that the transition to working virtually potentially exerts a larger influence. Interviewees expressed both positive and negative aspects of virtual knowledge sharing. Simultaneously, the positive influences of DevOps were followed by extensive challenges.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)