Observability and Chaos Engineering for System Calls in Containerized Applications
Abstract: Chaos engineering is about testing the resilience of systems in production to see if they perform as expected during changing conditions. Container usage is becoming more common and as such applying principles of chaos engineering to their usage is important. In this thesis we investigate something that every containerized application uses, system calls and their possible perturbations consisting of the tuple (s, e, d), a system call, error code and delay. The perturbations consist of 9 different system calls, 7 error codes and 3 possible delays. Targets for these perturbations are mainly containerized HTTP-based applications. Included with this is the need for observability and as such monitoring is created for system calls, HTTP and resource based metrics. For this purpose an application called ChaosOrca was developed with support for both monitoring and system call perturbations on containers. We find that for the nine system calls and four applications evaluated the system call perturbations that have an effect on some of the applications were: open, poll, read, readv, select, sendf ile64, write and writev. With the only perturbation able to result in a crash being the perturbation of the select system call with an error code. Furthermore, we find that the metrics collected are enough to reason about the system behavior, where having a network protocol specific metric is useful but not always necessary.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)