Cloud native chaos engineering for IoT systems

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: IoT (Internet of Things) systems implement event-driven architectures that are deployed on an ever-increasing scale as more and more devices (things) become connected to the internet. Consequently, IoT cloud platforms are becoming increasingly distributed and complex as they adapt to handle larger amounts of user requests and device data. The complexity of such systems makes it close to impossible to predict how they will handle failures that inevitably occur once they are put into production. Chaos engineering, the practice of deliberately injecting faults in production, has successfully been used by many software companies as a means to build confidence in that their complex systems are reliable for the end-users. Nevertheless, its applications in the scope of IoT systems remain largely unexplored in research. Modern IoT cloud platforms are built cloud native with containerized microservices, container orchestration, and other cloud native technologies, much like any other distributed cloud computing system. We therefore investigate cloud native chaos engineering technology and its applications in IoT cloud platforms. We also introduce a framework for getting started with using cloud native chaos engineering to verify and improve the resilience of IoT systems and evaluate it through a case study at a commercial home appliance manufacturer. The evaluation successfully reveals unknown system behavior and results in the discovery of potential resilience improvements for the case study IoT system. The evaluation also shows three ways to measure the resilience of IoT cloud platforms with respect to perturbations, these are: (1) success rate of user requests, (2) system health, and (3) event traffic. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)