Improving Availability of Stateful Serverless Functions in Apache Flink

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Serverless computing and Function-as-a-Service are rising in popularity due to their ease of use, provided scalability and cost-efficient billing model. One such platform is Apache Flink Stateful Functions. It allows application developers to run serverless functions with state that is persisted using the underlying stream processing engine Apache Flink. Stateful Functions use an embedded RocksDB state backend, where state is stored locally at each worker. One downside of this architecture is that state is lost if a worker fails. To recover, a recent snapshot of the state is fetched from a persistent file system. This can be a costly operation if the size of the state is large. In this thesis, we designed and developed a new decoupled state backend for Apache Flink Stateful Functions, with the goal of increasing availability while measuring potential performance trade-offs. It extends an existing decoupled state backend for Flink, FlinkNDB, to support the operations of Stateful Functions. FlinkNDB stores state in a separate highly available database, RonDB, instead of locally at the worker nodes. This allows for fast recovery as large state does not have to be transferred between nodes. Two new recovery methods were developed, eager and lazy recovery. The results show that lazy recovery can decrease recovery time by up to 60% compared to RocksDB when the state is large. Eager recovery did not provide any recovery time improvements. The measured performance was similar between RocksDB and FlinkNDB. Checkpointing times in FlinkNDB were however longer, which cause short periodic performance degradation. The evaluation of FlinkNDB suggests that decoupled state can be used to improve availability, but that there might be performance deficits included. The proposed solution could thus be a viable option for applications with high requirements of availability and lower performance requirements.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)