FlinkNDB : Guaranteed Data Streaming Using External State

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Apache Flink is a stream processing framework that provides a unified state management mechanism which, at its core, treats stream processing as a sequence of distributed transactions. Flink handles failures, re-scaling and reconfiguration seamlessly via a form of a two-phase commit protocol that periodically commits all past side effects consistently into the state backends. This involves invoking and combining checkpoints and, in time of need, redistributing the state to resume data pipelines. All the existing Flink state backend implementations, such as RocksDB, are embedded and coupled with the compute nodes. Therefore, recovery time is proportional to the state needed to be reconfigured and that can take from a few seconds to hours. If application logic is compute-heavy and Flink’s tasks are overloaded, scaling out compute pipeline means scaling out storage together with compute tasks and vice-versa because of the embedded state backends. It also introduces delays due to expensive state re-shuffle and moving large state on the wire. This thesis work proposes the decoupling of the state storage from compute to improve Flink’s scalability. It introduces the design and implementation of a new State backend, FlinkNDB, that decouples state storage from compute. Furthermore, we designed and implemented new techniques to perform snapshotting, and failure recovery to reduce the recovery time close to zero. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)