Investigating programming language support for fault-tolerance

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Ismail Demirkoparan; [2023]

Keywords: Dataflow; Fault tolerance; Data streaming; Distributed systems; Checkpointing; Logging; Lineage; Lineage stash; Arc-Lang; Apache Flink;

Abstract: Dataflow systems have become the norm for developing data-intensive computing applications. These systems provide transparent scalability and fault tolerance. For fault tolerance, many dataflow-system adopt a snapshotting approach which persists the state of an operator once it has received a snapshot marker on all its input channels. This approach requires channels to be blocked for potentially prolonged durations until all other input channels have received their markers to guarantee that no events from the future make it into the operator’s present state snapshot. Alignment can for this reason have a severe performance impact. In particular, for black-box user-defined operators, the system has no knowledge about how events from different channels affect the operator’s state. Thus, the system must conservatively assume that all events affect the same state and align all channels. In this thesis, we argue that alignment between two channels is unnecessary if messages from those channels are not written to the same output channel. We propose a snapshotting approach for the fault tolerance and call it partial approach. The partial approach does not require alignment when an operator’s input channels are independent. Two input channels are independent if their events do not affect the same state and are never written to the same output channel. We propose the use of static code analysis to identify such dependencies. To enable this analysis, we translate operators into finite state machines that make the operator’s state explicit. As a proof of concept, we extend the implementation of Arc-Lang, an existing dataflow language, so that applications written in it transparently execute with fault tolerance. We evaluate our approach by comparing it to a baseline eager approach that always requires alignment between the input channels. The conducted experiments’ results show that the partial approach performs about 47 % better than the eager approach when the streaming sources are producing data at different velocities.

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Investigating programming language support for fault-tolerance

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-26)