Comparison of State Backends for Modern Stream Processing System

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Mikolaj Robakowski; [2021]

Keywords: ;

Abstract: Distributed Stream Processing is a very popular computing paradigm used invarious modern computer systems. An important aspect of distributed streamprocessing systems is how they deal with computation state bigger than thesystem memory. This is often solved by the usage of a state backend – adatabase, usually an embedded one, that manages the state on the persistentstorage. However, this makes the performance of the whole system dependanton the performance of the database under the given workload. Log-structuredmerge-tree-based solutions are commonly used in stream processing systemsas a one-size-fits-all state backends. We postulate that using different statebackends for different workloads yields much better performance. In this workwe implement several state backends for Arcon, a modern stream processingruntime written in Rust and developed at KTH. The thesis goes over the designchoices and implementation process of a state backend interface alongwith several concrete implementations. We experimentally evaluate the implementationsagainst each other and show that under certain workloads someperform better than other. In particular we show that under read-heavy workloadssled, an embedded Bw-tree-based database written in Rust, outperformsthe commonly used, LSM-based RocksDB.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)