Visual Debugging of Dataflow Systems

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Author: Fanti Machmount Al Samisti; [2017]

Keywords: ;

Abstract: Big data processing has seen vast integration into the idea of data analysis in live streaming and batch environments. A plethora of tools have been developed to break down a problem into manageable tasks and to allocate both software and hardware resources in a distributed and fault tolerant manner. Apache Spark is one of the most well known platforms for large-scale cluster computation. In SICS Swedish ICT, Spark runs on top of an in-house developed solution. HopsWorks provides a graphical user interface to the Hops platform that aims to simplify the process of configuring a Hadoop environment and improving upon it. The user interface includes, among other capabilities, an array of tools for executing distributed applications such as Spark, TensorFlow, Flink with a variety of input and output sources, e.g. Kafka, HDFS files etc. Currently the available tools to monitor and instrument a stack that includes the aforementioned technologies come from both the corporate and open source world. The former is usually part of a bigger family of products running on proprietary code. In contrast, the latter offers a wider variety of choices with the most prominent ones lacking either the flexibility in exchange for a more generic approach or the ease of gaining meaningful insight except of the most experienced users. The contribution of this project is a visualization tool in the form of a web user interface, part of the Hops platform, for understanding, debugging and ultimately optimizing the resource allocation and performance of dataflow applications. These processes are based both on the abstraction provided by the dataflow programming paradigm and on systems concepts such as properties of data, how much variability in the data, computation, distribution, and other system wide resources.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)