Serverless Streaming Graph Analytics with Flink Stateful Functions

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Serverless Function as a Service (FaaS) platforms have been an emerging trend nowadays with the continuous improvement of the cloud-native ecosystem. Graph streaming analytics is a widely-known research area that demands well-designed computation paradigms and complex optimization of storage and queries. Using serverless platforms to process graph streaming analytics would be a prospective field. For one thing, serverless platforms normally use a Function as the first-class citizen, and users can smoothly use or expand the Functions only caring about the application layer, to get the results without knowing the beneath architectures or environment. For another, distributed large-scale graph problems normally demand the message-passing actor model and serverless platforms could use one Function instance for one vertex with its own context, and each of the Functions could evolve its state by passing messages to each other. This way of processing is native to distributed stateful applications and can smoothly support graph streaming analytics. A temporal graph is a graph that evolves with time. With timestamps on edges, users can retrieve historical graph states and even retrieve graph states in any arbitrary event time windows for further analytics. Handling temporal graph analytics problems on serverless platforms is the focus of this thesis. Flink Stateful Functions, a newly-built API under the umbrella of Apache Flink, simplifies the building of distributed stateful applications with runtime for serverless architectures, with the full support of stateful entities modeling with location transparency, concurrency, scaling, and resiliency. Flink Stateful Functions is a powerful tool for temporal graph streaming analytics on a serverless platform. In this thesis project, a temporal graph processing library is built based on the Flink Stateful Functions. It supports efficient storage and query specifically on temporal graph analytics problems.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)