Streaming Graph Partitioning with Graph Convolutional Networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Michal Zwolak; [2020]

Keywords: ;

Abstract: In this work, we present a novel approach to the streaming graph partitioning problem which handles unbounded streams.Graph partitioning is a process of dividing a graph into groups of nodes or edges. Traditional, offline partitioning methods require a priori access to the entire graph and make multiple passes over the data in order to compute partitions. However, recently the demand for real-time analysis of graph data sparked the prospect of online partitioning. In such an approach, the graph arrives as a stream of nodes or edges which are assigned to partitions as they come and are never reassigned again. Additionally, in the case of modern systems, where graphs constantly grow, the streams are unbounded. The main goals of graph partitioning are preserving data locality, so related items belong to the same partitions, and load balance, so partitions have similar sizes.State-of-the-art streaming graph partitioning algorithms fulfil the two latter requirements. However, they make their partitioning decisions based on internal state, which grows as new items arrive. Thus, they are not capable of processing unbounded streams. At some point, the state will exceed the memory capacity of the machine the algorithm is running on. Moreover, modern stream data processors run in a distributed environment. In such a setting synchronisation of a shared state is an expensive operation.In the proposed approach, in addition to structural information about the graph, we utilise attributes associated with vertices such as user’s location, age, or previous actions. In order to do that, we employ a graph convolutional network (GCN), which is a novel method of graph representation learning. A GCN can embed both structural and feature-based characteristics of each vertex into a low-dimensional space. Secondly, we feed these representations into a neural network, which assigns incoming items to partitions. Such a method requires only the networks’ parameters’ values in order to make a partitioning decision. Thus, the size of the state remains constant regardless of the length of the stream.We present both unsupervised and supervised approaches to train the proposed framework. Moreover, we describe a method to apply the models to partition the streaming graph. We evaluate the performance of our novel method on three real-world graph datasets and compare it with the state-of-the-art HDRF algorithm as well as a simple, stateless hash-based approach. The experimental results show the generalisation capabilities of our models. Moreover, our methods can yield up to 16% lower replication factor than hash partitioning which corresponds to only 1% increase compared to HDRF. At thesame time, we reduce state requirements from linear to constant, which for the graph with 230k vertices and 5.7M edges translates to 125 times smaller size of the state and allows for processing unbounded streams. Nevertheless, the latency of our methods is about 20 times higher than HDRF.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)