Scalable Streaming Graph Partitioning

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Abstract: Large-scale graph-structured datasets are growing at an increasing rate. Social network graphs are an example of these datasets. Processing large-scale graphstructured datasets are central to many applications ranging from telecommunication to biology and has led to the development of many parallel graph algorithms. Performance of parallel graph algorithms largely depends on how the underlying graph is partitioned. In this work, we focus on studying streaming vertex-cut graph partitioning algorithms where partitioners receive a graph as a stream of vertices and edges and assign partitions to them on their arrival once and for all. Some of these algorithms maintain a state during partitioning. In some cases, the size of the state is so huge that it cannot be kept in a single machine memory. In many real world scenarios, several instances of a streaming graph partitioning algorithm are run simultaneously to improve the system throughput. However, running several instances of a partitioner drops the partitioning quality considerably due to the incomplete information of partitioners. Even frequently sharing states and its combination with buffering mechanisms does not completely solves the problem because of the heavy communication overhead produced by partitioners. In this thesis, we propose an algorithm which tackles the problem of low scalability and performance of existing streaming graph partitioning algorithms by providing an efficient way of sharing states and its combination with windowing mechanism. We compare state-of-the-art streaming graph partitioning algorithms with our proposed solution concerning performance and efficiency. Our solution combines a batch processing method with a shared-state mechanism to achieve both an outstanding performance and a high partitioning quality. Shared state mechanism is used for sharing states of partitioners. We provide a robust implementation of our method in a PowerGraph framework. Furthermore, we empirically evaluate the impact of partitioning quality on how graph algorithms perform in a real cloud environment. The results show that our proposed method outperforms other algorithms in terms of partitioning quality and resource consumption and improves partitioning time considerably. On average our method improves partitioning time by 23%, decreases communication load by 15% and increase memory consumption by only 5% compared to the state-of-the-art streaming graph partitioning.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)