Evaluation and benchmarking of Tachyon as a memory-centric distributed storage system for Apache Hadoop

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Author: Ioannis Kerkinos; [2016]

Keywords: ;

Abstract: Hadoop was developed as an open-source software framework that leveraged initially the MapReduce programming model and therefore was able to efficiently analyse and process large datasets. At the core of Hadoop is the Hadoop distributed file system or HDFS, which is used as the default storage across the cluster. Hadoop can also be used with other types of storage, with or without HDFS, such as Amazon S3, Windows Azure Storage Blobs, GlusterFS, Tachyon etc. This thesis focuses on Tachyon, a distributed file system that claims to enable reliable data sharing at memory speed across cluster computing frameworks. We benchmark and evaluate HDFS with and without Tachyon in regards to performance. To do so we used TestDFSIO as a benchmark to simulate different MapReduce workloads and an in-production Spark job from Spotify. Tachyon's different writetypes were also put to the test and evaluated. To see how cloud solutions compare, we perform the same evaluations of Tachyon over Google Cloud Storage.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)