Multitenant PrestoDB as a service

University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

Abstract: In recent years, there has been tremendous growth in both the volumes of data that is produced, stored, and queried by organizations. Organizations spend more money to investigate and obtain useful information or knowledge against terabytes and even petabytes of data. Large-scale data analysis is the key functionality provided by Big Data platforms. Previously, data platforms would get the information from unstructured data in the form of files, text, and videos. In recent times, the Hadoop stack has played a vital role in Big Data, becoming the defector open source software used to process and analyze Big Data. Hops is a Hadoop distribution developed by KTH and RISE SICS. Hops modifies the Hadoop stack by moving the meta-data for YARN and HDFS to NDB, an open-source in-memory distributed database. HopsWorks is the User Interface for Hops and provides support for multi-tenant users, as well as self-service, graphical access to frameworks such as Hadoop, Flink, Spark, Kafka, and Kibana. HopsWorks currently does not provide a SQL-on-Hadoop service, although work is ongoing for supporting Hive. Presto is one of the main SQL-on-Hadoop platform, but, currently, Presto does not provide multi-tenancy support for users. This thesis investigates providing multitenancy support to Presto with the help of HopsWorks, including both the security problem and the self-service UI requirements of HopsWorks. Presto is a distributed SQL query Engine which can run SQL queries against up to petabytes of data. As HopsWorks provides UI access to services, we decided to build our UI for Presto on an existing open-source UI for Presto, called Airpal, developed by Airbnb. This provided solution of the thesis divided into two functionalities. First one, maintain two separate Applications (HopsWorks and Airpal Applications) run by the help of two JVMs and maintain ProxyServlet to control traffic between them. Second one HopsWorks-Presto-service leverages HopsWorks access-control (Data owner and Data-scientist) and self-service security model. The evaluation of the thesis used qualitative approach by comparing HopsWorks-PrestoService with standalone PrestoDB and comparing HopsWorks-PrestoService with HopsWorks without Presto-Service.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)