Essays about: "SQL on Hadoop"

Showing result 1 - 5 of 8 essays containing the words SQL on Hadoop.

  1. 1. Hudi on Hops : Incremental Processing and Fast Data Ingestion for Hops

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Netsanet Gebretsadkan Kidane; [2019]
    Keywords : Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka; Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka;

    Abstract : In the era of big data, data is flooding from numerous data sources and many companies have been utilizing different types of tools to load and process data from various sources in a data lake. The major challenges where different companies are facing these days are how to update data into an existing dataset without having to read the entire dataset and overwriting it to accommodate the changes which have a negative impact on the performance. READ MORE

  2. 2. Hive, Spark, Presto for Interactive Queries on Big Data

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Nikita Gureev; [2018]
    Keywords : Hadoop; SQL; interactive analysis; Hive; Spark; Spark SQL; Presto; Big Data;

    Abstract : Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. READ MORE

  3. 3. Scheduling workflows to optimize for execution time

    University essay from Uppsala universitet/Institutionen för informatik och media

    Author : Mathias Peters; [2018]
    Keywords : Hadoop; Hive; big data; workflows; scheduling; parallelization; automatic dependency identification; dependency graph; SQL; HiveQL;

    Abstract : Many functions in today’s society are immensely dependent on data. Data drives everything from business decisions to self-driving cars to intelligent home assistants like Amazon Echo and Google Home. To make good decisions based on data, of which exabytes are generated every day, somehow that data has to be processed. READ MORE

  4. 4. Multitenant PrestoDB as a service

    University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)

    Author : Aruna Kumari Yedurupak; [2017]
    Keywords : Hadoop; Presto; SQL; Multi-tenancy; Hops; HopsWorks; Airpal; Proxy servlet; Hadoop; Presto; SQL; multi-hyresrätt; Hops; HopsWorks; Airpal; Proxy servlet;

    Abstract : In recent years, there has been tremendous growth in both the volumes of data that is produced, stored, and queried by organizations. Organizations spend more money to investigate and obtain useful information or knowledge against terabytes and even petabytes of data. READ MORE

  5. 5. Optimisation of Ad-hoc analysis of an OLAP cube using SparkSQL

    University essay from Uppsala universitet/Avdelningen för beräkningsvetenskap

    Author : Milja Aho; [2017]
    Keywords : ;

    Abstract : An Online Analytical Processing (OLAP) cube is a way to represent a multidimensional database. The multidimensional database often uses a star schema and populates it with the data from a relational database. READ MORE