Essays about: "hadoop spark"

Showing result 1 - 5 of 20 essays containing the words hadoop spark.

  1. 1. Spark on Kubernetes using HopsFS as a backing store : Measuring performance of Spark with HopsFS for storing and retrieving shuffle files while running on Kubernetes

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Shivam Saini; [2020]
    Keywords : Spark; Kubernetes; HopsFS; Data processing; Distributed and Parallel processing;

    Abstract : Data is a raw list of facts and details, such as numbers, words, measurements or observations that is not useful for us all by itself. Data processing is a technique that helps to process the data in order to get useful information out of it. Today, the world produces huge amounts of data that can not be processed using traditional methods. READ MORE

  2. 2. Machine Learning for Predictive Maintenance on Wind Turbines : Using SCADA Data and the Apache Hadoop Ecosystem

    University essay from Linköpings universitet/Institutionen för datavetenskap

    Author : John Eriksson; [2020]
    Keywords : Predictive maintenance; machine learning; hadoop; spark; mllib; apache; wind turbine; wind turbines; stacking; bagging; multilayer perceptron; decision tree; random forest;

    Abstract : This thesis explores how to implement a predictive maintenance system for wind turbines in Apache Spark using SCADA data. How to balance and scale the data set is evaluated, together with the effects of applying the algorithms available in Spark mllib to the given problem. READ MORE

  3. 3. Hudi on Hops : Incremental Processing and Fast Data Ingestion for Hops

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Netsanet Gebretsadkan Kidane; [2019]
    Keywords : Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka; Hudi; Hadoop; Hops; Upsert; SQL; Spark; Kafka;

    Abstract : In the era of big data, data is flooding from numerous data sources and many companies have been utilizing different types of tools to load and process data from various sources in a data lake. The major challenges where different companies are facing these days are how to update data into an existing dataset without having to read the entire dataset and overwriting it to accommodate the changes which have a negative impact on the performance. READ MORE

  4. 4. S3-HopsFS: A Scalable Cloud-native Distributed File System

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Joel Stenkvist; [2019]
    Keywords : ;

    Abstract : Data has been regarded as the new oil in today’s modern world. Data is generated everywhere from how you do online shopping to where you travel. Companies rely on analyzing this data to make informed business decisions and improve their products and services. However, storing this massive amount of data can be very expensive. READ MORE

  5. 5. Hive, Spark, Presto for Interactive Queries on Big Data

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Nikita Gureev; [2018]
    Keywords : Hadoop; SQL; interactive analysis; Hive; Spark; Spark SQL; Presto; Big Data;

    Abstract : Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. READ MORE