Essays about: "hadoop spark"
Showing result 1 - 5 of 20 essays containing the words hadoop spark.
-
1. Spark on Kubernetes using HopsFS as a backing store : Measuring performance of Spark with HopsFS for storing and retrieving shuffle files while running on Kubernetes
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : Data is a raw list of facts and details, such as numbers, words, measurements or observations that is not useful for us all by itself. Data processing is a technique that helps to process the data in order to get useful information out of it. Today, the world produces huge amounts of data that can not be processed using traditional methods. READ MORE
-
2. Machine Learning for Predictive Maintenance on Wind Turbines : Using SCADA Data and the Apache Hadoop Ecosystem
University essay from Linköpings universitet/Institutionen för datavetenskapAbstract : This thesis explores how to implement a predictive maintenance system for wind turbines in Apache Spark using SCADA data. How to balance and scale the data set is evaluated, together with the effects of applying the algorithms available in Spark mllib to the given problem. READ MORE
-
3. Hudi on Hops : Incremental Processing and Fast Data Ingestion for Hops
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : In the era of big data, data is flooding from numerous data sources and many companies have been utilizing different types of tools to load and process data from various sources in a data lake. The major challenges where different companies are facing these days are how to update data into an existing dataset without having to read the entire dataset and overwriting it to accommodate the changes which have a negative impact on the performance. READ MORE
-
4. S3-HopsFS: A Scalable Cloud-native Distributed File System
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : Data has been regarded as the new oil in today’s modern world. Data is generated everywhere from how you do online shopping to where you travel. Companies rely on analyzing this data to make informed business decisions and improve their products and services. However, storing this massive amount of data can be very expensive. READ MORE
-
5. Hive, Spark, Presto for Interactive Queries on Big Data
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : Traditional relational database systems can not be efficiently used to analyze data with large volume and different formats, i.e. big data. Apache Hadoop is one of the first open-source tools that provides a distributed data storage system and resource manager. READ MORE