Essays about: "apache spark"

Showing result 1 - 5 of 47 essays containing the words apache spark.

  1. 1. Big Data and Analytics with Driving  Data : Implementation and Analysis of Data Pipeline and Data Processing Resources

    University essay from Uppsala universitet/Institutionen för informationsteknologi

    Author : Ivar Blohm; Erik Jarvis; [2023]
    Keywords : ;

    Abstract : This thesis project was conducted in cooperation with Zenseact for the purpose of investigating the possible usage of Google BigQuery and its capabilities to store and provide insights of large time-series data. An end-to-end data pipeline was built to facilitate the movement of data from Zenseact's local servers and ingestion into BigQuery. READ MORE

  2. 2. Big Data Analytics Using Apache Flink for Cybercrime Forensics on X (formerly known as Twitter)

    University essay from Högskolan i Halmstad/Akademin för informationsteknologi

    Author : Manjunath Kakkepalya Puttaswamy; [2023]
    Keywords : Apache Flink; Apache Spark; Big Data; Twitter; X;

    Abstract : The exponential growth of social media usage has led to massive data sharing, posing challenges for traditional systems in managing and analyzing such vast amounts of data. This surge in data exchange has also resulted in an increase in cyber threats from individuals and criminal groups. READ MORE

  3. 3. Auto-Tuning Apache Spark Parameters for Processing Large Datasets

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Shidi Zhou; [2023]
    Keywords : Apache Spark; Cloud Environment; Spark Configuration Parameter; Resource Utilization; Ridge Regression; Elastic Net; Random Forest; Deep Neural Network; Bayesian Optimization; Particle Swarm Optimization.; Apache Spark; Molnmiljö; Apache Spark konfigurationsparameter; Resursutnyttjande; Ridge-regression; Elastisk nät; Slumpskog; Djupt neuralt nätverk; Bayesiansk optimering; Partikelsvärmsoptimering.;

    Abstract : Apache Spark is a popular open-source distributed processing framework that enables efficient processing of large amounts of data. Apache Spark has a large number of configuration parameters that are strongly related to performance. Selecting an optimal configuration for Apache Spark application deployed in a cloud environment is a complex task. READ MORE

  4. 4. Resource-efficient and fast Point-in-Time joins for Apache Spark : Optimization of time travel operations for the creation of machine learning training datasets

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Axel Pettersson; [2022]
    Keywords : Apache Spark; Point-in-Time; ASOF; Join; Optimizations; Time travel; Apache Spark; Point-in-Time; ASOF; Join; Optimeringar; Tidsresning;

    Abstract : A scenario in which modern machine learning models are trained is to make use of past data to be able to make predictions about the future. When working with multiple structured and time-labeled datasets, it has become a more common practice to make use of a join operator called the Point-in-Time join, or PIT join, to construct these datasets. READ MORE

  5. 5. A performance study for autoscaling big data analytics containerized applications : Scalability of Apache Spark on Kubernetes

    University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

    Author : Vinay Kumar Vennu; Sai Ram Yepuru; [2022]
    Keywords : Containers; Container Orchestration; Big data analytics; Autoscaling; Resource Management;

    Abstract : Container technologies are rapidly changing how distributed applications are executed and managed on cloud computing resources. As containers can be deployed on a large scale, there is a tremendous need for Container Orchestration tools like Kubernetes that are highly automatic in deployment, scaling, and management. READ MORE