Essays about: "Dataprocessering"
Found 4 essays containing the word Dataprocessering.
-
1. A Comparative Study on Efficiency and Scalability of Integer and String Datasets in cuDF and pandas
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : This thesis presents a comparative analysis of cuDF and pandas, two Python data processing libraries, with a focus on performance, limitations, and scalability when handling integer and string datasets. The study aims to assess the efficiency and suitability of cuDF as a potential alternative to pandas in scenarios where high-performance data processing is required. READ MORE
-
2. Highly Available Task Scheduling in Distinctly Branched Directed Acyclic Graphs
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : Big data processing frameworks utilizing distributed frameworks to parallelize the computing of datasets have become a staple part of the data engineering and data science pipelines. One of the more known frameworks is Dask, a widely utilized distributed framework used for parallelizing data processing jobs. READ MORE
-
3. Scaling cloud-native Apache Spark on Kubernetes for workloads in external storages
University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)Abstract : CERN Scalable Analytics Section currently offers shared YARN clusters to its users as monitoring, security and experiment operations. YARN clusters with data in HDFS are difficult to provision, complex to manage and resize. This imposes new data and operational challenges to satisfy future physics data processing requirements. READ MORE
-
4. Integrating Pig and Stratosphere
University essay from KTH/Skolan för informations- och kommunikationsteknik (ICT)Abstract : MapReduce is a wide-spread programming model for processing big amounts of data in parallel. PACT is a generalization of MapReduce, based on the concept of Parallelization Contracts (PACTs). Writing efficient applications in MapReduce or PACT requires strong programming skills and in-depth understanding of the systems’ architectures. READ MORE