S3DA: A Stream-based Solution for Scalable DataAnalysis

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Preechakorn Torruangwatthana; [2017]

Keywords: ;

Abstract: Data processing frameworks based on cloud platforms are gaining significant attentionas solutions to address the challenges posed by the 3Vs (Velocity, Volume andVariety) of BigData. Very large amounts of information is created continuously, giving rise to data streams. This imposes a high demand on the stream processing system to be very efficient and to cope with massive volumes and fluctuating velocity of data. Existing systems such as Apache Storm, Spark Streaming and Flink rely on messaging systems to handle unreliable data rates with a trade-off of additional latency. Incontrast, data streams arising from scientific applications is often characterized huge tuple sizes, and might suffer in performance from the intermediate layer created bythe messaging systems. The processing system should be scalable enough to overcome the fluctuation of data velocity while maintaining quality of service with low latency and high throughput. It should also provide flexibility in its deployment towork well for fog-computing scenarios where data is generated and handled close to the scientific infrastructure generating the data. In this thesis, we would like tointroduce a framework called HarmonicIO, designed for scientific applications. We show that an optimized data flow and real-time scaling (as seen in HarmonicIO) canreduce the cost per operation while maximizing throughput with low latency.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)