Improving the performance of stream processing pipeline for vehicle data

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: The growing amount of position-dependent data (containing both geo position data (i.e. latitude, longitude) and also vehicle/driver-related information) collected from sensors on vehicles poses a challenge to computer programs to process the aggregate amount of data from many vehicles. While handling this growing amount of data, the computer programs that process this data need to exhibit low latency and high throughput – as otherwise the value of the results of this processing will be reduced. As a solution, big data and cloud computing technologies have been widely adopted by industry. This thesis examines a cloud-based processing pipeline that processes vehicle location data. The system receives real-time vehicle data and processes the data in a streaming fashion. The goal is to improve the performance of this streaming pipeline, mainly with respect to latency and cost. The work began by looking at the current solution using AWS Kinesis and AWS Lambda. A benchmarking environment was created and used to measure the current system’s performance. Additionally, a literature study was conducted to find a processing framework that best meets both industrial and academic requirements. After a comparison, Flink was chosen as the new framework. A new solution was designed to use Fink. Next the performance of the current solution and the new Flink solution were compared using the same benchmarking environment and. The conclusion is that the new Flink solution has 86.2% lower latency while supporting triple the throughput of the current system at almost same cost.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)