Comparison of lossy and lossless compression algorithms for time series data in the Internet of Vehicles

University essay from Linköpings universitet/Institutionen för datavetenskap

Abstract: As automotive development advances, connectivity features are continually added to vehicles that, in conjunction, form an Internet of Vehicles. For numerous reasons, it is vital for vehicle manufacturers to collect telemetry from their fleets. However, the volume of the generated data is too immense to feasibly be transmitted to a server due to CPU and memory limitations of embedded hardware and the monetary cost of cellular network usage. The purpose of this thesis is thus to investigate how these issues can be alleviated by the use of real-time compression of time series data before off-board transmission. A hybrid approach is proposed that results in fast and effective performance on a variety of time series exhibiting varying numerical data features, all while limiting the maximum reconstruction error to a user-specified absolute value. We first perform a literature review to identify state of the art compression algorithms for time series compression that run online and provide max-error guarantees. We then choose a subset of lossless and lossy algorithms that are implemented and benchmarked with regards to their compression ratio, resource usage, and reconstruction error when used on time series that exhibit a variety of data features. Finally, we ask whether we are able to run a lossy and lossless algorithm in succession in order to further increase the compression ratio. The literature review identifies a diverse range of compression algorithms. Out of these, the algorithms Poor Man's Compression - MidRange (PMC-MR) and Swing filter are selected as lossy algorithms, and Run-length Binary Encoding (RLBE) and Gorilla are selected as lossless algorithms. The experiments yield positive results for the lossy algorithms, which excel on different data sets. These are able to achieve compression ratios between 22.0% and 99.5%, depending on the data set, while limiting the max-error to 1%. In contrast, Gorilla achieves compression ratios between 66.6% and 83.7%, outperforming RLBE in nearly all aspects. Moreover, we conclude that there is a strictly positive improvement to the compression ratio when losslessly compressing the result of lossily compressed data. When combining either PMC-MR or Swing filter with Gorilla, we achieve compression ratios between 83.1% and 99.6% across a variety of time series with a maximum error for any given data point of 1%.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)