Experimental Study on Machine Learning with Approximation to Data Streams

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Realtime transferring of data streams enables many data analytics and machine learning applications in the areas of e.g. massive IoT and industrial automation. Big data volume of those streams is a significant burden or overhead not only to the transportation network, but also to the corresponding application servers. Therefore, researchers and scientists focus on reducing the amount of data needed to be transferred via data compressions and approximations. Data compression techniques like lossy compression can significantly reduce data volume with the price of data information loss. Meanwhile, how to do data compression is highly dependent on the corresponding applications. However, when apply the decompressed data in some data analysis application like machine learning, the results may be affected due to the information loss. In this paper, the author did a study on the impact of data compression to the machine learning applications. In particular, from the experimental perspective, it shows the tradeoff among the approximation error bound, compression ratio and the prediction accuracy of multiple machine learning methods. The author believes that, with proper choice, data compression can dramatically reduce the amount of data transferred with limited impact on the machine learning applications.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)