TupleSearch : A scalable framework based on sketches to process and store streaming temporal data for real time analytics

University essay from Mittuniversitetet/Avdelningen för informationssystem och -teknologi

Abstract: In many fields, there is a need for quick analysis of data. As the number of devices connected to the Internet grows, so does the amounts of data generated. The traditional way of analyzing large amounts of data has been by using batch processing, where the already collected data is pro-cessed. This process is time consuming, resulting in another trend emerg-ing: stream processing. Stream processing is when data is processed and stored as it arrives. Because of the velocity, volume and variations in data. Stream processing is best carried out in the main memory, and means processing and storing data as it arrives, which makes it a big challenge. This thesis focuses on developing a framework for the processing and storing of streaming temporal data enabling the data to be analyzed in real time. For this purpose, a server application was created consisting of approximate in-memory data synopsizes, called sketches, to process and store the input data. Furthermore, a client web application was created to query and analyze the data. The results show that the framework can sup-port simple aggregate queries with constant query time regardless to the volume of data. Also, it can process data 6.8 times faster than a traditional database system. All this implies that the system is scalable, at the same time it with a query error vs. memory trade-off. For a distribution of ~3000000 unique items it was concluded that the framework can provide very accurate answers, with an error rate less than 1.1%, for the trendiest data using about 100 times less space than the actual size of the data set.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)