Continuous Parallel Approximate Frequent Elements Queries on Data Streams

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: The frequent elements problem involves processing a stream of elements and finding all elements that occur more than a given fraction of the time. A relaxed versionof this problem is the -approximate elements problem which allows some false positives.This thesis aims to solve this problem in a parallel context, where multiplethreads work together to speed up computation. Previous research has been successfulin producing algorithms that can process large streams of data very quickly,however they divide the input stream equally among the threads in the system,which results in excessive memory usage. The algorithm presented in this thesis, the Delegation Space-Saving algorithm, logically assigns ownership of certain elements to certain threads. This decreases space consumption and increases accuracy.The Delegation Space-Saving algorithm was evaluated on the metrics of throughput, accuracy, and memory consumption. The algorithm was evaluated using both synthetic data with varying skew and real-world network packet data from a backbonerouter. The Delegation Space-Saving algorithm uses as little as almost the same amount of memory as the single-threaded version, while also having several times higher query and update throughput and equivalent accuracy.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)