Maintaining Stream Data Distribution Over Sliding Window

University essay from Mittuniversitetet/Avdelningen för informationssystem och -teknologi

Abstract: In modern applications, it is a big challenge that analyzing the order statistics about the most recent parts of the high-volume and high velocity stream data. There are some online quantile algorithms that can keep the sketch of the data in the sliding window and they can answer the quantile or rank query in a very short time. But most of them take the GK algorithm as the subroutine, which is not known to be mergeable. In this paper, we propose another algorithm to keep the sketch that maintains the order statistics over sliding windows. For the fixed-size window, the existing algorithms can’t maintain the correctness in the process of updating the sliding window. Our algorithm not only can maintain the correctness but also can achieve similar performance of the optimal algorithm. Under the basis of maintaining the correctness, the insert time and query time are close to the best results, while others can't maintain the correctness. In addition to the fixed-size window algorithm, we also provide the time-based window algorithm that the window size varies over time. Last but not least, we provide the window aggregation algorithm which can help extend our algorithm into the distributed system.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)