Pattern Mining in Uncertain Tensors

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Aurélien Coussat; [2019]

Keywords: ;

Abstract: Data mining is the art of extracting information from data and creating useful knowledge. Itemset mining, or pattern mining, is an important subfield that consists in finding relevant patterns in datasets. We focus on two subproblems: high-utility itemset mining, where a numerical value called utility is associated to every tuple of the dataset, and patterns are extracted whose utilities sum up to a high-enough value; and skypattern mining, which is the extraction of patterns optimizing various measures, using the notion of Pareto domination. To tackle both of these challenges, we follow a generalistic approach based on measures’ piecewise (anti-)monotonicity. This mathematical property is used in multidupehack, an algorithm in which it is proved useful to prune the search space. Our contributions are implemented as extensions of multidupehack in order to benefit from its powerful pruning strategy. It also allows the extraction of patterns in a broad context: many existing algorithms only handle datasets that are 0/1-matrices, while this work deals with uncertain tensors, i.e. n-dimensional datasets in which the values are numerical numbers between 0 and 1. Experiments on real-life datasets show the efficiency of our approach, and its ability to extract semantically highly relevant patterns. Comparative studies on reference datasets prove its competitiveness with state-of-the-art algorithms: despite its greater versatility, it is often shown faster than its competitors.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)