A comparative analysis of database sanitization techniques for privacy-preserving association rule mining

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Association rule hiding (ARH) is the process of modifying a transaction database to prevent sensitive patterns (association rules) from discovery by data miners. An optimal ARH technique successfully hides all sensitive patterns while leaving all nonsensitive patterns public. However, in practice, many ARH algorithms cause some undesirable side effects, such as failing to hide sensitive rules or mistakenly hiding nonsensitive ones. Evaluating the utility of ARH algorithms therefore involves measuring the side effects they cause. There are a wide array of ARH techniques in use, with evolutionary algorithms in particular gaining popularity in recent years. However, previous research in the area has focused on incremental improvement of existing algorithms. No work was found that compares the performance of ARH algorithms without the incentive of promoting a newly suggested algorithm as superior. To fill this research gap, this project compares three ARH algorithms developed between 2019 and 2022—ABC4ARH, VIDPSO, and SA-MDP— using identical and unbiased parameters. The algorithms were run on three real databases and three synthetic ones of various sizes, in each case given four different sets of sensitive rules to hide. Their performance was measured in terms of side effects, runtime, and scalability (i.e., performance on increasing database size). It was found that the performance of the algorithms varied considerably depending on the characteristics of the input data, with no algorithm consistently outperforming others at the task of mitigating side effects. VIDPSO was the most efficient in terms of runtime, while ABC4ARH maintained the most robust performance as the database size increased. However, results matching the quality of those in the papers originally describing each algorithm could not be reproduced, showing a clear need for validating the reproducibility of research before the results can be trusted.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)