Utility of Differentially Private Synthetic Data Generation for High-Dimensional Databases

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Daan Knoors; [2018]

Keywords: ;

Abstract: When processing data that contains sensitive information, careful consideration is required with regard to privacy-preservation to prevent disclosure of confidential information. Privacy engineering enables one to extract valuable patterns, safely, without compromising anyone’s privacy. Over the last decade, academics have actively sought to find stronger definitions and methodologies to achieve data privacy while preserving the data utility. Differential privacy emerged and became the de facto standard for achieving data privacy and numerous techniques are continuously proposed based on this definition. One method in particular focuses on the generation of private synthetic databases, that mimic statistical patterns and characteristics of a confidential data source in a privacy-preserving manner. Original data format and utility is preserved in a new database that can be shared and analyzed safely without the risk of privacy violation. However, while this privacy approach sounds promising there has been little application beyond academic research. Hence, we investigate the potential of private synthetic data generation for real-world applicability. We propose a new utility evaluation framework that provides a unified approach upon which various algorithms can be assessed and compared. This framework extends academic evaluation methods by incorporating a user-oriented perspective and varying industry requirements, while also examining performance on real-world use cases. Finally, we implement multiple general-purpose algorithms and evaluate them based on our framework to ultimately determine the potential of private synthetic data generation beyond the academic domain.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)