Machine Learning with Reconfigurable Privacy on Resource-Limited Edge Computing Devices

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Distributed computing allows effective data storage, processing and retrieval but it poses security and privacy issues. Sensors are the cornerstone of the IoT-based pipelines, since they constantly capture data until it can be analyzed at the central cloud resources. However, these sensor nodes are often constrained by limited resources. Ideally, it is desired to make all the collected data features private but due to resource limitations, it may not always be possible. Making all the features private may cause overutilization of resources, which would in turn affect the performance of the whole system. In this thesis, we design and implement a system that is capable of finding the optimal set of data features to make private, given the device’s maximum resource constraints and the desired performance or accuracy of the system. Using the generalization techniques for data anonymization, we create user-defined injective privacy encoder functions to make each feature of the dataset private. Regardless of the resource availability, some data features are defined by the user as essential features to make private. All other data features that may pose privacy threat are termed as the non-essential features. We propose Dynamic Iterative Greedy Search (DIGS), a greedy search algorithm that takes the resource consumption for each non-essential feature as input and returns the most optimal set of non-essential features that can be private given the available resources. The most optimal set contains the features which consume the least resources. We evaluate our system on a Fitbit dataset containing 17 data features, 4 of which are essential private features for a given classification application. Our results show that we can provide 9 additional private features apart from the 4 essential features of the Fitbit dataset containing 1663 records. Furthermore, we can save 26:21% memory as compared to making all the features private. We also test our method on a larger dataset generated with Generative Adversarial Network (GAN). However, the chosen edge device, Raspberry Pi, is unable to cater to the scale of the large dataset due to insufficient resources. Our evaluations using 1=8th of the GAN dataset result in 3 extra private features with up to 62:74% memory savings as compared to all private data features. Maintaining privacy not only requires additional resources, but also has consequences on the performance of the designed applications. However, we discover that privacy encoding has a positive impact on the accuracy of the classification model for our chosen classification application. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)