The Effect of 5-anonymity on a classifier based on neural network that is applied to the adult dataset

University essay from Högskolan i Skövde/Institutionen för informationsteknologi

Abstract: Privacy issues relating to having data made public is relevant with the introduction of the GDPR. To limit problems related to data becoming public, intentionally or via an event such as a security breach, anonymization of datasets can be employed. In this report, the impact of the application of 5-anonymity to the adult dataset on a classifier based on a neural network predicting whether people had an income exceeding $50,000 was investigated using precision, recall and accuracy. The classifier was trained using the non-anonymized data, the anonymized data, and the non-anonymized data with those attributes which were suppressed in the anonymized data removed. The result was that average accuracy dropped from 0.82 to 0.76, precision from 0.58 to 0.50, and recall increased from 0.82 to 0.87. The average values and distributions seem to support the estimation that the majority of the performance impact of anonymization in this case comes from the suppression of attributes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)