Navigating Deep Classifiers : A Geometric Study Of Connections Between Adversarial Examples And Discriminative Features In Deep Neural Networks

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Johannes Rüther; [2020]

Keywords: ;

Abstract: Although deep networks are powerful and effective in numerous applications, their high vulnerability to adversarial perturbations remains a critical limitation in domains such as security, personalized medicine or autonomous systems. While the sensitivity to adversarial perturbations is generally viewed as a bug of deep classifiers, recent research suggests that they are actually a manifestation of non-robust features that deep classifiers exploit for predictive accuracy. In this work, we therefore systematically compute and analyze these perturbations to understand how they relate to discriminative features that models use. Most of the insights obtained in this work take a geometrical perspective on classifiers, specifically the location of decision boundaries in the vicinity of samples. Perturbations that successfully flip classification decisions are conceived as directions in which samples can be moved to transition into other classification regions. Thereby we reveal that navigating classification spaces is surprisingly simple: Any sample can be moved into a target region within a small distance by following a single direction extracted from adversarial perturbations. Moreover, we reveal that for simple data sets such as MNIST, discriminative features used by deep classifiers with standard training are indeed composed of elements found in adversarial examples. Finally, our results also demonstrate that adversarial training fundamentally changes classifier geometry in the vicinity of samples, yielding more diverse and complex decision boundaries. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)