Defending Against Universal Perturbations With Shared Adversarial Training

Chaithanya Kumar Mummadi, Thomas Brox, Jan Hendrik Metzen; The IEEE International Conference on Computer Vision (ICCV), 2019, pp. 4928-4937

Abstract


Classifiers such as deep neural networks have been shown to be vulnerable against adversarial perturbations on problems with high-dimensional input space. While adversarial training improves the robustness of image classifiers against such adversarial perturbations, it leaves them sensitive to perturbations on a non-negligible fraction of the inputs. In this work, we show that adversarial training is more effective in preventing universal perturbations, where the same perturbation needs to fool a classifier on many inputs. Moreover, we investigate the trade-off between robustness against universal perturbations and performance on unperturbed data and propose an extension of adversarial training that handles this trade-off more gracefully. We present results for image classification and semantic segmentation to showcase that universal perturbations that fool a model hardened with adversarial training become clearly perceptible and show patterns of the target scene.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Mummadi_2019_ICCV,
author = {Mummadi, Chaithanya Kumar and Brox, Thomas and Metzen, Jan Hendrik},
title = {Defending Against Universal Perturbations With Shared Adversarial Training},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}