Learning Ordered Top-k Adversarial Attacks via Adversarial Distillation

Zekun Zhang, Tianfu Wu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020, pp. 776-777


Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, especially white-box targeted attacks. This paper studies the problem of how aggressive white-box targeted attacks can be to go beyond widely used Top-1 attacks. We propose to learn ordered Top-k attacks (k >=1), which enforce the Top-k predicted labels of an adversarial example to be the k (randomly) selected and ordered labels (the ground-truth label is exclusive). Two methods are presented. First, we extend the vanilla Carlini-Wagner (C&W) method and use it as a strong baseline. Second, we present an Adversarial Distillation (AD) framework consisting of two components: (i) Computing an adversarial probability distribution for a given ordered Top-k targeted labels. (ii) Learning adversarial examples by minimizing the Kullback-Leibler (KL) divergence between the adversarial distribution and the predicted distribution, together with the perturbation energy penalty. In computing adversarial distributions, we explore how to leverage label semantic similarities, leading to knowledge-oriented attacks. In experiments, we test Top-k (k=1,2,5,10) attacks in the ImageNet-1000 val. dataset using three representative DNNs trained with the clean ImageNet-1000 train dataset, ResNet-50, DenseNet-121 and AOGNet-12M. Overall, the proposed AD approach obtains the best results, especially by a large margin when the computation budget is limited. It reduces the perturbation energy consistently with the same attack success rate on all the four k's, and improves the attack success rate by large margin against the modified C&W method for k=10.

Related Material

author = {Zhang, Zekun and Wu, Tianfu},
title = {Learning Ordered Top-k Adversarial Attacks via Adversarial Distillation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}