Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Yin, Shenglin; Xiao, Zhen; Song, Mingxuan; Long, Jieyi

Shenglin Yin, Zhen Xiao, Mingxuan Song, Jieyi Long; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 24605-24614

Abstract

Adversarial distillation (AD) is a highly effective method for enhancing the robustness of small models. Contrary to expectations a high-performing teacher model does not always result in a more robust student model. This is due to two main reasons. First when there are significant differences in predictions between the teacher model and the student model exact matching of predicted values using KL divergence interferes with training leading to poor performance of existing methods. Second matching solely based on the output prevents the student model from fully understanding the behavior of the teacher model. To address these challenges this paper proposes a novel AD method named SmaraAD. During the training process we facilitate the student model in better understanding the teacher model's behavior by aligning the attribution region that the student model focuses on with that of the teacher model. Concurrently we relax the condition of exact matching in KL divergence and replace it with a more flexible matching criterion thereby enhancing the model's robustness. Extensive experiments substantiate the effectiveness of our method in improving the robustness of small models outperforming previous SOTA methods.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Yin_2024_CVPR, author = {Yin, Shenglin and Xiao, Zhen and Song, Mingxuan and Long, Jieyi}, title = {Adversarial Distillation Based on Slack Matching and Attribution Region Alignment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {24605-24614} }