Towards Understanding and Improving Adversarial Robustness of Vision Transformers

Jain, Samyak; Dutta, Tanima

Samyak Jain, Tanima Dutta; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 24736-24745

Abstract

Recent literature has demonstrated that vision transformers (VITs) exhibit superior performance compared to convolutional neural networks (CNNs). The majority of recent research on adversarial robustness however has predominantly focused on CNNs. In this work we bridge this gap by analyzing the effectiveness of existing attacks on VITs. We demonstrate that due to the softmax computations in every attention block in VITs they are inherently vulnerable to floating point underflow errors. This can lead to a gradient masking effect resulting in suboptimal attack strength of well-known attacks like PGD Carlini and Wagner (CW) GAMA and Patch attacks. Motivated by this we propose Adaptive Attention Scaling (AAS) attack that can automatically find the optimal scaling factors of pre-softmax outputs using gradient-based optimization. We show that the proposed simple strategy can be incorporated with any existing adversarial attacks as well as adversarial training methods and achieved improved performance. On VIT-B16 we demonstrate an improved attack strength of upto 2.2% on CIFAR10 and upto 2.9% on CIFAR100 by incorporating the proposed AAS attack with state-of-the-art single attack methods like GAMA attack. Further we utilise the proposed AAS attack for every few epochs in existing adversarial training methods which is termed as Adaptive Attention Scaling Adversarial Training (AAS-AT). On incorporating AAS-AT with existing methods we outperform them on VITs over 1.3-3.5% on CIFAR10. We observe improved performance on ImageNet-100 as well.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Jain_2024_CVPR, author = {Jain, Samyak and Dutta, Tanima}, title = {Towards Understanding and Improving Adversarial Robustness of Vision Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {24736-24745} }