Robustness With Query-Efficient Adversarial Attack Using Reinforcement Learning

Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Sahand Ghorbanpour, Vineet Gundecha, Antonio Guillen, Ricardo Luna, Avisek Naug; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 2330-2337


A measure of robustness against naturally occurring distortions is key to safety, success, and trustworthiness of machine learning models on deployment. We propose an adversarial black-box attack that adds minimum Gaussian noise distortions to input images to make machine learning models misclassify. We used a Reinforcement Learning (RL) agent as a smart hacker to explore the input images to add minimum distortions to the most sensitive regions to induce misclassification. The agent employs a smart policy also to remove noises introduced earlier, which has less impact on the trained model at a given state. This novel approach is equivalent to doing a deep tree search to add noises without an exhaustive search, leading to faster and optimal convergence. Also, this adversarial attack method effectively measures the robustness of image classification models with the misclassification inducing minimum L2 distortion of Gaussian noise similar to many naturally occurring distortions. Furthermore, the proposed black-box L2 adversarial attack tool beats state-of-the-art competitors in terms of the average number of queries by a significant margin with a 100% success rate while maintaining a very competitive L2 score, despite limiting distortions to Gaussian noise. For the ImageNet dataset, the average number of queries achieved by the proposed method for ResNet-50, Inception-V3, and VGG-16 models are 42%, 32%, and 31% better than the state-of-the-art "Square-Attack" approach while maintaining a competitive L2. Demo:

Related Material

@InProceedings{Sarkar_2023_CVPR, author = {Sarkar, Soumyendu and Babu, Ashwin Ramesh and Mousavi, Sajad and Ghorbanpour, Sahand and Gundecha, Vineet and Guillen, Antonio and Luna, Ricardo and Naug, Avisek}, title = {Robustness With Query-Efficient Adversarial Attack Using Reinforcement Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {2330-2337} }