A Unified, Resilient, and Explainable Adversarial Patch Detector

Kumar, Vishesh; Agarwal, Akshay

Vishesh Kumar, Akshay Agarwal; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 30387-30397

Abstract

Deep Neural Networks (DNNs), backbone architecture in `almost' every computer vision task, are vulnerable to adversarial attacks, particularly physical out-of-distribution (OOD) adversarial patches. Existing defense models often struggle with interpreting these attacks in ways that align with human visual perception. Our proposed AdvPatchXAI approach introduces a generalized, robust, and explainable defense algorithm designed to defend DNNs against physical adversarial threats. AdvPatchXAI employs a novel patch decorrelation loss that reduces feature redundancy and enhances the distinctiveness of patch representations, enabling better generalization across unseen adversarial scenarios. It learns prototypical parts self-supervised, enhancing interpretability and correlation with human vision. The model utilizes a sparse linear layer for classification, making the decision process globally interpretable through a set of learned prototypes and locally explainable by pinpointing relevant prototypes within an image. Our comprehensive evaluation shows that AdvPatchXAI closes the "semantic" gap between latent space and pixel space and effectively handles unseen adversarial patches even perturbed with unseen corruptions, thereby significantly advancing DNN robustness in practical settings(https://github.com/tbvl22/Unified-resilient-and-Explainable-Adversarial-Patch-detector).

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Kumar_2025_CVPR, author = {Kumar, Vishesh and Agarwal, Akshay}, title = {A Unified, Resilient, and Explainable Adversarial Patch Detector}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {30387-30397} }