-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Yao_2025_WACV, author = {Yao, Yuguang and Liu, Jiancheng and Gong, Yifan and Liu, Xiaoming and Wang, Yanzhi and Lin, Xue and Liu, Sijia}, title = {Can Adversarial Examples Be Parsed to Reveal Victim Model Information?}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7049-7061} }
Can Adversarial Examples Be Parsed to Reveal Victim Model Information?
Abstract
Numerous adversarial attack methods have been developed to generate imperceptible image perturbations that cause erroneous predictions in state-of-the-art machine learning (ML) models particularly deep neural networks (DNNs). Despite extensive research on adversarial examples limited efforts have been made to explore the hidden characteristics carried by these perturbations. In this study we investigate the feasibility of deducing information about the victim model (VM)--specifically characteristics such as architecture type kernel size activation function and weight sparsity--from adversarial examples. We approach this problem as a supervised learning task where we aim to attribute categories of VM characteristics to individual adversarial examples. To facilitate this we have assembled a dataset of adversarial attacks spanning seven types generated from 135 victim models systematically varied across five architecture types three kernel size configurations three activation functions and three levels of weight sparsity. We demonstrate that a supervised model parsing network (MPN) can effectively extract concealed details of the VM from adversarial examples. We also validate the practicality of this approach by evaluating the effects of various factors on parsing performance such as different input formats and generalization to out-of-distribution cases. Furthermore we highlight the connection between model parsing and attack transferability by showing how the MPN can uncover VM attributes in transfer attacks.
Related Material