-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Nie_2026_CVPR, author = {Nie, Sen and Zhang, Jie and Yan, Jianxin and Shan, Shiguang and Chen, Xilin}, title = {V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {42257-42267} }
V-Attack: Targeting Disentangled Value Features for Controllable Adversarial Attacks on LVLMs
Abstract
Adversarial attacks have evolved from simply disrupting predictions on conventional task-specific models to the more complex goal of manipulating image semantics in Large Vision-Language Models (LVLMs). However, existing methods struggle with controllability and cannot precisely manipulate the semantics of specific concepts in an image. We attribute this limitation to semantic entanglement in the patch-token representations that adversarial attacks typically operate on: global context aggregated by self-attention in the vision encoder dominates patch features, making them unreliable for precise local semantic manipulation. Our systematic investigation reveals a key insight: value features, computed within the transformer attention block, provide much more precise handles for manipulation. We show that these value features suppress global-context channels, allowing them to retain high-entropy, disentangled local semantic information. Building on this discovery, we propose V-Attack, a novel method for precise local semantic attacks. V-Attack targets value features and introduces two core components: (1) a Self-Value Enhancement module to refine the intrinsic semantic richness of value features, and (2) a Text-Guided Value Manipulation module that uses text prompts to locate the source concept and optimize it toward a target concept. By bypassing entangled patch features, V-Attack achieves highly effective semantic control. Extensive experiments across diverse LVLMs, including LLaVA, InternVL, DeepSeek-VL, and GPT-4o, show that V-Attack improves attack success rate by an average of 36% over state-of-the-art methods, exposing critical vulnerabilities in modern vision-language understanding.
Related Material

