Beyond [CLS] Token: Query-Driven Token-Level Forgery Purification for Generalizable Deepfake Detection

Wang, Changshuo; Wang, Jiangming; Zhang, Ke-Yue; Yao, Taiping; Ding, Shouhong; Wang, Shunli; Yi, Ran; Ma, Lizhuang

Changshuo Wang, Jiangming Wang, Ke-Yue Zhang, Taiping Yao, Shouhong Ding, Shunli Wang, Ran Yi, Lizhuang Ma; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 42922-42931

Abstract

We investigate state-of-the-art deepfake detectors that leverage ViT-based vision foundation models and discover that the [CLS] token suffers from the Pre-trained Information Bias (PIB), i.e., it tends to mainly focus on global semantics due to the knowledge dominated by pre-trained model parameters, while struggling to emphasize subtle local forgery cues. To overcome this limitation, one potential way is incorporating the token-level features to reform a detection-specific token. To this end, we propose Query-Driven Token-Level Forgery Purification (QTFP) framework to better capture local forgery traces without losing useful pre-trained prior. Specifically, we introduce randomly initialized, learnable query tokens independent of the backbone and prior knowledge, which effectively aggregate multi-patch evidence into a global token for detection. To make query tokens focus on meaningful regions, we propose a theoretical fake-likelihood contrastive learning loss, which employs a weighting strategy to highlight significant fake regions while diminishing real-like patch impact. Using SNR theory, we verify that the designed weight is both reliable and informative. To further maintain useful authentic information, a real-attention alignment constraint is applied to query tokens. These designs go beyond relying solely on the [CLS] token by jointly reorganizing real and fake information across all tokens, which successfully enhance detector robustness. Extensive experiments on diverse datasets demonstrate the effectiveness of our method.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wang_2026_CVPR, author = {Wang, Changshuo and Wang, Jiangming and Zhang, Ke-Yue and Yao, Taiping and Ding, Shouhong and Wang, Shunli and Yi, Ran and Ma, Lizhuang}, title = {Beyond [CLS] Token: Query-Driven Token-Level Forgery Purification for Generalizable Deepfake Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {42922-42931} }