-
[pdf]
[supp]
[bibtex]@InProceedings{Wang_2026_CVPR, author = {Wang, Changshuo and Wang, Jiangming and Zhang, Ke-Yue and Yao, Taiping and Ding, Shouhong and Wang, Shunli and Yi, Ran and Ma, Lizhuang}, title = {Beyond [CLS] Token: Query-Driven Token-Level Forgery Purification for Generalizable Deepfake Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {42922-42931} }
Beyond [CLS] Token: Query-Driven Token-Level Forgery Purification for Generalizable Deepfake Detection
Abstract
We investigate state-of-the-art deepfake detectors that leverage ViT-based vision foundation models and discover that the [CLS] token suffers from the Pre-trained Information Bias (PIB), i.e., it tends to mainly focus on global semantics due to the knowledge dominated by pre-trained model parameters, while struggling to emphasize subtle local forgery cues. To overcome this limitation, one potential way is incorporating the token-level features to reform a detection-specific token. To this end, we propose Query-Driven Token-Level Forgery Purification (QTFP) framework to better capture local forgery traces without losing useful pre-trained prior. Specifically, we introduce randomly initialized, learnable query tokens independent of the backbone and prior knowledge, which effectively aggregate multi-patch evidence into a global token for detection. To make query tokens focus on meaningful regions, we propose a theoretical fake-likelihood contrastive learning loss, which employs a weighting strategy to highlight significant fake regions while diminishing real-like patch impact. Using SNR theory, we verify that the designed weight is both reliable and informative. To further maintain useful authentic information, a real-attention alignment constraint is applied to query tokens. These designs go beyond relying solely on the [CLS] token by jointly reorganizing real and fake information across all tokens, which successfully enhance detector robustness. Extensive experiments on diverse datasets demonstrate the effectiveness of our method.
Related Material

