Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model

Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, Jun-Cheng Chen; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 22995-23005

Abstract


The current deep generative models have enabled the creation of synthetic facial images with remarkable photorealism, raising significant societal concerns over their potential misuse. Despite rapid advancements in the field of deepfake detection, developing an efficient and effective approach for the generalized deepfake detection of unseen forgery samples remains challenging. To address this challenge, we leverage the rich semantic priors of foundation models and propose a novel side-network-based decoder that extracts spatial and temporal cues using the CLIP image encoder for generalized video-based Deepfake detection. Additionally, we introduce Facial Component Guidance (FCG) to enhance spatial learning generalizability by encouraging the model to focus on key facial regions. By leveraging the generic features of a vision-language foundation model, our approach demonstrates promising generalizability on challenging Deepfake datasets while also exhibiting superiority in training data efficiency, parameter efficiency, and model robustness. The source code is available at: https://github.com/aiiu-lab/DFD-FCG.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Han_2025_CVPR, author = {Han, Yue-Hua and Huang, Tai-Ming and Hua, Kai-Lung and Chen, Jun-Cheng}, title = {Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {22995-23005} }