FVBench: Benchmarking Deepfake Video Detection Capability of Large Multimodal Models

Jiarui Wang, Huiyu Duan, Juntong Wang, Xiongkuo Min; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 4425-4437

Abstract


As generative models rapidly evolve, the realism of AI-generated videos has reached new levels, posing significant challenges for detecting the authenticity of videos. Existing deepfake detection techniques generally rely on training datasets with limited generation methods and content diversity, which limits their generalization ability on more realistic content, particularly that produced by the latest generative models. Recently, large multimodal models (LMMs) have demonstrated remarkable zero-shot performance across a variety of vision tasks. Yet, their ability to discern deepfake videos remains largely untested. To this end, we propose **FVBench**, a comprehensive deep\underline f ake \underline v ideo \underline bench mark designed to advance video deepfake detection. It includes: (i) extensive content diversity, with over 120K videos covering real, AI-edited, and fully AI-generated categories, (ii) comprehensive model coverage, with fake videos generated and edited by 42 of the state-of-the-art video synthesis and editing models, and (iii) deepfake video detection benchmark for LMMs, which is a comprehensive benchmark for exploring the deepfake video detection capabilities of LMMs. The dataset and code are released at https://github.com/IntMeGroup/FVBench.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2026_CVPR, author = {Wang, Jiarui and Duan, Huiyu and Wang, Juntong and Min, Xiongkuo}, title = {FVBench: Benchmarking Deepfake Video Detection Capability of Large Multimodal Models}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {4425-4437} }