MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence

Tai D Nguyen, Matthew C Stamm; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 2207-2217

Abstract


While videos can be falsified in many different ways most existing forensic networks are specialized to detect only a single manipulation type (e.g. deepfake inpainting). This poses a significant issue as the manipulation used to falsify a video is not known a priori. To address this problem we propose MVFNet - a multipurpose video forensics network capable of detecting multiple types of manipulations including inpainting deepfakes splicing and editing. Our network does this by extracting and jointly analyzing a broad set of forensic feature modalities that capture both spatial and temporal anomalies in falsified videos. To reliably detect and localize fake content of all shapes and sizes our network employs a novel Multi-Scale Hierarchical Transformer module to identify forensic inconsistencies across multiple spatial scales. Experimental results show that our network obtains state-of-the-art performance in general scenarios where multiple different manipulations are possible and rivals specialized detectors in targeted scenarios.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Nguyen_2025_WACV, author = {Nguyen, Tai D and Stamm, Matthew C}, title = {MVFNet: Multipurpose Video Forensics Network using Multiple Forms of Forensic Evidence}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {2207-2217} }