On the Content Bias in Frechet Video Distance

Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar, Jun-Yan Zhu, Jia-Bin Huang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7277-7288

Abstract


Frechet Video Distance (FVD) a prominent metric for evaluating video generation models is known to conflict with human perception occasionally. In this paper we aim to explore the extent of FVD's bias toward frame quality over temporal realism and identify its sources. We first quantify the FVD's sensitivity to the temporal axis by decoupling the frame and motion quality and find that the FVD only increases slightly with larger temporal corruption. We then analyze the generated videos and show that via careful sampling from a large set of generated videos that do not contain motions one can drastically decrease FVD without improving the temporal quality. Both studies suggest FVD's basis towards the quality of individual frames. We show that FVD with features extracted from the recent large-scale self-supervised video models is less biased toward image quality. Finally we revisit a few real-world examples to validate our hypothesis.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Ge_2024_CVPR, author = {Ge, Songwei and Mahapatra, Aniruddha and Parmar, Gaurav and Zhu, Jun-Yan and Huang, Jia-Bin}, title = {On the Content Bias in Frechet Video Distance}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {7277-7288} }