FPS-Bench: A Benchmark for High Frame-Rate Video Understanding

Rohan Choudhury, Jean-Sebastien Dandurand, Kai Qiu, Kshitij Madhav Bhat, Kartik Sharma, Liza Dahiya, Yizhou Zhao, Souraja Kundu, Chun-Hsien Lin, Kris M. Kitani, László A. Jeni; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 18598-18608

Abstract


Modern video-language models are typically trained on videos downsampled to low frames-per-second (FPS), and the most commonly used evaluation benchmarks are designed for low-FPS input as well. To address this shortcoming, we present FPS-Bench, a large video question-answering benchmark designed to evaluate VLMs' capabilities to understand video at high-frame rates. We introduce a new metric, the minimum frames-per-second (minFPS), which measures the minimum frame-rate required to solve a given question. While existing benchmarks require <1 minFPS, we rigorously curate more than 1000 questions from a diverse source of videos and manually verify minFPS for each example, leading to a benchmark that requires watching videos at on average 7 FPS to solve. Our evaluation of several state-of-the-art VLMs shows that they are severely lacking, achieving QA accuracy of 30% in the FPS-Bench multiple-choice task, while humans achieve 72% accuracy.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Choudhury_2026_CVPR, author = {Choudhury, Rohan and Dandurand, Jean-Sebastien and Qiu, Kai and Bhat, Kshitij Madhav and Sharma, Kartik and Dahiya, Liza and Zhao, Yizhou and Kundu, Souraja and Lin, Chun-Hsien and Kitani, Kris M. and Jeni, L\'aszl\'o A.}, title = {FPS-Bench: A Benchmark for High Frame-Rate Video Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {18598-18608} }