-
[pdf]
[supp]
[bibtex]@InProceedings{Choudhury_2026_CVPR, author = {Choudhury, Rohan and Dandurand, Jean-Sebastien and Qiu, Kai and Bhat, Kshitij Madhav and Sharma, Kartik and Dahiya, Liza and Zhao, Yizhou and Kundu, Souraja and Lin, Chun-Hsien and Kitani, Kris M. and Jeni, L\'aszl\'o A.}, title = {FPS-Bench: A Benchmark for High Frame-Rate Video Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {18598-18608} }
FPS-Bench: A Benchmark for High Frame-Rate Video Understanding
Abstract
Modern video-language models are typically trained on videos downsampled to low frames-per-second (FPS), and the most commonly used evaluation benchmarks are designed for low-FPS input as well. To address this shortcoming, we present FPS-Bench, a large video question-answering benchmark designed to evaluate VLMs' capabilities to understand video at high-frame rates. We introduce a new metric, the minimum frames-per-second (minFPS), which measures the minimum frame-rate required to solve a given question. While existing benchmarks require <1 minFPS, we rigorously curate more than 1000 questions from a diverse source of videos and manually verify minFPS for each example, leading to a benchmark that requires watching videos at on average 7 FPS to solve. Our evaluation of several state-of-the-art VLMs shows that they are severely lacking, achieving QA accuracy of 30% in the FPS-Bench multiple-choice task, while humans achieve 72% accuracy.
Related Material

