-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Wu_2025_CVPR, author = {Wu, Weijia and Liu, Mingyu and Zhu, Zeyu and Xia, Xi and Feng, Haoen and Wang, Wen and Lin, Kevin Qinghong and Shen, Chunhua and Shou, Mike Zheng}, title = {MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {28984-28994} }
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Abstract
Recent advancements in video generation models, such as Stable Video Diffusion, have shown promising results, but these works primarily focus on short videos, often limited to a single scene and lacking a rich storyline. These models struggle with generating long videos that involve multiple scenes, coherent narratives, and consistent characters. Furthermore, there is currently no publicly accessible dataset specifically designed for analyzing, evaluating, and training models for long video generation. In this paper, we present MovieBench: A Hierarchical Movie-Level Dataset for Long Video Generation, which addresses these challenges by providing unique contributions: (1) character consistency across scenes, (2) long videos with rich and coherent storylines, and (3) multi-scene narratives. MovieBench features three distinct levels of annotation: the movie level, which provides a broad overview of the film; the scene level, offering a mid-level understanding of the narrative; and the shot level, which emphasizes specific moments with detailed descriptions.
Related Material