-
[pdf]
[bibtex]@InProceedings{Ye_2024_CVPR, author = {Ye, Zilyu and Liu, Jinxiu and Cao, JinJin and Chen, Zhiyang and Xuan, Ziwei and Zhou, Mingyuan and Liu, Qi and Qi, Guo-Jun}, title = {OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven Visual Storytelling}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7953-7962} }
OpenStory: A Large-Scale Open-Domain Dataset for Subject-Driven Visual Storytelling
Abstract
Recently the advancement and evolution of generative AI have been highly compelling. In this paper we present OpenStory a large-scale dataset tailored for training subject-focused story visualization models to generate coherent and contextually relevant visual narratives. Addressing the challenges of maintaining subject continuity across frames and capturing compelling narratives We propose an innovative pipeline that automates the extraction of keyframes from open-domain videos. It ingeniously employs vision-language models to generate descriptive captions which are then refined by a large language model to ensure narrative flow and coherence. Furthermore advanced subject masking techniques are applied to isolate and segment the primary subjects. Derived from diverse video sources including YouTube and existing datasets OpenStory offers a comprehensive open-domain resource surpassing prior datasets confined to specific scenarios. With automated captioning instead of manual annotation high-resolution imagery optimized for subject count per frame and extensive frame sequences ensuring consistent subjects for temporal modeling OpenStory establishes itself as an invaluable benchmark. It facilitates advancements in subject-focused story visualization enabling the training of models capable of comprehending and generating intricate multi-modal narratives from extensive visual and textual inputs.
Related Material