DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior

Huang, Junjia; Yang, Binbin; Yan, Pengxiang; Liu, Jiyang; Xia, Bin; Wang, Zhao; Wang, Yitong; Lin, Liang; Li, Guanbin

Junjia Huang, Binbin Yang, Pengxiang Yan, Jiyang Liu, Bin Xia, Zhao Wang, Yitong Wang, Liang Lin, Guanbin Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 36736-36746

Abstract

Storyboard synthesis plays a crucial role in visual storytelling, aiming to generate coherent shot sequences that visually narrate cinematic events with consistent characters, scenes, and transitions. However, existing approaches are mostly adapted from text-to-image diffusion models, which struggle to maintain long-range temporal coherence, consistent character identities, and narrative flow across multiple shots. In this paper, we introduce DreamShot, a video generative model based storyboard framework that fully exploits powerful video diffusion priors for controllable multi-shot synthesis. DreamShot supports both Text-to-Shot and Reference-to-Shot generation, as well as story continuation conditioned on previous frames, enabling flexible and context-aware storyboard generation. By leveraging the spatial-temporal consistency inherent in video generative models, DreamShot produces visually and semantically coherent sequences with improved narrative fidelity and character continuity. Furthermore, DreamShot incorporates a multi-reference role conditioning module that accepts multiple character reference images and enforces identity alignment via a Role-Attention Consistency Loss, explicitly constraining attention between reference and generated roles. Extensive experiments demonstrate that DreamShot achieves superior scene coherence, role consistency, and generation efficiency compared to state-of-the-art text-to-image storyboard models, establishing a new direction toward controllable video model-driven visual storytelling.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Huang_2026_CVPR, author = {Huang, Junjia and Yang, Binbin and Yan, Pengxiang and Liu, Jiyang and Xia, Bin and Wang, Zhao and Wang, Yitong and Lin, Liang and Li, Guanbin}, title = {DreamShot: Personalized Storyboard Synthesis with Video Diffusion Prior}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {36736-36746} }