360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

Qian Wang, Weiqi Li, Chong Mou, Xinhua Cheng, Jian Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 6913-6923

Abstract


Panorama video recently attracts more interest in both study and application courtesy of its immersive experience. Due to the expensive cost of capturing 360-degree panoramic videos generating desirable panorama videos by prompts is urgently required. Lately the emerging text-to-video (T2V) diffusion methods demonstrate notable effectiveness in standard video generation. However due to the significant gap in content and motion patterns between panoramic and standard videos these methods encounter challenges in yielding satisfactory 360-degree panoramic videos. In this paper we propose a pipeline named 360-Degree Video Diffusion model (360DVD) for generating 360-degree panoramic videos based on the given prompts and motion conditions. Specifically we introduce a lightweight 360-Adapter accompanied by 360 Enhancement Techniques to transform pre-trained T2V models for panorama video generation. We further propose a new panorama dataset named WEB360 consisting of panoramic video-text pairs for training 360DVD addressing the absence of captioned panoramic video datasets. Extensive experiments demonstrate the superiority and effectiveness of 360DVD for panorama video generation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Qian and Li, Weiqi and Mou, Chong and Cheng, Xinhua and Zhang, Jian}, title = {360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {6913-6923} }