DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 7892-7901

Abstract


Choreographers determine what the dances look like while cameramen determine the final presentation of dances. Recently various methods and datasets have showcased the feasibility of dance synthesis. However camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus we present DCM a new multi-modal 3D dataset which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community covering 4 music genres. With this dataset we uncover that dance camera movement is multifaceted and human-centric and possesses multiple influencing factors making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties we propose DanceCamera3D a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy. For evaluation we devise new metrics measuring camera movement quality diversity and dancer fidelity. Utilizing these metrics we conduct extensive experiments on our DCM dataset providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model. Code and video demos are available at https://github.com/ Carmenw1203/DanceCamera3D-Official.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Zixuan and Jia, Jia and Sun, Shikun and Wu, Haozhe and Han, Rong and Li, Zhenyu and Tang, Di and Zhou, Jiaqing and Luo, Jiebo}, title = {DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {7892-7901} }