Rethinking Sampling for Music-Driven Long-Term Dance Generation

Tuong-Vy Truong-Thuy, Gia-Cat Bui-Le, Hai-Dang Nguyen, Trung-Nghia Le; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 2667-2683

Abstract


Generating dance sequences that synchronize with music while maintaining naturalness and realism is a challenging task. Existing methods often suffer from freezing phenomena or abrupt transitions. In this work, we introduce DanceFusion, a conditional diffusion model designed to address the complexities of creating long-term dance sequences. Our method employs a past and future-conditioned diffusion model, leveraging the attention mechanism to learn the dependencies among music, past, and future motions. We also propose a novel sampling method that completes the transitional motions between two dance segments by treating previous and upcoming motions as conditions. Additionally, we address abruptness in dance sequences by incorporating inpainting strategies into a part of the sampling process, thereby improving the smoothness and naturalness of motion generation. Experimental results demonstrate that DanceFusion outperforms state-of-the-art methods in generating high-quality and diverse dance motions. User study results further validate the effectiveness of our approach in generating long dance sequences, with participants consistently rating DanceFusion higher across all key metrics. Code and model are available at https://github.com/trgvy23/DanceFusion.

Related Material


[pdf]
[bibtex]
@InProceedings{Truong-Thuy_2024_ACCV, author = {Truong-Thuy, Tuong-Vy and Bui-Le, Gia-Cat and Nguyen, Hai-Dang and Le, Trung-Nghia}, title = {Rethinking Sampling for Music-Driven Long-Term Dance Generation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {2667-2683} }