M2C: Concise Music Representation for 3D Dance Generation

Matthew Marchellus, In Kyu Park; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3126-3135

Abstract


Generating 3D dance motions that are synchronized with music is a difficult task, as it involves modeling the complex interplay between musical rhythms and human body movements. Most existing approaches focus on improving the dance generation network, often overlooking the importance of the music feature processing stage which plays a crucial role in dance motion generation. In this paper, we propose music codes, a better latent representation for music features using discrete variables. We present a comprehensive analysis of the music features and propose a different normalization procedure to address the scale imbalance issue within music features. We also introduce the Music-to-Codes (M2C) network, a VQ-VAE inspired network as a music code extractor to replace existing music feature processors. To evaluate the effectiveness of our approach, we combine M2C with Stochastic Motion GPT (SM-GPT), our modification of a recent SoTA dance generation method. Our extensive evaluation and ablation study demonstrates that our dance generation pipeline (using M2C and SM-GPT) significantly improves the dance generation result both qualitatively and quantitatively across all evaluation metrics. Our work opens up new possibilities for exploring the relationship between music and dance, contributing to more effective and efficient music-conditioned 3D dance generation.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Marchellus_2023_ICCV, author = {Marchellus, Matthew and Park, In Kyu}, title = {M2C: Concise Music Representation for 3D Dance Generation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3126-3135} }