Dance Style Transfer With Cross-Modal Transformer

Wenjie Yin, Hang Yin, Kim Baraka, Danica Kragic, Mårten Björkman; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 5058-5067

Abstract


We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yin_2023_WACV, author = {Yin, Wenjie and Yin, Hang and Baraka, Kim and Kragic, Danica and Bj\"orkman, M\r{a}rten}, title = {Dance Style Transfer With Cross-Modal Transformer}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {5058-5067} }