Audio to Body Dynamics

Eli Shlizerman, Lucio Dery, Hayden Schoen, Ira Kemelmacher-Shlizerman; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7574-7583

Abstract


We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Notably, it's not clear if body movement can be predicted from music at all and our aim in this work is to explore this possibility. In this paper, we present the first result that shows that natural body dynamics can be predicted. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.

Related Material


[pdf] [arXiv] [video]
[bibtex]
@InProceedings{Shlizerman_2018_CVPR,
author = {Shlizerman, Eli and Dery, Lucio and Schoen, Hayden and Kemelmacher-Shlizerman, Ira},
title = {Audio to Body Dynamics},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}