Pose Transformers (POTR): Human Motion Prediction With Non-Autoregressive Transformers

Angel Martínez-González, Michael Villamizar, Jean-Marc Odobez; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2276-2284

Abstract


We propose to leverage Transformer architectures for non-autoregressive human motion prediction. Our approach decodes elements in parallel from a query sequence, instead of conditioning on previous predictions such as in state-of-the-art RNN-based approaches. In such a way our approach is less computational intensive and potentially avoids error accumulation to long term elements in the sequence. In that context, our contributions are fourfold: (i) we frame human motion prediction as a sequence-to-sequence problem and propose a non-autoregressive Transformer to infer the sequences of poses in parallel; (ii) we propose to decode sequences of 3D poses from a query sequence generated in advance with elements from the input sequence; (iii) we propose to perform skeleton-based activity classification from the encoder memory, in the hope that identifying the activity can improve predictions; (iv) we show that despite its simplicity, our approach achieves competitive results in two public datasets, although surprisingly more for short term predictions rather than for long term ones.

Related Material


[pdf]
[bibtex]
@InProceedings{Martinez-Gonzalez_2021_ICCV, author = {Mart{\'\i}nez-Gonz\'alez, Angel and Villamizar, Michael and Odobez, Jean-Marc}, title = {Pose Transformers (POTR): Human Motion Prediction With Non-Autoregressive Transformers}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2276-2284} }