Motion Adaptive Pose Estimation From Compressed Videos

Zhipeng Fan, Jun Liu, Yao Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11719-11728

Abstract


Human pose estimation from videos has many real-world applications. Existing methods focus on applying models with a uniform computation profile on fully de- coded frames, ignoring the freely available motion signals and motion-compensation residuals from the compressed stream. A novel model, called Motion Adaptive Pose Net is proposed to exploit the compressed streams to efficiently decode pose sequences from videos. The model incorporates a Motion Compensated ConvLSTM to propagate the spatially aligned features, along with an adaptive gate to dynamically determine if the computationally expensive features should be extracted from fully decoded frames to compensate the motion-warped features, solely based on the residual errors. Leveraging the informative yet readily available signals from compressed streams, we propagate the latent features through our Motion Adaptive Pose Net efficiently. Our model outperforms the state-of-the-art models in pose- estimation accuracy on two widely used datasets with only around half of the computation complexity.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Fan_2021_ICCV, author = {Fan, Zhipeng and Liu, Jun and Wang, Yao}, title = {Motion Adaptive Pose Estimation From Compressed Videos}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {11719-11728} }