Shift Equivariant Pose Network

Pengxiao Wang, Tzu-Heng Lin, Chunyu Wang, Yizhou Wang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 192-201

Abstract


Human pose estimation has been greatly advanced in recent years. However even the best-performing models are not shift equivariant. In particular a small change in input images often results in drastic alterations in output which are problematic especially in video applications. The prevalence of top-down approaches which typically rely on a non-equivariant object detector in the first stage exacerbates this issue. In this paper we first demonstrate that the biased keypoint representation and the non-equivariant network components are the two main obstacles to shift equivariant pose estimation. To address the limitation we propose an unbiased decoding method and redesign the necessary network components (e.g. APS-ResBlock SSP). Extensive experiments show that our method not only produces much more stable results with shifting input but also achieves better metrics with the ability to tolerate inaccurate detector output from the first stage. To our knowledge this is the first work to address the problem of shift equivariance in the field of pose estimation. Our method could be easily applied to existing CNN-based pose estimation networks.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Wang_2025_WACV, author = {Wang, Pengxiao and Lin, Tzu-Heng and Wang, Chunyu and Wang, Yizhou}, title = {Shift Equivariant Pose Network}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {192-201} }