Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input

Trung-Hieu Hoang, Mona Zehni, Huy Phan, Duc Minh Vo, Minh N. Do; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 113-123

Abstract


Despite the promising performance of current 3D human pose estimation techniques understanding and enhancing their robustness on challenging in-the-wild videos remain an open problem. In this work we focus on building robust 2D-to-3D pose lifters. To this end we develop two benchmark datasets namely Human3.6M-C and HumanEva-I-C to examine the resilience of video-based 3D pose lifters to a wide range of common video corruptions including temporary occlusion motion blur and pixel-level noise. We demonstrate the poor generalization of state-of-the-art 3D pose lifters in the presence of corruption and establish two techniques to tackle this issue. First we introduce Temporal Additive Gaussian Noise (TAGN) as a simple yet effective 2D input pose data augmentation. Additionally to incorporate the confidence scores output by the 2D pose detectors we design a confidence-aware convolution (CA-Conv) block. Extensively tested on corrupted videos the proposed strategies consistently boost the robustness of 3D pose lifters and serve as new baselines for future research.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Hoang_2024_CVPR, author = {Hoang, Trung-Hieu and Zehni, Mona and Phan, Huy and Vo, Duc Minh and Do, Minh N.}, title = {Improving the Robustness of 3D Human Pose Estimation: A Benchmark and Learning from Noisy Input}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {113-123} }