Visibility guided Self-Supervised Occlusion-Resilient Human Pose Estimation

Dutta, Arindam; Bose, Sarosij; Kundu, Rohit; Ta, Calvin-Khang; Bachu, Saketh; Karydis, Konstantinos; Roy-Chowdhury, Amit K.

Arindam Dutta, Sarosij Bose, Rohit Kundu, Calvin-Khang Ta, Saketh Bachu, Konstantinos Karydis, Amit K. Roy-Chowdhury; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026, pp. 1054-1063

Abstract

Occlusion remains a significant challenge for existing human pose estimation algorithms, often resulting in inaccurate and anatomically implausible predictions. Although recent occlusion-robust methods report strong performance, they typically rely heavily on supervised learning and privileged information, such as multiview data or temporal sequences. Furthermore, these models often fail under domain changes. Domain-adaptive human pose estimation seeks to mitigate this issue; however, when occlusions are present in the target domain, a common occurrence in real-world applications, performance of these algorithms deteriorates significantly. To address these challenges, we propose VisOR, a novel Visibility guided Self-Supervised algorithm for Occlusion-Resilient Human Pose Estimation. VisOR achieves robustness to both domain shifts and occlusions by integrating contextual reasoning with iterative pseudo-label refinement. It mitigates the overfitting to noisy labels from occluded regions via a visibility-driven curriculum learning strategy, which progressively introduces the model to increasingly occluded training samples. Additionally, VisOR is regularized by a learned human pose prior that maintains anatomical plausibility throughout the adaptation process. Recognizing the scarcity of human pose datasets with realistic occlusions, we introduce BOW: Blended Occlusions in-the-Wild, a rigorously constructed context-aware synthetic benchmark designed to evaluate the occlusion resilience of human pose estimation algorithms. BOW offers a diverse range of context-aware occlusions across both indoor and outdoor environments, simulating real-world conditions. Through extensive experiments, we demonstrate that VisOR outperforms current state-of-the-art methods by 7% in challenging occluded human pose estimation benchmarks and provides a baseline performance on BOW, against existing algorithms.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Dutta_2026_WACV, author = {Dutta, Arindam and Bose, Sarosij and Kundu, Rohit and Ta, Calvin-Khang and Bachu, Saketh and Karydis, Konstantinos and Roy-Chowdhury, Amit K.}, title = {Visibility guided Self-Supervised Occlusion-Resilient Human Pose Estimation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {March}, year = {2026}, pages = {1054-1063} }