DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach

Dayi Tan, Hansheng Chen, Wei Tian, Lu Xiong; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2230-2239

Abstract


This paper presents the DiffusionRegPose a novel approach to multi-person pose estimation that converts a one-stage end-to-end keypoint regression model into a diffusion-based sampling process. Existing one-stage deterministic regression methods though efficient are often prone to missed or false detections in crowded or occluded scenes due to their inability to reason pose ambiguity. To address these challenges we handle ambiguous poses in a generative fashion i.e. sampling from the image-conditioned pose distributions characterized by a diffusion probabilistic model. Specifically with initial pose tokens extracted from the image noisy pose candidates are progressively refined by interacting with the initial tokens via attention layers. Extensive evaluations on the COCO and CrowdPose datasets show that DiffusionRegPose clearly improves the pose accuracy in crowded scenarios as evidenced by a notable 4.0 AP increase in the AP_H metric on the CrowdPose dataset. This demonstrates the model's potential for robust and precise human pose estimation in real-world applications.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Tan_2024_CVPR, author = {Tan, Dayi and Chen, Hansheng and Tian, Wei and Xiong, Lu}, title = {DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {2230-2239} }