Towards Unified Human Parsing and Pose Estimation

Jian Dong, Qiang Chen, Xiaohui Shen, Jianchao Yang, Shuicheng Yan; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 843-850


We study the problem of human body configuration analysis, more specifically, human parsing and human pose estimation. These two tasks, i.e. identifying the semantic regions and body joints respectively over the human body image, are intrinsically highly correlated. However, previous works generally solve these two problems separately or iteratively. In this work, we propose a unified framework for simultaneous human parsing and pose estimation based on semantic parts. By utilizing Parselets and Mixture of Joint-Group Templates as the representations for these semantic parts, we seamlessly formulate the human parsing and pose estimation problem jointly within a unified framework via a tailored And-Or graph. A novel Grid Layout Feature is then designed to effectively capture the spatial co-occurrence/occlusion information between/within the Parselets and MJGTs. Thus the mutually complementary nature of these two tasks can be harnessed to boost the performance of each other.The resultant unified model can be solved using the structure learning framework in a principled way. Comprehensive evaluations on two benchmark datasets for both tasks demonstrate the effectiveness of the proposed framework when compared with the state-of-the-art methods.

Related Material

author = {Dong, Jian and Chen, Qiang and Shen, Xiaohui and Yang, Jianchao and Yan, Shuicheng},
title = {Towards Unified Human Parsing and Pose Estimation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2014}