Cascaded Pyramid Network for Multi-Person Pose Estimation

Chen, Yilun; Wang, Zhicheng; Peng, Yuxiang; Zhang, Zhiqiang; Yu, Gang; Sun, Jian

Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7103-7112

Abstract

The topic of multi-person pose estimation has beenlargely improved recently, especially with the developmentof convolutional neural network. However, there still exista lot of challenging cases, such as occluded keypoints, in-visible keypoints and complex background, which cannot bewell addressed. In this paper, we present a novel networkstructure called Cascaded Pyramid Network (CPN) whichtargets to relieve the problem from these “hard” keypoints.More specifically, our algorithm includes two stages: Glob-alNet and RefineNet. GlobalNet is a feature pyramid net-work which can successfully localize the “simple” key-points like eyes and hands but may fail to precisely rec-ognize the occluded or invisible keypoints. Our RefineNettries explicitly handling the “hard” keypoints by integrat-ing all levels of feature representations from the Global-Net together with an online hard keypoint mining loss. Ingeneral, to address the multi-person pose estimation prob-lem, a top-down pipeline is adopted to first generate a setof human bounding boxes based on a detector, followed byour CPN for keypoint localization in each human boundingbox. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with averageprecision at 73.0 on the COCO test-dev dataset and 72.1 onthe COCO test-challenge dataset, which is a 19% relativeimprovement compared with 60.5 from the COCO 2016 key-point challenge. Code and the detection results for personused will be publicly available for further research.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Chen_2018_CVPR,
author = {Chen, Yilun and Wang, Zhicheng and Peng, Yuxiang and Zhang, Zhiqiang and Yu, Gang and Sun, Jian},
title = {Cascaded Pyramid Network for Multi-Person Pose Estimation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}