Integrating Grammar and Segmentation for Human Pose Estimation

Brandon Rothrock, Seyoung Park, Song-Chun Zhu; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3214-3221

Abstract


In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowing parts or collections of parts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

Related Material


[pdf]
[bibtex]
@InProceedings{Rothrock_2013_CVPR,
author = {Rothrock, Brandon and Park, Seyoung and Zhu, Song-Chun},
title = {Integrating Grammar and Segmentation for Human Pose Estimation},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2013}
}