End-To-End People Detection in Crowded Scenes

Russell Stewart, Mykhaylo Andriluka, Andrew Y. Ng; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2325-2333

Abstract


Current people detectors operate either by scanning an image in a sliding window fashion or by classifying a discrete set of proposals. We propose a model that is based on decoding an image into a set of people detections. Our system takes an image as input and directly outputs a set of distinct detection hypotheses. Because we generate predictions jointly, common post-processing steps such as non-maximum suppression are unnecessary. We use a recurrent LSTM layer for sequence generation and train our model end-to-end with a new loss function that operates on sets of detections. We demonstrate the effectiveness of our approach on the challenging task of detecting people in crowded scenes

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Stewart_2016_CVPR,
author = {Stewart, Russell and Andriluka, Mykhaylo and Ng, Andrew Y.},
title = {End-To-End People Detection in Crowded Scenes},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}