Multi-stage Contextual Deep Learning for Pedestrian Detection

Xingyu Zeng, Wanli Ouyang, Xiaogang Wang; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 121-128


Cascaded classifiers 1 have been widely used in pedestrian detection and achieved great success. These classifiers are trained sequentially without joint optimization. In this paper, we propose a new deep model that can jointly train multi-stage classifiers through several stages of backpropagation. It keeps the score map output by a classifier within a local region and uses it as contextual information to support the decision at the next stage. Through a specific design of the training strategy, this deep architecture is able to simulate the cascaded classifiers by mining hard samples to train the network stage-by-stage. Each classifier handles samples at a different difficulty level. Unsupervised pre-training and specifically designed stage-wise supervised training are used to regularize the optimization problem. Both theoretical analysis and experimental results show that the training strategy helps to avoid overfitting. Experimental results on three datasets (Caltech, ETH and TUD-Brussels) show that our approach outperforms the state-of-the-art approaches.

Related Material

author = {Zeng, Xingyu and Ouyang, Wanli and Wang, Xiaogang},
title = {Multi-stage Contextual Deep Learning for Pedestrian Detection},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}