Mimicking Very Efficient Network for Object Detection

Quanquan Li, Shengying Jin, Junjie Yan; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6356-6364

Abstract


Current CNN based object detectors need initialization from pre-trained ImageNet classification models, which are usually time-consuming. In this paper, we present a fully convolutional feature mimic framework to train very efficient CNN based detectors, which do not need ImageNet pre-training and achieve competitive performance as the large and slow models. We add supervision from high-level features of the large networks in training to help the small network better learn object representation. More specifically, we conduct a mimic method for the features sampled from the entire feature map and use a transform layer to map features from the small network onto the same dimension of the large network. In training the small network, we optimize the similarity between features sampled from the same region on the feature maps of both networks. Extensive experiments are conducted on pedestrian and common object detection tasks using VGG, Inception and ResNet. On both Caltech and Pascal VOC, we show that the modified 2.5x accelerated Inception network achieves competitive performance as the full Inception Network. Our faster model runs at 80 FPS for a 1000x1500 large input with only a minor degradation of performance on Caltech.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Li_2017_CVPR,
author = {Li, Quanquan and Jin, Shengying and Yan, Junjie},
title = {Mimicking Very Efficient Network for Object Detection},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}