PreDet: Large-Scale Weakly Supervised Pre-Training for Detection

Vignesh Ramanathan, Rui Wang, Dhruv Mahajan; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 2865-2875


State-of-the-art object detection approaches typically rely on pre-trained classification models to achieve better performance and faster convergence. We hypothesize that classification pre-training strives to achieve translation invariance, and consequently ignores the localization aspect of the problem. We propose a new large-scale pre-training strategy for detection, where noisy class labels are available for all images, but not bounding-boxes. In this setting, we augment standard classification pre-training with a new detection-specific pretext task. Motivated by the noise-contrastive learning based self-supervised approaches, we design a task that forces bounding boxes with high-overlap to have similar representations in different views of an image, compared to non-overlapping boxes. We redesign Faster R-CNN modules to perform this task efficiently. Our experimental results show significant improvements over existing weakly-supervised and self-supervised pre-training approaches in both detection accuracy as well as fine-tuning speed.

Related Material

[pdf] [supp]
@InProceedings{Ramanathan_2021_ICCV, author = {Ramanathan, Vignesh and Wang, Rui and Mahajan, Dhruv}, title = {PreDet: Large-Scale Weakly Supervised Pre-Training for Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {2865-2875} }