Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization

Zhe Xu, Shaoli Huang, Ya Zhang, Dacheng Tao; The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 2524-2532


We propose a new method for fine-grained object recognition that employs part-level annotations and deep convolutional neural networks (CNNs) in a unified framework. Although both schemes have been widely used to boost recognition performance, due to the difficulty in acquiring detailed part annotations, strongly supervised fine-grained datasets are usually too small to keep pace with the rapid evolution of CNN architectures. In this paper, we solve this problem by exploiting inexhaustible web data. The proposed method improves classification accuracy in two ways: more discriminative CNN feature representations are generated using a training set augmented by collecting a large number of part patches from weakly supervised web images; and more robust object classifiers are learned using a multi-instance learning algorithm jointly on the strong and weak datasets. Despite its simplicity, the proposed method delivers a remarkable performance improvement on the CUB200-2011 dataset compared to baseline part-based R-CNN methods, and achieves the highest accuracy on this dataset even in the absence of test image annotations.

Related Material

author = {Xu, Zhe and Huang, Shaoli and Zhang, Ya and Tao, Dacheng},
title = {Augmenting Strong Supervision Using Web Data for Fine-Grained Categorization},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}