Learning Deep Representation With Large-Scale Attributes

Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1895-1903


Learning strong feature representations from large scale supervision has achieved remarkable success in computer vision as the emergence of deep learning techniques. It is driven by big visual data with rich annotations. This paper contributes a large-scale object attribute database (The dataset is available on www.ee.cuhk.edu.hk/ xgwang/ImageNetAttribute.html) that contains rich attribute annotations (over 300 attributes) for ~180k samples and 494 object classes. Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then we use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types. Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2.6% in mAP on the ILSVRC 2014 object detection dataset and 2.4% in mAP on PASCAL VOC 2007 object detection dataset. Such improvement is well generalized across datasets.

Related Material

author = {Ouyang, Wanli and Li, Hongyang and Zeng, Xingyu and Wang, Xiaogang},
title = {Learning Deep Representation With Large-Scale Attributes},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}