Jointly Discriminating and Frequent Visual Representation Mining

Qiannan Wang, Ying Zhou, Zhaoyan Zhu, Xuefeng Liang, Yu Gu; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020


Discovering visual representation in an image category is a challenging issue, because the visual representation should not only be discriminating but also frequently appears in these images. Previous studies have proposed many solutions, but they all separately optimized the discrimination and frequency, which makes the solutions sub-optimal. To address this issue, we propose a method to discover the jointly discriminating and frequent visual representation, named as JDFR. To ensure discrimination, JDFR employs a classification task with cross-entropy loss. To achieve frequency, JDFR uses triplet loss to optimize within-class and between-class distance, then mines frequent visual representations in feature space. Moreover, we propose an attention module to locate the representative region in the image. Extensive experiments on four benchmark datasets (i.e. CIFAR10, CIFAR100-20, VOC2012-10 and Travel) show that the discovered visual representations have better discrimination and frequency than ones mined from five state-of-the-art methods with average improvements of 7.51% on accuracy and 1.88% on frequency.

Related Material

[pdf] [supp]
@InProceedings{Wang_2020_ACCV, author = {Wang, Qiannan and Zhou, Ying and Zhu, Zhaoyan and Liang, Xuefeng and Gu, Yu}, title = {Jointly Discriminating and Frequent Visual Representation Mining}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }