Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition

Heliang Zheng, Jianlong Fu, Tao Mei, Jiebo Luo; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5209-5217

Abstract


Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other. MA-CNN consists of convolution, channel grouping and part classification sub-networks. The channel grouping network takes as input feature channels from convolutional layers, and generates multiple parts by clustering, weighting and pooling from spatially-correlated channels. The part classification network further classifies an image by each individual part, through which more discriminative fine-grained features can be learned. Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way. MA-CNN does not need bounding box/part annotation and can be trained end-to-end. We incorporate the learned parts from MA-CNN with part-CNN for recognition, and show the best performances on three challenging published fine-grained datasets, e.g., CUB-Birds, FGVC-Aircraft and Stanford-Cars.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Zheng_2017_ICCV,
author = {Zheng, Heliang and Fu, Jianlong and Mei, Tao and Luo, Jiebo},
title = {Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}