Visual Attention in Multi-Label Image Classification

Yan Luo, Ming Jiang, Qi Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0


One of the most significant challenges in multi-label image classification is the learning of representative features that capture the rich semantic information in a cluttered scene. As an information bottleneck, the visual attention mechanism allows humans to selectively process the most important visual input, enabling rapid and accurate scene understanding. In this work, we study the correlation between visual attention and multi-label image classification, and exploit an extra attention pathway for improving multi-label image classification performance. Specifically, we propose a dual-stream neural network that consists of two sub-networks: one is a conventional classification model and the other is a saliency prediction model trained with human fixations. Features computed with the two sub-networks are trained separately and then fine-tuned jointly using a multiple cross entropy loss. Experimental results show that the additional saliency sub-network improves multi-label image classification performance on the MS COCO dataset. The improvement is consistent across various levels of scene clutterness.

Related Material

author = {Luo, Yan and Jiang, Ming and Zhao, Qi},
title = {Visual Attention in Multi-Label Image Classification},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}