Visual Saliency Based on Multiscale Deep Features

Guanbin Li, Yizhou Yu; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5455-5463


Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks(CNN), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for extracting features at three different scales. Our learned saliency model is capable of achieving state-of-the-art performance on all public benchmarks. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, we point out that aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single region decomposition. To promote further research and evaluation of visual saliency models, we also construct a large database of 4447 challenging images and their pixelwise saliency annotation. Experimental results demonstrate that our proposed method significantly outperforms all existing saliency estimation techniques, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset, and lowering the mean absolute error by 5.7% and 35.1% respectively on the same two datasets.

Related Material

author = {Li, Guanbin and Yu, Yizhou},
title = {Visual Saliency Based on Multiscale Deep Features},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2015}