Fisher Encoded Convolutional Bag-Of-Windows for Efficient Image Retrieval and Social Image Tagging

Tiberio Uricchio, Marco Bertini, Lorenzo Seidenari, Alberto Del Bimbo; Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, pp. 9-15

Abstract


In this paper we present an efficient and accurate method to aggregate a set of Deep Convolutional Neural Network (CNN) responses, extracted from a set of image windows. CNN features are usually computed on the whole frame or with a dense multi scale approach. There is evidence that using multiple windows yields a better image representation nonetheless it is still not clear how windows should be sampled and how CNN responses should be aggregated. Instead of sampling the image densely in scale and space we show that selecting a few hundred windows is enough to obtain an effective image signature. We show how to use Fisher Vectors and PCA to obtain a short and highly descriptive signature that can be used effectively for image retrieval. We test our method on two relevant computer vision tasks: image retrieval and image tagging. We report state-of-the art results for both tasks on three standard datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Uricchio_2015_ICCV_Workshops,
author = {Uricchio, Tiberio and Bertini, Marco and Seidenari, Lorenzo and Del Bimbo, Alberto},
title = {Fisher Encoded Convolutional Bag-Of-Windows for Efficient Image Retrieval and Social Image Tagging},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {December},
year = {2015}
}