Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

Hosnieh Sattar, Andreas Bulling, Mario Fritz; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2740-2748

Abstract


Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism -- incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Sattar_2017_ICCV,
author = {Sattar, Hosnieh and Bulling, Andreas and Fritz, Mario},
title = {Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}