Learning to Predict Gaze in Egocentric Video

Yin Li, Alireza Fathi, James M. Rehg; The IEEE International Conference on Computer Vision (ICCV), 2013, pp. 3216-3223


We present a model for gaze prediction in egocentric video by leveraging the implicit cues that exist in camera wearer's behaviors. Specifically, we compute the camera wearer's head motion and hand location from the video and combine them to estimate where the eyes look. We further model the dynamic behavior of the gaze, in particular fixations, as latent variables to improve the gaze prediction. Our gaze prediction results outperform the state-of-the-art algorithms by a large margin on publicly available egocentric vision datasets. In addition, we demonstrate that we get a significant performance boost in recognizing daily actions and segmenting foreground objects by plugging in our gaze predictions into state-of-the-art methods.

Related Material

author = {Li, Yin and Fathi, Alireza and Rehg, James M.},
title = {Learning to Predict Gaze in Egocentric Video},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}