Following Gaze in Video

Adria Recasens, Carl Vondrick, Aditya Khosla, Antonio Torralba; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1435-1443


Following the gaze of people inside videos is an important signal for understanding people and their actions. In this paper, we present an approach for following gaze in video by predicting where a person (in the video) is looking even when the object is in a different frame. We collect VideoGaze, a new dataset which we use as a benchmark to both train and evaluate models. Given one frame with a person in it, our model estimates a density for gaze location in every frame and the probability that the person is looking in that particular frame. A key aspect of our approach is an end-to-end model that jointly estimates: saliency, gaze pose, and geometric relationships between views while only using gaze as supervision. Visualizations suggest that the model learns to internally solve these intermediate tasks automatically without additional supervision. Experiments show that our approach follows gaze in video better than existing approaches, enabling a richer understanding of human activities in video.

Related Material

author = {Recasens, Adria and Vondrick, Carl and Khosla, Aditya and Torralba, Antonio},
title = {Following Gaze in Video},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}