Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos

Yifei Huang, Minjie Cai, Hiroshi Kera, Ryo Yonetani, Keita Higuchi, Yoichi Sato; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2313-2321

Abstract


This work aims to develop a computer-vision technique for understanding objects jointly attended by a group of people during social interactions. As a key tool to discover such objects of joint attention, we rely on a collection of wearable eye-tracking cameras that provide a first-person video of interaction scenes and points-of-gaze data of interacting parties. Technically, we propose a hierarchical conditional random field-based model that can 1) localize events of joint attention temporally and 2) segment objects of joint attention spatially. We show that by alternating these two procedures, objects of joint attention can be discovered reliably even from cluttered scenes and noisy points-of-gaze data. Experimental results demonstrate that our approach outperforms several state-of-the-art methods for co-segmentation and joint attention discovery.

Related Material


[pdf]
[bibtex]
@InProceedings{Huang_2017_ICCV,
author = {Huang, Yifei and Cai, Minjie and Kera, Hiroshi and Yonetani, Ryo and Higuchi, Keita and Sato, Yoichi},
title = {Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}