Seeing the Unseen: Predicting the First-Person Camera Wearer's Location and Pose in Third-Person Scenes

Yangming Wen, Krishna Kumar Singh, Markham Anderson, Wei-Pang Jan, Yong Jae Lee; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3446-3455

Abstract


Our goal is to predict the camera wearer's location and pose in his/her environment based on what's captured by the camera wearer's first-person wearable camera. Toward this goal, we first collect a new dataset in which the camera wearer performs various activities (e.g., opening a fridge, reading a book) in different scenes with time-synchronized first-person and stationary third-person cameras. We then propose a novel deep network architecture, which takes as input the first-person video frames and empty third-person scene image (without the camera wearer) to predict the location and pose of the camera wearer. We explore and compare our approach with several intuitive baselines and show initial promising results on this novel, challenging problem.

Related Material


[pdf]
[bibtex]
@InProceedings{Wen_2021_ICCV, author = {Wen, Yangming and Singh, Krishna Kumar and Anderson, Markham and Jan, Wei-Pang and Lee, Yong Jae}, title = {Seeing the Unseen: Predicting the First-Person Camera Wearer's Location and Pose in Third-Person Scenes}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {3446-3455} }