Learning Action Maps of Large Environments via First-Person Vision

Nicholas Rhinehart, Kris M. Kitani; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 580-588

Abstract


When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate functional understanding of large spaces by leveraging activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in both large scenes where people have behaved, as well as novel scenes where no behaviors are available. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe demonstrations, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras, and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our mathematical framework allows for the prediction of Action Maps in new environments. Additionally, we take a preliminary look at the breadth of applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Rhinehart_2016_CVPR,
author = {Rhinehart, Nicholas and Kitani, Kris M.},
title = {Learning Action Maps of Large Environments via First-Person Vision},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}