Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association

Liang Yang, Hao Jiang, Zhouyuan Huo, Jizhong Xiao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0

Abstract


In a crowded and cluttered environment, identifying a particular person is a challenging problem. Current identification approaches are not able to handle the dynamic environment. In this paper, we tackle the problem of identifying and tracking a person of interest in the crowded environment using egocentric and third person view videos. We propose a novel method (Visual-GPS) to identify, track, and localize the person, who is capturing the egocentric video, using joint analysis of imagery from both videos. The output of our method is the bounding box of the target person detected in each frame of the third person view and the 3D metric trajectory. At glance, the views of the two cameras are quite different. This paper illustrates an insight into how they are correlated. Our proposed method uses several difference clues. In addition to using RGB images, we take advantage of both the body motion and action features to correlate the two views. We can track and localize the person by finding the most "correlated" individual in the third view. Furthermore, the target person's 3D trajectory is recovered based on the mapping of the 2d-3D body joints. Our experiment confirms the effectiveness of ETVIT network and shows 18.32 % improvement in detection accuracy against the baseline methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Yang_2019_CVPR_Workshops,
author = {Yang, Liang and Jiang, Hao and Huo, Zhouyuan and Xiao, Jizhong},
title = {Visual-GPS: Ego-Downward and Ambient Video Based Person Location Association},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}