Modeling 4D Human-Object Interactions for Event and Object Recognition

Ping Wei, Yibiao Zhao, Nanning Zheng, Song-Chun Zhu; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 3272-3279

Abstract


Recognizing the events and objects in the video sequence are two challenging tasks due to the complex temporal structures and the large appearance variations. In this paper, we propose a 4D human-object interaction model, where the two tasks jointly boost each other. Our human-object interaction is defined in 4D space: i) the cooccurrence and geometric constraints of human pose and object in 3D space; ii) the sub-events transition and objects coherence in 1D temporal dimension. We represent the structure of events, sub-events and objects in a hierarchical graph. For an input RGB-depth video, we design a dynamic programming beam search algorithm to: i) segment the video, ii) recognize the events, and iii) detect the objects simultaneously. For evaluation, we built a large-scale multiview 3D event dataset which contains 3815 video sequences and 383,036 RGBD frames captured by the Kinect cameras. The experiment results on this dataset show the effectiveness of our method.

Related Material


[pdf]
[bibtex]
@InProceedings{Wei_2013_ICCV,
author = {Wei, Ping and Zhao, Yibiao and Zheng, Nanning and Zhu, Song-Chun},
title = {Modeling 4D Human-Object Interactions for Event and Object Recognition},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}
}