-
[pdf]
[supp]
[bibtex]@InProceedings{Zhang_2024_CVPR, author = {Zhang, Juze and Zhang, Jingyan and Song, Zining and Shi, Zhanhe and Zhao, Chengfeng and Shi, Ye and Yu, Jingyi and Xu, Lan and Wang, Jingya}, title = {HOI-M{\textasciicircum}3: Capture Multiple Humans and Objects Interaction within Contextual Environment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {516-526} }
HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment
Abstract
Humans naturally interact with both others and the surrounding multiple objects engaging in various social activities. However recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects due to fundamental data scarcity. In this paper we introduce HOI-M^3 a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M^3 dataset we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M^3 dataset corresponding codes and pre-trained models will be disseminated to the community for future research.
Related Material