HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment

Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 516-526

Abstract


Humans naturally interact with both others and the surrounding multiple objects engaging in various social activities. However recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects due to fundamental data scarcity. In this paper we introduce HOI-M^3 a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M^3 dataset we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M^3 dataset corresponding codes and pre-trained models will be disseminated to the community for future research.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhang_2024_CVPR, author = {Zhang, Juze and Zhang, Jingyan and Song, Zining and Shi, Zhanhe and Zhao, Chengfeng and Shi, Ye and Yu, Jingyi and Xu, Lan and Wang, Jingya}, title = {HOI-M{\textasciicircum}3: Capture Multiple Humans and Objects Interaction within Contextual Environment}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {516-526} }