GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

Chen, Guangyan; Cui, Te; Wang, Meiling; Yang, Chengcai; Hu, Mengxiao; Lu, Haoyang; Mu, Yao; Peng, Zicai; Zhou, Tianxing; Jiang, Xinran; Yang, Yi; Yue, Yufeng

Guangyan Chen, Te Cui, Meiling Wang, Chengcai Yang, Mengxiao Hu, Haoyang Lu, Yao Mu, Zicai Peng, Tianxing Zhou, Xinran Jiang, Yi Yang, Yufeng Yue; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1756-1768

Abstract

Learning from demonstration is a powerful method for robotic skill acquisition. However, the significant expense of collecting such action-labeled robot data presents a major bottleneck. Video data, a rich data source encompassing diverse behavioral and physical knowledge, emerges as a promising alternative. In this paper, we present GraphMimic, a novel paradigm that leverages video data via graph-to-graphs generative modeling, which pre-trains models to generate future graphs conditioned on the graph within a video frame. Specifically, GraphMimic abstracts video frames into object and visual action vertices, and constructs graphs for state representations. The graph generative modeling network then effectively models internal structures and spatial relationships within the constructed graphs, aiming to generate future graphs. The generated graphs serve as conditions for the control policy, mapping to robot actions. Our concise approach captures important spatial relations and enhances future graph generation accuracy, enabling the acquisition of robust policies from limited action-labeled data. Furthermore, the transferable graph representations facilitate the effective learning of manipulation skills from cross-embodiment videos. Our experiments exhibit that GraphMimic achieves superior performance using merely 20% action-labeled data. Moreover, our method outperforms the state-of-the-art method by over 17% and 23% in simulation and real-world experiments, and delivers improvements of over 33% in cross-embodiment transfer experiments.

Related Material

[pdf]

[bibtex]

@InProceedings{Chen_2025_CVPR, author = {Chen, Guangyan and Cui, Te and Wang, Meiling and Yang, Chengcai and Hu, Mengxiao and Lu, Haoyang and Mu, Yao and Peng, Zicai and Zhou, Tianxing and Jiang, Xinran and Yang, Yi and Yue, Yufeng}, title = {GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, pages = {1756-1768} }