GRS: Generating Robotic Simulation Tasks from Real-World Images

Alex Zook, Fan-Yun Sun, Josef Spjut, Valts Blukis, Stan Birchfield, Jonathan Tremblay; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 594-603

Abstract


We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zook_2025_CVPR, author = {Zook, Alex and Sun, Fan-Yun and Spjut, Josef and Blukis, Valts and Birchfield, Stan and Tremblay, Jonathan}, title = {GRS: Generating Robotic Simulation Tasks from Real-World Images}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {594-603} }