Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

Janika Deborah Gajo, Gerarld Paul Merales, Jerome Escarcha, Brenden Ashley Molina, Gian Nartea, Emmanuel G. Maminta, Juan Carlos Roldan, Rowel O. Atienza; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2390-2399

Abstract


We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Addressing a gap in retail-specific sim environments for embodied agent training, Sari Sandbox features over 250 interactive grocery items across three store configurations, controlled via an API. It supports both virtual reality (VR) for human interaction and a vision language model (VLM)-powered embodied agent. We also introduce SariBench, a dataset of annotated human demonstrations across varied task difficulties. Our sandbox enables embodied agents to navigate, inspect, and manipulate retail items, providing baselines against human performance. We conclude with benchmarks, performance analysis, and recommendations for enhancing realism and scalability. The source code can be accessed via https://github.com/upeee/sari-sandbox-env.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Gajo_2025_ICCV, author = {Gajo, Janika Deborah and Merales, Gerarld Paul and Escarcha, Jerome and Molina, Brenden Ashley and Nartea, Gian and Maminta, Emmanuel G. and Roldan, Juan Carlos and Atienza, Rowel O.}, title = {Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2390-2399} }