-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Waghmare_2025_WACV, author = {Waghmare, Sagar M. and Wilber, Kimberly and Hawkey, Dave and Yang, Xuan and Wilson, Matthew and Debats, Stephanie and Nuengsigkapian, Cattalyya and Sharma, Astuti and Pandikow, Lars and Wang, Huisheng and Adam, Hartwig and Sirotenko, Mikhail}, title = {SANPO: A Scene Understanding Accessibility and Human Navigation Dataset}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7855-7864} }
SANPO: A Scene Understanding Accessibility and Human Navigation Dataset
Abstract
Vision is essential for human navigation. The World Health Organization (WHO) estimates that 43.3 million people were blind in 2020 and this number is projected to reach 61 million by 2050. Modern scene understanding models could empower these people by assisting them with navigation obstacle avoidance and visual recognition capabilities. The research community needs high quality datasets for both training and evaluation to build these systems. And while datasets for autonomous vehicles are abundant there is a critical gap in datasets tailored for outdoor human navigation. This gap poses a major obstacle to the development of computer vision based Assistive Technologies. To overcome this obstacle we present SANPO a large-scale egocentric video dataset designed for dense prediction in outdoor human navigation environments. SANPO contains 701 stereo videos of 30+ seconds captured in diverse real-world outdoor environments across four geographic locations in the USA. Every frame has a high resolution depth map and 112K frames were annotated with temporally consistent dense video panoptic segmentation labels. The dataset also includes 1961 high-quality synthetic videos with pixel accurate depth and panoptic segmentation annotations to balance the noisy real world annotations with the high precision synthetic annotations. SANPO is already publicly available and is being used by applications like Project Guideline to train mobile models that help low-vision users run independently. To preserve anonymization during peer review a link to the dataset will be provided upon acceptance.
Related Material