Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding

Jay Bhanushali, Manivannan Muniyandi, Praneeth Chakravarthula; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1290-1300

Abstract


We present a cross-domain inference technique that learns from synthetic data to estimate depth and normals for in-the-wild omnidirectional 3D scenes encountered in real-world uncontrolled settings. To this end we introduce UBotNet an architecture that combines UNet and Bottleneck Transformer elements to predict consistent scene normals and depth. We also introduce the OmniHorizon synthetic dataset containing 24335 omnidirectional images that represent a wide variety of outdoor environments including buildings streets and diverse vegetation. This dataset is generated from expansive lifelike virtual spaces and encompasses dynamic scene elements such as changing lighting conditions different times of day pedestrians and vehicles.Our experiments show that UBotNet achieves significantly improved accuracy in depth estimation and normal estimation compared to existing models. Lastly we validate cross-domain synthetic-to-real depth and normal estimation on real outdoor images using UBotNet trained solely on our synthetic OmniHorizon dataset demonstrating the potential of both the synthetic dataset and the proposed network for real-world scene understanding applications.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Bhanushali_2024_CVPR, author = {Bhanushali, Jay and Muniyandi, Manivannan and Chakravarthula, Praneeth}, title = {Cross-Domain Synthetic-to-Real In-the-Wild Depth and Normal Estimation for 3D Scene Understanding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1290-1300} }