WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion

Khiem Vuong, N Dinesh Reddy, Robert Tamburo, Srinivasa G. Narasimhan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 9514-9524

Abstract


Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments partly due to the lack of large-scale labeled ground-truth annotations for learning occlusion. In this work we introduce a novel framework for automatically generating a large realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box segmentation keypoint) and 3D (pose shape) predictions as pseudo-groundtruth unoccluded 3D objects are identified automatically and composited into the background in a clip-art style ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Vuong_2024_CVPR, author = {Vuong, Khiem and Reddy, N Dinesh and Tamburo, Robert and Narasimhan, Srinivasa G.}, title = {WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {9514-9524} }