ODAM: Object Detection, Association, and Mapping Using Posed RGB Video

Kejie Li, Daniel DeTone, Yu Fan (Steven) Chen, Minh Vo, Ian Reid, Hamid Rezatofighi, Chris Sweeney, Julian Straub, Richard Newcombe; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 5998-6008

Abstract


Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep-learning-based front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Li_2021_ICCV, author = {Li, Kejie and DeTone, Daniel and Chen, Yu Fan (Steven) and Vo, Minh and Reid, Ian and Rezatofighi, Hamid and Sweeney, Chris and Straub, Julian and Newcombe, Richard}, title = {ODAM: Object Detection, Association, and Mapping Using Posed RGB Video}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {5998-6008} }