Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Lu, Yan; Ma, Xinzhu; Yang, Lei; Zhang, Tianzhu; Liu, Yating; Chu, Qi; Yan, Junjie; Ouyang, Wanli

Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 3111-3121

Abstract

Monocular 3D object detection has received increasing attention due to the wide application in autonomous driving. Existing works mainly focus on introducing geometry projection to predict depth priors for each object. Despite their impressive progress, these methods neglect the geometry leverage effect of the projection process, which leads to uncontrollable inferences and damage the training efficiency. In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to handle these problems, which can guide the model to learn more reliable depth outputs. The overall framework combines the uncertainty inference and the hierarchical task learning to reduce the negative effects of the geometry leverage. Specifically, an Uncertainty Geometry Projection module is proposed to obtain the geometry guided uncertainty of the inferred depth, which can not only benefit the geometry learning but also provide more reliable depth inferences to reduce the uncontrollableness caused by the geometry leverage. Besides, to reduce the instability in the training process caused by the geometry leverage effect, we propose a Hierarchical Task Learning strategy to control the overall optimization process. This learning algorithm can monitor the situation of each task through a well designed learning situation indicator and adaptively assign the proper loss weights for different tasks according to their learning situation and the hierarchical structure, which can significantly improve the stability and the efficiency of the training process. Extensive experiments demonstrate the effectiveness of the proposed method.The overall model can infer more reliable depth and location information than existing methods, which achieves the state-of-the-art performance on the KITTI benchmark.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lu_2021_ICCV, author = {Lu, Yan and Ma, Xinzhu and Yang, Lei and Zhang, Tianzhu and Liu, Yating and Chu, Qi and Yan, Junjie and Ouyang, Wanli}, title = {Geometry Uncertainty Projection Network for Monocular 3D Object Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {3111-3121} }