Multi-Level Fusion Based 3D Object Detection From Monocular Images

Bin Xu, Zhenzhong Chen; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2345-2353

Abstract


In this paper, we present an end-to-end deep learning based framework for 3D object detection from a single monocular image. A deep convolutional neural network is introduced for simultaneous 2D and 3D object detection. First, 2D region proposals are generated through a region proposal network. Then the shared features are learned within the proposals to predict the class probability, 2D bounding box, orientation, dimension, and 3D location. We adopt a stand-alone module to predict the disparity and extract features from the computed point cloud. Thus features from the original image and the point cloud will be fused in different levels for accurate 3D localization. The estimated disparity is also used for front view feature encoding to enhance the input image,regarded as an input-fusionprocess. The proposed algorithm can directly output both 2D and 3D object detection results in an end-to-end fashion with only a single RGB image as the input. The experimental results on the challenging KITTI benchmark demonstrate that our algorithm significantly outperforms the state-of-the-art methods with only monocular images.

Related Material


[pdf]
[bibtex]
@InProceedings{Xu_2018_CVPR,
author = {Xu, Bin and Chen, Zhenzhong},
title = {Multi-Level Fusion Based 3D Object Detection From Monocular Images},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}
}