Monocular Depth Estimation Using Multi Scale Neural Network and Feature Fusion

Abhinav Sagar; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2022, pp. 656-662

Abstract


Depth estimation from monocular images is a challenging problem in computer vision. In this paper, we tackle this problem using a novel network architecture using multi scale feature fusion. Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We present a new loss function for training the network which uses a depth regression term, SSIM loss term and a multinomial logistic loss term combined. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss. Our network outperforms previous state of the art methods with lesser parameters.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Sagar_2022_WACV, author = {Sagar, Abhinav}, title = {Monocular Depth Estimation Using Multi Scale Neural Network and Feature Fusion}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2022}, pages = {656-662} }