VRT-Net: Real-Time Scene Parsing via Variable Resolution Transform

Jogendra Nath Kundu, Gaurav Singh Rajput, Venkatesh Babu RADHAKRISHNAN; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2049-2056

Abstract


Urban scene parsing is a basic requirement for various autonomous navigation systems especially in self-driving. Most of the available approaches employ generic image parsing architectures designed for segmentation of object focused scene captured in indoor setups. However, images captured in car-mounted cameras exhibit an extreme effect of perspective geometry, causing a significant scale disparity between near and farther objects. Recognizing this, we formalize a unique Variable Resolution Transform (VRT) technique motivated from the foveal magnification in human eye. Following this, we design a Fovea Estimation Network (FEN) which is trained to estimate a single most convenient fixation location along with the associated magnification factor, best suited for a given input image. The proposed framework is designed to enable its usage as a wrapper over the available real-time scene parsing models, thereby demonstrating a superior trade-off between speed and quality as compared to the prior state-of-the-arts.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Kundu_2020_WACV,
author = {Kundu, Jogendra Nath and Rajput, Gaurav Singh and RADHAKRISHNAN, Venkatesh Babu},
title = {VRT-Net: Real-Time Scene Parsing via Variable Resolution Transform},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}
}