Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15512-15521

Abstract


Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline directly predicting the BEV segmentation map from monocular RGB inputs. However the challenge arises when the RGB inputs and BEV targets from distinct perspectives making the direct point-to-point predicting hard to optimize. In this paper we decompose the original BEV segmentation task into two stages namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves mapping RGB input images into the BEV latent space of the first stage directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps equipping the model to handle intricate and challenging scenes effectively. Besides we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2024_CVPR, author = {Zhao, Tianhao and Chen, Yongcan and Wu, Yu and Liu, Tianyang and Du, Bo and Xiao, Peilun and Qiu, Shi and Yang, Hongda and Li, Guozhen and Yang, Yi and Lin, Yutian}, title = {Improving Bird's Eye View Semantic Segmentation by Task Decomposition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {15512-15521} }