Scene Representation in Bird's-Eye View From Surrounding Cameras With Transformers

Yun Zhao, Yu Zhang, Zhan Gong, Hong Zhu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4511-4519

Abstract


Scene representation in the bird's-eye-view (BEV) coordinate frame provides a succinct and effective way to understand surrounding environments for autonomous vehicles and robotics. In this work, we present an end-to-end architecture to generate the BEV representation from surrounding cameras. To generate the BEV representation, we propose a transformer-based encoder-decoder structure to translate the image features from different cameras into the BEV frame, which takes advantage of the context information in the individual image and the relationship between images in different views. We perform multiple semantic segmentation tasks using the BEV features. Experimental results show that our model outperforms the competitive baseline, which demonstrates the effectiveness and efficiency of our method.

Related Material


[pdf]
[bibtex]
@InProceedings{Zhao_2022_CVPR, author = {Zhao, Yun and Zhang, Yu and Gong, Zhan and Zhu, Hong}, title = {Scene Representation in Bird's-Eye View From Surrounding Cameras With Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4511-4519} }