-
[pdf]
[bibtex]@InProceedings{Zhao_2022_CVPR, author = {Zhao, Yun and Zhang, Yu and Gong, Zhan and Zhu, Hong}, title = {Scene Representation in Bird's-Eye View From Surrounding Cameras With Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4511-4519} }
Scene Representation in Bird's-Eye View From Surrounding Cameras With Transformers
Abstract
Scene representation in the bird's-eye-view (BEV) coordinate frame provides a succinct and effective way to understand surrounding environments for autonomous vehicles and robotics. In this work, we present an end-to-end architecture to generate the BEV representation from surrounding cameras. To generate the BEV representation, we propose a transformer-based encoder-decoder structure to translate the image features from different cameras into the BEV frame, which takes advantage of the context information in the individual image and the relationship between images in different views. We perform multiple semantic segmentation tasks using the BEV features. Experimental results show that our model outperforms the competitive baseline, which demonstrates the effectiveness and efficiency of our method.
Related Material