TransFusion: Multi-Modal Fusion Network for Semantic Segmentation

Abhisek Maiti, Sander Oude Elberink, George Vosselman; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 6537-6547

Abstract


The complementary properties of 2D color images and 3D point clouds can potentially improve semantic segmentation compared to using uni-modal data. Multi-modal data fusion is however challenging due to the heterogeneity, dimensionality of the data, the difficulty of aligning different modalities to the same reference frame, and the presence of modality-specific bias. In this regard, we propose a new model, TransFusion, for semantic segmentation that fuses images directly with point clouds without the need for lossy pre-processing of the point clouds. TransFusion outperforms the baseline FCN model that uses images with depth maps. Compared to the baseline, our method improved mIoU by 4% and 2% for the Vaihingen and Potsdam datasets. We demonstrate the capability of our proposed model to adequately learn the spatial and structural information resulting in better inference.

Related Material


[pdf]
[bibtex]
@InProceedings{Maiti_2023_CVPR, author = {Maiti, Abhisek and Elberink, Sander Oude and Vosselman, George}, title = {TransFusion: Multi-Modal Fusion Network for Semantic Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {6537-6547} }