Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

Yunlong Liu, Osamu Yoshie, Hiroshi Watanabe; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 1245-1264

Abstract


The difficulty of semantic segmentation in computer vision has been reintroduced as a topic of interest for researchers thanks to the advancement of deep learning algorithms. This research aims into the logic of multi-modal semantic segmentation on images with two different modalities of RGB and Depth, which employs RGB-D images as input. For cross-modal calibration and fusion, this research presents a novel FFCA Module. It can achieve the goal of enhancing segmentation results by acquiring complementing information from several modalities. This module is plug-and-play compatible and can be used with existing neural networks. A multi-modal semantic segmentation network named FFCANet has been designed to test the validity, with a dual-branch encoder structure and a global context module developed using the classic combination of ResNet and DeepLabV3+ backbone. Compared with the baseline, the model used in this research has drastically improved the accuracy of the semantic segmentation task.

Related Material


[pdf]
[bibtex]
@InProceedings{Liu_2022_ACCV, author = {Liu, Yunlong and Yoshie, Osamu and Watanabe, Hiroshi}, title = {Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {1245-1264} }