Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation
The difficulty of semantic segmentation in computer vision has been reintroduced as a topic of interest for researchers thanks to the advancement of deep learning algorithms. This research aims into the logic of multi-modal semantic segmentation on images with two different modalities of RGB and Depth, which employs RGB-D images as input. For cross-modal calibration and fusion, this research presents a novel FFCA Module. It can achieve the goal of enhancing segmentation results by acquiring complementing information from several modalities. This module is plug-and-play compatible and can be used with existing neural networks. A multi-modal semantic segmentation network named FFCANet has been designed to test the validity, with a dual-branch encoder structure and a global context module developed using the classic combination of ResNet and DeepLabV3+ backbone. Compared with the baseline, the model used in this research has drastically improved the accuracy of the semantic segmentation task.