RGB-D Co-attention Network for Semantic Segmentation

Hao Zhou, Lu Qi, Zhaoliang Wan, Hai Huang, Xu Yang; Proceedings of the Asian Conference on Computer Vision (ACCV), 2020


Incorporating the depth (D) information for RGB images has proven the effectiveness and robustness in semantic segmentation. However, the fusion between them is still a challenge due to their meaning discrepancy, in which RGB represents the color but D depth information. In this paper, we propose a co-attention Network (CANet) to capture the fine-grained interplay between RGB' and D' features. The key part in our CANet is co-attention fusion part. It includes three modules. At first, the position and channel co-attention fusion modules adaptively fuse color and depth features in spatial and channel dimension. Finally, a final fusion module integrates the outputs of the two co-attention fusion modules for forming a more representative feature. Our extensive experiments validate the effectiveness of CANet in fusing RGB and D features, achieving the state-of-the-art performance on two challenging RGB-D semantic segmentation datasets, i.e., NYUDv2, SUN-RGBD.

Related Material

@InProceedings{Zhou_2020_ACCV, author = {Zhou, Hao and Qi, Lu and Wan, Zhaoliang and Huang, Hai and Yang, Xu}, title = {RGB-D Co-attention Network for Semantic Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {November}, year = {2020} }