Scale Adaptive Fusion Network for RGB-D Salient Object Detection
RGB-D Salient Object Detection (SOD) is a fundamental problem in the field of computer vision and relies heavily on multi-modal interaction between the RGB and depth information. However, most existing approaches adopt the same fusion module to integrate RGB and depth features in multiple scales of the networks, without distinguishing the unique attributes of different layers, e.g.the geometric information in the shallower scales, the structural features in the middle scales, and the semantic cues in the deeper scales. In this work, we propose a Scale Adaptive Fusion Network (SAFNet) for RGB-D SOD which employs scale adaptive modules to fuse the RGB-D features. Specifically, for the shallow scale, we conduct the early fusion strategy by mapping the 2D RGB-D images to a 3D point cloud and learning a unified representation of the geometric information in the 3D space. For the middle scale, we model the structural features from multi-modalities by exploring spatial contrast information from the depth space. For the deep scale, we design a depth-aware channel-wise attention module to enhance the semantic representation of the two modalities. Extensive experiments demonstrate the superiority of the scale-adaptive fusion strategy adopted by our method. The proposed SAFNet achieves favourable performance against state-of-the-art algorithms on six large-scale benchmarks.