MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition

Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham; The IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1125-1133

Abstract


Most of the feature-learning methods for RGB-D object recognition either learn features from color and depth modalities separately, or simply treat RGB-D as undifferentiated four-channel data, which cannot adequately exploit the relationship between different modalities. Motivated by the intuition that different modalities should contain not only some modal-specific patterns but also some shared common patterns, we propose a multi-modal feature learning framework for RGB-D object recognition. We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities. In this way, we obtain features reflecting shared properties as well as modal-specific properties in different modalities. The information of the multi-modal learning frameworks is back-propagated to the early CNN layers. Experimental results show that our proposed multi-modal feature learning method outperforms state-of-the-art approaches on two widely used RGB-D object benchmark datasets.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2015_ICCV,
author = {Wang, Anran and Cai, Jianfei and Lu, Jiwen and Cham, Tat-Jen},
title = {MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2015}
}