Multi-Task Learning Using Multi-Modal Encoder-Decoder Networks With Shared Skip Connections

Ryohei Kuga, Asako Kanezaki, Masaki Samejima, Yusuke Sugano, Yasuyuki Matsushita; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 403-411

Abstract


Multi-task learning is a promising approach for efficiently and effectively addressing multiple mutually related recognition tasks. Many scene understanding tasks such as semantic segmentation and depth prediction can be framed as cross-modal encoding/decoding, and hence most of the prior work used multi-modal datasets for multi-task learning. However, the inter-modal commonalities, such as one across image, depth, and semantic labels, have not been fully exploited. We propose a multi-modal encoder-decoder networks to harness the multi-modal nature of multi-task scene recognition. In addition to the shared latent representation among encoder-decoder pairs, our model also has shared skip connections from different encoders. By combining these two representation sharing mechanisms, the proposed method efficiently learns a shared feature representation. Experimental validation show the advantage of our method over baseline encoder-decoder networks and multi-modal auto-encoders.

Related Material


[pdf]
[bibtex]
@InProceedings{Kuga_2017_ICCV,
author = {Kuga, Ryohei and Kanezaki, Asako and Samejima, Masaki and Sugano, Yusuke and Matsushita, Yasuyuki},
title = {Multi-Task Learning Using Multi-Modal Encoder-Decoder Networks With Shared Skip Connections},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}