Sequential Learning for Cross-Modal Retrieval

Ge Song, Xiaoyang Tan; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 0-0


Cross-modal retrieval has attracted increasing attention with the rapid growth of multimodal data, but its learning paradigm under changing environment is less studied. Inspired by the recent achievement in the field of cognition mechanism on how the human brain acquires knowledge, we propose a new sequential learning method for cross-modal retrieval. In this method, a unified model is maintained to capture the common knowledge of various modalities but are learnt in a sequential manner such that it behaves adaptively according to the evolving distribution of different modalities, and needs no laborious alignment operations among multimodal data before learning. Furthermore, we propose a novel meta-learning based method to overcome the catastrophic forgetting encountered in sequential learning. Extensive experiments are conducted on three popular multimodal datasets, showing that our method achieves state-of-the-art cross-modal retrieval performance without any modal-alignment.

Related Material

author = {Song, Ge and Tan, Xiaoyang},
title = {Sequential Learning for Cross-Modal Retrieval},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2019}