Towards Cycle-Consistent Models for Text and Image Retrieval

Marcella Cornia, Lorenzo Baraldi, Hamed R. Tavakoli, Rita Cucchiara; Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0-0

Abstract


Cross-modal retrieval has been recently becoming an hotspot research, thanks to the development of deeply-learnable architectures. Such architectures generally learn a joint multi-modal embedding space in which text and images could be projected and compared. Here we investigate a different approach, and reformulate the problem of crossmodal retrieval as that of learning a translation between the textual and visual domain. In particular, we propose an end-to-end trainable model which can translate text into image features and vice versa, and regularizes this mapping with a cycle-consistency criterion. Preliminary experimental evaluations show promising results with respect to ordinary visual-semantic models.

Related Material


[pdf]
[bibtex]
@InProceedings{Cornia_2018_ECCV_Workshops,
author = {Cornia, Marcella and Baraldi, Lorenzo and Tavakoli, Hamed R. and Cucchiara, Rita},
title = {Towards Cycle-Consistent Models for Text and Image Retrieval},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV) Workshops},
month = {September},
year = {2018}
}