Key-Word-Aware Network for Referring Expression Image Segmentation

Hengcan Shi, Hongliang Li, Fanman Meng, Qingbo Wu; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 38-54


Referring expression image segmentation aims to segment out the object referred by a natural language query expression. Without considering the specific properties of visual and textual information, existing works usually deal with this task by directly feeding a foreground/background classifier with cascaded image and text features, which are extracted from each image region and the whole query, respectively. On the one hand, they ignore that each word in a query expression makes different contributions to identify the desired object, which requires a differential treatment in extracting text feature. On the other hand, the relationships of different image regions are not considered as well, even though they are greatly important to eliminate the undesired foreground object in accordance with specific query. To address aforementioned issues, in this paper, we propose a key-word-aware network, which contains a query attention model and a key-word-aware visual context model. In extracting text features, the query attention model attends to assign higher weights for the words which are more important for identifying object. Meanwhile, the key-word-aware visual context model describes the relationships among different image regions, according to corresponding query. Our proposed method outperforms state-of-the-art methods on two referring expression image segmentation databases.

Related Material

author = {Shi, Hengcan and Li, Hongliang and Meng, Fanman and Wu, Qingbo},
title = {Key-Word-Aware Network for Referring Expression Image Segmentation},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}