Leveraging Style and Content Features for Text Conditioned Image Retrieval
Image Search is a fundamental task playing a significant role in the success of wide variety of frameworks and applications. However, with the increasing sizes of product catalogues and the number of attributes per product, it has become difficult for users to express their needs effectively. Therefore, we focus on the problem of Image Retrieval with Text Feedback, which involves retrieving modified images according to the natural language feedback provided by users. In this work, we hypothesise that since an image can be delineated by its content and style features, modifications to the image can also take place in the two sub spaces respectively. Hence, we decompose an input image into its corresponding style and content features, apply modification of the text feedback individually in both the style and content spaces and finally fuse them for retrieval. Our experiments show that our approach outperforms a recent state of the art method in this task, TIRG, that seeks to use a single vector in contrast to leveraging the modification via text over style and content spaces separately.