Leveraging Style and Content Features for Text Conditioned Image Retrieval

Pranit Chawla, Surgan Jandial, Pinkesh Badjatiya, Ayush Chopra, Mausoom Sarkar, Balaji Krishnamurthy; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 3978-3982

Abstract


Image Search is a fundamental task playing a significant role in the success of wide variety of frameworks and applications. However, with the increasing sizes of product catalogues and the number of attributes per product, it has become difficult for users to express their needs effectively. Therefore, we focus on the problem of Image Retrieval with Text Feedback, which involves retrieving modified images according to the natural language feedback provided by users. In this work, we hypothesise that since an image can be delineated by its content and style features, modifications to the image can also take place in the two sub spaces respectively. Hence, we decompose an input image into its corresponding style and content features, apply modification of the text feedback individually in both the style and content spaces and finally fuse them for retrieval. Our experiments show that our approach outperforms a recent state of the art method in this task, TIRG, that seeks to use a single vector in contrast to leveraging the modification via text over style and content spaces separately.

Related Material


[pdf]
[bibtex]
@InProceedings{Chawla_2021_CVPR, author = {Chawla, Pranit and Jandial, Surgan and Badjatiya, Pinkesh and Chopra, Ayush and Sarkar, Mausoom and Krishnamurthy, Balaji}, title = {Leveraging Style and Content Features for Text Conditioned Image Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {3978-3982} }