Learning Attribute-Driven Disentangled Representations for Interactive Fashion Retrieval

Yuxin Hou, Eleonora Vig, Michael Donoser, Loris Bazzani; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 12147-12157

Abstract


Interactive retrieval for online fashion shopping provides the ability of changing image retrieval results according to the user feedback. One common problem in interactive retrieval is that a specific user interaction (e.g., changing the color of a T-shirt) causes other aspects to change inadvertently (e.g., the results have a sleeve type different from that of the query). This is a consequence of existing methods learning visual representations that are entangled in the embedding space, which limits the controllability of the retrieved results. We propose to leverage on the semantics of visual attributes to train convolutional networks that learn attribute-specific subspaces for each attribute type to obtain disentangled representations. Operations, such as swapping out a particular attribute value for another, impact the attribute at hand and leave others untouched. We show that our model can be tailored to deal with different retrieval tasks while maintaining its disentanglement property. We obtained state-of-the-art performance on three interactive fashion retrieval tasks: attribute manipulation retrieval, conditional similarity retrieval, and outfit complementary item retrieval. We will make code and models publicly available.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hou_2021_ICCV, author = {Hou, Yuxin and Vig, Eleonora and Donoser, Michael and Bazzani, Loris}, title = {Learning Attribute-Driven Disentangled Representations for Interactive Fashion Retrieval}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {12147-12157} }