OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Sarkar, Rohan; Bodla, Navaneeth; Vasileva, Mariya I.; Lin, Yen-Liang; Beniwal, Anurag; Lu, Alan; Medioni, Gerard

Rohan Sarkar, Navaneeth Bodla, Mariya I. Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, Gerard Medioni; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3601-3609

Abstract

Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the proposed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations encoding the compatibility relations between all items in the entire outfit for addressing both compatibility prediction and complementary item retrieval. For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss. For complementary item retrieval, we design a target item token that additionally takes the target item specification (in the form of a category or text description) into consideration. We train our framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification as inputs. The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit. Additionally, we adopt a pre-training approach and a curriculum learning strategy to improve retrieval performance. Experiments show that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Sarkar_2023_WACV, author = {Sarkar, Rohan and Bodla, Navaneeth and Vasileva, Mariya I. and Lin, Yen-Liang and Beniwal, Anurag and Lu, Alan and Medioni, Gerard}, title = {OutfitTransformer: Learning Outfit Representations for Fashion Recommendation}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3601-3609} }