Localized Triplet Loss for Fine-Grained Fashion Image Retrieval

Antonio D'Innocente, Nikhil Garg, Yuan Zhang, Loris Bazzani, Michael Donoser; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 3910-3915

Abstract


Fashion retrieval methods aim at learning a clothing specific embedding space where images are ranked based on their global visual similarity with a given query. However, global embeddings struggle to capture localized fine grained similarities between images, because of aggregation operations. Our work deals with this problem by learning localized representations for fashion retrieval based on local interest points of prominent visual features specified by a user. We introduce a localized triplet loss function that compares samples based on corresponding patterns. We incorporate random local perturbation on the interest point as a key regularization technique to enforce local invariance of visual representations. Due to the absence of existing fashion datasets to train on localized representations, we introduce FashionLocalTriplets, a new highquality dataset annotated by fashion specialists that contains triplets of women's dresses and interest points. The proposed model outperforms state-of-the-art global representations on FashionLocalTriplets.

Related Material


[pdf]
[bibtex]
@InProceedings{D'Innocente_2021_CVPR, author = {D'Innocente, Antonio and Garg, Nikhil and Zhang, Yuan and Bazzani, Loris and Donoser, Michael}, title = {Localized Triplet Loss for Fine-Grained Fashion Image Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {3910-3915} }