Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding From Fashion Images

Wei-Lin Hsiao, Kristen Grauman; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4203-4212

Abstract


What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn a style-coherent representation. Our method leverages probabilistic polylingual topic models based on visual attributes to discover a set of latent style factors. Given a collection of unlabeled fashion images, our approach mines for the latent styles, then summarizes outfits by how they mix those styles. Our approach can organize galleries of outfits by style without requiring any style labels. Experiments on over 100K images demonstrate its promise for retrieving, mixing, and summarizing fashion images by their style.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Hsiao_2017_ICCV,
author = {Hsiao, Wei-Lin and Grauman, Kristen},
title = {Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding From Fashion Images},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}