Fashion-Specific Ambiguous Expression Interpretation With Partial Visual-Semantic Embedding

Ryotaro Shimizu, Takuma Nakamura, Masayuki Goto; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 3497-3502


A novel technology named fashion intelligence system has been proposed to quantify ambiguous expressions unique to fashion, such as "casual," "adult-casual," and "office-casual," and to support users' understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which is the basis of its system, does not support situations in which images are composed of multiple parts such as hair, tops, pants, skirts, and shoes. We propose partial VSE, which enables sensitive learning for each part of the fashion outfits. This enables five types of practical functionalities, particularly image-retrieval tasks in which changes are made only to the specified parts and image-reordering tasks that focus on the specified parts by the single model. Based on both the multiple unique qualitative and quantitative evaluation experiments, we show the effectiveness of the proposed model.

Related Material

[pdf] [supp]
@InProceedings{Shimizu_2023_CVPR, author = {Shimizu, Ryotaro and Nakamura, Takuma and Goto, Masayuki}, title = {Fashion-Specific Ambiguous Expression Interpretation With Partial Visual-Semantic Embedding}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3497-3502} }