- [pdf] [supp]
Fashion-Specific Ambiguous Expression Interpretation With Partial Visual-Semantic Embedding
A novel technology named fashion intelligence system has been proposed to quantify ambiguous expressions unique to fashion, such as "casual," "adult-casual," and "office-casual," and to support users' understanding of fashion. However, the existing visual-semantic embedding (VSE) model, which is the basis of its system, does not support situations in which images are composed of multiple parts such as hair, tops, pants, skirts, and shoes. We propose partial VSE, which enables sensitive learning for each part of the fashion outfits. This enables five types of practical functionalities, particularly image-retrieval tasks in which changes are made only to the specified parts and image-reordering tasks that focus on the specified parts by the single model. Based on both the multiple unique qualitative and quantitative evaluation experiments, we show the effectiveness of the proposed model.