Fine-Grained Visual Attribute Extraction From Fashion Wear
Automatically extracting visual attributes for e-commerce data has widespread applications in cataloging, catalogue qualification and enrichment, visual search, etc. Here, we address the task of visual attribute extraction for a highly challenging real-world fashion data from Flipkart catalogue (an Indian e-commerce platform), which is collected from seller uploaded product images. This data not only contains widely varying categories (e.g., shirt, sari, shoes), but also has both coarse-grained (e.g., occasion, top type, sari type) and fine-grained (e.g., neck type, print type) attributes. Training examples available for different attributes are highly imbalanced, making this task even more challenging. To this end, we propose an end-to-end framework which integrates multi-task learning with transformer as an attention module, in addition to handling the data imbalance. The proposed architecture supports multiple attributes across various product categories in a scalable manner. Extensive experiments on the in-house dataset shows effectiveness of the proposed framework in improving performance of the fine-grained attributes by 13% on the baseline across the attributes.