Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images

Zhi-Qi Cheng, Xiao Wu, Yang Liu, Xian-Sheng Hua; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 4048-4056

Abstract


In recent years, both online retail and video hosting service have been exponentially grown. In this paper, a novel deep neural network, called AsymNet, is proposed to explore a new cross-domain task, Video2Shop, targeting for matching clothes appeared in videos to the exactly same items in online shops. For the image side, well-established methods are used to detect and extract features for clothing patches with arbitrary sizes. For the video side, deep visual features are extracted from detected object regions in each frame, and further fed into a Long Short-Term Memory (LSTM) framework for sequence modeling, which captures the temporal dynamics in videos. To conduct exact matching between videos and online shopping images, LSTM hidden states for videos and image features extracted from static images are jointly modeled, under the similarity network with reconfigurable deep tree structure. Moreover, an approximate training method is proposed to achieve the efficiency when training. Extensive experiments conducted on a large cross-domain dataset have demonstrated the effectiveness and efficiency of the proposed AsymNet, which outperforms the state-of-the-art methods.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Cheng_2017_CVPR,
author = {Cheng, Zhi-Qi and Wu, Xiao and Liu, Yang and Hua, Xian-Sheng},
title = {Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}