Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images

Aron Yu, Kristen Grauman; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5570-5579

Abstract


Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains nontrivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning to rank.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yu_2017_ICCV,
author = {Yu, Aron and Grauman, Kristen},
title = {Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}