- [pdf] [supp]
Learning Universal Semantic Correspondences with No Supervision and Automatic Data Curation
We study the problem of learning semantic image correspondences without manual supervision. Previous works that tackled this problem rely on manually curated image pairs and learn benchmark-specific correspondences. Instead, we present a new method that learns universal correspondences once, from a large image dataset, and without using any manual curation. Despite their generality and despite using less supervision, our universal correspondences still outperform prior works, unsupervised and weakly supervised, in most benchmarks. Our approach starts from local features extracted by an unsupervised vision transformer, which obtain good semantic but poor geometric matching accuracy. It then learns a Transformer Adapter which improves the geometric accuracy of the features, as well as their compatibility between pairs of different images. The method combines semantic similarity with geometric stability obtained via cycle consistency and supervision via synthetic transformations. We use these features to also select pairs of matching images for training the unsupervised correspondences.