Ensemble Deep Manifold Similarity Learning Using Hard Proxies

Nicolas Aziere, Sinisa Todorovic; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7299-7307


This paper is about learning deep representations of images such that images belonging to the same class have more similar representations than those belonging to different classes. For this goal, prior work typically uses the triplet or N-pair loss, specified in terms of either l2-distances or dot-products between deep features. However, such formulations seem poorly suited to the highly non-Euclidean deep feature space. Our first contribution is in specifying the N-pair loss in terms of manifold similarities between deep features. We introduce a new time- and memory-efficient method for estimating the manifold similarities by using a closed-form convergence solution of the Random Walk algorithm. Our efficiency comes, in part, from following the recent work that randomly partitions the deep feature space, and expresses image distances via representatives of the resulting subspaces, a.k.a. proxies. Our second contribution is aimed at reducing overfitting by estimating hard proxies that are as close to one another as possible, but remain in their respective subspaces. Our evaluation demonstrates that we outperform the state of the art in both image retrieval and clustering on the benchmark CUB-200-2011, Cars196, and Stanford Online Products datasets.

Related Material

author = {Aziere, Nicolas and Todorovic, Sinisa},
title = {Ensemble Deep Manifold Similarity Learning Using Hard Proxies},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}