Scalable Exemplar-based Subspace Clustering on Class-Imbalanced Data

Chong You, Chi Li, Daniel P. Robinson, Rene Vidal; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 67-83


Subspace clustering methods based on expressing each data point as a linear combination of a few other data points (e.g., sparse subspace clustering) have become a popular tool for unsupervised learning due to their empirical success and theoretical guarantees. However, their performance can be affected by imbalanced data distributions and large-scale datasets. This paper presents an exemplar-based subspace clustering method to tackle the problem of imbalanced and large-scale datasets. The proposed method searches for a subset of the data that best represents all data points as measured by the $ell_1$-norm of the representation coefficients. To solve our model efficiently, we introduce a farthest first search algorithm which iteratively selects the least well-represented point as an exemplar. When data comes from a union of subspaces, we prove that the computed subset contains enough exemplars from each subspace for expressing all data points even if the data are imbalanced. Our experiments demonstrate that the proposed method outperforms state-of-the-art subspace clustering methods in two large-scale image datasets that are imbalanced. We also demonstrate the effectiveness of our method on unsupervised data subset selection for a face image classification task.

Related Material

author = {You, Chong and Li, Chi and Robinson, Daniel P. and Vidal, Rene},
title = {Scalable Exemplar-based Subspace Clustering on Class-Imbalanced Data},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}