Spectral Metric for Dataset Complexity Assessment

Frederic Branchaud-Charron, Andrew Achkar, Pierre-Marc Jodoin; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3215-3224


In this paper, we propose a new measure to gauge the complexity of image classification problems. Given an annotated image dataset, our method computes a complexity measure called the cumulative spectral gradient (CSG) which strongly correlates with the test accuracy of convolutional neural networks (CNN). The CSG measure is derived from the probabilistic divergence between classes in a spectral clustering framework. We show that this metric correlates with the overall separability of the dataset and thus its inherent complexity. As will be shown, our metric can be used for dataset reduction, to assess which classes are more difficult to disentangle, and approximate the accuracy one could expect to get with a CNN. Results obtained on 11 datasets and three CNN models reveal that our method is more accurate and faster than previous complexity measures.

Related Material

[pdf] [supp]
author = {Branchaud-Charron, Frederic and Achkar, Andrew and Jodoin, Pierre-Marc},
title = {Spectral Metric for Dataset Complexity Assessment},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}