Large-scale Dataset Pruning with Dynamic Uncertainty

Muyang He, Shuo Yang, Tiejun Huang, Bo Zhao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7713-7722

Abstract


The state of the art of many learning tasks e.g. image classification is advanced by collecting larger datasets and then training larger models on them. As the outcome the increasing computational cost is becoming unaffordable. In this paper we investigate how to prune the large-scale datasets and thus produce an informative subset for training sophisticated deep models with negligible performance drop. We propose a simple yet effective dataset pruning method by exploring both the prediction uncertainty and training dynamics. We study dataset pruning by measuring the variation of predictions during the whole training process on large-scale datasets i.e. ImageNet-1K and ImageNet-21K and advanced models i.e. Swin Transformer and ConvNeXt. Extensive experimental results indicate that our method outperforms the state of the art and achieves 25% lossless pruning ratio on both ImageNet-1K and ImageNet-21K.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{He_2024_CVPR, author = {He, Muyang and Yang, Shuo and Huang, Tiejun and Zhao, Bo}, title = {Large-scale Dataset Pruning with Dynamic Uncertainty}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7713-7722} }