Label Augmented Dataset Distillation

Kang, Seoungyoon; Lim, Youngsun; Shim, Hyunjung

Seoungyoon Kang, Youngsun Lim, Hyunjung Shim; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 1457-1466

Abstract

Traditional dataset distillation primarily focuses on image representation while often overlooking the important role of labels. In this study we introduce Label-Augmented Dataset Distillation (LADD) a new dataset distillation framework enhancing dataset distillation with label augmentations. LADD sub-samples each synthetic image generating additional dense labels to capture rich semantics. These dense labels require only a 2.5% increase in storage (ImageNet subsets) with significant performance benefits providing strong learning signals. Our label-generation strategy can complement existing dataset distillation methods and significantly enhance their training efficiency and performance. Experimental results demonstrate that LADD outperforms existing methods in terms of computational overhead and accuracy. With three high-performance dataset distillation algorithms LADD achieves remarkable gains by an average of 14.9% in accuracy. Furthermore the effectiveness of our method is proven across various datasets distillation hyperparameters and algorithms. Finally our method improves the cross-architecture robustness of the distilled dataset which is important in the application scenario.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Kang_2025_WACV, author = {Kang, Seoungyoon and Lim, Youngsun and Shim, Hyunjung}, title = {Label Augmented Dataset Distillation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {1457-1466} }