Dataset Distillation via the Wasserstein Metric

Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, Haohan Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 1205-1215

Abstract


Dataset Distillation (DD) aims to generate a compact synthetic dataset that enables models to achieve performance comparable to training on the full large dataset, significantly reducing computational costs. Drawing from optimal transport theory, we introduce WMDD (Wasserstein Metric-based Dataset Distillation), a straightforward yet powerful method that employs the Wasserstein metric to enhance distribution matching. We compute the Wasserstein barycenter of features from a pretrained classifier to capture essential characteristics of the original data distribution. By optimizing synthetic data to align with this barycenter in feature space and leveraging per-class BatchNorm statistics to preserve intra-class variations, WMDD maintains the efficiency of distribution matching approaches while achieving state-of-the-art results across various high-resolution datasets. Our extensive experiments demonstrate WMDD's effectiveness and adaptability, highlighting its potential for advancing machine learning applications at scale. Code is available at https://github.com/Liu-Hy/WMDD and website at https://liu-hy.github.io/WMDD/.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Liu_2025_ICCV, author = {Liu, Haoyang and Li, Yijiang and Xing, Tiancheng and Wang, Peiran and Dalal, Vibhu and Li, Luwei and He, Jingrui and Wang, Haohan}, title = {Dataset Distillation via the Wasserstein Metric}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {1205-1215} }