Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning

Xin Zhang, Jiawei Du, Yunsong Li, Weiying Xie, Joey Tianyi Zhou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26223-26232

Abstract


Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered including factors such as forgetting event and probability change typically using an averaging approach. However these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples which may not be sufficiently highlighted in an averaging manner. In this study we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS) to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth we estimate the series of each sample's individual contributions spanning the training progress ensuring comprehensive integration of training dynamics. In the second depth we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100 our method achieves 54.51% accuracy with only 10% training data surpassing baselines methods by more than 12.69%. Our codes are available at https://github.com/zhangxin-xd/Dataset-Pruning-TDDS.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhang_2024_CVPR, author = {Zhang, Xin and Du, Jiawei and Li, Yunsong and Xie, Weiying and Zhou, Joey Tianyi}, title = {Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {26223-26232} }