RD-DPP: Rate-Distortion Theory Meets Determinantal Point Process to Diversify Learning Data Samples

Chen, Xiwen; Li, Huayu; Qiu, Peijie; Zhu, Wenhui; Amin, Rahul; Razi, Abolfazl

Xiwen Chen, Huayu Li, Peijie Qiu, Wenhui Zhu, Rahul Amin, Abolfazl Razi; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 6911-6920

Abstract

Selecting representative samples plays an indispensable role in many machine learning and computer vision applications under limited resources (e.g. limited communication bandwidth and computational power). Determinantal Point Process (DPP) is a widely used method for selecting the most diverse representative samples that can summarize a dataset. However its adaptability to different tasks remains an open challenge as it is challenging for DPP to perform task-specific tuning. In contrast Rate-Distortion (RD) theory provides a way to measure task-specific diversity. However optimizing RD for a data selection problem remains challenging because the quantity that needs to be optimized is the index set of the selected samples. To tackle these challenges we first draw an inherent relationship between DPP and RD theory. Our theoretical derivation paves the way to take advantage of both RD and DPP for a task-specific data selection. To this end we propose a novel method for task-specific data selection for multi-level classification tasks named RD-DPP. Empirical studies on seven different datasets using five benchmark models demonstrate the effectiveness of the proposed RD-DPP method. Our method also outperforms recent strong competing methods while exhibiting high generalizability to a variety of learning tasks. The source code is available on https://github.com/xiwenc1/RD-DPP

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Chen_2025_WACV, author = {Chen, Xiwen and Li, Huayu and Qiu, Peijie and Zhu, Wenhui and Amin, Rahul and Razi, Abolfazl}, title = {RD-DPP: Rate-Distortion Theory Meets Determinantal Point Process to Diversify Learning Data Samples}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {6911-6920} }