Data Valuation and Detections in Federated Learning

Wenqian Li, Shuran Fu, Fengrui Zhang, Yan Pang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 12027-12036

Abstract


Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data which is crucial for incentivizing clients to contribute high-quality data in the FL task. In scenarios involving numerous data clients within FL it is often the case that only a subset of clients and datasets are pertinent to a specific learning task while others might have either a negative or negligible impact on the model training process. This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task. Our proposed approach \texttt FedBary utilizes Wasserstein distance within the federated context offering a new solution for data valuation in the FL framework. This method ensures transparent data valuation and efficient computation of the Wasserstein barycenter and reduces the dependence on validation datasets. Through extensive empirical experiments and theoretical analyses we demonstrate the advantages of this data valuation method as a promising avenue for FL research.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Wenqian and Fu, Shuran and Zhang, Fengrui and Pang, Yan}, title = {Data Valuation and Detections in Federated Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {12027-12036} }