Communication-Efficient Federated Data Augmentation on Non-IID Data
Federated learning (FL) is an attractive distributed machine learning framework due to the property of privacy preservation. The implementation of FL encounters the challenge of the Non-Independent and Identically Distributed (Non-IID) data across devices. This work focuses on mitigating the impact of Non-IID datasets in wireless communications. To achieve this goal, we propose a generative models-based federated data augmentation strategy (FedDA) with privacy preservation and communication efficiency. In FedDA, the Conditional AutoEncoder (CVAE) is adopted to generate the missing samples on Non-IID datasets. The Knowledge Distillation Mechanism is introduced to achieve Federated learning, through which knowledge is shared, rather than model parameters or gradients. The knowledge is designed based on the hidden-layer features to reduce the communication overhead and protect raw data privacy. Meanwhile, to generate cross-class samples that are easy to classify, the latent variables in CVAE are constrained and the attention mechanism is introduced. Extensive experiments are conducted on Fashion-MNIST and CIFAR-10 with different data distributions. The results show that FedDA can improve the model accuracy by up to 8% while reducing the communication overhead by up to 2x, compared to classic baselines with highly Non-IID data.