Does Federated Dropout Actually Work?
Model sizes are limited in Federated Learning due to network bandwidth and on-device memory constraints. The success of increasing model sizes in other machine learning domains motivates the development of methods for training large-scale models in Federated Learning. To this end, Caldas et al. draws inspiration from dropout and proposes Federated Dropout: an algorithm where clients train randomly selected subsets of a larger server model. Despite the promising empirical results and the many other works that build on it, we argue in this paper that the metrics used to measure the performance of Federated Dropout and its variants are misleading. We propose and perform new experiments which suggest that Federated Dropout is actually detrimental to scaling efforts. We show how a simple ensembling technique outperforms Federated Dropout and other baselines. We perform ablations that suggest that the best performing variations of Federated Dropout approximate ensembling. The simplicity of ensembling allows for easy, practical implementations. Furthermore, ensembling naturally leverages the parallelizable nature of Federated Learning---recall that it is easy to train several models independently because there are a lot of clients and server-compute is not the bottleneck. Ensembling's strong performance against our baselines suggests that Federated Learning models may be more easily scaled than previously thought with more sophisticated ensembling strategies e.g., via boosting.