- [pdf] [supp] [arXiv]
DivAug: Plug-In Automated Data Augmentation With Explicit Diversity Maximization
Human-designed data augmentation strategies havebeen replaced by automatically learned augmentation pol-icy in the past two years. Specifically, recent works haveexperimentally shown that the superior performance of theautomated methods stems from increasing the diversity ofaugmented data. However, two factors regard-ing the diversity of augmented data are still missing: 1)the explicit definition (and thus measurement) of diversityand 2) the quantifiable relationship between diversity andits regularization effects. To fill this gap, we propose a di-versity measure called "Variance Diversity" and theoreti-cally show that the regularization effect of data augmenta-tion is promised by Variance Diversity. We confirm in exper-iments that the relative gain from automated data augmen-tation in test accuracy of a given model is highly correlatedto Variance Diversity. To improve the search process ofautomated augmentation, an unsupervised sampling-basedframework,DivAug, is designed to directly optimize Vari-ance Diversity and hence strengthen the regularization ef-fect. Without requiring a separate search process, the per-formance gain from DivAug is comparable with state-of-the-art method with better efficiency. Moreover, under thesemi-supervised setting, our framework can further improvethe performance of semi-supervised learning algorithmsbased on RandAugment, making it highly applicable to real-world problems, where labeled data is scarce. The code is available at https://github.com/warai-0toko/DivAug.