- [pdf] [supp]
Assessing the Impact of Diversity on the Resilience of Deep Learning Ensembles: A Comparative Study on Model Architecture, Output, Activation, and Attribution
We investigate the relationship between different diversity metrics, accuracy, and resiliency to natural image corruptions of Deep Learning (DL) image classifier ensembles. We evaluate existing diversity dimensions such as model architecture, model prediction, and neuron activations, as well as a novel diversity dimension of input attribution. Using ResNet50 as a comparison baseline, we evaluate the resiliency of multiple individual DL model architectures against dataset distribution shifts corresponding to natural image corruptions. We compare ensembles created with diverse model architectures trained either independently or through a Neural Architecture Search technique and evaluate the correlation of prediction-based and attribution-based diversity to the final ensemble accuracy. Finally, we evaluate a set of diversity enforcement heuristics for training based on negative correlation learning (NCL) and compare how effective they are to achieve independent failure behavior. Our key observations are: 1) model architecture is more important for individual resiliency than model size or model accuracy but architecture diversity in an ensemble is typically not more resilient, 2) attribution-based diversity is less negatively correlated to the ensemble accuracy than prediction-based diversity, 3) a balanced loss function of individual and ensemble accuracy creates more resilient ensembles for image natural corruptions, 4) architecture diversity produces more diversity than NCL in all explored diversity metrics: predictions, attributions, and activations.