Measuring Calibration in Deep Learning

Jeremy Nixon, Michael W. Dusenberry, Linchuan Zhang, Ghassen Jerfel, Dustin Tran; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 38-41


The reliability of a machine learning model's confidence in its predictions is critical for high-risk applications. Calibration--the idea that a model's predicted probabilities of outcomes reflect true probabilities of those outcomes--formalizes this notion. Current calibration metrics fail to consider all of the predictions made by machine learning models, and are in- efficient in their estimation of the calibration error. We design the Adaptive Calibration Error (ACE) metric to resolve these pathologies and show that it outperforms other metrics, especially in settings where predictions beyond the maximum prediction that is chosen as the output class matter.

Related Material

author = {Nixon, Jeremy and Dusenberry, Michael W. and Zhang, Linchuan and Jerfel, Ghassen and Tran, Dustin},
title = {Measuring Calibration in Deep Learning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}