Calibration Transfer via Knowledge Distillation

Ramya Hebbalaguppe, Mayank Baranwal, Kartik Anand, Chetan Arora; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 513-530

Abstract


Modern deep neural networks often suffer from miscalibration, leading to overly confident errors that undermine their reliability. Although Knowledge Distillation (KD) is known to improve student classifier accuracy, its impact on model calibration remains unclear. It is generally assumed that well-calibrated teachers produce well-calibrated students. However, previous findings indicate that teachers calibrated with label smoothing (LS) result in less accurate students. This paper explores the theoretical foundations of KD, revealing that prior results are artifacts of specific calibration methods rather than KD itself. Our study shows that calibrated teachers can effectively transfer calibration to their students, but not all training regimes are equally effective. Notably, teachers calibrated using dynamic label smoothing methods yield better-calibrated student classifiers through KD. We also show that transfer of calibration can be induced from lower capacity teachers to larger capacity students. The proposed KD based Calibration framework, named KD(C), leads to a state-of-the-art calibration results. More specifically, on CIFAR100 using WRN-40-1 feature extractor, we report an ECE of 0.98 compared to 7.61, 7.00, and 2.1 by the current SOTA calibration techniques, Adafocal, ACLS , and CPC respectively, and 11.16 by the baseline NLL loss (lower ECE is better).

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Hebbalaguppe_2024_ACCV, author = {Hebbalaguppe, Ramya and Baranwal, Mayank and Anand, Kartik and Arora, Chetan}, title = {Calibration Transfer via Knowledge Distillation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {513-530} }