Evidential Knowledge Distillation

Liangyu Xiang, Junyu Gao, Changsheng Xu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 2814-2824

Abstract


Existing logit-based knowledge distillation methods typically employ singularly deterministic categorical distributions, which eliminates the inherent uncertainty in network predictions and thereby limiting the effective transfer of knowledge. To address this limitation, we introduce distribution-based probabilistic modeling as a more comprehensive representation of network knowledge. Specifically, we regard the categorical distribution as a random variable and leverage deep neural networks to predict its distribution, representing it as an evidential second-order distribution. Based on the second-oder modeling, we propose Evidential Knowledge Distillation (EKD) which distills both the expectation of the teacher distribution and the distribution itself into the student. The expectation captures the macroscopic characteristics of the distribution, while the distribution itself conveys microscopic information about the classification boundaries. Additionally, we theoretically show that EKD's distillation objective provides an upper bound on the student's expected risk when treating the teacher's predictions as ground truth. Extensive experiments on several standard benchmarks across various teacher-student network pairs highlight the effectiveness and superior performance of EKD. Our code is available at https://github.com/lyxiang-casia/EKD.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xiang_2025_ICCV, author = {Xiang, Liangyu and Gao, Junyu and Xu, Changsheng}, title = {Evidential Knowledge Distillation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {2814-2824} }