Aligning Logits Generatively for Principled Black-Box Knowledge Distillation

Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 23148-23157

Abstract


Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper we formalize a two-step workflow consisting of deprivatization and distillation and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing the distance of high-dimensional image points. For different teacher-student pairs our method yields inspiring distillation performance on various benchmarks and outperforms the previous state-of-the-art approaches.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Ma_2024_CVPR, author = {Ma, Jing and Xiang, Xiang and Wang, Ke and Wu, Yuchuan and Li, Yongbin}, title = {Aligning Logits Generatively for Principled Black-Box Knowledge Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {23148-23157} }