-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Iordache_2025_WACV, author = {Iordache, Adrian and Alexe, Bogdan and Ionescu, Radu Tudor}, title = {Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7133-7142} }
Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets
Abstract
We propose a novel teacher-student framework to distill knowledge from multiple teachers trained on distinct datasets. Each teacher is first trained from scratch on its own dataset. Then the teachers are combined into a joint architecture which fuses the features of all teachers at multiple representation levels. The joint teacher architecture is fine-tuned on samples from all datasets thus gathering useful generic information from all data samples. Finally we employ a multi-level feature distillation procedure to transfer the knowledge to a student model for each of the considered datasets. We conduct image classification experiments on seven benchmarks and action recognition experiments on three benchmarks. To illustrate the power of our feature distillation procedure the student architectures are chosen to be identical to those of the individual teachers. To demonstrate the flexibility of our approach we combine teachers with distinct architectures. We show that our novel Multi-Level Feature Distillation (MLFD) can significantly surpass equivalent architectures that are either trained on individual datasets or jointly trained on all datasets at once. Furthermore we confirm that each step of the proposed training procedure is well motivated by a comprehensive ablation study. We publicly release our code at https://github.com/AdrianIordache/MLFD.
Related Material