Cross-Architecture Distillation Made Simple with Redundancy Suppression

Zhang, Weijia; Liu, Yuehao; Ran, Wu; Ma, Chao

Weijia Zhang, Yuehao Liu, Wu Ran, Chao Ma; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 23256-23266

Abstract

We describe a simple method for cross-architecture knowledge distillation, where the knowledge transfer is cast into a redundant information suppression formulation. Existing methods introduce sophisticated modules, architecture-tailored designs, and excessive parameters, which impair their efficiency and applicability. We propose to extract the architecture-agnostic knowledge in heterogeneous representations by reducing the redundant architecture-exclusive information. To this end, we present a simple redundancy suppression distillation (RSD) loss, which comprises cross-architecture invariance maximisation and feature decorrelation objectives. To prevent the student from entirely losing its architecture-specific capabilities, we further design a lightweight module that decouples the RSD objective from the student's internal representations. Our method is devoid of the architecture-specific designs and complex operations in the pioneering method of OFA. It outperforms OFA on CIFAR-100 and ImageNet-1k benchmarks with only a fraction of their parameter overhead, which highlights its potential as a simple and strong baseline to the cross-architecture distillation community.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Zhang_2025_ICCV, author = {Zhang, Weijia and Liu, Yuehao and Ran, Wu and Ma, Chao}, title = {Cross-Architecture Distillation Made Simple with Redundancy Suppression}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {23256-23266} }