C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation

Fushuo Huo, Wenchao Xu, Jingcai Guo, Haozhao Wang, Song Guo; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16006-16015

Abstract


Existing Knowledge Distillation (KD) methods typically focus on transferring knowledge from a large-capacity teacher to a low-capacity student model achieving substantial success in unimodal knowledge transfer. However existing methods can hardly be extended to Cross-Modal Knowledge Distillation (CMKD) where the knowledge is transferred from a teacher modality to a different student modality with inference only on the distilled student modality. We empirically reveal that the modality gap i.e. modality imbalance and soft label misalignment incurs the ineffectiveness of traditional KD in CMKD. As a solution we propose a novel \underline C ustomized \underline C rossmodal \underline K nowledge \underline D istillation (C^2KD). Specifically to alleviate the modality gap the pre-trained teacher performs bidirectional distillation with the student to provide customized knowledge. The On-the-Fly Selection Distillation(OFSD) strategy is applied to selectively filter out the samples with misaligned soft labels where we distill cross-modal knowledge from non-target classes to avoid the modality imbalance issue. To further provide receptive cross-modal knowledge proxy student and teacher inheriting unimodal and cross-modal knowledge is formulated to progressively transfer cross-modal knowledge through bidirectional distillation. Experimental results on audio-visual image-text and RGB-depth datasets demonstrate that our method can effectively transfer knowledge across modalities achieving superior performance against traditional KD by a large margin.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Huo_2024_CVPR, author = {Huo, Fushuo and Xu, Wenchao and Guo, Jingcai and Wang, Haozhao and Guo, Song}, title = {C2KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {16006-16015} }