MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 27370-27380

Abstract


Vision-language pre-trained models have achieved impressive performance on various downstream tasks. However their large model sizes hinder their utilization on platforms with limited computational resources. We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Recent efforts for VLP compression either adopt uni-modal compression metrics resulting in limited performance or involve costly mask-search processes with learnable masks. In this paper we first propose the Module-wise Pruning Error (MoPE) metric accurately assessing CLIP module importance by performance decline on cross-modal tasks. Using the MoPE metric we introduce a unified pruning framework applicable to both pre-training and task-specific fine-tuning compression stages. For pre-training MoPE-CLIP effectively leverages knowledge from the teacher model significantly reducing pre-training costs while maintaining strong zero-shot capabilities. For fine-tuning consecutive pruning from width to depth yields highly competitive task-specific models. Extensive experiments in two stages demonstrate the effectiveness of the MoPE metric and MoPE-CLIP outperforms previous state-of-the-art VLP compression methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Lin_2024_CVPR, author = {Lin, Haokun and Bai, Haoli and Liu, Zhili and Hou, Lu and Sun, Muyi and Song, Linqi and Wei, Ying and Sun, Zhenan}, title = {MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {27370-27380} }