Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers

Maohui Li, Michael Halstead, Chris Mccool; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 5432-5439

Abstract


Instance-based semantic segmentation provides detailed per-pixel scene understanding information crucial for both computer vision and robotics applications. However state-of-the-art approaches such as Mask2Former are computationally expensive and reducing this computational burden while maintaining high accuracy remains challenging. Knowledge distillation has been regarded as a potential way to compress neural networks but to date limited work has explored how to apply this to distill information from the output queries of a model such as Mask2Former. In this paper we match the output queries of the student and teacher models to enable a query-based knowledge distillation scheme. We independently match the teacher and the student to the ground truth and use this to define the teacher to student relationship for knowledge distillation. Using this approach we show that it is possible to perform knowledge distillation where the student models can have a lower number of queries and the backbone can be changed from a Transformer architecture to a DCNN architecture. Experiments on two challenging agricultural datasets sweet pepper (BUP20) and sugar beet (SB20) and Cityscapes demonstrate the efficacy of our approach. Across the three datasets the student models obtain an average absolute performance improvement in AP of 1.8 and 1.9 for ResNet-50 and Swin-Tiny backbone respectively. To the best of our knowledge this is the first work to propose knowledge distillation schemes for instance semantic segmentation with transformer-based models.

Related Material


[pdf]
[bibtex]
@InProceedings{Li_2024_CVPR, author = {Li, Maohui and Halstead, Michael and Mccool, Chris}, title = {Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5432-5439} }