Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, Shiliang Pu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 6501-6510

Abstract


Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability. In this paper, we propose a simple yet effective framework to Distill the Knowledge from the VLM to a DETR-like detector, termed DK-DETR. Specifically, we present two ingenious distillation schemes named semantic knowledge distillation (SKD) and relational knowledge distillation (RKD). To utilize the rich knowledge from the VLM systematically, SKD transfers the semantic knowledge explicitly, while RKD exploits implicit relationship information between objects. Furthermore, a distillation branch including a group of auxiliary queries is added to the detector to mitigate the negative effect on base categories. Equipped with SKD and RKD on the distillation branch, DK-DETR improves the detection performance of novel categories significantly and avoids disturbing the detection of base categories. Extensive experiments on LVIS and COCO datasets show that DK-DETR surpasses existing OVOD methods under the setting that the base-category supervision is solely available. The code and models are available at https://github.com/hikvision-research/opera.

Related Material


[pdf]
[bibtex]
@InProceedings{Li_2023_ICCV, author = {Li, Liangqi and Miao, Jiaxu and Shi, Dahu and Tan, Wenming and Ren, Ye and Yang, Yi and Pu, Shiliang}, title = {Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6501-6510} }