DA-DETR: Domain Adaptive Detection Transformer With Information Fusion

Zhang, Jingyi; Huang, Jiaxing; Luo, Zhipeng; Zhang, Gongjie; Zhang, Xiaoqin; Lu, Shijian

Jingyi Zhang, Jiaxing Huang, Zhipeng Luo, Gongjie Zhang, Xiaoqin Zhang, Shijian Lu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 23787-23798

Abstract

The recent detection transformer (DETR) simplifies the object detection pipeline by removing hand-crafted designs and hyperparameters as employed in conventional two-stage object detectors. However, how to leverage the simple yet effective DETR architecture in domain adaptive object detection is largely neglected. Inspired by the unique DETR attention mechanisms, we design DA-DETR, a domain adaptive object detection transformer that introduces information fusion for effective transfer from a labeled source domain to an unlabeled target domain. DA-DETR introduces a novel CNN-Transformer Blender (CTBlender) that fuses the CNN features and Transformer features ingeniously for effective feature alignment and knowledge transfer across domains. Specifically, CTBlender employs the Transformer features to modulate the CNN features across multiple scales where the high-level semantic information and the low-level spatial information are fused for accurate object identification and localization. Extensive experiments show that DA-DETR achieves superior detection performance consistently across multiple widely adopted domain adaptation benchmarks.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Zhang_2023_CVPR, author = {Zhang, Jingyi and Huang, Jiaxing and Luo, Zhipeng and Zhang, Gongjie and Zhang, Xiaoqin and Lu, Shijian}, title = {DA-DETR: Domain Adaptive Detection Transformer With Information Fusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {23787-23798} }