Detection Transformer with Stable Matching

Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 6491-6500

Abstract


This paper is concerned with the matching stability problem across different decoder layers in DEtection TRansformers (DETR). We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR. To address this problem, we show that the most important design is to use and only use positional metrics (like IOU) to supervise classification scores of positive examples. Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost, named position-supervised loss and position-modulated cost. We verify our methods on several DETR variants. Our methods show consistent improvements over baselines. By integrating our methods with DINO, we achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 1x (12 epochs) and 2x (24 epochs) training settings, achieving a new record under the same setting. Our code will be made available.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Liu_2023_ICCV, author = {Liu, Shilong and Ren, Tianhe and Chen, Jiayu and Zeng, Zhaoyang and Zhang, Hao and Li, Feng and Li, Hongyang and Huang, Jun and Su, Hang and Zhu, Jun and Zhang, Lei}, title = {Detection Transformer with Stable Matching}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6491-6500} }