StageInteractor: Query-based Object Detector with Cross-stage Interaction

Yao Teng, Haisong Liu, Sheng Guo, Limin Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 6577-6588

Abstract


Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors are based a sparse set of learnable queries refined by a series of decoder layers. The one-to-one label assignment is independently applied on each layer for deep supervision during training. Despite the great success of query-based object detection, however, this vanilla one-to-one label assignment strategy requires the detectors to have strong fine-grained discrimination and modeling capacity. In this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward pass, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is designed to improve the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline counterpart by 2.2 AP, and achieves a 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.3 AP and 52.7 AP with ResNeXt-101-DCN and Swin-S, respectively. The code and models are made available at https://github.com/MCG-NJU/StageInteractor.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Teng_2023_ICCV, author = {Teng, Yao and Liu, Haisong and Guo, Sheng and Wang, Limin}, title = {StageInteractor: Query-based Object Detector with Cross-stage Interaction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6577-6588} }