RecursiveDet: End-to-End Region-Based Recursive Object Detection

Jing Zhao, Li Sun, Qingli Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 6307-6316

Abstract


End-to-end region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages, which refine the current predictions according to their previous results. Model parameters within each stage are independent, evolving a huge cost. In this paper, we find the general setting of decoding stages is actually redundant. By simply sharing parameters and making a recursive decoder, the detector already obtains a significant improvement. The recursive decoder can be further enhanced by positional encoding (PE) of the proposal box, which makes it aware of the exact locations and sizes of input bounding boxes, thus becoming adaptive to proposals from different stages during the recursion. Moreover, we also design centerness-based PE to distinguish the RoI feature element and dynamic convolution kernels at different positions within the bounding box. To validate the effectiveness of the proposed method, we conduct intensive ablations and build the full model on three recent mainstream region-based detectors. The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters and slightly increased computation cost.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhao_2023_ICCV, author = {Zhao, Jing and Sun, Li and Li, Qingli}, title = {RecursiveDet: End-to-End Region-Based Recursive Object Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6307-6316} }