Deep Equilibrium Object Detection

Shuai Wang, Yao Teng, Limin Wang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 6296-6306

Abstract


Query-based object detectors directly decode image features into object instances with a set of learnable queries. These query vectors are progressively refined to stable meaningful representations through a sequence of decoder layers, and then used to directly predict object locations and categories with simple FFN heads. In this paper, we present a new query-based object detector (DEQDet) by designing a deep equilibrium decoder. Our DEQ decoder models the query vector refinement as the fixed point solving of an implicit layer and is equivalent to applying infinite steps of refinement. To be more specific to object decoding, we use a two-step unrolled equilibrium equation to explicitly capture the query vector refinement. Accordingly, we are able to incorporate refinement awareness into the DEQ training with the inexact gradient back-propagation (RAG). In addition, to stabilize the training of our DEQDet and improve its generalization ability, we devise the deep supervision scheme on the optimization path of DEQ with refinement-aware perturbation (RAP). Our experiments demonstrate DEQDet converges faster, consumes less memory, and achieves better results than the baseline counterpart (AdaMixer). In particular, our DEQDet with ResNet50 backbone and 300 queries achieves the 49.5 mAP and 33.0 APs on the MS COCO benchmark under 2x training scheme (24 epochs).

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2023_ICCV, author = {Wang, Shuai and Teng, Yao and Wang, Limin}, title = {Deep Equilibrium Object Detection}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {6296-6306} }