D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers

Jianfeng He, Yuan Gao, Tianzhu Zhang, Zhe Zhang, Feng Wu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 2904-2914

Abstract


Establishing pixel-level matches between image pairs is vital for a variety of computer vision applications. However, achieving robust image matching remains challenging because CNN extracted descriptors usually lack discriminative ability in texture-less regions and keypoint detectors are only good at identifying keypoints with a specific level of structure. To deal with these issues, a novel image matching method is proposed by Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-based Transformers (D2Former), including a contextual feature descriptor learning (CFDL) module and a hierarchical keypoint detector learning (HKDL) module. The proposed D2Former enjoys several merits. First, the proposed CFDL module can model long-range contexts efficiently and effectively with the aid of designed descriptor agents. Second, the HKDL module can generate keypoint detectors in a hierarchical way, which is helpful for detecting keypoints with diverse levels of structures. Extensive experimental results on four challenging benchmarks show that our proposed method significantly outperforms state-of-the-art image matching methods.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{He_2023_CVPR, author = {He, Jianfeng and Gao, Yuan and Zhang, Tianzhu and Zhang, Zhe and Wu, Feng}, title = {D2Former: Jointly Learning Hierarchical Detectors and Contextual Descriptors via Agent-Based Transformers}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {2904-2914} }