RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

Künzel, Johannes; Hilsmann, Anna; Eisert, Peter

Johannes Künzel, Anna Hilsmann, Peter Eisert; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 4868-4877

Abstract

We introduce RIPE, an innovative reinforcement learning-based framework for weakly-supervised training of a keypoint extractor that excels in both detection and description tasks. In contrast to conventional training regimes that depend heavily on artificial transformations, pre-generated models, or 3D data, RIPE requires only a binary label indicating whether paired images represent the same scene.This minimal supervision significantly expands the pool of training data, enabling the creation of a highly generalized and robust keypoint extractor. RIPE utilizes the encoder's intermediate layers for the description of the keypoints with a hyper-column approach to integrate information from different scales. Additionally, we propose a auxiliary loss to enhance the discriminative capability of the learned descriptors.Comprehensive evaluations on standard benchmarks demonstrate that RIPE simplifies data preparation while achieving competitive performance compared to state-of-the-art techniques, marking a significant advancement in robust keypoint extraction and description.Code and data will be made available for research purposes.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Kunzel_2025_ICCV, author = {K\"unzel, Johannes and Hilsmann, Anna and Eisert, Peter}, title = {RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {4868-4877} }