Learning to Rank Proposals for Object Detection

Zhiyu Tan, Xuecheng Nie, Qi Qian, Nan Li, Hao Li; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8273-8281

Abstract


Non-Maximum Suppression (NMS) is an essential step of modern object detection models for removing duplicated candidates. The efficacy of NMS heavily affects the final detection results. Prior works exploit suppression criterions relying on either the objectiveness derived from classification or the overlapness produced by regression, both of which are heuristically designed and fail to explicitly link with the suppression rank. To address this issue, in this paper, we propose a novel Learning-to-Rank (LTR) model to produce the suppression rank via a learning procedure, thus facilitating the candidate generation and lifting the detection performance. In particular, we define a ranking score based on IoU to indicate the ranks of candidates during the NMS step, where candidates with high ranking score will be reserved and the ones with low ranking score will be eliminated. We design a lightweight network to predict the ranking score. We introduce a ranking loss to supervise the generation of these ranking scores, which encourages candidates with IoU to the ground-truth to rank higher. To facilitate the training procedure, we design a novel sampling strategy via dividing candidates into different levels and select hard pairs to adopt in the training. During the inference phase, this module can be exploited as a plugin to the current object detector. The training and inference of the overall framework is end-to-end. Comprehensive experiments on benchmarks PASCAL VOC and MS COCO demonstrate the generality and effectiveness of our model for facilitating existing object detectors to state-of-the-art accuracy.

Related Material


[pdf]
[bibtex]
@InProceedings{Tan_2019_ICCV,
author = {Tan, Zhiyu and Nie, Xuecheng and Qian, Qi and Li, Nan and Li, Hao},
title = {Learning to Rank Proposals for Object Detection},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}