- [pdf] [supp] [arXiv]
Towards Discriminative and Transferable One-Stage Few-Shot Object Detectors
Recent object detection models have proved valuable for many robotics and manufacturing tasks, but they require large amounts of annotated data for each new class of objects they are trained for. Few-shot object detection (FSOD) aims to address this problem by learning novel classes given only a few samples of annotated data. While competitive results have been achieved using two-stage FSOD detectors, typically faster one-stage FSODs underperform in comparison. We make the discovery that the large gap in performance between two-stage and one-stage FSODs is mainly due to their weak discriminability, which is explained away by a small post-fusion receptive field and a small number of foreground samples in the loss function. We propose a new one-stage FSOD framework to address these limitations - Few-shot RetinaNet (FSRN). Specifically, we propose: (1) a multi-way support training strategy to augment the number of foreground samples for dense meta-detectors during training, (2) an early multi-level feature fusion providing a wide receptive field that covers the whole anchor area, (3) two augmentation techniques on query and source images to enhance transferability. Extensive experiments demonstrate that the proposed approach addresses the limitations of previous methods and boosts both discriminability and transferability. FSRN is two times faster than twostage FSODs while remaining competitive in accuracy, and it triples the state-of-the-art of one-stage meta-detectors on the competitive 10-shot MS-COCO benchmark. On the PASCAL VOC benchmark, the proposed approach consistently outperforms one-stage meta-detectors and many two-stage FSODs.