Learning Rare Category Classifiers on a Tight Labeling Budget

Ravi Teja Mullapudi, Fait Poms, William R. Mark, Deva Ramanan, Kayvon Fatahalian; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 8423-8432


Many real-world ML deployments face the challenge of training a rare category model with a small labeling bud- get. In these settings, there is often access to large amounts of unlabeled data, therefore it is attractive to consider semi-supervised or active learning approaches to reduce human labeling effort. However, prior approaches make two assumptions that do not often hold in practice; (a) one has access to a modest amount of labeled data to bootstrap learning and (b) every image belongs to a common category of interest. In this paper, we consider the scenario where we start with as-little-as five labeled positives of a rare category and a large amount of unlabeled data of which 99.9% of it is negatives. We propose an active semi-supervised method for building accurate models in this challenging setting. Our method leverages two key ideas: (a) Utilize human and machine effort where they are most effective; human labels are used to identify "needle-in-a-haystack" positives, while machine-generated pseudo-labels are used to identify negatives. (b) Adapt recently proposed representation learning techniques for handling extremely imbalanced human labeled data to iteratively train models with noisy machine labeled data. We compare our approach with prior active learning and semi-supervised approaches, demonstrating significant improvements in accuracy per unit labeling effort, particularly on a tight labeling budget.

Related Material

[pdf] [supp]
@InProceedings{Mullapudi_2021_ICCV, author = {Mullapudi, Ravi Teja and Poms, Fait and Mark, William R. and Ramanan, Deva and Fatahalian, Kayvon}, title = {Learning Rare Category Classifiers on a Tight Labeling Budget}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {8423-8432} }