Active Learning for Imbalanced Datasets

Umang Aggarwal, Adrian Popescu, Celine Hudelot; The IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1428-1437


Active learning increases the effectiveness of labeling when only subsets of unlabeled datasets can be processed manually. To our knowledge, existing algorithms are designed under the assumption that datasets are balanced. However, many real-life datasets are actually imbalanced and we propose two adaptations of active learning to tackle imbalance. First, we modify acquisition functions to select samples by taking advantage of a deep model pretrained on a source domain. Second, we introduce a balancing step in the acquisition process to reduce the imbalance of the labeled subset. Evaluation is done with four imbalanced datasets using existing active learning methods and their modifications introduced here. Results show that our adaptations are useful as long as knowledge from the source domain is transferable to target domains.

Related Material

author = {Aggarwal, Umang and Popescu, Adrian and Hudelot, Celine},
title = {Active Learning for Imbalanced Datasets},
booktitle = {The IEEE Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2020}