Classifier Guided Cluster Density Reduction for Dataset Selection

Cheng Chang, Keyu Long, Zijian Li, Himanshu Rai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 7338-7347

Abstract


In this paper we address the challenge of selecting an optimal dataset from a source pool with annotations to enhance performance on a target dataset derived from a different source. This is important in scenarios where it is hard to afford on-the-fly dataset annotation and is also the theme of the second Visual Data Understanding (VDU) Challenge. Our solution the Classifier Guided Cluster Density Reduction (CCDR) framework operates in two stages. Initially we employ a filtering technique to identify images that align with the target dataset's distribution. Subsequently we implement a graph-based cluster density reduction method steered by a classifier that approximates the distance between the target distribution and source distribution. This classifier is trained to distinguish between images that resemble the target dataset and those that do not facilitating the pruning process shown in Figure 1. Our approach maintains a balance between selecting pertinent images that match the target distribution and eliminating redundant ones that do not contribute to the enhancement of the detection model. We demonstrate the superiority of our method over various baselines in object detection tasks particularly in optimizing the training set distribution on the region100 dataset. We have released our code here: https://github.com/himsR/DataCVChallenge-2024/tree/main

Related Material


[pdf]
[bibtex]
@InProceedings{Chang_2024_CVPR, author = {Chang, Cheng and Long, Keyu and Li, Zijian and Rai, Himanshu}, title = {Classifier Guided Cluster Density Reduction for Dataset Selection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7338-7347} }