Referring Expression Counting

Siyang Dai, Jun Liu, Ngai-Man Cheung; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16985-16995

Abstract


Existing counting tasks are limited to the class level which don't account for fine-grained details within the class. In real applications it often requires in-context or referring human input for counting target objects. Take urban analysis as an example fine-grained information such as traffic flow in different directions pedestrians and vehicles waiting or moving at different sides of the junction is more beneficial. Current settings of both class-specific and class-agnostic counting treat objects of the same class indifferently which pose limitations in real use cases. To this end we propose a new task named Referring Expression Counting (REC) which aims to count objects with different attributes within the same class. To evaluate the REC task we create a novel dataset named REC-8K which contains 8011 images and 17122 referring expressions. Experiments on REC-8K show that our proposed method achieves state-of-the-art performance compared with several text-based counting methods and an open-set object detection model. We also outperform prior models on the class agnostic counting (CAC) benchmark [36] for the zero-shot setting and perform on par with the few-shot methods. Code and dataset is available at https://github.com/sydai/referring-expression-counting.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Dai_2024_CVPR, author = {Dai, Siyang and Liu, Jun and Cheung, Ngai-Man}, title = {Referring Expression Counting}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {16985-16995} }