Organizing Unstructured Image Collections using Natural Language

Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 8907-8918

Abstract


In this work, we introduce and study the novel task of Open-ended Semantic Multiple Clustering (OpenSMC). Given a large, unstructured image collection, the goal is to automatically discover several, diverse semantic clustering criteria (e.g., Activity or Location) from the images, and subsequently organize them according to the discovered criteria, without requiring any human input. Our framework, X-Cluster: eXploratory Clustering, treats text as a reasoning proxy: it concurrently scans the entire image collection, proposes candidate criteria in natural language, and groups images into meaningful clusters per criterion. This radically differs from previous works, which either assume predefined clustering criteria or fixed cluster counts. To evaluate X-Cluster, we create two new benchmarks, COCO-4C and Food-4C, each annotated with four distinct grouping criteria and corresponding cluster labels. Experiments show that X-Cluster can effectively reveal meaningful partitions on several datasets. Finally, we use X-Cluster to achieve various real-world applications, including uncovering hidden biases in text-to-image (T2I) generative models and analyzing image virality on social media. Project page: https://oatmealliu.github.io/xcluster.html

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Liu_2026_CVPR, author = {Liu, Mingxuan and Zhong, Zhun and Li, Jun and Franchi, Gianni and Roy, Subhankar and Ricci, Elisa}, title = {Organizing Unstructured Image Collections using Natural Language}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings}, month = {June}, year = {2026}, pages = {8907-8918} }