CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

Shen, Shuai; Li, Wanhua; Wang, Xiaobing; Zhang, Dafeng; Jin, Zhezhu; Zhou, Jie; Lu, Jiwen

Shuai Shen, Wanhua Li, Xiaobing Wang, Dafeng Zhang, Zhezhu Jin, Jie Zhou, Jiwen Lu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 20786-20795

Abstract

One of the most important yet rarely studied challenges for supervised face clustering is the large intra-class variance caused by different face attributes such as age, pose, and expression. Images of the same identity but with different face attributes usually tend to be clustered into different sub-clusters. For the first time, we proposed an attribute hallucination framework named CLIP-Cluster to address this issue, which first hallucinates multiple representations for different attributes with the powerful CLIP model and then pools them by learning neighbor-adaptive attention. Specifically, CLIP-Cluster first introduces a text-driven attribute hallucination module, which allows one to use natural language as the interface to hallucinate novel attributes for a given face image based on the well-aligned image-language CLIP space. Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance. The proxy feature is generated by adaptively attending to the hallucinated visual features and the source one based on the local neighbor information. On this basis, a graph built with the proxy representations is used for subsequent clustering operations. Extensive experiments show our proposed approach outperforms state-of-the-art face clustering methods with high inference efficiency.

Related Material

[pdf]

[bibtex]

@InProceedings{Shen_2023_ICCV, author = {Shen, Shuai and Li, Wanhua and Wang, Xiaobing and Zhang, Dafeng and Jin, Zhezhu and Zhou, Jie and Lu, Jiwen}, title = {CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {20786-20795} }