CLIPSym: Delving into Symmetry Detection with CLIP

Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 21003-21013

Abstract


Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models, i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and G-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Yang_2025_ICCV, author = {Yang, Tinghan and Rahman, Md Ashiqur and Yeh, Raymond A.}, title = {CLIPSym: Delving into Symmetry Detection with CLIP}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {21003-21013} }