Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Shashanka Venkataramanan, Valentinos Pariza, Mohammadreza Salehi, Lukas Knobel, Elias Ramzi, Spyros Gidaris, Andrei Bursuc, Yuki M Asano; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 10533-10544

Abstract


We present Franca (pronounced Fran-ka): free one; the first fully open-source (data, code, weights) vision foundation model that matches and in many cases surpasses the performance of state-of-the-art proprietary models, e.g., DINOv2, CLIP, SigLIPv2, etc. Our approach is grounded in a transparent training pipeline inspired by Web-SSL and uses publicly available data: ImageNet-21K and a subset of ReLAION-2B. Beyond model release, we tackle critical limitations in self-supervised learning clustering methods. Existing approaches assign image features to large codebooks via clustering algorithms such as Sinkhorn-Knopp, but they often overlook the inherent ambiguity in cluster semantics. To address this, we introduce a multi-head clustering projector based on nested Matryoshka representations. This design progressively refines features into increasingly fine-grained clusters without increasing the model size, producing higher-quality dense representations. Additionally, we propose a novel positional disentanglement strategy that explicitly removes positional biases from dense representations.This leads to consistent gains on several downstream benchmarks, demonstrating the utility of cleaner feature spaces. Our contributions establish a new standard for transparent, high-performance vision models and open a path toward more reproducible and generalizable foundation models for the broader AI community.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Venkataramanan_2026_CVPR, author = {Venkataramanan, Shashanka and Pariza, Valentinos and Salehi, Mohammadreza and Knobel, Lukas and Ramzi, Elias and Gidaris, Spyros and Bursuc, Andrei and Asano, Yuki M}, title = {Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {10533-10544} }