Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images

Sun, Xiangyu; Jiang, Haoyi; Liu, Liu; Nam, Seungtae; Kang, Gyeongjin; Wang, Xinjie; Sui, Wei; Su, Zhizhong; Liu, Wenyu; Wang, Xinggang; Park, Eunbyung

Xiangyu Sun, Haoyi Jiang, Liu Liu, Seungtae Nam, Gyeongjin Kang, Xinjie Wang, Wei Sui, Zhizhong Su, Wenyu Liu, Xinggang Wang, Eunbyung Park; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 33280-33290

Abstract

Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics, directly from unposed multi-view images. Our approach leverages a Cross-View Transformer to robustly integrate information across arbitrary multi-view inputs, which then regresses a set of 3D Gaussian primitives endowed with semantic feature fields. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction--all within a single, feed-forward pass. Extensive experiments demonstrate that Uni3R sets a new state of the art across multiple benchmarks, including in-domain datasets such as RE10K and ScanNet, as well as the out-of-domain dataset Mip-NeRF360. This work represents a new paradigm toward generalizable and unified 3D scene reconstruction and understanding.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Sun_2026_CVPR, author = {Sun, Xiangyu and Jiang, Haoyi and Liu, Liu and Nam, Seungtae and Kang, Gyeongjin and Wang, Xinjie and Sui, Wei and Su, Zhizhong and Liu, Wenyu and Wang, Xinggang and Park, Eunbyung}, title = {Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {33280-33290} }