-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Xie_2025_ICCV, author = {Xie, Liuyue and Guo, Jiancong and Cakmakci, Ozan and Araujo, Andre and Jeni, L\'aszl\'o A. and Jia, Zhiheng}, title = {AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {26901-26911} }
AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
Abstract
Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by 8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.
Related Material