-
[pdf]
[bibtex]@InProceedings{Lan_2026_CVPR, author = {Lan, Hongbo and An, Zhenlin and Li, Haoyu and Singh, Vaibhav and Shangguan, Longfei}, title = {Efficient Structure-Guided 3D Physical Property Reasoning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {8391-8400} }
Efficient Structure-Guided 3D Physical Property Reasoning
Abstract
Inferring an object's physical properties such as material type and surface hardness from visual observations is essential for augmented reality, robotic perception, and embodied intelligence. However, existing solutions to physical property reasoning like NeRF2Physics are computationally expensive and error-prone because they interpolate sparse, noisy CLIP features across dense 3D scenes. This creates a fundamental conflict between the pursuit of high semantic resolution and high reasoning efficiency while making the system sensitive to oblique or low-quality viewpoints. We introduce a lightweight, structure-guided framework that achieves fine-grained semantic consistency for physical property reasoning with orders-of-magnitude lower computational cost. Our key insight is that the 3D structural priors offer a stronger cue for the object's semantic organization, which allows us to avoid the dense interpolation for physical property reasoning. We project 2D DINO embeddings into 3D for coarse component segmentation, perform adaptive sparse sampling of representative CLIP source points, and apply a view-quality-aware patch selection with probability-weighted aggregation. These designs successfully eliminate dense interpolation, suppress noisy viewpoints, and drastically cut the number of CLIP queries. Extensive experiments on ABO dataset demonstrate our method reduces end-to-end runtime from hundreds of seconds to mere seconds per scene while improving semantic accuracy, spatial coherence, and downstream physical-property inference.
Related Material

