SplatTouch: Explicit 3D Representation Binding Vision and Touch

Antonio Luigi Stefani, Niccolò Bisagno, Nicola Conci, Francesco De Natale; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops, 2025, pp. 118-127

Abstract


When compared to vision, a touch sample captures information of a small area of an object without context, making it difficult to build a fully touchable 3D scene. Recent advances in touch perception have leveraged generative models to estimate tactile signals of novel samples using depth and RGB images extracted from implicit 3D scene representations. While these local contextual cues can provide sufficient information for tactile estimation, they limit accurate 3D touch localization in the scene. In this work, we introduce a novel explicit representation for multimodal 3D scene modeling that integrates both vision and touch. Our approach combines Gaussian Splatting (GS) for 3D scene representation with a diffusion-based generative model to infer missing tactile information from sparse samples and a contrastive approach for 3D touch localization. Unlike NeRF-based implicit methods, Gaussian Splatting enables the computation of an absolute 3D reference frame via Normalized Object Coordinate Space (NOCS) maps, facilitating structured, 3D-aware tactile generation. This framework not only improves tactile sample prompting but also enhances 3D tactile localization, overcoming the local constraints of prior implicit approaches. We demonstrate the effectiveness of our method in generating novel touch samples and localizing tactile interactions in 3D. Our results show that explicitly incorporating tactile information into Gaussian Splatting improves multimodal scene understanding, offering a significant step toward integrating touch into immersive virtual environments.

Related Material


[pdf]
[bibtex]
@InProceedings{Stefani_2025_CVPR, author = {Stefani, Antonio Luigi and Bisagno, Niccol\`o and Conci, Nicola and De Natale, Francesco}, title = {SplatTouch: Explicit 3D Representation Binding Vision and Touch}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {118-127} }