Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

Yang, Fengyu; Feng, Chao; Chen, Ziyang; Park, Hyoungseob; Wang, Daniel; Dou, Yiming; Zeng, Ziyao; Chen, Xien; Gangopadhyay, Rit; Owens, Andrew; Wong, Alex

Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 26340-26353

Abstract

The ability to associate touch with other modalities has huge implications for humans and computational systems. However multimodal learning with touch remains challenging due to the expensive data collection process and non-standardized sensor outputs. We introduce UniTouch a unified tactile model for vision-based touch sensors connected to multiple modalities including vision language and sound. We achieve this by aligning our UniTouch embeddings to pretrained image embeddings already associated with a variety of other modalities. We further propose learnable sensor-specific tokens allowing the model to learn from a set of heterogeneous tactile sensors all at the same time. UniTouch is capable of conducting various touch sensing tasks in the zero-shot setting from robot grasping prediction to touch image question answering. To the best of our knowledge UniTouch is the first to demonstrate such capabilities.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Yang_2024_CVPR, author = {Yang, Fengyu and Feng, Chao and Chen, Ziyang and Park, Hyoungseob and Wang, Daniel and Dou, Yiming and Zeng, Ziyao and Chen, Xien and Gangopadhyay, Rit and Owens, Andrew and Wong, Alex}, title = {Binding Touch to Everything: Learning Unified Multimodal Tactile Representations}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {26340-26353} }