Connecting NeRFs Images and Text

Francesco Ballerini, Pierluigi Zama Ramirez, Roberto Mirabella, Samuele Salti, Luigi Di Stefano; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 866-876


Neural Radiance Fields (NeRFs) have emerged as a standard framework for representing 3D scenes and objects introducing a novel data type for information exchange and storage. Concurrently significant progress has been made in multimodal representation learning for text and image data. This paper explores a novel research direction that aims to connect the NeRF modality with other modalities similar to established methodologies for images and text. To this end we propose a simple framework that exploits pre-trained models for NeRF representations alongside multimodal models for text and image processing. Our framework learns a bidirectional mapping between NeRF embeddings and those obtained from corresponding images and text. This mapping unlocks several novel and useful applications including NeRF zero-shot classification and NeRF retrieval from images or text.

Related Material

[pdf] [arXiv]
@InProceedings{Ballerini_2024_CVPR, author = {Ballerini, Francesco and Ramirez, Pierluigi Zama and Mirabella, Roberto and Salti, Samuele and Di Stefano, Luigi}, title = {Connecting NeRFs Images and Text}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {866-876} }