-
[pdf]
[bibtex]@InProceedings{Choi_2024_CVPR, author = {Choi, Jiho and Hwang, Gyutae and Lee, Sang Jun}, title = {DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {7850-7858} }
DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes
Abstract
Neural radiance fields have emerged in the field of autonomous driving which contributes to improve perception of the complex 3D environment through the reconstruction of geometry and appearance. Moving objects and sky for outdoor environment is challenging to optimize the NeRF model. Previous work addresses these challenges through preprocessing such as masking; however the masking process requires additional ground-truth data and a segmentation network. We propose DiCo-NeRF an approach for driving scenes by leveraging cosine similarity map differences of vision-language aligned model. DiCo-NeRF investigates the correlation between rendered patches and pre-defined text and adjusts the loss of challenging patches such as moving objects and the sky. Our neural radiance field utilizes embedding vectors from a pre-trained CLIP to obtain the cosine similarity maps. We introduce SimLoss a loss function aimed at regulating the color field of NeRF based on the quantified distribution differences between ground-truth and rendered similarity maps. Unlike previous NeRF models that used driving datasets our approach does not require additional input such as sensor data to the model. Experimental results demonstrate that the incorporation of language semantic cues improves the performance of the novel view synthesis task particularly in complex driving environments. We conducted experiments that included fisheye driving scenes from the KITTI-360 and real-world datasets. Our code is available at https://github.com/ziiho08/DiCoNeRF.
Related Material