-
[pdf]
[bibtex]@InProceedings{Hatami_2025_WACV, author = {Hatami, Parisa and Shoman, Maged and Sartipi, Mina}, title = {Open-World Hazard Detection and Captioning for Autonomous Driving with a Unified Multimodal Pipeline}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {686-694} }
Open-World Hazard Detection and Captioning for Autonomous Driving with a Unified Multimodal Pipeline
Abstract
Autonomous driving systems must operate reliably in dynamic environments where uncommon or unanticipated hazards frequently arise. Traditional perception models often rely on closed-set recognition restricting them to predefined object categories and limiting their ability to address novel or rare obstacles. To tackle these open-world challenges we propose an integrated pipeline that merges image enhancement optical flow depth estimation semantic segmentation and vision-language models. We evaluate our method on the COOOL dataset of 200 annotated dashcam videos containing both standard and previously unseen hazards. By applying depth filtering and road segmentation our system focuses on objects along the drivable surface. In parallel optical flow analysis captures driver reactions adding a temporal dimension that supports hazard assessment. A vision-language module then generates concise semantically relevant captions for novel objects enhancing interpretability. Experimental results show that our unified pipeline consistently improves hazard detection and captioning performance in open-world scenarios underscoring the need for flexible perception strategies that advance autonomous driving safety.
Related Material