Multi-Input Fusion for Practical Pedestrian Intention Prediction

Ankur Singh, Upendra Suddamalla; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2304-2311

Abstract


Pedestrians are the most vulnerable road users and are at a high risk of fatal accidents. Accurate pedestrian detection and effectively analyzing their intentions to cross the road are critical for autonomous vehicles and ADAS solutions to safely navigate public roads. Faster and precise estimation of pedestrian intention helps in adopting safe driving behavior. Visual pose and motion are two important cues that have been previously employed to determine pedestrian intention. However, motion patterns can give erroneous results for short-term video sequences and are thus prone to mistakes. In this work, we propose an intention prediction network that utilizes pedestrian bounding boxes, pose, bounding box coordinates, and takes advantage of global context along with the local setting. This network implicitly learns pedestrians' motion cues and location information to differentiate between a crossing and a non-crossing pedestrian. We experiment with different combinations of input features and propose multiple efficient models in terms of accuracy and inference speeds. Our best-performing model shows around 85% accuracy on the JAAD dataset.

Related Material


[pdf]
[bibtex]
@InProceedings{Singh_2021_ICCV, author = {Singh, Ankur and Suddamalla, Upendra}, title = {Multi-Input Fusion for Practical Pedestrian Intention Prediction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2304-2311} }