Sparse Instance Conditioned Multimodal Trajectory Prediction

Yonghao Dong, Le Wang, Sanping Zhou, Gang Hua; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 9763-9772

Abstract


Pedestrian trajectory prediction is critical in many vision tasks but challenging due to the multimodality of the future trajectory. Most existing methods predict multimodal trajectories conditioned by goals (future endpoints) or instances (all future points). However, goal-conditioned methods ignore the intermediate process and instance-conditioned methods ignore the stochasticity of pedestrian motions. In this paper, we propose a simple yet effective Sparse Instance Conditioned Network (SICNet), which gives a balanced solution between goal-conditioned and instance-conditioned methods. Specifically, SICNet learns comprehensive sparse instances, i.e., representative points of the future trajectory, through a mask generated by a long short-term memory encoder and uses the memory mechanism to store and retrieve such sparse instances. Hence SICNet can decode the observed trajectory into the future prediction conditioned on the stored sparse instance. Moreover, we design a memory refinement module that refines the retrieved sparse instances from the memory to reduce memory recall errors. Extensive experiments on ETH-UCY and SDD datasets show that our method outperforms existing state-of-the-art methods. In addition, ablation studies demonstrate the superiority of our method compared with goal-conditioned and instance-conditioned approaches.

Related Material


[pdf]
[bibtex]
@InProceedings{Dong_2023_ICCV, author = {Dong, Yonghao and Wang, Le and Zhou, Sanping and Hua, Gang}, title = {Sparse Instance Conditioned Multimodal Trajectory Prediction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {9763-9772} }