Trajectory Unified Transformer for Pedestrian Trajectory Prediction

Liushuai Shi, Le Wang, Sanping Zhou, Gang Hua; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 9675-9684

Abstract


Pedestrian trajectory prediction is an essentially connecting link to understanding human behavior. Recent works achieve state-of-the-art performance gained from the hand-designed post-processing, e.g., clustering. However, this post-processing suffers from expensive inference time and neglects the probability of the predicted trajectory disturbing downstream safety decisions. In this paper, we present Trajectory Unified TRansformer, called TUTR, which unifies the trajectory prediction components, social interaction and multimodal trajectory prediction, into a transformer encoder-decoder architecture to effectively remove the need for post-processing. Specifically, TUTR parses the relationships across various motion modes by an explicit global prediction and an implicit mode-level transformer encoder. Then, TUTR attends to the social interactions with neighbors by a social-level transformer decoder. Finally, a dual prediction forecasts diverse trajectories and corresponding probabilities in parallel without post-processing. TUTR achieves state-of-the-art accuracy performance and about 10x - 40x inference speed improvements compared with previous well-tuning state-of-the-art methods using post-processing.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Shi_2023_ICCV, author = {Shi, Liushuai and Wang, Le and Zhou, Sanping and Hua, Gang}, title = {Trajectory Unified Transformer for Pedestrian Trajectory Prediction}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {9675-9684} }