-
[pdf]
[supp]
[bibtex]@InProceedings{Wang_2023_ICCV, author = {Wang, Jingbo and Yuan, Ye and Luo, Zhengyi and Xie, Kevin and Lin, Dahua and Iqbal, Umar and Fidler, Sanja and Khamis, Sameh}, title = {Learning Human Dynamics in Autonomous Driving Scenarios}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {20796-20806} }
Learning Human Dynamics in Autonomous Driving Scenarios
Abstract
Simulation has emerged as an indispensable tool for scaling and accelerating the development of self-driving systems. A critical aspect of this is simulating realistic and diverse human behavior and intent. In this work, we propose a holistic framework for learning physically plausible human dynamics from real driving scenarios, narrowing the gap between real and simulated human behavior in safety-critical applications. We show that state-of-the-art methods underperform in driving scenarios where video data is recorded from moving vehicles, and humans are frequently partially or fully occluded. Furthermore, existing methods often disregard the global scene where humans are situated, resulting in various motion artifacts like foot sliding, floating, or ground penetration. Therefore, the primary technical challenge of this work is to infer physically plausible human dynamics for the occluded body parts on uneven terrain, based on visible motions. To address this challenge, we propose an approach that incorporates physics with a reinforcement learning-based motion controller to learn human dynamics for driving scenarios. Our framework can simulate physically plausible human dynamics that accurately match observed human motions and infill motions for occluded body parts, while improving the physical plausibility of the entire motion sequence. We evaluate our method on the challenging driving scenarios in the Waymo Open Dataset. Experiments on the challenging Waymo Open Dataset show that our method outperforms state-of-the-art motion capture approaches significantly in recovering high-quality, physically plausible, and scene-aware human dynamics.
Related Material