End-To-End Learning of Driving Models From Large-Scale Video Datasets

Huazhe Xu, Yang Gao, Fisher Yu, Trevor Darrell; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2174-2182

Abstract


Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm. We provide a novel large-scale dataset of crowd-sourced driving behavior suitable for training our model, and report results predicting the driver action on held out sequences across diverse conditions.

Related Material


[pdf] [video]
[bibtex]
@InProceedings{Xu_2017_CVPR,
author = {Xu, Huazhe and Gao, Yang and Yu, Fisher and Darrell, Trevor},
title = {End-To-End Learning of Driving Models From Large-Scale Video Datasets},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {July},
year = {2017}
}