- [pdf] [arXiv]
MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting Through Multi-View Fusion of LiDAR Data
In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a singular view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method to successfully leverage the complementary strengths of both views. We accomplish this by proposing a novel method to effectively utilize both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network, as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our novel multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show the scalability of MVFuseNet with respect to increased operating range, by demonstrating real-time performance.