- [pdf] [supp]
EFI-Net: Video Frame Interpolation From Fusion of Events and Frames
Event cameras are sensors with pixels that respond independently and asynchronously to changes in scene illumination. Event cameras have a number of advantages when compared to conventional cameras: low-latency, high temporal resolution, high dynamic range, low power and sparse data output. However, existing event cameras also suffer from comparatively low spatial resolution and are sensitive to noise. Recently, it has been shown that it is possible to reconstruct an intensity frame stream from an event stream. These reconstructions preserve the high temporal rate of the event stream, but tend to suffer from significant artifacts and low image quality due to the shortcomings of event cameras. In this work we demonstrate that it is possible to combine the best of both worlds, by fusing a color frame stream at low temporal resolution and high spatial resolution with an event stream at high temporal resolution and low spatial resolution to generate a video stream with both high temporal and spatial resolutions while preserving the original color information. We utilize a novel event frame interpolation network (EFI-Net), a multi-phase convolutional neural network which fuses the frame and event streams. EFI-Net is trained using only simulated data and generalizes exceptionally well to real-world experimental data. We show that our method is able to interpolate frames where traditional video interpolation approaches fail, while also outperforming event-only reconstructions. We further contribute a new dataset, containing event camera data synchronized with high speed video. This work opens the door to a new application for event cameras, enabling high fidelity fusion with frame based image streams for generation of high-quality high-speed video.