- [pdf] [supp]
E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential Equations
Event cameras are bio-inspired sensors that measure per-pixel intensity changes asynchronously. These cameras operate in a low-latency, high dynamic range (HDR) and output a stream of events that encode the location, time and sign of the brightness changes. In recent years, numerous researchers in the field of event-based vision attempted to reconstruct HDR videos from events. However, these works do not provide videos with sufficiently good quality due to unrealistic artifacts, such as lack of temporal information from irregular and discontinuous data and deterministic modeling for continuous-time stochastic process. In this study, we overcome these difficulties by introducing a new model called E2V-SDE, which is a neural continuous time-state model consisting of a latent stochastic differential equation and a conditional distribution of the observation. Based on the learned dynamics, our model can rapidly reconstruct HDR video at desired time steps and make realistic predictions on unseen data called interpolation or extrapolation, which is not possible in previous works. In addition, we successfully adopted a variety of image composition techniques in video reconstruction with event data, further improving image clarity and temporal consistency between adjacent frames. By conducting extensive experiments on simulated and real-scene datasets, we verify that our model outperforms state-of-the-art approaches under various video reconstruction settings. In terms of image quality, the LPIPS score improves by up to 12% and the reconstruction speed is 87% higher than that of ET-Net.