Gated Spatio-Temporal Attention-Guided Video Deblurring
Video deblurring remains a challenging task due to the complexity of spatially and temporally varying blur. Most of the existing works depend on implicit or explicit alignment for temporal information fusion, which either increases the computational cost or results in suboptimal performance due to misalignment. In this work, we investigate two key factors responsible for deblurring quality: how to fuse spatio-temporal information and from where to collect it. We propose a factorized gated spatio-temporal attention module to perform non-local operations across space and time to fully utilize the available information without depending on alignment. First, we perform spatial aggregation followed by a temporal aggregation step. Next, we adaptively distribute the global spatio-temporal information to each pixel. It shows superior performance compared to existing non-local fusion techniques while being considerably more efficient. To complement the attention module, we propose a reinforcement learning-based framework for selecting keyframes from the neighborhood with the most complementary and useful information. Moreover, our adaptive approach can increase or decrease the frame usage at inference time, depending on the user's need. Extensive experiments on multiple datasets demonstrate the superiority of our method.