- [pdf] [supp]
Deep Fusion of Appearance and Frame Differencing for Motion Segmentation
Motion segmentation is a technique to detect and localize class-agnostic motion in videos. This motion is assumed to be relative to a stationary background and usually originates from objects such as vehicles or humans. When the camera moves, too, frame differencing approaches that do not have to model the stationary background over minutes, hours, or even days are more promising compared to background subtraction methods. In this paper, we propose a Deep Convolutional Neural Network (DCNN) for multi-modal motion segmentation: the current image contributes with appearance information to distinguish between relevant and irrelevant motion and frame differencing captures the temporal information, which is the scene's motion independent of the camera motion. We fuse this information to receive an effective and efficient approach for robust motion segmentation. The effectiveness is demonstrated using the multi-spectral CDNet-2014 dataset that we re-labeled for motion segmentation. We specifically show that we can detect tiny moving objects significantly better compared to methods based on optical flow.