TransBlast: Self-Supervised Learning Using Augmented Subspace With Transformer for Background/Foreground Separation
Background/Foreground separation is a fundamental and challenging task of many computer vision applications. The F-measure performance of state-of-the-art models is limited due to the lack of fine details in the predicted output (i.e., the foreground object) and the limited labeled data. In this paper, we propose a background/foreground separation model based on a transformer that has a higher learning capacity than the convolutional neural networks. The model is trained using self-supervised learning to leverage the limited data and learn a strong object representation that is invariant to changes. The proposed method, dubbed TransBlast, reformulates the background/foreground separation problem in self-supervised learning using the augmented subspace loss function. The augmented subspace loss function consists of two components: 1) the cross-entropy loss function and 2) the subspace that depends on Singular Value Decomposition (SVD). The proposed model is evaluated using three benchmarks, namely CDNet, DAVIS, and SegTrackV2. The performance of TransBlast outperforms state-of-the-art background/foreground separation models in terms of F-measure.