An Unsupervised Temporal Consistency (TC) Loss To Improve the Performance of Semantic Segmentation Networks

Varghese, Serin; Gujamagadi, Sharat; Klingner, Marvin; Kapoor, Nikhil; Bar, Andreas; Schneider, Jan David; Maag, Kira; Schlicht, Peter; Huger, Fabian; Fingscheidt, Tim

Serin Varghese, Sharat Gujamagadi, Marvin Klingner, Nikhil Kapoor, Andreas Bar, Jan David Schneider, Kira Maag, Peter Schlicht, Fabian Huger, Tim Fingscheidt; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 12-20

Abstract

Deep neural networks (DNNs) for highly automated driving are often trained on a large and diverse dataset, and evaluation metrics are reported usually on a per-frame basis. However, when evaluated on video sequences, the predictions are often unstable between consecutive frames. As such unstable predictions over time can lead to severe safety consequences, there is a growing need to understand, evaluate, and improve the temporal consistency of DNNs. In this paper, we explore such a temporal characteristic and propose a novel unsupervised temporal consistency (TC) loss that penalizes unstable semantic segmentation predictions. This loss function is used in a two-stage training scheme to jointly optimize for both, the accuracy of semantic segmentation predictions, and its temporal consistency based on video sequences. We demonstrate that our training strategy helps in improving the temporal consistency of two state-of-the-art semantic segmentation networks on two different road-scenes datasets. We report an absolute 4.25% improvement in the mean temporal consistency (mTC) of the HRNetV2 network and an absolute 2.78% improvement on the DeepLabv3+ network, both evaluated on the Cityscapes dataset, with only a slight decrease in accuracy. When evaluating on the same video sequences using a synthetic dataset Sim KI-A, we show absolute improvements in both, accuracy (2.19% mIoU) and temporal consistency (0.21% mTC) for the DeepLabv3+ network. We confirm similar improvements for the HRNetV2 network.

Related Material

[pdf]

[bibtex]

@InProceedings{Varghese_2021_CVPR, author = {Varghese, Serin and Gujamagadi, Sharat and Klingner, Marvin and Kapoor, Nikhil and Bar, Andreas and Schneider, Jan David and Maag, Kira and Schlicht, Peter and Huger, Fabian and Fingscheidt, Tim}, title = {An Unsupervised Temporal Consistency (TC) Loss To Improve the Performance of Semantic Segmentation Networks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {12-20} }