Continual Learning of Unsupervised Monocular Depth From Videos

Chawla, Hemang; Varma, Arnav; Arani, Elahe; Zonooz, Bahram

Hemang Chawla, Arnav Varma, Elahe Arani, Bahram Zonooz; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 8419-8429

Abstract

Spatial scene understanding, including monocular depth estimation, is an important problem with various applications such as robotics and autonomous driving. While improvements in unsupervised monocular depth estimation have potentially allowed models to be trained on diverse crowdsourced videos, this remains under-explored as most methods utilize the standard training protocol wherein the models are trained from scratch on all data after new data is collected. Instead, continual training of models on sequentially collected data would significantly reduce computational and memory costs. Nevertheless, naive continual training leads to catastrophic forgetting, where the model performance deteriorates on older domains as it learns on newer domains, highlighting the trade-off between model stability and plasticity. While several techniques have been proposed to address this issue in image classification, the high-dimensional and spatiotemporally correlated outputs of depth estimation make it a distinct challenge. To the best of our knowledge, no framework or method currently exists focusing on the problem of continual learning in depth estimation. Thus, we introduce a framework that captures the challenges of continual unsupervised depth estimation (CUDE), and define the necessary metrics for evaluating model performance. We propose a rehearsal-based dual-memory method MonoDepthCL, which utilizes spatiotemporal consistency for continual learning in depth estimation, even when the camera intrinsics are unknown.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chawla_2024_WACV, author = {Chawla, Hemang and Varma, Arnav and Arani, Elahe and Zonooz, Bahram}, title = {Continual Learning of Unsupervised Monocular Depth From Videos}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {8419-8429} }