Analyzing Results of Depth Estimation Models With Monocular Criteria
Monocular depth estimation is an essential but ill-posed (computer) vision task. While human visual perception of depth relies on several monocular depth clues, such as occlusion of objects, relative height, usual object size, linear perspective, deep learning models have to implicitly learn these cues from labeled training data to determine depth. In this paper, we investigate whether monocular depth criteria from human vision are violated for certain image instances given a model's predictions. We consider the task of depth estimation as a ranking problem, i.e., for a given pair of points, we estimate which point is nearer to the camera. In particular, we model four monocular depth criteria to automatically predict a subset of point pairs and infer their depth relation. Our experiments show that the implemented depth criteria achieve comparable performance to deep learning models. This allows the investigation of models with regard to the plausibility of predictions by finding image instances where the prediction is incorrect according to modeled human visual perception.