- [pdf] [supp]
Soft Cross Entropy Loss and Bottleneck Tri-Cost Volume for Efficient Stereo Depth Prediction
Real-time, robust, and accurate stereo depth-prediction algorithms deliver cutting-edge performance in applications ranging from autonomous driving to augmented reality. Many state-of-the-art approaches produce subpixel error and subsecond runtimes on commodity hardware, but improving even these remains an area of active research. We focus on improving accuracy and efficiency in stereo-based depth prediction by contributing two generic techniques to improve performance and runtime. First, we propose encoding the ground truth disparity as a discrete distribution that can be trained via cross-entropy loss. Specifically, we use the minimum variance and unbiased 'Soft' encoding, where two adjacent bins are weighted so the expected value is ground truth. We demonstrate that training with cross entropy loss using this encoding decreases error rate by 10% on synthetic and LIDAR datasets over the more popular regression losses such as Huber and MAE. Second, we propose a bottleneck tri-cost volume composed of the sum of absolute difference of the features as well as two reference channels. Replacing the standard 64-channel concatenation popular in state-of-the-art networks with this 3-channel cost-volume maintains metric performance and can reduce runtime by over 22% on PSM-Net architectures.