Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching

Xu, Peng; Xiang, Zhiyu; Qiao, Chengyu; Fu, Jingyun; Pu, Tianyu

Peng Xu, Zhiyu Xiang, Chengyu Qiao, Jingyun Fu, Tianyu Pu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 5135-5144

Abstract

Despite the great success of deep learning in stereo matching recovering accurate disparity maps is still challenging. Currently L1 and cross-entropy are the two most widely used losses for stereo network training. Compared with the former the latter usually performs better thanks to its probability modeling and direct supervision to the cost volume. However how to accurately model the stereo ground-truth for cross-entropy loss remains largely under-explored. Existing works simply assume that the ground-truth distributions are uni-modal which ignores the fact that most of the edge pixels can be multi-modal. In this paper a novel adaptive multi-modal cross-entropy loss (ADL) is proposed to guide the networks to learn different distribution patterns for each pixel. Moreover we optimize the disparity estimator to further alleviate the bleeding or misalignment artifacts in inference. Extensive experimental results show that our method is generic and can help classic stereo networks regain state-of-the-art performance. In particular GANet with our method ranks 1st on both the KITTI 2015 and 2012 benchmarks among the published methods. Meanwhile excellent synthetic-to-realistic generalization performance can be achieved by simply replacing the traditional loss with ours. Code is available at https://github.com/xxxupeng/ADL.

Related Material

[pdf] [arXiv]

[bibtex]

@InProceedings{Xu_2024_CVPR, author = {Xu, Peng and Xiang, Zhiyu and Qiao, Chengyu and Fu, Jingyun and Pu, Tianyu}, title = {Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {5135-5144} }