ES3Net: Accurate and Efficient Edge-Based Self-Supervised Stereo Matching Network
Efficient and accurate depth estimation is crucial for real-world embedded vision applications, such as autonomous driving, 3D reconstruction, and drone navigation. Stereo matching is considered more accurate than monocular depth estimation due to the presence of a reference image, but its computational inefficiency poses a challenge for its deployment on edge devices. Moreover, it is difficult to acquire ground-truth depths for supervised training of stereo matching networks. To address these challenges, we propose Edge-based Self-Supervised Stereo matching Network (ES3Net), which efficiently estimates accurate depths without ground-truth depths for training. We introduce dual disparity to transform an efficient supervised stereo matching network into a self-supervised learning framework. Comprehensive experimental results demonstrate that ES3Net has comparable accuracy with stereo methods while outperforming monocular methods in inference time, approaching state-of-the-art performance. More specifically, our method improves over 40% in terms of RMSElog, compared to monocular methods while having 1500 times fewer parameters and running four times faster on NVIDIA Jetson TX2. The efficient and reliable estimation of depths on edge devices using ES3Net lays a good foundation for safe drone navigation.