240FPS Stereo Vision from Monocular Mixed Spikes

Yeliduosi Xiaokaiti, Yakun Chang, Yang Bai, Zhaojun Huang, Peiqi Duan, Boxin Shi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 26688-26697

Abstract


Stereo vision is fundamental for enabling machines to perceive and interact with the world. While monocular stereo methods offer hardware compactness, they struggle with generalization due to reliance on data-driven priors. Binocular and multi-view systems improve accuracy but incur higher hardware complexity and data inefficiency. In this paper, we introduce a monocular solution for high-framerate stereo vision via temporal optical modulation. The modulation directs light from two views onto a single sensor in a mixed manner, while periodically attenuating one view at 60 Hz. To capture the temporal variations introduced by this modulation, we employ a high-speed spike camera that records the mixed scene as temporally dense spikes. The high temporal resolution of these spikes enables the construction of a linear system for efficient binocular video decoupling. Consequently, we introduce a two-stage decoding methodology for achieving high-quality stereo vision: An efficient least-squares-based baseline reconstruction followed by a deep learning refinement module. Experimental results demonstrate that our approach achieves 240FPS binocular video reconstruction with superior accuracy compared to monocular systems, while maintaining the hardware compactness and data efficiency. Code is available at https://github.com/yongqiye00/MonoSpikeStereo.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Xiaokaiti_2026_CVPR, author = {Xiaokaiti, Yeliduosi and Chang, Yakun and Bai, Yang and Huang, Zhaojun and Duan, Peiqi and Shi, Boxin}, title = {240FPS Stereo Vision from Monocular Mixed Spikes}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {26688-26697} }