-
[pdf]
[bibtex]@InProceedings{Zhang_2026_CVPR, author = {Zhang, Jiansong and Yang, Xiaying and Luo, Xiaoling and Shen, Linlin}, title = {EchoVDiff: Cardiac-Cycle Echocardiography Video Generation from Arbitrary Single Frame}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {9040-9050} }
EchoVDiff: Cardiac-Cycle Echocardiography Video Generation from Arbitrary Single Frame
Abstract
Reconstructing a physiologically plausible cardiac video from a single image remains a fundamental challenge in generative modeling, owing to the complex and nonlinear periodic dynamics of echocardiography. Previous image-to-video (I2V) approaches primarily focus on temporal continuity, yet often struggle to capture the intrinsic periodicity of cardiac motion, leading to limited temporal coherence and semantic consistency. We present EchoVDiff, a novel phase-aware diffusion model that reconstructs a full cardiac cycle from any single frame. Instead of direct pixel synthesis, EchoVDiff integrates physiological priors into a diffusion paradigm, learning interpretable mappings between cardiac phase, anatomy, and motion. By jointly modeling temporal rhythm and spatial semantics within a disentangled latent space, it achieves controllable and physiologically consistent generation. Extensive experiments on EchoNet-Dynamic and EchoNet-Pediatric demonstrate that EchoVDiff consistently surpasses state-of-the-art methods in both fidelity and temporal coherence. Remarkably, it enables accurate reconstruction of complete cardiac cycles from arbitrary phases, marking the first demonstration of single-frame-driven echocardiographic video generation.
Related Material

