Longitudinal Multimodal Modeling for Alzheimer's Disease with Pre-trained Brain Latent Diffusion and Mixture-of-Experts Fusion

Zeqing Li, Linlin Gao, Liming Dong, Hao Huang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 4343-4350

Abstract


Tracking the complex evolutionary trajectory of Alzheimer's disease (AD) is significantly enhanced by integrating longitudinal structural magnetic resonance imaging (sMRI) with dynamic clinical tabular records. However, joint modeling is challenged by the optimization bottlenecks of high-dimensional sMRI sequences, irregular temporal sampling, tabular heterogeneity, and cross-modal dominance. We propose a novel multimodal framework leveraging pre-trained Brain Latent Diffusion Models (LDM) and Mixture-of-Experts (MoE) fusion to integrate these modalities. Specifically, our longitudinal Brain LDM incorporates a SpatioTemporal Transformer into a pre-trained diffusion architecture, utilizing its 3D generative priors to extract continuous anatomical trajectories from irregular sMRI. Concurrently, a Longitudinal Tabular Transformer explicitly encodes heterogeneous tabular dynamics. To prevent modality dominance, a dynamic MoE router adaptively balances domain-specific and cross-modal shared representations. Experiments on the ADNI dataset demonstrate that our method achieves state-of-the-art performance, outperforming competitive methods by a substantial margin. The source code is publicly available at https://anonymous.4open.science/r/TLDM-2BC0.

Related Material


[pdf]
[bibtex]
@InProceedings{Li_2026_CVPR, author = {Li, Zeqing and Gao, Linlin and Dong, Liming and Huang, Hao}, title = {Longitudinal Multimodal Modeling for Alzheimer's Disease with Pre-trained Brain Latent Diffusion and Mixture-of-Experts Fusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {4343-4350} }