TempA-VLP: Temporal-Aware Vision-Language Pretraining for Longitudinal Exploration in Chest X-ray Image

Yang, Zhuoyi; Shen, Liyue

Zhuoyi Yang, Liyue Shen; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 4625-4634

Abstract

Longitudinal medical image processing is a significant task to understand the dynamic changes of disease by taking and comparing image series over time providing insights into how conditions evolve and enabling more accurate diagnosis and treatment planning. While recent advancements in biomedical Vision-Language Pre-training (VLP) have enabled label-efficient representation learning with paired medical images and reports existing methods primarily pair a single image with the corresponding textual report limiting their ability to capture temporal relationships. To address this limitation it is essential to learn temporal-aware cross-modal representations from sequential medical images and text reports that highlight the temporal changes occurring between examinations. Specifically we introduce TempA-VLP a temporal-aware vision language pre-training framework with a cross-exam encoder to integrate the information from both prior and current examinations. This approach enables the model to capture dynamic representations that reflect disease progression over time which allows us to (i) achieve state-of-the-art performance in disease progression classification (ii) localize dynamic progression regions across consecutive examinations as demonstrated in our new task dynamic phrase grounding on the Chest-Imagenome Gold dataset and (iii) highlight progression localized regions often relevant to lesion areas which in turn improves disease classification tasks on a single image.

Related Material

[pdf]

[bibtex]

@InProceedings{Yang_2025_WACV, author = {Yang, Zhuoyi and Shen, Liyue}, title = {TempA-VLP: Temporal-Aware Vision-Language Pretraining for Longitudinal Exploration in Chest X-ray Image}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {4625-4634} }