OTCXR: Rethinking Self-Supervised Alignment using Optimal Transport for Chest X-ray Analysis

Vandan Gorade, Azad Singh, Deepak Mishra; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 7143-7152

Abstract


Self-supervised learning (SSL) has emerged as a promising technique for analyzing medical modalities such as X-rays due to its ability to learn without annotations. However conventional SSL methods face challenges in achieving semantic alignment and capturing subtle details which limits their ability to accurately represent the underlying anatomical structures and pathological features. To address these limitations we propose OTCXR a novel SSL framework that leverages optimal transport (OT) to learn dense semantic invariance. By integrating OT with our innovative Cross-Viewpoint Semantics Infusion Module (CV-SIM) OTCXR enhances the model's ability to capture not only local spatial features but also global contextual dependencies across different viewpoints. This approach enriches the effectiveness of SSL in the context of chest radiographs. Furthermore OTCXR incorporates variance and covariance regularizations within the OT framework to prioritize clinically relevant information while suppressing less informative features. This ensures that the learned representations are comprehensive and discriminative particularly beneficial for tasks such as thoracic disease diagnosis. We validate OTCXR's efficacy through comprehensive experiments on three publicly available chest X-ray datasets. Our empirical results demonstrate the superiority of OTCXR over state-of-the-art methods across all evaluated tasks confirming its capability to learn semantically rich representations.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gorade_2025_WACV, author = {Gorade, Vandan and Singh, Azad and Mishra, Deepak}, title = {OTCXR: Rethinking Self-Supervised Alignment using Optimal Transport for Chest X-ray Analysis}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {7143-7152} }