CheXFusion: Effective Fusion of Multi-View Features Using Transformers for Long-Tailed Chest X-Ray Classification

Kim, Dongkyun

Dongkyun Kim; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2702-2710

Abstract

Medical image classification poses unique challenges due to the long-tailed distribution of diseases, the cooccurrence of diagnostic findings, and the multiple views available for each study or patient. This paper introduces our solution to the ICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays. Our approach introduces CheXFusion, a transformer-based fusion module incorporating multi-view images. The fusion module, guided by self-attention and cross-attention mechanisms, efficiently aggregates multiview features while considering label co-occurrence. Furthermore, we explore data balancing and self-training methods to optimize the model's performance. Our solution achieves state-of-the-art results with 0.372 mAP in the MIMIC-CXR test set, securing 1st place in the competition. Our success in the task underscores the significance of considering multi-view settings, class imbalance, and label co-occurrence in medical image classification. Public code is available at https://github. com/dongkyuk/CXR-LT-public-solution.

Related Material

[pdf]

[bibtex]

@InProceedings{Kim_2023_ICCV, author = {Kim, Dongkyun}, title = {CheXFusion: Effective Fusion of Multi-View Features Using Transformers for Long-Tailed Chest X-Ray Classification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {2702-2710} }