Towards Hierarchical Regional Transformer-Based Multiple Instance Learning

Josef Cersovsky, Sadegh Mohammadi, Dagmar Kainmueller, Johannes Hoehne; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3952-3960

Abstract


The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning method that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We additionally propose a method that fuses regional patch information to derive slide-level predictions. We then show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we also suggest a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Cersovsky_2023_ICCV, author = {Cersovsky, Josef and Mohammadi, Sadegh and Kainmueller, Dagmar and Hoehne, Johannes}, title = {Towards Hierarchical Regional Transformer-Based Multiple Instance Learning}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3952-3960} }