-
[pdf]
[bibtex]@InProceedings{Sun_2026_CVPR, author = {Sun, Youhan and Rao, Jiahua and Du, Kangrui and Xie, Jiancong and Yang, Yuedong}, title = {Predicting Spatial Transcriptomics from Histology Images via High-Order Multi-Cell Interaction Modeling}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {19781-19790} }
Predicting Spatial Transcriptomics from Histology Images via High-Order Multi-Cell Interaction Modeling
Abstract
Spatial transcriptomics (ST) links gene expression to tissue architecture and enables predicting spatial expression from H&E-stained whole-slide images (WSIs). However, existing spot- or slide-level predictors focus on single-spot features or pairwise relations, failing to capture high-order, many-to-many cross-cell interactions. As a result, they miss synergistic and antagonistic effects among multiple neighboring cells. Here, we introduce MCToGene, a scalable and accurate framework that explicitly models multi-cell interactions via many-body attention with hierarchical coupling to predict spatial gene expression. MCToGene employs a many-body attention module to encode high-order, many-to-many cross-cell dependencies, enabling context-aware microenvironment modeling. To mitigate the combinatorial burden of many-body modeling, we design a hierarchical interaction module that couples pairwise and many-body representations for feature aggregation and prediction, preserving many-body expressiveness while controlling computation and memory. On HEST-1k and STImage-1K4M, MCToGene surpasses state-of-the-art baselines with 7.85% relative improvement. Ablations confirm that explicit high-order, many-to-many modeling drives these gains, and visualizations demonstrate that multi-cell interactions are essential for biologically coherent spatial predictions.
Related Material

