TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation

Chen, Yinda; Shi, Haoyuan; Liu, Xiaoyu; Shi, Te; Zhang, Ruobing; Liu, Dong; Xiong, Zhiwei; Wu, Feng

Yinda Chen, Haoyuan Shi, Xiaoyu Liu, Te Shi, Ruobing Zhang, Dong Liu, Zhiwei Xiong, Feng Wu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 13604-13613

Abstract

Neuron segmentation from electron microscopy (EM) volumes is crucial for understanding brain circuits, yet the complex neuronal structures in high-resolution EM images present significant challenges. EM data exhibits unique characteristics including high noise levels, anisotropic voxel dimensions, and ultra-long spatial dependencies that make traditional vision models inadequate. Inspired by autoregressive pretraining in language models, we propose TokenUnify, a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives. TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space with emergent properties. From an information-theoretic perspective, these three tasks are complementary and provide optimal coverage of visual data structure, with our approach reducing autoregressive error accumulation from O(K) to O(\sqrt K ) for sequences of length K. We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity. Leveraging the Mamba architecture's linear-time sequence modeling capabilities, TokenUnify achieves a 44% performance improvement on downstream neuron segmentation and outperforms MAE by 25%. Our approach demonstrates superior scaling properties as model size increases, effectively bridging the gap between pretraining strategies for language and vision models. Code is available at https://github.com/ydchen0806/TokenUnify.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chen_2025_ICCV, author = {Chen, Yinda and Shi, Haoyuan and Liu, Xiaoyu and Shi, Te and Zhang, Ruobing and Liu, Dong and Xiong, Zhiwei and Wu, Feng}, title = {TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {13604-13613} }