OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

Hao, Jing; Liang, Yuci; Lin, Lizhuo; Fan, Yuxuan; Zhou, Wenkai; Guo, Kaixin; Ye, Zanting; Sun, Yanpeng; Zhang, Xinyu; Yang, Yanqi; Li, Qiankun; Tang, Hao; Tsoi, James Kit-Hon; Shen, Linlin; Hung, Kuo Feng

Jing Hao, Yuci Liang, Lizhuo Lin, Yuxuan Fan, Wenkai Zhou, Kaixin Guo, Zanting Ye, Yanpeng Sun, Xinyu Zhang, Yanqi Yang, Qiankun Li, Hao Tang, James Kit-Hon Tsoi, Linlin Shen, Kuo Feng Hung; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 38509-38519

Abstract

Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties, yet dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Hao_2026_CVPR, author = {Hao, Jing and Liang, Yuci and Lin, Lizhuo and Fan, Yuxuan and Zhou, Wenkai and Guo, Kaixin and Ye, Zanting and Sun, Yanpeng and Zhang, Xinyu and Yang, Yanqi and Li, Qiankun and Tang, Hao and Tsoi, James Kit-Hon and Shen, Linlin and Hung, Kuo Feng}, title = {OralGPT-Omni: A Versatile Dental Multimodal Large Language Model}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {38509-38519} }