nnMobileNet: Rethinking CNN for Retinopathy Research

Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2285-2294

Abstract


Over the past few decades convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability--their ability to improve as more parameters are added. As a result ViT-based models tend to outshine traditional CNNs in RD applications albeit at the cost of increased data and computational demands. ViTs also differ from CNNs in their approach to processing images working with patches rather than local regions which can complicate the precise localization of small variably presented lesions in RD. In our study we revisited and updated the architecture of a CNN model specifically MobileNet to enhance its utility in RD diagnostics. We found that an optimized MobileNet through selective modifications can surpass ViT-based models in various RD benchmarks including diabetic retinopathy grading detection of multiple fundus diseases and classification of diabetic macular edema. The code is available at https://github.com/Retinal-Research/NN-MOBILENET

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Zhu_2024_CVPR, author = {Zhu, Wenhui and Qiu, Peijie and Chen, Xiwen and Li, Xin and Lepore, Natasha and Dumitrascu, Oana M. and Wang, Yalin}, title = {nnMobileNet: Rethinking CNN for Retinopathy Research}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2285-2294} }