Scaling Graph Convolutions for Mobile Vision

William Avery, Mustafa Munir, Radu Marculescu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 5857-5865


To compete with existing mobile architectures MobileViG introduces Sparse Vision Graph Attention (SVGA) a fast token-mixing operator based on the principles of GNNs. However MobileViG scales poorly with model size falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC) a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture MobileViGv2 uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model MobileViGv2-Ti achieves a 77.7% top-1 accuracy on ImageNet-1K 2% higher than MobileViG-Ti with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model MobileViGv2-B achieves an 83.4% top-1 accuracy 0.8% higher than MobileViG-B with 2.7 ms inference latency. Besides image classification we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017 MobileViGv2-M outperforms MobileViG-M by 1.2 AP^ box and 0.7 AP^ mask and MobileViGv2-B outperforms MobileViG-B by 1.0 AP^ box and 0.7 AP^ mask . For semantic segmentation on ADE20K MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU.

Related Material

@InProceedings{Avery_2024_CVPR, author = {Avery, William and Munir, Mustafa and Marculescu, Radu}, title = {Scaling Graph Convolutions for Mobile Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {5857-5865} }