GR-Gauge: Cost-efficient Training Configuration By Gauging the Gradient Redundancy

Wang, Guanjie; Chen, Chen

Guanjie Wang, Chen Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 12934-12943

Abstract

The recent success of artificial intelligence motivates many non-professional users to train their own models. Those users often resort to cloud training services, seeking to obtain a sufficiently accurate model at a modest cost, for which properly setting up the learning rate and batch size is crucial. While various Hyper-parameter Optimization (HPO) methods have been proposed in that regard, they largely act based on heavy-weight validation signals, being inefficient in the overall cost. We find that the model training process can be viewed as a two-dimensional voting process---with gradients for different iterations and from different samples; moreover, to attain cost-efficient training is to ensure that the gradient redundancy is within a proper range which is similar across diverse models. Based on that insight, we further introduce GR-Gauge, a general method that gauges the gradient redundancy to instruct HPO decisions like configuration searching and trial termination. Extensive experiments demonstrate that GR-Gauge can help attain near-optimal accuracy in much less time than existing methods.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Wang_2026_CVPR, author = {Wang, Guanjie and Chen, Chen}, title = {GR-Gauge: Cost-efficient Training Configuration By Gauging the Gradient Redundancy}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {12934-12943} }