Scaling Test-Time Compute Can Outperform Larger Architectures in Computer Vision

Erfan Darzi, Dylan Nguyen, George Cheng; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2025, pp. 3367-3375

Abstract


Deep neural networks face a fundamental trade-off between computational efficiency and accuracy. This paper introduces a method for network depth optimization that enables flexible inference with adjustable computational budgets while potentially improving training dynamics. Our approach partitions each residual stage into core and gated sub-paths, employing depth-aware training to develop networks that can operate at varying depths. We present theoretical analysis of our method through three key results: (1) an explicit regularization theorem quantifying how our training approach may penalize discrepancies between network configurations, (2) a statistical convergence theorem suggesting tighter generalization bounds based on effective network depth, and (3) a gradient dynamics theorem characterizing the noise properties induced by our training procedure. Empirically, our method shows improvements over conventional approaches on standard benchmarks, achieving favorable accuracy-efficiency trade-offs with a single trained model. The Gated Depth architecture provides a framework for deploying deep networks across diverse computational environments.

Related Material


[pdf]
[bibtex]
@InProceedings{Darzi_2025_CVPR, author = {Darzi, Erfan and Nguyen, Dylan and Cheng, George}, title = {Scaling Test-Time Compute Can Outperform Larger Architectures in Computer Vision}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2025}, pages = {3367-3375} }