The Myth of the Pyramid

Ramon Izquierdo-Cordova, Walterio Mayol-Cuevas; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 311-321

Abstract


A deep-rooted strategy for building convolutional neural networks in computer vision is to increase the number of filters every time the feature map resolution is decreased. The notion ruling this pyramidal design is that the expressivity of the network increases with a higher number of filters to compensate for losses caused for lower resolutions. This paper challenges the practice by testing a set of variate distribution of filters named filter templates on popular CNN architectures (VGG ResNet MobileNet and MnasNet). The experimental results show that the superiority of the pyramidal design holds on the ImageNet dataset but fails for other datasets such as MNIST CIFAR and Tiny-ImageNet and for other tasks such as audio classification. CNN models with different filter distributions deliver higher accuracy with reduced resource consumption suggesting the pyramidal design has been optimised to Imagenet and that each model-dataset pair benefits from tuning the number and distribution of filters. To further illustrate the benefits of exploring other distributions this paper shows that the best performing model from the NASBench101 dataset can increase its accuracy over the original pyramidal design with reductions of parameters up to 68 per cent by using templates. Overall our experiments point to new opportunities for model designers to find more efficient models.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Izquierdo-Cordova_2024_CVPR, author = {Izquierdo-Cordova, Ramon and Mayol-Cuevas, Walterio}, title = {The Myth of the Pyramid}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {311-321} }