-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Chen_2024_CVPR, author = {Chen, Sichen and Zhang, Yingyi and Huang, Siming and Yi, Ran and Fan, Ke and Zhang, Ruixin and Chen, Peixian and Wang, Jun and Ding, Shouhong and Ma, Lizhuang}, title = {SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {1082-1090} }
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Abstract
Recently transformer-based methods have achieved state-of-the-art prediction quality on human pose estimation(HPE). Nonetheless most of these top-performing transformer-based models are too computation-consuming and storage-demanding to deploy on edge computing platforms. Those transformer-based models that require fewer resources are prone to under-fitting due to their smaller scale and thus perform notably worse than their larger counterparts. Given this conundrum we introduce SDPose a new self-distillation method for improving the performance of small transformer-based models. To mitigate the problem of under-fitting we design a transformer module named Multi-Cycled Transformer(MCT) based on multiple-cycled forwards to more fully exploit the potential of small model parameters. Further in order to prevent the additional inference compute-consuming brought by MCT we introduce a self-distillation scheme extracting the knowledge from the MCT module to a naive forward model. Specifically on the MSCOCO validation dataset SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs. Furthermore SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset with 6.2M parameters and 4.7 GFLOPs achieving a new state-of-the-art among predominant tiny neural network methods.
Related Material