Co-Compressing and Unifying Deep CNN Models for Efficient Human Face and Speaker Recognition

Wan, Timmy S. T.; Lee, Jia-Hong; Chan, Yi-Ming; Chen, Chu-Song

Timmy S. T. Wan, Jia-Hong Lee, Yi-Ming Chan, Chu-Song Chen; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0-0

Abstract

Deep CNN models have become state-of-the-art techniques in many application, e.g., face recognition, speaker recognition, and image classification. Although many studies address on speedup or compression of individual models, very few studies focus on co-compressing and unifying models from different modalities. In this work, to joint and compress face and speaker recognition models, a shared-codebook approach is adopted to reduce the redundancy of the combined model. Despite the modality of the inputs of these two CNN models are quite different, the shared codebook can support two CNN models of sound and image for speaker and face recognition. Experiments show the promising results of unified and co-compressing heterogeneous models for efficient inference.

Related Material

[pdf]

[bibtex]

@InProceedings{Wan_2019_CVPR_Workshops,
author = {Wan, Timmy S. T. and Lee, Jia-Hong and Chan, Yi-Ming and Chen, Chu-Song},
title = {Co-Compressing and Unifying Deep CNN Models for Efficient Human Face and Speaker Recognition},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}