Class Specialized Knowledge Distillation
Knowledge Distillation (KD) is a compression framework that transfers distilled knowledge from a teacher to a smaller student model. KD approaches conventionally address problem domains where the teacher and student network have equal numbers of classes for classification. We provide a knowledge distillation solution tailored for the class specialization setting, where the user requires a compact and performant network specializing in a subset of classes from the class set used to train the teacher model. To this end, we introduce a novel knowledge distillation framework, Class Specialized Knowledge Distillation (CSKD), that combines two loss functions: Renormalized Knowledge Distillation (RKD) and Intra-Class Variance (ICV) to render a computationally-efficient, specialized student network. We report results on several popular architectural benchmarks and tasks. In particular, CSKD consistently demonstrates significant performance improvements over teacher models for highly restrictive specialization tasks (e.g., instances where the number of subclasses or datasets is relatively small), in addition to outperforming other state-of-the-art knowledge distillation approaches for class specialization tasks.