- [pdf] [code]
Action Representing by Constrained Conditional Mutual Information
Contrastive learning achieves a remarkable performance for representation learning by constructing the InfoNCE loss function. It enables learned representations to describe the invariance in data transformation without labels. Contrastive learning also been employed in self-supervised learning of action recognition. However, this kind of method fails to introduce assumptions according to human knowledge about the prior distribution of representations in the training process. For solving this problem, this paper proposes a self-supervised learning framework, which can achieve different self-supervised learning methods by choosing different assumptions about the prior distribution of representations, while still learning the description of invariance in data transformation as contrastive learning. This framework minimizes the CCMI (Constrained Conditional Mutual Information) loss function, which represents the conditional mutual information between input augmented samples of the same sample and the output representations of the encoder while the prior distribution of representations is constrained. By theoretical analysis of the framework, it is proved that traditional contrastive learning by InfoNCE is a special case without human knowledge constraint of this framework. The Gaussian Mixture Model on Unit Hyper-sphere is chosen as the representation prior distribution to achieve the self-supervised method called CoMInG. Compared with the existing methods, the performance of the learned representation by this method in the downstream task of action recognition is significantly improved.