Self-Supervised Distilled Learning for Multi-Modal Misinformation Identification

Michael Mu, Sreyasee Das Bhattacharjee, Junsong Yuan; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 2819-2828


Rapid dissemination of misinformation is a major societal problem receiving increasing attention. Unlike Deepfake, Out-of-Context misinformation, in which the unaltered unimode contents (e.g. image, text) of a multi-modal news sample are combined in an out-of-context manner to generate deception, requires limited technical expertise to create. Therefore, it is more prevalent a means to confuse readers. Most existing approaches extract features from its uni-mode counterparts to concatenate and train a model for the misinformation classification task. In this paper, we design a self-supervised feature representation learning strategy that aims to attain the multi-task objectives: (1) task-agnostic, which evaluates the intra- and inter-mode representational consistencies for improved alignments across related models; (2) task-specific, which estimates the category-specific multi-modal knowledge to enable the classifier to derive more discriminative predictive distributions. To compensate for the dearth of annotated data representing varied types of misinformation, the proposed Self-Supervised Distilled Learner (SSDL) utilizes a Teacher network to weakly guide a Student network to mimic a similar decision pattern as the teacher. The two-phased learning of SSDL can be summarized as: initial pretraining of the Student model using a combination of contrastive self-supervised task-agnostic objective and supervised task-specific adjustment in parallel; finetuning the Student model via self-supervised knowledge distillation blended with the supervised objective of decision alignment. In addition to the consistent out-performances over the existing baselines that demonstrate the feasibility of our approach, the explainability capacity of the proposed SSDL also helps users visualize the reasoning behind a specific prediction made by the model.

Related Material

@InProceedings{Mu_2023_WACV, author = {Mu, Michael and Das Bhattacharjee, Sreyasee and Yuan, Junsong}, title = {Self-Supervised Distilled Learning for Multi-Modal Misinformation Identification}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {2819-2828} }