-
[pdf]
[supp]
[bibtex]@InProceedings{Tah_2026_CVPR, author = {Tah, Shreyas Kumar and Singh, Anshul and Katari, Prajeet and Agarwala, Aditya and Biswas, Shwetabh and Gupta, Lucky and Baneerjee, Siddhartha and AGJ, Faheema and Ashika, Ashika and Biswas, Soma}, title = {HybridNet: Efficient Multimodal Fake News Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2026}, pages = {11004-11012} }
HybridNet: Efficient Multimodal Fake News Detection
Abstract
Detecting multimodal fake news, where authentic images are paired with misleading text, remains a significant challenge. Vision Language Models give impressive performance for this task, but the results are not easily interpretable. Fine-tuning large multimodal LLMs can provide explanations, but relies heavily on large, fully annotated datasets with reasoning to achieve high accuracy, ultimately constraining its real-world scalability. In this work, we propose HybridNet, treating frozen open-source MLLMs as reasoning extractors rather than classifiers through three-stage consistency-checking. We distill these signals into a lightweight Reasoning-Aware Classifier that weighs MLLM observations against VLM features, enabling it to move beyond binary predictions and produce interpretable explanations alongside final outputs. In addition, HybridNet employs an active learning strategy to combine MLLM interpretability with supervised robustness, achieving competitive accuracy using less than 50% labeled data and offering a scalable and interpretable solution for multimodal misinformation detection.
Related Material

