Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression Detection

Anas Zafar, Danyal Aftab, Rizwan Qureshi, Yaofeng Wang, Hong Yan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 2258-2265


Multimodal depression detection through internet-based data such as social media platforms has been an important problem in the research community aiming to predict human mental states for ensuring wellbeing of the society. Recently attention-based networks have gained significant popularity for depression detection. However existing multimodal methods primarily rely on images and text assuming no correlation between temporal aspects such as relative time of different posts or tweets which is a crucial factor in deriving depression related behavior patterns. Moreover they lack model interpretability resulting in limited understanding of how different features are contributing to the model's final prediction. In this paper we propose Multi-Explainable TemporalNet (METN) a Temporal Convolution Network (TCN) based multi-modal transformer network with relative timestamp embeddings. We leverage pretrained foundation models for text and image embeddings and attention maps for model interpretability. We perform extensive experiments and ablation studies to validate the performance of METN for user-level depression detection task. Our model shows state-of-the-art results on various benchmarks such as 0.945 F1 score on multimodal Twitter dataset and 0.913 F1 score on multimodal Reddit dataset. We further demonstrate that our model enhances the accuracy of identifying depression in individuals who publicly post messages on social media platforms with enhanced interpretable compatibility. Code and models are available at

Related Material

[pdf] [supp]
@InProceedings{Zafar_2024_CVPR, author = {Zafar, Anas and Aftab, Danyal and Qureshi, Rizwan and Wang, Yaofeng and Yan, Hong}, title = {Multi-Explainable TemporalNet: An Interpretable Multimodal Approach using Temporal Convolutional Network for User-level Depression Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {2258-2265} }