Confusion Mixup Regularized Multimodal Fusion Network for Continual Egocentric Activity Recognition

Hanxin Wang, Shuchang Zhou, Qingbo Wu, Hongliang Li, Fanman Meng, Linfeng Xu, Heqian Qiu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3560-3569

Abstract


Continual egocentric activity recognition aims to understand diverse first-person activities from the multimodal data of a wearable device captured in streaming environments, which is an emerging and challenging task. Existing continual learning methods ignore the dynamic change of multiple modalities' correlation and hardly learn discriminative representations for the sequentially isolated activity classes from different stages. In this paper, we propose a Confusion Mixup Regularized Multimodal Fusion Network (CMR-MFN) to address this issue. Firstly, CMR-MFN is composed of a ternary-modality-input dynamic expansion architecture, which progressively grows additional branches for in-stage classes recognition. Each input owns a frozen modality-specific backbone to avoid forgetting caused by parameter shifts. Secondly, CMR-MFN captures the dynamics of multimodal inputs via learnable self-attention layers. We augment unknown classes by linearly mixing up the samples from two known classes and assigning a biased weight to one of them, which makes the unknown class samples confusing toward the known class with a higher weight. By learning from the current and augmented training data together, we regularize the multimodal fusion representation to distinguish the in-stage classes from their confusing samples of unknown classes, which implicitly pushes the out-stage classes' samples far from the in-stage classes' ones when they are similar to each other. Experiments on the latest UESTC-MMEA-CL database show that the proposed method significantly outperforms state-of-the-art methods for multimodal data based continual egocentric activity recognition. Our code is available at https://github.com/Hanna-W/CMR-MFN.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2023_ICCV, author = {Wang, Hanxin and Zhou, Shuchang and Wu, Qingbo and Li, Hongliang and Meng, Fanman and Xu, Linfeng and Qiu, Heqian}, title = {Confusion Mixup Regularized Multimodal Fusion Network for Continual Egocentric Activity Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3560-3569} }