FedFSLAR: A Federated Learning Framework for Few-Shot Action Recognition

Tu, Nguyen Anh; Abu, Assanali; Aikyn, Nartay; Makhanov, Nursultan; Lee, Min-Ho; Le-Huy, Khiem; Wong, Kok-Seng

Nguyen Anh Tu, Assanali Abu, Nartay Aikyn, Nursultan Makhanov, Min-Ho Lee, Khiem Le-Huy, Kok-Seng Wong; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 270-279

Abstract

In recent years, Federated Learning (FL) has emerged as a promising solution for many computer vision applications due to its effectiveness in handling data privacy and communication overhead. However, when applying FL to advanced and computationally heavy tasks like video-based action recognition, FL clients can struggle with the lack of annotated data and model biases, thus negatively impacting learning performance. Therefore, adopting Few-Shot Learning (FSL) is essential, where the learned model can adapt to unseen classes using limited labeled examples. Nonetheless, FSL has rarely been exploited for vision tasks under FL settings. In this paper, we develop a Federated Few-Shot Learning framework, FedFSLAR, that collaboratively learns the classification model from multiple FL clients to recognize unseen actions with a few labeled video samples. Prior works in few-shot action recognition mostly use 2D-CNNs as feature backbones and ineffectively capture the temporal correlation between video frames. To overcome this limitation and enable more robust representation, we integrate the spatiotemporal feature backbones based on 3D-CNNs into a meta-learning paradigm, i.e., ProtoNet. Accordingly, we conduct extensive experiments under practical FL settings, e.g., non-IID data, to evaluate various 3D-CNN models alongside representative FL algorithms, i.e., FedAvg and FedProx. Experimental results on benchmark datasets validate the effectiveness of our FedFSLAR framework. Remarkably, our findings indicate that combining feature backbones pre-trained on external data with the FL setting can incredibly benefit FSL. Our framework offers a viable path toward achieving notable progress in FL and FSL for action recognition tasks.

Related Material

[pdf]

[bibtex]

@InProceedings{Tu_2024_WACV, author = {Tu, Nguyen Anh and Abu, Assanali and Aikyn, Nartay and Makhanov, Nursultan and Lee, Min-Ho and Le-Huy, Khiem and Wong, Kok-Seng}, title = {FedFSLAR: A Federated Learning Framework for Few-Shot Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {270-279} }