ST2ST: Self-Supervised Test-time Adaptation for Video Action Recognition

Masud An-Nur Islam Fahim, Mohammed Innat, Jani Boutellier; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 1057-1066

Abstract


The performance of trained deep neural network (DNN) models relies on the assumption that the test data has largely the same feature distribution as the training data. In deployed video recognition systems the feature distribution of acquired samples can however become shifted due to environmental conditions (rain lighting variations) or technological factors such as lossy data compression. To improve action recognition performance under feature distribution shifts we propose a simple test-time self-distillation strategy where the DNN model goes through an intra-video logit minimization phase. As a result the model can update its predictions for the given input. The proposed approach is agnostic to the neural network type (CNN transformer) and applies to various action recognition models. In contrast to many test-time adaption studies the proposed approach does not require access to the training data. The performance of the proposed method is evaluated with multiple state-of-the-art action recognition models and widely used benchmark datasets Kinetics-400 and Something-Something V2.

Related Material


[pdf]
[bibtex]
@InProceedings{Fahim_2024_CVPR, author = {Fahim, Masud An-Nur Islam and Innat, Mohammed and Boutellier, Jani}, title = {ST2ST: Self-Supervised Test-time Adaptation for Video Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {1057-1066} }