VLMAH: Visual-Linguistic Modeling of Action History for Effective Action Anticipation

Victoria Manousaki, Konstantinos Bacharidis, Konstantinos Papoutsakis, Antonis Argyros; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1917-1927

Abstract


Although existing methods for action anticipation have shown considerably improved performance on the predictability of future events in videos, the way they exploit information related to past actions is constrained by time duration and encoding complexity. This paper addresses the task of action anticipation by taking into consideration the history of all executed actions throughout long, procedural activities. A novel approach noted as Visual-Linguistic Modeling of Action History (VLMAH) is proposed that fuses the immediate past in the form of visual features as well as the distant past based on a cost-effective form of linguistic constructs (semantic labels of the nouns, verbs, or actions). Our approach generates accurate near-future action predictions during procedural activities by leveraging information on the long-and short-term past. Extensive experimental evaluation was conducted on three challenging video datasets containing procedural activities, namely the Meccano, the Assembly-101, and the 50Salads. The obtained results validate the importance of incorporating long-term action history for action anticipation and document the significant improvement of the state-of-the-art Top-1 accuracy performance.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Manousaki_2023_ICCV, author = {Manousaki, Victoria and Bacharidis, Konstantinos and Papoutsakis, Konstantinos and Argyros, Antonis}, title = {VLMAH: Visual-Linguistic Modeling of Action History for Effective Action Anticipation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1917-1927} }