-
[pdf]
[supp]
[arXiv]
[code]
[bibtex]@InProceedings{Zhang_2022_ACCV, author = {Zhang, Chuhan and Gupta, Ankush and Zisserman, Andrew}, title = {Is an Object-Centric Video Representation Beneficial for Transfer?}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {1976-1994} }
Is an Object-Centric Video Representation Beneficial for Transfer?
Abstract
The objective of this work is to learn an object-centric video representation, with
the aim of improving transferability to novel tasks, ie, tasks different
from the pre-training task of action classification.To this end, we introduce a new object-centric video recognition model based on a transformer architecture.
The model learns a set of object-centric summary vectors for the video, and
uses these vectors to fuse the visual and spatio-temporal trajectory
`modalities' of the video clip. We also introduce a novel trajectory contrast
loss to further enhance objectness in these summary vectors.
With experiments on four datasets -- SomethingSomething-V2, SomethingElse, Action Genome and EpicKitchens -- we show that the object-centric model outperforms prior video representations (both object-agnostic and object-aware), when: (1) classifying actions on unseen objects and unseen environments; (2) low-shot learning to novel classes; (3) linear probe to other downstream tasks; as well as (4) for standard action classification.
Related Material