ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded

Adrian Bulat, Enrique Sanchez, Brais Martinez, Georgios Tzimiropoulos; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 13523-13533

Abstract


This paper sets out to solve the following problem: How can we turn a generative video captioning model into an open-world video/action classification model? Video captioning models can naturally produce open-ended free-form descriptions of a given video which, however, might not be discriminative enough for video/action recognition. Unfortunately, when fine-tuned to auto-regress the class names directly, video captioning models overfit the base classes losing their open-world zero-shot capabilities. To alleviate base class overfitting, in this work, we propose to use reinforcement learning to enforce the output of the video captioning model to be more class-level discriminative. Specifically, we propose ReGen, a novel reinforcement learning based framework with a three-fold objective and reward functions: (1) a class-level discrimination reward that enforces the generated caption to be correctly classified into the corresponding action class, (2) a CLIP reward that encourages the generated caption to continue to be descriptive of the input video (i.e. video-specific), and (3) a grammar reward that preserves the grammatical correctness of the caption. We show that ReGen can train a model to produce captions that are: discriminative, video-specific and grammatically correct. Importantly, when evaluated on standard benchmarks for zero- and few-shot action classification, ReGen significantly outperforms the previous state-of-the-art.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Bulat_2023_ICCV, author = {Bulat, Adrian and Sanchez, Enrique and Martinez, Brais and Tzimiropoulos, Georgios}, title = {ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {13523-13533} }