Robust Visual Reinforcement Learning by Prompt Tuning

Tran, Tung; Than, Khoat; Vargas, Danilo

Tung Tran, Khoat Than, Danilo Vargas; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 1133-1147

Abstract

Training an agent based solely on observational data in a single environment, which then performs well in a zero-shot manner in unseen contexts, presents a significant challenge in the field of Reinforcement Learning. Given that environmental signals are limited to pixel-based inputs, the development of a generalized visual encoder is crucial for enhancing the agent's robustness. While pre-trained image encoders provide a straightforward and effective means of obtaining universal representations, the inability to perform end-to-end retraining on off-the-shelf models limits them from acquiring essential in-domain knowledge. This paper explores the promising potential of Visual Prompt Tuning to construct a more resilient image encoder for the agent. Extensive empirical evaluations are conducted on multiple benchmarks derived from the DeepMind Control Suite. The findings indicate notable improvements in both episode rewards and sample efficiency.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Tran_2024_ACCV, author = {Tran, Tung and Than, Khoat and Vargas, Danilo}, title = {Robust Visual Reinforcement Learning by Prompt Tuning}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {1133-1147} }