Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning

Pierre Marza, Laetitia Matignon, Olivier Simonin, Christian Wolf; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 17847-17856

Abstract


Successfully addressing a wide variety of tasks is a core ability of autonomous agents requiring flexibly adapting the underlying decision-making strategies and as we argue in this work also adapting the perception modules. An analogical argument would be the human visual system which uses top-down signals to focus attention determined by the current task. Similarly we adapt pre-trained large vision models conditioned on specific downstream tasks in the context of multi-task policy learning. We introduce task-conditioned adapters that do not require finetuning any pre-trained weights combined with a single policy trained with behavior cloning and capable of addressing multiple tasks. We condition the visual adapters on task embeddings which can be selected at inference if the task is known or alternatively inferred from a set of example demonstrations. To this end we propose a new optimization-based estimator. We evaluate the method on a wide variety of tasks from the CortexBench benchmark and show that compared to existing work it can be addressed with a single policy. In particular we demonstrate that adapting visual features is a key design choice and that the method generalizes to unseen tasks given a few demonstrations.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Marza_2024_CVPR, author = {Marza, Pierre and Matignon, Laetitia and Simonin, Olivier and Wolf, Christian}, title = {Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {17847-17856} }