-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Chen_2025_CVPR, author = {Chen, Jiaqi and Zhu, Xiaoye and Wang, Yue and Liu, Tianyang and Chen, Xinhui and Chen, Ying and Leong, Chak Tou and Ke, Yifei and Liu, Joseph and Yuan, Yiwen and McAuley, Julian and Li, Li-jia}, title = {Symbolic Representation for Any-to-Any Generative Tasks}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {27816-27826} }
Symbolic Representation for Any-to-Any Generative Tasks
Abstract
We propose a symbolic generative task description language and a corresponding inference engine that can represent arbitrary multimodal tasks as structured symbolic flows. Unlike conventional generative models, which rely on large-scale training and implicit neural representations to learn cross-modal mappings--often with high computational costs and limited flexibility--our framework introduces an explicit symbolic representation composed of three core primitives: functions, parameters, and topological logic. Using a pre-trained language model, our inference engine maps natural language instructions directly to symbolic workflows in a training-free manner. Our framework successfully performs over 12 diverse multimodal generative tasks, demonstrating strong performance and flexibility without requiring task-specific tuning. Experiments show that our method not only matches or outperforms existing state-of-the-art unified models in content quality but also offers greater efficiency, editability, and interruptibility. We believe symbolic task representations provide a cost-effective and extensible foundation for advancing the capabilities of generative AI.
Related Material