Beyond Sequential Tools: A Unified VLM Agent System for Photographic Post-Processing via Dynamic Multi-Expert Fusion

Honglin Xiong, Chenjie Zhu, Jianbiao Ding, Zixuan Ni, Wei Li, Zhenpeng Mi, Qian Wang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 41521-41530

Abstract


Real-world image restoration is challenged by complex, coupled degradations. Existing "all-in-one" models often lack generalization, while agentic systems suffer from inefficient sequential tool invocation. We propose a VLM-guided one-shot framework for universal photographic post-processing. Our system employs a Vision-Language Model (VLM) as an orchestrator to perform nuanced intent understanding and degradation analysis, dynamically allocating weights to a suite of specialized expert LoRA modules. To ensure superior composability, these experts adapt only Key (K) and Value (V) matrices and are simultaneously merged into a pretrained diffusion backbone for synergistic, single-pass restoration. Furthermore, we introduce a lightweight branch trained via Direct Preference Optimization (DPO) to ensure perceptually optimal weight allocation. Our method achieves state-of-the-art performance across diverse synthetic and real-world datasets. Crucially, it demonstrates remarkable zero-shot generalization on authentic real-world data without additional fine-tuning.

Related Material


[pdf]
[bibtex]
@InProceedings{Xiong_2026_CVPR, author = {Xiong, Honglin and Zhu, Chenjie and Ding, Jianbiao and Ni, Zixuan and Li, Wei and Mi, Zhenpeng and Wang, Qian}, title = {Beyond Sequential Tools: A Unified VLM Agent System for Photographic Post-Processing via Dynamic Multi-Expert Fusion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {41521-41530} }