-
[pdf]
[supp]
[bibtex]@InProceedings{Chen_2026_CVPR, author = {Chen, Xinwang and Li, Xiuxing and Li, Qing and Zhuang, Ziyue and Wu, Yutong and Li, Ziyu and Wang, Zhuo and Li, Kai and Hao, Jianye and Wu, Xia}, title = {Human-like Abstract Visual Reasoning via Understanding and Solving Reasoning Loop}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {41235-41244} }
Human-like Abstract Visual Reasoning via Understanding and Solving Reasoning Loop
Abstract
Abstract visual reasoning benchmarks such as ARC-AGI evaluate the ability to infer generalizable transformation rules from few graphical demonstrations, a capability where current deep learning models severely underperform. Mainstream LLMs achieve only 15.8% (DeepSeek-R1) and 34.5% (o3-mini-high) accuracy. The core reason lies in their static processing of task examples: unlike humans, who iteratively refine their understanding of examples while solving problems, these models lack mechanisms for dynamically aligning understanding and solving. We address this gap with the Understanding and Solving Reasoning Loop (USRL) framework. The architecture comprises two explicitly interacting modules: an Understanding Module (UM) that encodes and refines rule representations of examples, and a Solving Module (SM) that generates a draft solution informed by these evolving rule representations. Through recurrent interaction, the model iteratively aligns its draft solution with its understanding about task examples continuously. Furthermore, we introduce an adaptive reasoning halting mechanism that autonomously terminates the reasoning loop based on the consistency between the generated draft solution and the examples. With 7M parameters, our model achieves 47.2% accuracy on ARC-AGI-1.
Related Material

