Less is More: Empowering GUI Agent with Context-Aware Simplification

Chen, Gongwei; Zhou, Xurui; Shao, Rui; Lyu, Yibo; Zhou, Kaiwen; Wang, Shuai; Li, Wentao; Li, Yinchuan; Qi, Zhongang; Nie, Liqiang

Gongwei Chen, Xurui Zhou, Rui Shao, Yibo Lyu, Kaiwen Zhou, Shuai Wang, Wentao Li, Yinchuan Li, Zhongang Qi, Liqiang Nie; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 5901-5911

Abstract

The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agents and summarize: **1) the high-density and loose-relation of element context** highlight the existence of many unrelated elements and their negative influence; **2) the high redundancy of history context** reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed **SimpAgent**. To mitigate potential interference from numerous unrelated elements, we introduce a **masking-based element pruning** method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a **consistency-guided history compression** module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Chen_2025_ICCV, author = {Chen, Gongwei and Zhou, Xurui and Shao, Rui and Lyu, Yibo and Zhou, Kaiwen and Wang, Shuai and Li, Wentao and Li, Yinchuan and Qi, Zhongang and Nie, Liqiang}, title = {Less is More: Empowering GUI Agent with Context-Aware Simplification}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {5901-5911} }