AssistGUI: Task-Oriented PC Graphical User Interface Automation

Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 13289-13298

Abstract


Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark AssistGUI to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications such as After Effects and MS Word each accompanied by the necessary project files for better evaluation. Moreover we propose a multi-agent collaboration framework which incorporates four agents to perform task decomposition GUI parsing action generation and reflection. Our experimental results reveal that our multi-agent collaboration mechanism outshines existing methods in performance. Nevertheless the potential remains substantial with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations setting the stage for future breakthroughs in this domain.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Gao_2024_CVPR, author = {Gao, Difei and Ji, Lei and Bai, Zechen and Ouyang, Mingyu and Li, Peiran and Mao, Dongxing and Wu, Qinchen and Zhang, Weichen and Wang, Peiyi and Guo, Xiangwu and Wang, Hengxu and Zhou, Luowei and Shou, Mike Zheng}, title = {AssistGUI: Task-Oriented PC Graphical User Interface Automation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {13289-13298} }