Unsupervised Graphic Layout Grouping With Transformers

Jialiang Zhu, Danqing Huang, Chunyu Wang, Mingxi Cheng, Ji Li, Han Hu, Xin Geng, Baining Guo; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 1031-1040

Abstract


Graphic design conveys messages through the combination of text, images and other visual elements. Unstructured designs such as overloaded social media graphics may fail to communicate their intended messages effectively. To address this issue, layout grouping offers a solution by organizing design elements into perceptual groups. While most methods rely on heuristic Gestalt principles, they often lack the context modeling ability needed to handle complex layouts. In this work, we reformulate the layout grouping task as a set prediction problem. It uses Transformers to learn a set of group tokens at various hierarchies, enabling it to reason the membership of the elements more effectively. The self-attention mechanism in Transformers boosts its context modeling ability, which enables it to handle complex layouts more accurately. To reduce annotation costs, we also propose an unsupervised learning strategy that pre-trains on noisy pseudo-labels induced by a novel heuristic algorithm. This approach then bootstraps to self-refine the noisy labels, further improving the accuracy of our model. Our extensive experiments demonstrate the effectiveness of our method, which outperforms existing state-of-the-art approaches in terms of accuracy and efficiency.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Zhu_2024_WACV, author = {Zhu, Jialiang and Huang, Danqing and Wang, Chunyu and Cheng, Mingxi and Li, Ji and Hu, Han and Geng, Xin and Guo, Baining}, title = {Unsupervised Graphic Layout Grouping With Transformers}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2024}, pages = {1031-1040} }