ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention

Jiawei Wang, Changjian Li; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 3679-3688

Abstract


Sketch semantic segmentation is a well-explored and pivotal problem in computer vision involving the assignment of predefined part labels to individual strokes. This paper presents ContextSeg - a simple yet highly effective approach to tackling this problem with two stages. In the first stage to better encode the shape and positional information of strokes we propose to predict an extra dense distance field in an autoencoder network to reinforce structural information learning. In the second stage we treat an entire stroke as a single entity and label a group of strokes within the same semantic part using an autoregressive Transformer with the default attention mechanism. By group-based labeling our method can fully leverage the context information when making decisions for the remaining groups of strokes. Our method achieves the best segmentation accuracy compared with state-of-the-art approaches on two representative datasets and has been extensively evaluated demonstrating its superior performance. Additionally we offer insights into solving part imbalance in training data and the preliminary experiment on cross-category training which can inspire future research in this field.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2024_CVPR, author = {Wang, Jiawei and Li, Changjian}, title = {ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {3679-3688} }