Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction

Jinzhi Zheng, Heng Fan, Libo Zhang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 5957-5966

Abstract


Segmentation-based scene text detection algorithms that are accurate to the pixel level can satisfy the detection of arbitrary shape scene text and have received widespread attention. On the one hand due to the complexity and diversity of the scene text the convolution with a fixed kernel size has some limitations in extracting the visual features of the scene text. On the other hand most of the existing segmentation-based algorithms only segment the center of the text losing information such as the edges and directions of the text with limited detection accuracy. There are also some improved algorithms that use iterative corrections or introduce other multiple information to improve text detection accuracy but at the expense of efficiency. To address these issues this paper proposes a simple and effective scene text detection method the Kernel Adaptive Convolution which is designed with a Kernel Adaptive Convolution Module for scene text detection via predicting the distance map. Specifically first we design an extensible kernel adaptive convolution module (KACM) to extract visual features from multiple convolutions with different kernel sizes in an adaptive manner. Secondly our method predicts the text distance map under the supervision of a priori information (including direction map and foreground segmentation map) and completes the text detection from the predicted distance map. Experiments on four publicly available datasets prove the effectiveness of our algorithm in which the accuracy and efficiency of both the Total-Text and TD500 outperform the state-of-the-art algorithm. The algorithm efficiency is improved while the accuracy is competitive on ArT and CTW1500.

Related Material


[pdf]
[bibtex]
@InProceedings{Zheng_2024_CVPR, author = {Zheng, Jinzhi and Fan, Heng and Zhang, Libo}, title = {Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {5957-5966} }