- [pdf] [supp] [code]
Multi-Branch Network with Ensemble Learning for Text Removal in the Wild
The scene text removal (STR) is to substitute visually realistic backgrounds for text regions. Due to the diversity of scene text and the intricacy of backgrounds, earlier STR approaches may not be able to successfully remove scene texts. We discover that different networks produce different text removal results. Thus, we present a novel STR approach with a multi-branch network to entirely erase the text while maintaining the integrity of the backgrounds. The main branch preserves high-resolution texture information, while two sub-branches learn multi-scale semantic features. The complementary erasure networks are integrated with two ensemble learning fusion mechanisms: a featurelevel fusion and an image-level fusion. Additionally, we propose a patch attention module to perceive text location and generate text attention features. Our method outperforms state-of-the-art approaches on both real-world and synthetic datasets, improving PSNR by 1.78 dB in the SCUT-EnsText dataset and 4.45 dB in the SCUT-Syn dataset.