Edge-Aware Transformer for Scene Text Segmentation

Source: ZHUAN ZHI

This article is approximately 1000 words long and takes about 5 minutes to read.
This article presents the Edge-Aware Transformer (EAFormer) for more accurate text segmentation, especially at the text edges.

Scene text segmentation aims to crop text from scene images, often used to assist generative models in editing or removing text. Existing text segmentation methods often involve various text-related supervision to improve performance. However, most methods overlook the importance of text edges, which are crucial for downstream applications. This article proposes the Edge-Aware Transformer (EAFormer) for more accurate text segmentation, particularly at the text edges.

Method

Text Edge Extractor

First, we designed a text edge extractor to detect edges and filter out non-text regions’ edges. This extractor can effectively identify the boundaries of text, thus providing valuable information for subsequent segmentation tasks.

Edge-Guided Encoder

Then, we proposed an edge-guided encoder that allows the model to focus more on text edges. By incorporating edge information, the encoder can capture text regions more accurately, thereby improving segmentation precision.

MLP-Based Decoder

Finally, we used a multi-layer perceptron (MLP)-based decoder to predict text masks. This decoder can convert the encoded information into precise text region masks, achieving accurate text segmentation.

Experiments

We conducted extensive experiments on commonly used benchmark datasets to validate the effectiveness of EAFormer. The experimental results show that the proposed method performs better in text edge segmentation than existing methods. Considering that several benchmark datasets (such as COCO_TS and MLT_S) have inaccurate annotations, making it unfair to evaluate our method, we re-annotated these datasets. Through experiments, we observed that when trained with more accurate annotations, our method can achieve higher performance improvements.Code and datasets can be found at: https://hyangyu.github.io/EAFormer/

Conclusion

This article proposes a new scene text segmentation method—EAFormer, which improves the segmentation accuracy of text, especially at the edges, by introducing an edge-aware mechanism. The experimental results validate the effectiveness of our method, particularly on the more accurately re-annotated datasets. Future work will focus on further optimizing the model structure and expanding to more practical application scenarios.

About Us

Data Pie THU, as a public account for data science, is backed by Tsinghua University’s Big Data Research Center, sharing cutting-edge research dynamics in data science and big data technology innovation, continuously disseminating knowledge in data science, striving to build a platform for gathering data talents, and creating the strongest group in China’s big data.

Sina Weibo: @Data Pie THU

WeChat Video Account: Data Pie THU

Today’s Headlines: Data Pie THU