Unlocking the World of OCR: Comprehensive Toolkit and Datasets

In daily life, do you often use WeChat to recognize images and extract text information? Besides this, there are other applications like photo-based question searching, photo translation, document information extraction, and logistics information recognition, all thanks to the support of OCR technology.

With the continuous development of deep learning technologies, intelligent OCR algorithms and applications are becoming increasingly abundant, leading to a growing demand for related data.

This article will introduce several open-source OCR toolkits and datasets to help developers better engage in text recognition-related work.

Surya

Surya is a multilingual document OCR toolkit that can accurately detect text lines, currently supporting over 90 languages, with upcoming features for table and chart detection.

Open Source Address: https://github.com/VikParuchuri/surya

Unlocking the World of OCR: Comprehensive Toolkit and Datasets

EasyOCR

EasyOCR is an OCR library written in Python for recognizing text in images and outputting it as text, supporting over 80 languages and common scripts.

Open Source Address: https://github.com/JaidedAI/EasyOCR

MMOCR

MMOCR is an open-source toolbox based on PyTorch and mmdetection, focusing on text detection, text recognition, and corresponding downstream tasks like key information extraction.

Open Source Address: https://github.com/open-mmlab/mmocr

PaddleOCR

PaddleOCR is an OCR toolkit based on PaddlePaddle, featuring an ultra-lightweight Chinese OCR model of only 8.6M, supporting combined recognition of Chinese and English numbers, vertical text recognition, and long text recognition. It also supports various training algorithms for text detection and recognition.

Open Source Address: https://github.com/PaddlePaddle/PaddleOCR

CnOCR

CnOCR is an OCR toolkit under Python 3 that can recognize Simplified Chinese, Traditional Chinese (partially), English, and common digits, supporting vertical text recognition as well. This toolkit includes over 20 pre-trained models to meet various application needs, ready for immediate use after installation.

Open Source Address: https://github.com/breezedeus/CnOCR

COCO-Text V2.0

The COCO-Text dataset includes 63,686 images and 239,506 text instances. It covers handwritten and printed versions, clear and unclear versions, English and non-English versions.

Download Address: https://bgshih.github.io/cocotext/

SynthText in the Wild dataset

This dataset is a synthetic dataset containing 8 million images and 800,000 synthetic word instances. Each text instance is annotated with its text string, word-level, and character-level bounding boxes.

Download Address: https://www.robots.ox.ac.uk/~vgg/data/scenetext/

Uber Text dataset

The Uber Text dataset contains street-level images collected from onboard sensors and annotated by a team of image analysts with Ground Truth.

Features include:

Street images with their text area polygons and corresponding text descriptions
Includes 9 categories such as business names, street names, and street numbers
Contains over 110,000 images
Each image averages 4.84 text instances

Download Address: https://s3-us-west-2.amazonaws.com/uber-common-public/ubertext/index.html

Chinese Text Dataset in the Wild(CTW)

CTW is a large Chinese natural text dataset jointly launched by Tsinghua University and Tencent, containing 32,285 images, 1,018,402 Chinese characters, 3,850 character categories, and 6 attributes.

Download Address: https://ctwdataset.github.io/

MSRA Text Detection 500 Database(MSRA-TD500)

The MSRA-TD500 dataset contains 500 natural images taken with a pocket camera from indoor (offices and malls) and outdoor (streets) scenes. Indoor images mainly feature signs, doorplates, and warning signs, while outdoor images primarily showcase guide signs and billboards against complex backgrounds.

Download Address: http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_%28MSRA-TD500%29

Qudong Cloud possesses high-performance computing resources capable of rapidly processing massive data, providing robust support for text recognition algorithms. Additionally, Qudong Cloud offers thousands of datasets, including text-related DocRED, which developers can use with a single click to facilitate the rapid development and testing of text recognition algorithms.

Qudong Cloud

Connecting Computing Power・Connecting People

Register to receive a 168 RMB experience bonus!

More benefits, scan the code to add the assistant and invite you to join the group~

Register + Follow, get an additional 10 RMB computing bonus

Please contact the assistant after registration to claim immediately

▼HOT

Qudong Cloud is hotly registering! Click “Read Original” to try it out~