DA/T77-2019 Optical Character Recognition Standards for Digital Copies of Paper Archives

Preface This standard is drafted in accordance with the rules given in GB/T 1.1-2009. This standard is proposed and standardized by the National Archives Administration. The drafting units of this standard: National Archives Administration, Qingdao Archives. Main drafters of this standard: Liu Yun, Ding Desheng, Yang Laiqing, Zou Jie. 1 Scope This standard specifies the … Read more

Open Source OCR Engine – 55,000 Stars!

Open Source OCR Engine - 55,000 Stars!

Tesseract Open Source OCR Engine (Main Repository) GitHub Address https://github.com/tesseract-ocr/tesseract Official Website tesseract-ocr.github.io/ Tesseract is an open-source Optical Character Recognition (OCR) engine that can recognize and extract text from image files. Tesseract was developed by Ray Smith at Hewlett-Packard’s Bristol Labs between 1985 and 1995. In 2005, Tesseract was open-sourced by HP, and it has … Read more

Exploring the Infinite Possibilities of OCR Technology

Exploring the Infinite Possibilities of OCR Technology

Hello, everyone! I am Daodao Jun~ With the development of technology, the demand for text recognition is increasing. Traditional text recognition methods require manual input, which is time-consuming, labor-intensive, and prone to errors. However, with the emergence of OCR technology, we can quickly and accurately extract text information from images. Today, Daodao Jun is here … Read more

Open Source Offline OCR Software Umi-OCR

Open Source Offline OCR Software Umi-OCR

When it comes to OCR recognition, everyone is familiar with it. Whether on mobile or computer, there are many corresponding applications on the market, and I have recommended quite a few. Among them, there are quite a few software that supports OCR recognition on the computer side, with well-known ones like Adobe Acrobat DC and … Read more

Surya: An OCR Framework Better Than EasyOCR

Surya: An OCR Framework Better Than EasyOCR

Project Introduction Surya is a document OCR toolkit with the following features: OCR support for over 90 languages, outperforming cloud services in benchmark tests Line-level text detection for any language Layout analysis (detection of tables, images, headings, etc.) Reading order detection It is suitable for a range of documents (see usage and benchmarks for more … Read more

PaddleOCR v2: 7% Improvement in Accuracy, 220% Speed Boost

PaddleOCR v2: 7% Improvement in Accuracy, 220% Speed Boost

Follow the official WeChat account “ML_NLP“ Set as “Starred“, delivering heavy content to you first! 1. Introduction Engineers in the OCR field must have heard of the PaddleOCR project, whose main recommended PP-OCR algorithm has been widely used by developers in China and abroad, In just half a year,the total number of stars has exceeded … Read more

Efficient Open Source OCR Tool: Introduction and Usage of Surya-OCR

Efficient Open Source OCR Tool: Introduction and Usage of Surya-OCR

Click the card below to follow “Machine Vision and Deep Learning” Visual/image heavy content delivered first! Background In many enterprise applications, Optical Character Recognition (OCR) is a fundamental technology. In this article, we will delve into Surya-OCR, a recently popular solution. Text detection and extraction are crucial in various business use cases. For example: In … Read more

Next-Gen RAG Engine Based on OCR and Document Parsing

Next-Gen RAG Engine Based on OCR and Document Parsing

Introduction It is an open-source RAG (Retrieval-Augmented Generation) engine built on deep document understanding. It mainly provides a streamlined RAG workflow for enterprises and individuals of various sizes, leveraging large language models (LLMs) to handle users’ diverse complex format data, offering reliable Q&A and well-founded citations. Its main features include: 1. Deep Document Understanding: Capable … Read more

How Far Is AI from Practical Automation?

How Far Is AI from Practical Automation?

Artificial intelligence, from its name, suggests two characteristics: automation and intelligence. From these two perspectives, the level of intelligence is insufficient, leading to an inability to truly achieve automation in practice. For example, a private enterprise has bank reconciliation statements, each bank’s statement has different formats and a large quantity, resulting in a significant workload! … Read more

Implementing OCR Recognition Using Halcon

Implementing OCR Recognition Using Halcon

Previously, I worked with OpenCV, but now the company has a project for OCR, and I’ve implemented it using Halcon. There is a lot of information online about OCR teaching, but it can be overwhelming. Below is the practical implementation based on the materials and the current project. First, we need to create a sample … Read more