Click the above “Beginner’s Visual Learning” and choose to add “Star” or “Top”

Important content delivered in real-time

Translator: Mr. Geek

This article is translated from Ivan Ozhiganov’s article “Deep Dive Into OCR for Receipt Recognition” published on DZone. All copyright, image codes, and data belong to the author. The translated content has been slightly modified for localization.

Optical Character Recognition (OCR) technology is currently widely used in handwriting recognition, printed text recognition, and text image recognition. From document recognition, bank card, and ID recognition to advertisements and posters, the invention of OCR technology has greatly simplified the way we process data.

At the same time, the rapid development of Machine Learning (ML) and Convolutional Neural Networks (CNN) has brought a huge leap in text recognition! In this study, we will also use CNN technology to recognize paper receipts from retail stores. For demonstration purposes, we will only use Russian version receipts for testing.

Our goal is to develop a client to recognize and obtain relevant documents, with a server-side to recognize and parse the data. Are you ready? Let’s see how to do it together!

Preprocessing

First, we need to receive image-related data, ensuring it is aligned horizontally and vertically. Next, we will use algorithms to detect whether it is a receipt, and finally binarize it for easier recognition.

Rotate Image to Recognize Receipts

We have three solutions to recognize receipts, which we will test below.

1. High-threshold adaptive binarization technology. 2. Convolutional Neural Networks (CNN). 3. Haar feature classifier.

Adaptive Binarization Technology

Understanding OCR Recognition Principles for Receipts

First, we see that the image contains complete data, while the receipt has some contrast with the background. To better recognize the relevant data, we need to rotate the image to align it horizontally along the vertical direction.

Understanding OCR Recognition Principles for Receipts

We use the adaptive threshold function from OpenCV and the scikit-image framework to adjust the receipt data. By utilizing these two functions, we can retain white pixels in high-gradient areas and black pixels in low-gradient areas. This gives us a high-contrast sample image. Thus, by cropping, we can obtain the relevant information of the receipt.

Understanding OCR Recognition Principles for Receipts

Using Convolutional Neural Networks (CNN)

Initially, we decided to use CNN for the related location detection, just like we did in previous object detection projects. We use angle judgment to pick up relevant key points. Although this solution is useful, it performs worse in comparison to high-threshold detection cropping.

Because CNN can only find the angle coordinates of text, and the angle variations of the text are significant, this means that the CNN model is not very accurate. For details, please refer to the results of the CNN tests below.

Understanding OCR Recognition Principles for Receipts

Using Haar Feature Classifier to Recognize Receipts

As a third option, we attempted to use the Haar feature classifier for classification screening. However, after a week of classification training and adjusting relevant parameters, we did not achieve any positive results, and even found that CNN performed much better than Haar.

Understanding OCR Recognition Principles for Receipts

Binarization

Finally, we used the adaptive_threshold method from OpenCV for binarization, and after processing, we obtained a good image.

Understanding OCR Recognition Principles for Receipts

Text Detection

Next, we will introduce several different text detection components.

Detecting Text via Connected Components

First, we use the findContours function in OpenCV to find the connected text groups. Most of the connected components are characters, but there are also noisy texts left from binarization. Here, we filter the relevant text by setting the size of the threshold.

Then, we execute a synthesis algorithm to synthesize characters, such as: Й and =. By searching for the nearest character combinations to form words. This algorithm requires you to find the nearest character for each relevant letter and then find the best choice from several letters to display.

Understanding OCR Recognition Principles for Receipts

Next, the text forms lines. We determine if the text belongs to the same line by checking if the heights are consistent.

Understanding OCR Recognition Principles for Receipts

Of course, the drawback of this solution is that it cannot recognize noisy text.

Using Grids to Detect Text

We found that almost all receipts have text of the same width, so we managed to draw a grid on the receipts and used the grid to segment each character:

Understanding OCR Recognition Principles for Receipts

The grid immediately simplified the difficulty of receipt recognition. The neural network can accurately recognize characters within each grid. This solves the problem of noisy text. Ultimately, we can accurately count the number of texts.

We used the following algorithm to recognize the grid.

First, I used this connected component algorithm in the binarized image.

Understanding OCR Recognition Principles for Receipts

Then we found that some in the bottom left corner are true, so we adjusted the grid recognition using a two-dimensional periodic function.

Understanding OCR Recognition Principles for Receipts

The main idea behind correcting grid distortion is to use the peaks of the graphics to find nonlinear geometric distortion. In other words, we must find the maximum value of this function. Additionally, we need an optimal distortion value.

We used the RectBivariateSpline function in the Scipy Python module to parameterize the geometric distortion and optimized it using Scipy functions. The results are as follows:

Understanding OCR Recognition Principles for Receipts

In summary, this method is slow and unstable, so we firmly do not intend to use this solution.

Optical Character Recognition

We recognize text through connected components and identify complete words.

Recognizing Text Found Through Connected Components

For text recognition, we use Convolutional Neural Networks (CNN) to receive relevant fonts for training. In the output part, we improve the probability through comparison. After comparing several initial options and finding a 99% accuracy rate, we further improved accuracy by comparing dictionaries and eliminating related similar characters, such as the errors caused by “З” and “Э”.

Understanding OCR Recognition Principles for Receipts

However, when it comes to noisy text, this method performs poorly.

Recognizing Complete Words

When the text is too noisy, it is necessary to find complete words to perform individual letter recognition. We use the following two methods to solve this problem:

LSTM Networks
Image Non-Uniform Segmentation Techniques

LSTM Networks

You can read these articles for a deeper understanding of using Convolutional Neural Networks to recognize text in sequences, or can we use neural networks to build language-independent OCR? For this, we used the OCRopus library for recognition.

We used monospaced fonts as manual recognition samples for training.

Understanding OCR Recognition Principles for Receipts

After training, we tested our neural network using other data, and of course, the test results were very positive. Here are the data we obtained:

Understanding OCR Recognition Principles for Receipts

The trained neural network performed excellently on simple examples. Similarly, we also recognized complex situations where the grid was unsuitable.

We extracted relevant training samples and trained them through the neural network.

Understanding OCR Recognition Principles for Receipts

To avoid overfitting of the neural network, we frequently stopped and corrected the training results, continuously adding new data as training samples. Finally, we obtained the following results:

Understanding OCR Recognition Principles for Receipts

The new network excels at recognizing complex vocabulary, but performs poorly in simple text recognition.

We believe this convolutional neural network can refine recognition of individual characters to enhance text recognition.

Image Non-Uniform Segmentation Techniques

Since the receipt fonts are monospaced, we decided to segment the fonts by character. First, we need to know the width of each letter. Therefore, the width of the characters is particularly important, and we need to estimate the length of each letter. Using functions, we obtain the following image. We select various patterns to choose specific letter widths.

Understanding OCR Recognition Principles for Receipts

We obtain an approximate width of a word by dividing by the number of letters in the character, giving an approximate classification:

Understanding OCR Recognition Principles for Receipts

The best distinctions are:

Understanding OCR Recognition Principles for Receipts

The accuracy of this segmentation scheme is very high:

Understanding OCR Recognition Principles for Receipts

Of course, there are also situations where recognition is not very good:

Understanding OCR Recognition Principles for Receipts

After segmentation, we use CNN for recognition processing.

Extracting Meaning from Receipts

We use regular expressions to find purchase information in receipts. All receipts have a common point: the purchase price is written in XX.XX format. Therefore, relevant information can be extracted by extracting the purchase lines. The personal tax number is ten digits and can also be easily obtained through regular expressions. Similarly, NAME / SURNAME and other information can also be found using regular expressions.

Understanding OCR Recognition Principles for Receipts

Conclusion

No matter what method you choose, whether LSTM or other more complex solutions, there is no wrong choice. Some methods are difficult to use, but others are quite simple, depending on the recognition samples.
We will continue to optimize this project. Currently, the system performs better in the absence of noise.

Original link: https://dzone.com/articles/using-ocr-for-receipt-recognition

Original author: Ivan Ozhiganov

Good news!
The Beginner's Visual Learning knowledge community is now open to the public👇👇👇




Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial
Reply "Chinese Tutorial for Extension Modules" on the “Beginner's Visual Learning” public account to download the first Chinese version of the OpenCV extension module tutorial on the internet, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than 20 chapters of content.

Download 2: Python Vision Practical Project 52 Lectures
Reply "Python Vision Practical Project" on the “Beginner's Visual Learning” public account to download 31 practical vision projects, including image segmentation, mask detection, lane line detection, vehicle counting, eye line addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help quickly learn computer vision.

Download 3: OpenCV Practical Projects 20 Lectures
Reply "OpenCV Practical Projects 20 Lectures" on the “Beginner's Visual Learning” public account to download 20 practical projects based on OpenCV for advanced learning.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups on SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these will gradually be subdivided). Please scan the WeChat ID below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format; otherwise, the request will not be approved. After successful addition, you will be invited to join the relevant WeChat group based on your research direction. Please do not send advertisements in the group, or you will be removed. Thank you for your understanding~

Preprocessing

Rotate Image to Recognize Receipts

Adaptive Binarization Technology

Using Convolutional Neural Networks (CNN)

Using Haar Feature Classifier to Recognize Receipts

Binarization

Text Detection

Detecting Text via Connected Components

Using Grids to Detect Text

Optical Character Recognition

Recognizing Text Found Through Connected Components

Recognizing Complete Words

LSTM Networks

Image Non-Uniform Segmentation Techniques

Extracting Meaning from Receipts

Conclusion

Leave a Comment Cancel reply