Super Simple Implementation of OCR Text Recognition Application

Click on the above “Visual Learning for Beginners” and choose to add “Star” or “Top”

Important content delivered in real-time

1. Introduction

Recently, I encountered a small problem where I needed to extract some text information printed on images. Since there were quite a few images—several with hundreds of characters each—manually typing them out seemed wasteful. However, the software I found either produced numerous typos in the recognized text, making it more troublesome to correct than to type it out directly, or the output looked poor. Of course, I couldn’t write one myself; from data collection to training a model to actual output, it would likely take at least half a year, and the typos might not be fewer than those from others… So I planned to check whether some cloud computing platforms might provide “affordable” APIs for individual developers to achieve similar functionality.

In fact, common cloud computing platforms do offer such features, including the three major domestic players BAT. This time, we will take Baidu Cloud Platform as an example, and we will use Python 3.6.

2. Library and ID Configuration Steps

First, we can open the official website of Baidu Cloud:

Just search on Baidu, and then we find the text recognition option in the homepage – Products – Artificial Intelligence.

Then select “Use Now” on the pop-up page.

Here, we need to create a Baidu account to use Baidu Cloud Platform’s features. If you have a friend who has used a forum, you can use that account. After logging in and entering the “Text Recognition” interface, we can see:

Select to create an application, choose the features you need and give your application a name and category.

For example, I created an application called Text Recognition, and I can enter the application management interface to see:

Here we need the three pieces of information highlighted in red: AppID, API Key, and Secret Key, which are used to ensure that the requests we send to Baidu Cloud Platform can be recognized.

Next, we install the third-party library baidu-aip in Python 3.6:

pip install baidu-aip

Thus, we have completed the environment configuration for “Text Recognition”.

3. Script Writing

Writing the corresponding tool script is very simple. First, let’s look at our test set, which consists of two images containing text placed in the script working directory under data/ImageForOcr.

Now let’s write our Python script, starting with importing two libraries:

from aip import AipOcr
import os

Aip is the library we installed from Baidu.

Next, let’s write a function to read images:

dir='data\ImageForOcr'
def read_image(path):    dir_i=dir+'\'    print(dir_i+path)    with open(dir_i+path,'rb') as f:        image=f.read()    return image

The path parameter is used to pass in the name of the image we want to read. Then, we write the main program:

api_key='**************'
app_id='**********'
secret_key='********************'
client=AipOcr(app_id,api_key,secret_key)
fs=os.listdir(dir)
file=open(r'output.txt','w',encoding='utf-8')
for image in fs:    i=read_image(image)    inf=client.basicGeneral(i)    for response in inf['words_result']:        for words in response['words']:            file.write(words)    print(inf)
file.close()

The os library from Python’s standard library provides the listdir method to find all files and folders in the specified directory, returning a list of these names. Here, we use it to iterate through and read all images in the data/ImageForOcr directory and call Baidu Cloud’s API for recognition.

The contents of api_key, app_id, and secret_key correspond to the three pieces of information we applied for earlier in the application, all input as strings. The AipOcr function is a class method provided by aip, and basicGeneral is the method used by this object for text recognition, corresponding to the “General Text Recognition” feature provided by Baidu Cloud, which takes an image as a parameter and returns the recognized text information. Usually, it returns as a JSON-like object, but in Python, it naturally returns as a dict object, with the returned content as follows:
{
    'log_id': 9021892210976551911,
     'words_result_num': 14, 
    'words_result':[
        {'words':"xxxxxxxxxxx"},
        {'words':"xxxxxxxxxxx"},
        ...
    ]
}
However, sometimes errors may occur due to certain situations, and the returned content may include err_msg and similar messages.

(For more details, refer to the official technical documentation from Baidu Cloud)

For our two images, the console output is as follows:

Furthermore, we created a file named output.txt in the script working directory, which records the content we obtained from the recognized images, with almost no typos:

Thus, the script is complete. Those who need the source code can visit our GitHub (though such a short script may not need it much).

Some readers may consider whether it’s possible to retain the text in the original image format after recognition. This can be achieved by changing basicGeneral to general, which will return information that includes the shape and position of the text. This method corresponds to the “General Text Recognition (with Position Information)”.

Additionally, the API calls provided by Baidu Cloud are not free (as expected). However, at least at this time, there are generally free usage limits, such as the text recognition API described in this article, which allows 500 free uses per day for the general version, with a QPS limit. However, for personal tool use, this is completely sufficient.

4. Conclusion

So, is there only this much to discuss in this article? … Yes~. We can also achieve similar functionalities using APIs from Aliyun, Tencent Cloud, or Google Cloud. As for more in-depth usage methods, their official technical documentation will certainly be much more detailed. The main aim is to introduce everyone (including myself) to some of the AI platforms that have developed in recent years.

However, we can also see that as the times develop, various stages of artificial intelligence are gradually differentiating, from designing optimization algorithms using applied mathematics tools, to training models using existing algorithms, to applying already trained excellent models (some humorously refer to it as the three major career shifts in artificial intelligence: master, alchemist, and package-switcher (of course, this is just a humorous term)), the artificial intelligence industry is also gradually taking shape.

Download 1: OpenCV-Contrib Extension Module Chinese Version Tutorial

In the “Visual Learning for Beginners” WeChat public account, reply:Extension Module Chinese Tutorial, to download the first OpenCV extension module tutorial in Chinese, covering extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures

In the “Visual Learning for Beginners” WeChat public account, reply: Python Vision Practical Project, to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help quickly learn computer vision.

Download 3: OpenCV Practical Projects 20 Lectures

In the “Visual Learning for Beginners” WeChat public account, reply: OpenCV Practical Projects 20 Lectures, to download 20 practical projects based on OpenCV to achieve advanced learning in OpenCV.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D Vision, Sensors, Autonomous Driving, Computational Photography, Detection, Segmentation, Recognition, Medical Imaging, GAN, Algorithm Competitions, etc. (will gradually be subdivided in the future), Please scan the WeChat ID below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Vision SLAM”. Please do not send advertisements in the group, otherwise you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply