Implementing OCR for Digit Recognition from 0 to 9 Using OpenCV

Click on the above “Beginner’s Guide to Vision”, select to add “Star” or “Top”

Essential content delivered first-hand

Using OpenCV to recognize digits from 0 to 9, achieving a simple OCR function, feature extraction based on contour analysis, and digit recognition based on L1 distance calculation for matching. With interference eliminated, recognition accuracy can exceed 98%. The entire algorithm is divided into two parts: the first part is feature extraction, which achieves scale invariance and slight illumination and deformation interference exclusion, and the second part performs matching based on feature data to achieve similarity comparison, ultimately recognizing the ten digits from 0 to 9.

Detailed Explanation of the First Part:

The first part of the algorithm mainly implements the following functions: extracting 42 feature vectors, using 40 of them for matching recognition, and the remaining two for auxiliary checks. For example, there is a significant difference in the aspect ratio between 0 and 1. The main steps for feature extraction are as follows:

1. Image denoising and binarization
2. Contour discovery and ROI area segmentation
3. Horizontal and vertical projection to extract 20 vectors, and normalization
4. Grid segmentation 5×4 to extract 20 vectors, and normalization
5. Aspect ratio and blank ratio, totaling 42 vectors extracted.

Preprocessing is done by Gaussian blur for denoising, followed by global thresholding for image binarization, using contour discovery to extract the ROI rectangular area, and completing steps 3 to 5 for feature extraction in each area, where horizontal and vertical projections are demonstrated as follows:

The horizontal and vertical projections of the ROI area are divided into 10 bins. Considering the floating-point division, the length of each bin may not be an integer, so pixel points are proportionally divided using weights. Statistics of foreground pixel points for each bin are completed.

Similarly, the digit ROI area is subjected to a 5×4 grid segmentation, where the number of foreground pixels in each cell is calculated, also using proportional division with weights, ultimately obtaining 20 normalized feature vectors.

The 40 feature vectors obtained have scale invariance and slight resistance to deformation interference.

Detailed Explanation of the Second Part:

Running Screenshot:

Training Data:

Input Data:

Recognition Results:

Observational Conclusion

The training data and recognition data differ in font and size, yet matching based on the extracted features can recognize all, fully proving the scale invariance and local interference resistance of this recognition algorithm.

Execution Code

int main(int argc, char** argv) {
    Mat src = imread("D:/vcprojects/images/td1.png");
    if(src.empty()) {
        printf("could not load image...\n");
        return-1;
    }
    namedWindow("input image", CV_WINDOW_AUTOSIZE);
    imshow("input image", src);


    // Training
    train_data();


    // Testing
    test_data();

    waitKey(0);
    return0;
}

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply in the “Beginner’s Guide to Vision” public account backend:Extension Module Chinese Tutorial, to download the first OpenCV extension module tutorial in Chinese available online, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than twenty chapters.

Download 2: 52 Lectures on Python Visual Practical Projects

Reply in the “Beginner’s Guide to Vision” public account backend: Python Visual Practical Projects, to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects

Reply in the “Beginner’s Guide to Vision” public account backend: 20 Lectures on OpenCV Practical Projects, to download 20 practical projects based on OpenCV for advancing OpenCV learning.

Discussion Group

Welcome to join the public account reader group to exchange ideas with peers. Currently, there are WeChat groups for SLAM, three-dimensional vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (which will gradually be subdivided), please scan the WeChat ID below to join the group, with a note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please note as formatted, otherwise it will not be approved. After successfully adding, you will be invited to the relevant WeChat group based on research direction. Please do not send advertisements in the group, otherwise you will be removed from the group, thank you for your understanding~

Leave a Comment Cancel reply