Previously, I worked with OpenCV, but now the company has a project for OCR, and I’ve implemented it using Halcon. There is a lot of information online about OCR teaching, but it can be overwhelming. Below is the practical implementation based on the materials and the current project.
First, we need to create a sample set of characters to establish the correspondence between images and text. The character sample set needs to be extracted from images. When extracting, you can standardize the size of the character images. I did not standardize the size here.
Image Extraction Code:
read_image (Image, 'E:/image/test.png') * Convert image to grayscale rgb1_to_gray (Image, GrayImage) * Enhance the image scale_image (GrayImage, ImageScaled, 3.4, -323) * Binarize the image threshold (ImageScaled, Regions, 85, 255) * Segment each region for later filtering. connection (Regions, ConnectedRegions) * Use several feature sets to filter out the required characters select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [187.62,24.2,29.46], [1031.79,50.47,58.84]) select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [250,18.33,0], [2000,36.48,83.33]) * After extracting the regions, calculate the angle to align the characters. Here, we combine the selected regions into one region and calculate the angle of this total region union1 (SelectedRegions, RegionUnion) dilation_circle (RegionUnion, RegionClosing, 6.5) * Calculate the angle of the bounding rectangle shape_trans (RegionUnion, RegionTrans, 'rectangle2') * Used to calculate the direction of the input region. orientation_region (RegionTrans, Phi) * Calculate the center of the region area_center (RegionTrans, Area, Row, Column) * Convert the angle to a matrix vector_angle_to_rigid (Row, Column, Phi-(3.1415/2), Row, Column, rad(0), HomMat2D) * Rotate the region and image according to the transformation matrix affine_trans_region (RegionTrans, RegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_region (SelectedRegions, SelectRegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_image (Image, ImageAffineTrans, HomMat2D, 'constant', 'false') connection (RegionClosing, ConnectedRegionClosing) intersection (ConnectedRegionClosing, SelectRegionAffineTrans, RegionIntersection) count_obj (RegionIntersection, Number) count_obj (ConnectedRegionClosing, Number1) * Iterate through each region for Index := 1 to Number by 1 * Select the current region select_obj (RegionIntersection, SingleRegion, Index) * Crop the transformed image region reduce_domain (ImageAffineTrans, SingleRegion, PartImage) * Get the minimum bounding rectangle image (remove excess background) crop_domain (PartImage, CroppedImage) * Generate file name, for example: Filename := 'E:/Save/'+ 'part_' + (Index$'02d') + '.png' * Save the image in PNG format write_image (CroppedImage, 'png', 0, Filename) endfor
Note:
The above code handles a single image. Generally, batch processing is done on all images in a folder.
list_image_files(FolderPath, [‘png’], [], ImageFiles)
Use a for loop to iterate.
Here are some of my output results (character sample set):

Next, we can start establishing the correspondence between text and characters. The specific code is as follows:
* Define folder path * FolderPath := 'E:/Save/' * List all PNG files * list_image_files(FolderPath, ['png'], [], ImageFiles) * Check if the array is empty * if (|ImageFiles| > 0) * Iterate through each file and output the file name * for Index := 0 to |ImageFiles| - 1 by 1 * FileName := ImageFiles[Index] * last := strrchr(FileName,'/') * dotPos := strchr(FileName, '.') * tuple_substr (FileName, last+1, dotPos-1, Substring) * read_image (Image1, FileName) * rgb1_to_gray (Image1, GrayImage1) * threshold (GrayImage1, Region, 128, 255) * append_ocr_trainf (Region, GrayImage1, Substring, 'Myself') * endfor * endif
Finally, a .trf file will be generated, which contains the one-to-one mapping relationship between characters and images.

Next, we start creating the OCR model based on the .trf file generated above.
* Read the generated trf file read_ocr_trainf_names ('Myself', CharacterNames, CharacterCount) * Create OCR model create_ocr_class_mlp (8, 10, 'constant', 'default', CharacterNames, 80, 'none', 10, 42, OCRHandle) * Train our OCR model trainf_ocr_class_mlp (OCRHandle, 'Myself.trf', 200, 1, 0.01, Error, ErrorLog)
At this point, the model is created. Now we start processing the actual images.
Read image -> Image preprocessing -> Character segmentation (when segmenting individual characters, pay attention to characters that are not connected but belong to the same character, use the intersection operator) -> Character recognition -> Output results -> Done.
The specific code is as follows:
read_image (Image, 'E:/image/input.png') * Convert image to grayscale rgb1_to_gray (Image, GrayImage) * Open window get_image_size (GrayImage, Width, Height) dev_open_window (0, 0, Width, Height, 'black', WindowHandle) dev_display (Image) * Enhance the image scale_image (GrayImage, ImageScaled, 3.4, -323) * Binarize the image threshold (ImageScaled, Regions, 85, 255) * Segment each region for later filtering. connection (Regions, ConnectedRegions) * Use several feature sets to filter out the required characters select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [187.62,24.2,29.46], [1031.79,50.47,58.84]) * select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [250,18.33,0], [2000,36.48,83.33]) * After extracting the regions, calculate the angle to align the characters. Here, we combine the selected regions into one region and calculate the angle of this total region union1 (SelectedRegions, RegionUnion) dilation_circle (RegionUnion, RegionClosing, 6.5) * Calculate the angle of the bounding rectangle shape_trans (RegionUnion, RegionTrans, 'rectangle2') * Used to calculate the direction of the input region. orientation_region (RegionTrans, Phi) * Calculate the center of the region area_center (RegionTrans, Area, Row, Column) * Convert the angle to a matrix vector_angle_to_rigid (Row, Column, Phi-(3.1415/2), Row, Column, rad(0), HomMat2D) * Rotate the region and image according to the transformation matrix affine_trans_region (RegionTrans, RegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_region (SelectedRegions, SelectRegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_image (Image, ImageAffineTrans, HomMat2D, 'constant', 'false') connection (RegionClosing, ConnectedRegionClosing) intersection (ConnectedRegionClosing, SelectRegionAffineTrans, RegionIntersection) count_obj (RegionIntersection, Number) count_obj (ConnectedRegionClosing, Number1) * Iterate through each region for Index := 1 to Number by 1 * Select the current region select_obj (RegionIntersection, SingleRegion, Index) do_ocr_multi_class_mlp (SingleRegion, GrayImage, OCRHandle, Class, Confidence) smallest_rectangle1 (SingleRegion, Row1, Column1, Row2, Column2) disp_message (WindowHandle, Class[0], 'window', Row2 , Column2, 'black', 'true') endfor
Results:


Complete Code:
* read_image (Image, 'E:/image/0100蓝膜相机.png') * Convert image to grayscale * rgb1_to_gray (Image, GrayImage) * Enhance the image * scale_image (GrayImage, ImageScaled, 3.4, -323) * Binarize the image * threshold (ImageScaled, Regions, 85, 255) * Segment each region for later filtering. * connection (Regions, ConnectedRegions) * Use several feature sets to filter out the required characters * select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [187.62,24.2,29.46], [1031.79,50.47,58.84]) * select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [250,18.33,0], [2000,36.48,83.33]) * After extracting the regions, calculate the angle to align the characters. Here, we combine the selected regions into one region and calculate the angle of this total region * union1 (SelectedRegions, RegionUnion) * dilation_circle (RegionUnion, RegionClosing, 6.5) * Calculate the angle of the bounding rectangle * shape_trans (RegionUnion, RegionTrans, 'rectangle2') * Used to calculate the direction of the input region. * orientation_region (RegionTrans, Phi) * Calculate the center of the region * area_center (RegionTrans, Area, Row, Column) * Convert the angle to a matrix * vector_angle_to_rigid (Row, Column, Phi-(3.1415/2), Row, Column, rad(0), HomMat2D) * Rotate the region and image according to the transformation matrix * affine_trans_region (RegionTrans, RegionAffineTrans, HomMat2D, 'nearest_neighbor') * affine_trans_region (SelectedRegions, SelectRegionAffineTrans, HomMat2D, 'nearest_neighbor') * affine_trans_image (Image, ImageAffineTrans, HomMat2D, 'constant', 'false') * connection (RegionClosing, ConnectedRegionClosing) * intersection (ConnectedRegionClosing, SelectRegionAffineTrans, RegionIntersection) * count_obj (RegionIntersection, Number) * count_obj (ConnectedRegionClosing, Number1) * Iterate through each region * for Index := 1 to Number by 1 * Select the current region * select_obj (RegionIntersection, SingleRegion, Index) * Crop the transformed image region * reduce_domain (ImageAffineTrans, SingleRegion, PartImage) * Get the minimum bounding rectangle image (remove excess background) * crop_domain (PartImage, CroppedImage) * Generate file name, for example: * Filename := 'E:/Save/'+ 'part_' + (Index$'02d') + '.png' * Save the image in PNG format * write_image (CroppedImage, 'png', 0, Filename) * endfor * Define folder path * FolderPath := 'E:/Save/' * List all PNG files * list_image_files(FolderPath, ['png'], [], ImageFiles) * Check if the array is empty * if (|ImageFiles| > 0) * Iterate through each file and output the file name * for Index := 0 to |ImageFiles| - 1 by 1 * FileName := ImageFiles[Index] * last := strrchr(FileName,'/') * dotPos := strchr(FileName, '.') * tuple_substr (FileName, last+1, dotPos-1, Substring) * read_image (Image1, FileName) * rgb1_to_gray (Image1, GrayImage1) * threshold (GrayImage1, Region, 128, 255) * append_ocr_trainf (Region, GrayImage1, Substring, 'Myself') * endfor * endif * Close window dev_close_window () * Read the generated trf file read_ocr_trainf_names ('Myself', CharacterNames, CharacterCount) * Create OCR model create_ocr_class_mlp (8, 10, 'constant', 'default', CharacterNames, 80, 'none', 10, 42, OCRHandle) * Train our OCR model trainf_ocr_class_mlp (OCRHandle, 'Myself.trf', 200, 1, 0.01, Error, ErrorLog) read_image (Image2, 'E:/Save/1.png') rgb1_to_gray (Image2, GrayImage2) threshold (GrayImage2, Region1, 128, 255) * Start recognition do_ocr_multi_class_mlp (Region1, GrayImage2, OCRHandle, Class, Confidence) read_image (Image, 'E:/image/0100蓝膜相机.png') * Convert image to grayscale rgb1_to_gray (Image, GrayImage) * Open window get_image_size (GrayImage, Width, Height) dev_open_window (0, 0, Width, Height, 'black', WindowHandle) dev_display (Image) * Enhance the image scale_image (GrayImage, ImageScaled, 3.4, -323) * Binarize the image threshold (ImageScaled, Regions, 85, 255) * Segment each region for later filtering. connection (Regions, ConnectedRegions) * Use several feature sets to filter out the required characters select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [187.62,24.2,29.46], [1031.79,50.47,58.84]) * select_shape (ConnectedRegions, SelectedRegions, ['area','width','height'], 'and', [250,18.33,0], [2000,36.48,83.33]) * After extracting the regions, calculate the angle to align the characters. Here, we combine the selected regions into one region and calculate the angle of this total region union1 (SelectedRegions, RegionUnion) dilation_circle (RegionUnion, RegionClosing, 6.5) * Calculate the angle of the bounding rectangle shape_trans (RegionUnion, RegionTrans, 'rectangle2') * Used to calculate the direction of the input region. orientation_region (RegionTrans, Phi) * Calculate the center of the region area_center (RegionTrans, Area, Row, Column) * Convert the angle to a matrix vector_angle_to_rigid (Row, Column, Phi-(3.1415/2), Row, Column, rad(0), HomMat2D) * Rotate the region and image according to the transformation matrix affine_trans_region (RegionTrans, RegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_region (SelectedRegions, SelectRegionAffineTrans, HomMat2D, 'nearest_neighbor') affine_trans_image (Image, ImageAffineTrans, HomMat2D, 'constant', 'false') connection (RegionClosing, ConnectedRegionClosing) intersection (ConnectedRegionClosing, SelectRegionAffineTrans, RegionIntersection) count_obj (RegionIntersection, Number) count_obj (ConnectedRegionClosing, Number1) * Iterate through each region for Index := 1 to Number by 1 * Select the current region select_obj (RegionIntersection, SingleRegion, Index) do_ocr_multi_class_mlp (SingleRegion, GrayImage, OCRHandle, Class, Confidence) smallest_rectangle1 (SingleRegion, Row1, Column1, Row2, Column2) disp_message (WindowHandle, Class[0], 'window', Row2 , Column2, 'black', 'true') endfor