Using OpenCV and Tesseract for OCR on Image ROIs

Click the above “Beginner’s Visual Learning” to choose to add a bookmark or “pin

Important information delivered immediately

Using OpenCV and Tesseract for OCR on Image ROIs

In this article, we will apply OCR on the selected region of an image using OpenCV. By the end of this article, we will be able to apply automatic orientation correction to the input image, select the region of interest, and apply OCR to the selected area.

This article is based on Python 3.x and assumes that we have installed Pytesseract and OpenCV. Pytesseract is a Python wrapper library that uses the Tesseract engine for OCR. Therefore, if we have not installed the Tesseract engine, please download and install it from https://github.com/UB-Mannheim/tesseract/wiki and properly set the TESSDATA_PREFIX environment variable and the path variable.

Diving into the code, let’s start by importing the required libraries:

# Importing necessary libraries
import numpy as np
import cv2
import math
from scipy import ndimage
import pytesseract

Now, let’s read the image file into Python using OpenCV’s imread() method.

IMAGE_FILE_LOCATION = "test_image.jpg"  # Photo by Amanda Jones on Unsplash
input_img = cv2.imread(IMAGE_FILE_LOCATION)  # image read

Using OpenCV and Tesseract for OCR on Image ROIs

Before directly extracting the region of interest, let’s check its orientation, as we often notice that the orientation of documents or images is incorrect, leading to poor OCR results. So now we will adjust the orientation of the input image to ensure better OCR results.

Here, we apply two algorithms to detect the orientation of the input image: the Canny algorithm (for edge detection) and HoughLines (for line detection).

We then measure the angles of the lines and take the median of these angles to estimate the orientation angle. We will rotate the image by this median angle to convert it to the correct orientation for further steps.

Don’t worry, OpenCV can accomplish this task with just a few lines of code!

###################################################################################################### ORIENTATION CORRECTION/ADJUSTMENT
def orientation_correction(img, save_image = False):    # GrayScale Conversion for the Canny Algorithm  
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)     # Canny Algorithm for edge detection was developed by John F. Canny not Kennedy!! :)
    img_edges = cv2.Canny(img_gray, 100, 100, apertureSize=3)    # Using Houghlines to detect lines
    lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5)
    # Finding angle of lines in polar coordinates
    angles = []
    for x1, y1, x2, y2 in lines[0]:
        angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
        angles.append(angle)
    # Getting the median angle
    median_angle = np.median(angles)
    # Rotating the image with this median angle
    img_rotated = ndimage.rotate(img, median_angle)
    if save_image:
        cv2.imwrite('orientation_corrected.jpg', img_rotated)
    return img_rotated
#####################################################################################################
img_rotated = orientation_correction(input_img)

Using OpenCV and Tesseract for OCR on Image ROIs

Image after orientation adjustment

The next step is to extract the region of interest from the image.

Therefore, we first set up an event listener for the mouse, allowing users to select the region of interest. Here, we set two conditions: one for the left mouse button pressed and the other for the left mouse button released.

We store the starting coordinates when the left mouse button is pressed and the ending coordinates when the left mouse button is released. Then when the “enter” key is pressed, we extract the area between these starting and ending coordinates, and if “c” is pressed, we clear the coordinates.

###################################################################################################### REGION OF INTEREST (ROI) SELECTION
# initializing the list for storing the coordinates
coordinates = [] 
# Defining the event listener (callback function)
def shape_selection(event, x, y, flags, param):
    # making coordinates global
    global coordinates 
    # Storing the (x1,y1) coordinates when left mouse button is pressed  
    if event == cv2.EVENT_LBUTTONDOWN:
        coordinates = [(x, y)] 
    # Storing the (x2,y2) coordinates when the left mouse button is released and make a rectangle on the selected region
    elif event == cv2.EVENT_LBUTTONUP:
        coordinates.append((x, y)) 
        # Drawing a rectangle around the region of interest (roi)        
        cv2.rectangle(image, coordinates[0], coordinates[1], (0,0,255), 2) 
        cv2.imshow("image", image) 

# load the image, clone it, and setup the mouse callback function
image = img_rotated
image_copy = image.copy()
cv2.namedWindow("image") 
cv2.setMouseCallback("image", shape_selection) 

# keep looping until the 'q' key is pressed 
while True: # display the image and wait for a keypress     
    cv2.imshow("image", image)     
    key = cv2.waitKey(1) & 0xFF
    if key==13: # If 'enter' is pressed, apply OCR
        break
    if key == ord("c"): # Clear the selection when 'c' is pressed         
        image = image_copy.copy() 
    if len(coordinates) == 2:     
        image_roi = image_copy[coordinates[0][1]:coordinates[1][1],                                coordinates[0][0]:coordinates[1][0]]     
        cv2.imshow("Selected Region of Interest - Press any key to proceed", image_roi)     
        cv2.waitKey(0) 
# closing all open windows 
cv2.destroyAllWindows()  
#####################################################################################################
Using OpenCV and Tesseract for OCR on Image ROIs

Bounding box of the region of interest

Now, let’s apply Optical Character Recognition (OCR) using Pytesseract on the ROI. (Google Vision or Azure Vision can also be used instead of the Tesseract engine).

##################################################################### OPTICAL CHARACTER RECOGNITION (OCR) ON ROI
text = pytesseract.image_to_string(image_roi)
print("The text in the selected region is as follows:")
print(text)

The output is:

"Isn't it a greenhouse?"

Computer vision and optical character recognition can solve many problems in fields such as law (digitizing old court judgments), finance (extracting important information from loan agreements, land registration), etc.

GitHub code link:

https://github.com/ChayanBansal/roi_selection_and_ocr_with_orientation_correction

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
Reply in the backend of the “Beginner’s Visual Learning” WeChat public account: Extension Module Chinese Tutorial, to download the first Chinese version of the OpenCV extension module tutorial available online, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters.
Download 2: 52 Lectures on Practical Python Vision Projects
Reply in the backend of the “Beginner’s Visual Learning” WeChat public account: Python Vision Practical Projects, to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, adding eyeliner, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to assist in quickly learning computer vision.
Download 3: 20 Lectures on Practical OpenCV Projects
In the “Beginner’s Visual Learning” WeChat public account, reply: 20 Practical OpenCV Projects, to download 20 practical projects based on OpenCV to advance your learning of OpenCV.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions (which will be gradually subdivided in the future). Please scan the WeChat number below to join the group, and note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format, otherwise, you will not be approved. After successfully adding, you will be invited to join the relevant WeChat group based on your research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~

Using OpenCV and Tesseract for OCR on Image ROIs

Using OpenCV and Tesseract for OCR on Image ROIs

Leave a Comment