Click the above “Beginner’s Visual Learning” to choose to add a bookmark or “pin”
Important information delivered immediately
In this article, we will apply OCR on the selected region of an image using OpenCV. By the end of this article, we will be able to apply automatic orientation correction to the input image, select the region of interest, and apply OCR to the selected area.
This article is based on Python 3.x and assumes that we have installed Pytesseract and OpenCV. Pytesseract is a Python wrapper library that uses the Tesseract engine for OCR. Therefore, if we have not installed the Tesseract engine, please download and install it from https://github.com/UB-Mannheim/tesseract/wiki and properly set the TESSDATA_PREFIX environment variable and the path variable.
Diving into the code, let’s start by importing the required libraries:
# Importing necessary libraries
import numpy as np
import cv2
import math
from scipy import ndimage
import pytesseract
Now, let’s read the image file into Python using OpenCV’s imread() method.
IMAGE_FILE_LOCATION = "test_image.jpg" # Photo by Amanda Jones on Unsplash
input_img = cv2.imread(IMAGE_FILE_LOCATION) # image read
Before directly extracting the region of interest, let’s check its orientation, as we often notice that the orientation of documents or images is incorrect, leading to poor OCR results. So now we will adjust the orientation of the input image to ensure better OCR results.
Here, we apply two algorithms to detect the orientation of the input image: the Canny algorithm (for edge detection) and HoughLines (for line detection).
We then measure the angles of the lines and take the median of these angles to estimate the orientation angle. We will rotate the image by this median angle to convert it to the correct orientation for further steps.
Don’t worry, OpenCV can accomplish this task with just a few lines of code!
###################################################################################################### ORIENTATION CORRECTION/ADJUSTMENT
def orientation_correction(img, save_image = False): # GrayScale Conversion for the Canny Algorithm
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Canny Algorithm for edge detection was developed by John F. Canny not Kennedy!! :)
img_edges = cv2.Canny(img_gray, 100, 100, apertureSize=3) # Using Houghlines to detect lines
lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5)
# Finding angle of lines in polar coordinates
angles = []
for x1, y1, x2, y2 in lines[0]:
angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
angles.append(angle)
# Getting the median angle
median_angle = np.median(angles)
# Rotating the image with this median angle
img_rotated = ndimage.rotate(img, median_angle)
if save_image:
cv2.imwrite('orientation_corrected.jpg', img_rotated)
return img_rotated
#####################################################################################################
img_rotated = orientation_correction(input_img)
The next step is to extract the region of interest from the image.
Therefore, we first set up an event listener for the mouse, allowing users to select the region of interest. Here, we set two conditions: one for the left mouse button pressed and the other for the left mouse button released.
We store the starting coordinates when the left mouse button is pressed and the ending coordinates when the left mouse button is released. Then when the “enter” key is pressed, we extract the area between these starting and ending coordinates, and if “c” is pressed, we clear the coordinates.
###################################################################################################### REGION OF INTEREST (ROI) SELECTION
# initializing the list for storing the coordinates
coordinates = []
# Defining the event listener (callback function)
def shape_selection(event, x, y, flags, param):
# making coordinates global
global coordinates
# Storing the (x1,y1) coordinates when left mouse button is pressed
if event == cv2.EVENT_LBUTTONDOWN:
coordinates = [(x, y)]
# Storing the (x2,y2) coordinates when the left mouse button is released and make a rectangle on the selected region
elif event == cv2.EVENT_LBUTTONUP:
coordinates.append((x, y))
# Drawing a rectangle around the region of interest (roi)
cv2.rectangle(image, coordinates[0], coordinates[1], (0,0,255), 2)
cv2.imshow("image", image)
# load the image, clone it, and setup the mouse callback function
image = img_rotated
image_copy = image.copy()
cv2.namedWindow("image")
cv2.setMouseCallback("image", shape_selection)
# keep looping until the 'q' key is pressed
while True: # display the image and wait for a keypress
cv2.imshow("image", image)
key = cv2.waitKey(1) & 0xFF
if key==13: # If 'enter' is pressed, apply OCR
break
if key == ord("c"): # Clear the selection when 'c' is pressed
image = image_copy.copy()
if len(coordinates) == 2:
image_roi = image_copy[coordinates[0][1]:coordinates[1][1], coordinates[0][0]:coordinates[1][0]]
cv2.imshow("Selected Region of Interest - Press any key to proceed", image_roi)
cv2.waitKey(0)
# closing all open windows
cv2.destroyAllWindows()
#####################################################################################################

Bounding box of the region of interest
Now, let’s apply Optical Character Recognition (OCR) using Pytesseract on the ROI. (Google Vision or Azure Vision can also be used instead of the Tesseract engine).
##################################################################### OPTICAL CHARACTER RECOGNITION (OCR) ON ROI
text = pytesseract.image_to_string(image_roi)
print("The text in the selected region is as follows:")
print(text)
The output is:
"Isn't it a greenhouse?"
Computer vision and optical character recognition can solve many problems in fields such as law (digitizing old court judgments), finance (extracting important information from loan agreements, land registration), etc.
GitHub code link:
https://github.com/ChayanBansal/roi_selection_and_ocr_with_orientation_correction
Group Chat
Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions (which will be gradually subdivided in the future). Please scan the WeChat number below to join the group, and note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format, otherwise, you will not be approved. After successfully adding, you will be invited to join the relevant WeChat group based on your research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~