Introduction to Basic Tasks in Computer Vision

Click the above “Beginner Learning Vision”, select “Star” or “Top”

Heavyweight content delivered first-hand

If you are interested in or planning to do anything related to images or videos, you should definitely consider using computer vision. Computer Vision (CV) is a branch of Artificial Intelligence (AI) that enables computers to extract meaningful information from images, videos, and other visual inputs and take necessary actions. For example, autonomous vehicles, automated traffic management, surveillance, image-based quality inspection, and so on.

What is OpenCV?

OpenCV is a library primarily aimed at computer vision. It has all the tools you need when using Computer Vision (CV). “Open” stands for open-source, and “CV” stands for computer vision.

What Will I Learn?

This article contains everything you need to get started with computer vision using the OpenCV library. You will feel more confident and efficient in computer vision. All the code and data are here: https://www.kaggle.com/sonukiller99/open-cv-for-beginners

Reading and Displaying Images

First, let’s understand how to read and display an image, which is the foundational knowledge of CV.

Reading an image:

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img=cv2.imread('../input/images-for-computer-vision/tiger1.jpg')

‘img’ contains the image in the form of a numpy array. Let’s print its type and shape,

print(type(img))
print(img.shape)

The shape of the numpy array is (667, 1200, 3), where,

667 – image height, 1200 – image width, 3 – number of channels,

In this case, there are RGB channels, so we have 3 channels. The original image is in RGB form, but OpenCV reads the image in BGR by default, so we must convert it back to RGB before displaying.

Displaying the image:

# Converting image from BGR to RGB for displaying
img_convert=cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

Introduction to Basic Tasks in Computer Vision

Drawing on Images

We can draw lines, shapes, and text on images.

# Rectangle
color=(240,150,240) # Color of the rectangle
cv.rectangle(img, (100,100),(300,300),color,thickness=10, lineType=8) ## For filled rectangle, use thickness = -1
## (100,100) are (x,y) coordinates for the top left point of the rectangle and (300, 300) are (x,y) coordinates for the bottom right point

# Circle
color=(150,260,50)
cv.circle(img, (650,350),100, color,thickness=10) ## For filled circle, use thickness = -1
## (250, 250) are (x,y) coordinates for the center of the circle and 100 is the radius

# Text
color=(50,200,100)
font=cv.FONT_HERSHEY_SCRIPT_COMPLEX
cv.putText(img, 'Save Tigers',(200,150), font, 5, color,thickness=5, lineType=20)

# Converting BGR to RGB
img_convert=cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

Merging Images

We can also use OpenCV to merge two or more images. An image is just a number, and you can perform addition, subtraction, multiplication, and division on numbers to get an image. One thing to note is that the sizes of the images should be the same.

# For plotting multiple images at once
def myplot(images,titles):
    fig, axs=plt.subplots(1,len(images),sharey=True)
    fig.set_figwidth(15)
    for img,ax,title in zip(images,axs,titles):
        if img.shape[-1]==3:
            img=cv.cvtColor(img, cv.COLOR_BGR2RGB) # OpenCV reads images as BGR, so converting back them to RGB
        else:
            img=cv.cvtColor(img, cv.COLOR_GRAY2BGR)
        ax.imshow(img)
        ax.set_title(title)

img1 = cv.imread('../input/images-for-computer-vision/tiger1.jpg')
img2 = cv.imread('../input/images-for-computer-vision/horse.jpg')

# Resizing the img1
img1_resize = cv.resize(img1, (img2.shape[1], img2.shape[0]))

# Adding, Subtracting, Multiplying and Dividing Images
img_add = cv.add(img1_resize, img2)
img_subtract = cv.subtract(img1_resize, img2)
img_multiply = cv.multiply(img1_resize, img2)
img_divide = cv.divide(img1_resize, img2)

# Blending Images
img_blend = cv.addWeighted(img1_resize, 0.3, img2, 0.7, 0) ## 30% tiger and 70% horse
myplot([img1_resize, img2], ['Tiger','Horse'])
myplot([img_add, img_subtract, img_multiply, img_divide, img_blend], ['Addition', 'Subtraction', 'Multiplication', 'Division', 'Blending'])

The multiplication image is almost white, and the subtraction image is black, as white represents 255 and black represents 0. When we multiply the pixel values of two images, the resulting number is larger, thus its color becomes white or close to white, which is opposite to the subtraction image.

Image Transformations

Image transformations include translating, rotating, scaling, cropping, and flipping images.

img=cv.imread('../input/images-for-computer-vision/tiger1.jpg')

width, height, _=img.shape



# Translating
M_translate=np.float32([[1,0,200],[0,1,100]]) # 200=> Translation along x-axis and 100=>translation along y-axis

img_translate=cv.warpAffine(img,M_translate,(height,width)) 


# Rotating
center=(width/2,height/2)

M_rotate=cv.getRotationMatrix2D(center, angle=90, scale=1)

img_rotate=cv.warpAffine(img,M_rotate,(width,height))



# Scaling
scale_percent = 50

width = int(img.shape[1] * scale_percent / 100)

height = int(img.shape[0] * scale_percent / 100)

dim = (width, height)

img_scale = cv.resize(img, dim, interpolation = cv.INTER_AREA)



# Flipping
img_flip=cv.flip(img,1) # 0:Along horizontal axis, 1:Along vertical axis, -1: first along vertical then horizontal



# Shearing
srcTri = np.array( [[0, 0], [img.shape[1] - 1, 0], [0, img.shape[0] - 1]] ).astype(np.float32)
dstTri = np.array( [[0, img.shape[1]*0.33], [img.shape[1]*0.85, img.shape[0]*0.25], [img.shape[1]*0.15, img.shape[0]*0.7]] ).astype(np.float32)
warp_mat = cv.getAffineTransform(srcTri, dstTri)

img_warp = cv.warpAffine(img, warp_mat, (height, width))


myplot([img, img_translate, img_rotate, img_scale, img_flip, img_warp],
       ['Original Image', 'Translated Image', 'Rotated Image', 'Scaled Image', 'Flipped Image', 'Sheared Image'])

Image Preprocessing

Thresholding: In thresholding, pixel values below the threshold become 0 (black), and pixel values above the threshold become 255 (white).

I set the threshold to 150, but you can choose any other number.

# For visualising the filters
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def plot_3d(img1, img2, titles):
    fig = make_subplots(rows=1, cols=2,
                    specs=[[{'is_3d': True}, {'is_3d': True}]],
                    subplot_titles=[titles[0], titles[1]],
                    )
    x, y=np.mgrid[0:img1.shape[0], 0:img1.shape[1]]
    fig.add_trace(go.Surface(x=x, y=y, z=img1[:,:,0]), row=1, col=1)
    fig.add_trace(go.Surface(x=x, y=y, z=img2[:,:,0]), row=1, col=2)
    fig.update_traces(contours_z=dict(show=True, usecolormap=True,
                                  highlightcolor="limegreen", project_z=True))
    fig.show()

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')

# Pixel value less than threshold becomes 0 and more than threshold becomes 255

_,img_threshold=cv.threshold(img,150,255,cv.THRESH_BINARY)

plot_3d(img, img_threshold, ['Original Image', 'Threshold Image=150'])

After applying the threshold, the value of 150 becomes equal to 255

Filtering: Image filtering alters the appearance of an image by changing the pixel values. Each type of filter changes the pixel values according to its respective mathematical formula. I won’t go into detail about the math here, but I will demonstrate how each filter works by visualizing them in 3D.

If you are interested in the math behind filters, you can check: https://docs.opencv.org/4.5.3/d4/d86/group__imgproc__filter.html

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')

# Gaussian Filter
ksize=(11,11) # Both should be odd numbers
img_guassian=cv.GaussianBlur(img, ksize,0)
plot_3d(img, img_guassian, ['Original Image','Guassian Image'])

# Median Filter
ksize=11
img_medianblur=cv.medianBlur(img,ksize)
plot_3d(img, img_medianblur, ['Original Image','Median blur'])

# Bilateral Filter
img_bilateralblur=cv.bilateralFilter(img,d=5, sigmaColor=50, sigmaSpace=5)
myplot([img, img_bilateralblur],['Original Image', 'Bilateral blur Image'])
plot_3d(img, img_bilateralblur, ['Original Image','Bilateral blur'])

Gaussian Filter: Blurs the image by removing details and noise. For more details, you can read: https://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm

Median Filter: A nonlinear process that can be used to reduce impulse noise or salt-and-pepper noise

Bilateral Filter: Edge-preserving and noise-reducing smoothing.

In simple terms, filters help reduce or remove noise from brightness or color variations, which is called smoothing.

Feature Detection

Feature detection is a method of making local decisions at each point in an image by computing the abstraction of image information. For example, for an image of a face, features are the eyes, nose, lips, ears, etc., and we try to recognize these features.

Let’s first try to detect the edges of the image.

Edge Detection

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_canny1=cv.Canny(img,50, 200)
# Smoothing the img before feeding it to canny
filter_img=cv.GaussianBlur(img, (7,7), 0)
img_canny2=cv.Canny(filter_img,50, 200)
myplot([img, img_canny1, img_canny2],
       ['Original Image', 'Canny Edge Detector(Without Smoothing)', 'Canny Edge Detector(With Smoothing)'])

Here we use the Canny edge detector, which is an edge detection operator that uses a multi-stage algorithm to detect various edges in an image. It was developed by John F. Canny in 1986. I won’t go into detail about how Canny works, but the key point here is that it is used to extract edges.

To learn more about how it works, you can check: https://towardsdatascience.com/canny-edge-detection-step-by-step-in-python-computer-vision-b49c3a2d8123

Before using the Canny edge detection method to detect edges, we smooth the image to remove noise. As you can see from the image, we get clear edges after smoothing.

Contours

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_copy=img.copy()
img_gray=cv.cvtColor(img,cv.COLOR_BGR2GRAY)
_,img_binary=cv.threshold(img_gray,50,200,cv.THRESH_BINARY)
# Edroing and Dilating for smooth contours
img_binary_erode=cv.erode(img_binary,(10,10), iterations=5)
img_binary_dilate=cv.dilate(img_binary,(10,10), iterations=5)
contours,hierarchy=cv.findContours(img_binary,cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
cv.drawContours(img, contours,-1,(0,0,255),3) # Draws the contours on the original image just like draw function
myplot([img_copy, img], ['Original Image', 'Contours in the Image'])

Erosion is an operation that removes pixels on object boundaries, used to detect and reduce the shapes contained in the image.

Dilation: Adds pixels to the boundaries of objects in the image, the opposite of erosion

Hulls

img=cv.imread('../input/images-for-computer-vision/simple_shapes.png',0)
_,threshold=cv.threshold(img,50,255,cv.THRESH_BINARY)
contours,hierarchy=cv.findContours(threshold,cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
hulls=[cv.convexHull(c) for c in contours]
img_hull=cv.drawContours(img, hulls,-1,(0,0,255),2) # Draws the contours on the original image just like draw function
plt.imshow(img)

Conclusion

We have seen how to read and display images, draw shapes and text on images, blend two images, perform transformations like rotation, scaling, translation, etc., filter images using Gaussian blur, median blur, bilateral blur, and detect features using Canny edge detection and find contours in images.

Good news!
The Beginner Learning Vision Knowledge Planet is now open to the public👇👇👇



Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
Reply "Extension Module Chinese Tutorial" in the background of the "Beginner Learning Vision" public account to download the first OpenCV extension module tutorial in Chinese on the internet, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: 52 Lectures on Python Vision Practical Projects
Reply "Python Vision Practical Projects" in the background of the "Beginner Learning Vision" public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, to help quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects
Reply "OpenCV Practical Projects 20 Lectures" in the background of the "Beginner Learning Vision" public account to download 20 practical projects based on OpenCV to advance learning of OpenCV.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. There are currently WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format, otherwise, it will not be approved. After successful addition, you will be invited to join relevant WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for understanding~