Computer Vision: Essential Image Data Technologies

Click on the above“Beginner’s Guide to Vision” and choose to add Star or “Pin”

Valuable content delivered first-hand

Introduction

Since its inception, computer vision (Computer Vision) has rapidly and widely been applied in various fields, such as the face recognition technology we are familiar with and use daily based on smartphone cameras. In addition, it can assist cars in recognizing traffic signals, signs, and pedestrians in the field of autonomous driving; in manufacturing, it aids industrial robots in supervising and guiding manual operations.

The main goal of computer vision is to enable computers to see and recognize the world as well as or even better than humans. Computer vision typically uses programming languages such as C++, Python, and MATLAB, and is an important technology for augmented reality (AR). Currently, mainstream computer vision tools include OpenCV, TensorFlow, GPU, YOLO, Keras, etc. Computer vision is actually a complex and diverse interdisciplinary field that encompasses many concepts from digital signal processing, neuroscience, image processing, pattern recognition, machine learning (ML), robotics, and artificial intelligence (AI).

In this article, I would like to specifically introduce the workflow of computer vision.

What is Computer Vision

In short, computer vision is the field of technology that enables computers to understand and label the contents of images.

For example, please see the image below:

For humans, it is hard to explain to a primitive person who has never worn clothes what a dress or shoes are. Computer vision is similar; if it does not have relevant inputs, it will not understand what is in the image above.

Therefore, we need to collect and label a large number of images of clothes, shoes, and bags, input them into the computer to “tell” it what these images contain. After continuous learning and training, the computer will be able to recognize which is a dress, which is a shoe, and which is a bag.

Main Applications of Computer Vision

Computer vision is currently applied in countless fields; here are five representative applications:

Object and Behavior Recognition

Autonomous Vehicles

Medical Image Analysis and Diagnosis

Image Labeling

Face Recognition

Workflow of Computer Vision

The workflow of computer vision is a series of steps that most computer vision applications will undergo. Many vision applications begin with acquiring images and data, then processing the data, performing some analysis and recognition steps, and finally executing an action:

Workflow of Computer Vision

Take face recognition as an example; it mainly follows the workflow of computer vision:

Workflow of Face Recognition

We can see that most applications of computer vision actually start from data preprocessing, which is also key to machine learning.

Data Normalization

Preprocessing images means normalizing the input image data for the smooth progress of subsequent workflows. For example, suppose we create a simple clustering algorithm to distinguish red roses from other flowers:

We designed the algorithm to calculate the number of red pixels in a given image; if there are enough red pixels (more than 300 red pixels), it will be classified as a red rose. (In this example, we only extracted color features)

It is also important to note that the size and cropping of the input image will affect the output results of the algorithm, so data preprocessing is very important!

Images as Data

Each pixel in an image is a value we can change; for example, we can multiply a pixel by a scalar to change the image brightness, or we can shift each pixel value to the right to change the image saturation, etc.

Viewing images as digital grids is the basis of many image processing techniques. Generally, color and shape changes are accomplished through mathematical operations performed on the image pixel by pixel.

Training Neural Networks

To train a neural network, we must provide a set of labeled image data and then compare the differences between these input images and the predicted output labels or recognized measurements to detect the accuracy of the algorithm model. Deep learning based on neural networks supervises the errors it makes and iterates and fits by correcting the patterns and differences it finds between the image data.

Among them, gradient descent is a mathematical method for reducing errors in neural networks, and convolutional neural networks are a special type of neural network commonly used in computer vision applications, which we have detailed in our previous articles~

X = Input; a = Activation Function; W = Weights in Convolutional Neural Networks; J = Loss Function; Alpha = Learning Rate; y = Ground Truth; y = Prediction; k = Number of Iterations

Source: Shuyixueyuan

References

https://www.analyticsvidhya.com/blog/2020/11/computer-vision-a-key-concept-to-solve-many-problems-related-to-image-data/

Good news!
The Beginner's Guide to Vision Knowledge Group
Is now open to the public👇👇👇


Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
Reply "Extension Module Chinese Tutorial" in the "Beginner's Guide to Vision" WeChat public account backend to download the first Chinese version of the OpenCV extension module tutorial on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.

Download 2: Python Vision Practical Project 52 Lectures
Reply "Python Vision Practical Project" in the "Beginner's Guide to Vision" WeChat public account backend to download 31 vision practical projects, including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help quickly learn computer vision.

Download 3: OpenCV Practical Project 20 Lectures
Reply "OpenCV Practical Project 20 Lectures" in the "Beginner's Guide to Vision" WeChat public account backend to download 20 practical projects based on OpenCV, achieving advanced learning in OpenCV.

Discussion Group

Welcome to join the WeChat reader group to communicate with peers; currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat ID below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM". Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited to related WeChat groups based on research direction. Please do not send advertisements in the group; otherwise, you will be removed. Thank you for your understanding~

Leave a Comment Cancel reply