Computer Vision: Understanding the World Through Machines

AI Science Popularization 15

Computer Vision:

Understanding the World Through Machines

Digital Age

Facial Recognition

Computer

Image Processing

With the development of the digital age, the amount of information carried by images and videos is rapidly surpassing that conveyed by structured data primarily consisting of text and numbers. In this context, the gap between the processing capabilities of computer vision and the rapidly increasing amount of information has become increasingly apparent. In this issue, we will explore the concept of computer vision, its basic tasks, operating principles, and application scenarios.

1. Concept

Computer Vision (计算机视觉) is born out of the goal to achieve or exceed human visual perception capabilities. It aims to simulate the human visual system to gain a high-level understanding of visual materials from digital images or videos, encompassing pattern recognition, image processing, image analysis, and machine vision. In short, computer vision is about understanding the series of information hidden behind images.

In the above definition, you might be confused by the terms “image processing, machine vision, computer vision.” Actually, they can be clearly distinguished with the help of the following diagram.

Image Processing has images as both input and output. For example, we often use Photoshop to add filter effects, watermarks, or compress images, which only changes some visual attributes of the image.

On the other hand, machine vision and computer vision can both recognize and segment the target content in images. However, the difference lies in that computer vision is an interdisciplinary field, primarily referring to the automation of extracting, analyzing, and understanding useful information from digital images and videos. In contrast, machine vision focuses more on applications in the industrial field, including optical detection of image information of target objects, perception of position, size, shape, and color information.

2. Basic Tasks of Computer Vision

After understanding the concept of computer vision, are you curious about its amazing capabilities? In simple terms, computer vision can perform three basic tasks: classification, detection, and segmentation.

Classification involves determining the category of a given image, including image classification and facial recognition. For example, a mobile photo album can automatically categorize images into categories such as architecture, flowers, performances, cars, etc.

Detection requires the computer to find the location of all target objects in the image and identify the category of each target, which includes both “localization” and “classification.” For instance, a vending machine uses target detection methods to determine the actions of customers taking and returning products, as well as the types of products.

Segmentation includes semantic segmentation and instance segmentation. Semantic segmentation requires cutting the boundary between the target object and the background environment, while instance segmentation further segments the boundaries between similar target objects based on semantic segmentation. As shown in the figure: first, use semantic segmentation to separate people from the background, and then use instance segmentation to label individuals, thereby determining the boundaries between people.

3. Principles of Computer Vision

Before understanding the working principles of computer vision, we first explore the composition of human vision and its underlying principles.

The human visual system consists of the eyes and the visual nerve system. Humans selectively identify environmental information through their eyes, transmitting visual signals to the brain via neurons to complete the recognition, classification, and semantic analysis of targets.

Computer vision is an extension of human vision, and its principles are similar: after the target to be detected is converted into image signals, the image processing system begins to work, using multi-layer neural networks to gradually extract features from low-level to high-level, completing the recognition, classification, and semantic analysis of the overall target.

4. Advantages and Disadvantages of Computer Vision

Through studying the principles of computer vision, we can find that in certain situations, the recognition accuracy of computer vision is higher than that of human vision. For example, when checking machine configuration issues, as long as the parameter settings are consistent, a computer detector can inspect multiple machines and maintain the same accuracy, while humans cannot guarantee accuracy due to eye fatigue and subjectivity.

However, there are two challenges for computer vision. The first is that features are difficult to extract. For example, in recognizing an image of a cat, computer vision must rely on a large number of images to identify features like fur color, eye color, and ear shape, and then make a judgment. However, the feature differences of the same cat under different angles, lighting, and actions can be significant, which poses a considerable challenge to the accuracy of computer vision recognition.

The second challenge lies in the massive amount of data processed by computers. For instance, a 1000*2000 pixel color photo consists of three color parameters for each pixel (RGB), resulting in over 6,000,000 parameters that need to be processed. The storage space required for increasingly popular 4K videos is even larger.

5. Applications of Computer Vision

Computer vision is widely used in scenarios such as facial detection, facial comparison, and emotion recognition.

In facial detection, computers can accurately locate key points of the facial features to enable facial unlocking and facial payment; in facial comparison, by calculating the similarity between two faces, it can help find missing children and broadly identify suspects. In emotion recognition, by recognizing facial expressions, computers can determine human emotions such as joy and sadness, and assess whether a driver is fatigued based on the frequency of blinking.

Thank you for reading. I hope this popular science content can help you learn about artificial intelligence-related knowledge. See you next time!

Tip: Teachers registered on the Guangzhou Artificial Intelligence Teaching Platform are welcome to scan the code to log in to the Teacher Growth Space to watch explanatory videos and learn more about computer vision.

Textbook Express

References

[1] Guan Lu, Zhou Bao Hua. Application of Computer Vision Technology in News Communication Research. Contemporary Communication, 2022(3): 20–26.

[2] Davies E R. Machine Vision: Theory, Algorithms, Practicalities. Elsevier, 2004.

[3] Hu Shejiao, Chen Zonghai. Overview of Virtual Reality Technology. In Proceedings of the 2000 System Simulation Technology and Application Academic Exchange Conference, 2000: 9-20.

[4] Yang Lanping. Research on No-Reference Quality Evaluation Based on Real Distorted Images. Beijing University of Posts and Telecommunications, 2017.

[5] Du Xue, Wang Haoran, Wang Xinyue. Development of Deep Convolutional Neural Networks and Their Applications in the Field of Computer Vision. Electronic Components and Information Technology, 2021, 5(10): 244-245.

[6] Understanding Computer Vision: Basic Principles, Two Major Challenges, Eight Major Tasks, and Four Applications. Available at: https://easyai.tech/ai-definition/computer-vision/.

Contributed by: Resource Service Department

Edited by: Library Promotion Team

Reviewed by: Team Leader of Library Promotion Team

Leave a Comment Cancel reply