How AI Understands Images Through Technology

Welcome to the special winter vacation column “High-Tech Lessons for Kids” launched by Popular Science China!

Artificial intelligence, as one of the most cutting-edge technologies today, is changing our lives at an astonishing speed. From smart voice assistants to self-driving cars, from AI painting to machine learning, it opens up a future full of infinite possibilities. This column will explain the principles, applications, and profound impacts of artificial intelligence on society in an easy-to-understand way, using videos and texts for children.

Let’s embark on this AI journey together!

First, let’s watch the video:

Here is the text version:

In daily life, AI image recognition is everywhere.

See an unfamiliar plant? Take a photo, and you’ll find the answer in no time. Self-driving cars seem to have eyes, easily determining where the road is and where the trees are. Facial recognition technology also allows us to achieve face payment.

All of this relies on one technology—Convolutional Neural Networks. This technology acts like the eyes of AI.

To understand how AI’s eyes work, we first need to look at how animal eyes work.

From Cat Eyes to AI Eyes: Insights from Visual Neurons

In the 1950s and 1960s, David Hubel and Torsten Wiesel studied the vision of cats and found that when an image entered a cat’s field of vision, the neurons in the cat’s brain responsible for vision were activated by different stimuli.To make it easier to understand, let’s look at an example. For instance, in a certain image, some neurons are particularly interested in the edge lines of objects in the image and focus on processing this information, while other neurons are more sensitive to large areas of color and are better at processing this information. These nerve cells work together to help organisms recognize various complex images.

Edgar Degas, “At the Races in the Countryside” (1869)

This research earned David and Torsten the Nobel Prize in Physiology or Medicine in 1981 and inspired a very important algorithm in the field of artificial intelligence,Convolutional Neural Networks. In the 1980s, Japanese scientist Kunihiko Fukushima designed a model called Neocognitron to recognize handwritten Japanese characters, which had different “layers” to extract different information and ultimately synthesize this information to judge the recognized characters.This inspired a French scientist named Yann LeCun, who designed the earliest convolutional neural network and established the LeNet model based on it. This model was used by many banks at the time to recognize handwritten characters. Let’s look at a simple example to see how convolutional neural networks work.

Convolutional Neural Networks: The Unsung Heroes of Image Recognition

Compared to traditional neural networks, convolutional neural networks have two additional processes when recognizing images: convolution and pooling.Convolution is performed by something called a convolution kernel.An image, in the eyes of a computer, is actually a matrix composed of pixels. The convolution kernel does not consider the information of each pixel point individually but processes the information of a certain area, such as 3×3 or 5×5 pixel points, simultaneously. This allows for a comprehensive consideration of the information from adjacent pixel points, making it better at extracting higher-level features.You can imagine the convolution kernel as an observer using a telescope with a specific field of view to look at an image, processing and recording the information they see.Moreover, we can set different observers with different focuses to extract different dimensional information from the image. For example, some observers focus on extracting color information, some focus on the contours of objects, and some specialize in extracting specific shapes. Finally, this information is synthesized to help the neural network make better judgments.In addition, convolutional neural networks have an important step—pooling (also known as subsampling).Images are often very large matrices, and pooling can compress the information from a specific area into one piece of information. For instance, from a 16×16 matrix, we can use pooling to extract the deepest color information from a 2×2 grid, reducing it to an 8×8 matrix. If we perform another similar pooling operation, we can reduce the 8×8 matrix to a 4×4 matrix. Although there will be some changes after image pooling, the basic features of the entire image are still preserved.

Convolution and pooling allow convolutional neural networks to extract image information very effectively, greatly improving the efficiency of learning and processing images.Of course, convolutional neural networks also use the same backpropagation algorithm as traditional neural networks, constantly adjusting the parameters in the neural network in reverse based on known results to make increasingly accurate judgments.So, how does AI change the ecology of some industries? In the upcoming episodes, we will explore this together.

Planning and Production

This article is a work of the Popular Science China – Creation Cultivation Program

Produced by: Science Popularization Department of the China Association for Science and Technology

Supervised by: China Science and Technology Publishing House Co., Ltd., Beijing Zhongke Xinghe Cultural Media Co., Ltd.

Author: Beijing Yunyuj Cultural Communication Co., Ltd.

Reviewed by: Qin Zengchang, Associate Professor, School of Automation Science and Electrical Engineering, Beihang University

Planning: Fu Sijia

Editor: Fu Sijia

Related Articles

1. New Research! Exercise Can Alleviate Anxiety and Depression! The Best Effects Are from These 3 Types

2. The Warmest Sleeping Tips for Winter! Super Simple, but Many People Don’t Know

3. What Are the Differences Over Time Between Frequent Ventilation and Not Opening Windows?

4. Cold-sensitive People vs. Cold-resistant People, Which Is Healthier? The Truth Is…

5. Can You Find Cultural Artifacts in Green Belts?

The cover image and images in this article are from copyright image libraries.

Reproducing may lead to copyright disputes. For original text and images, please reply “Reprint” in the background.

Light Up “Seen”

Let’s Gain Knowledge Together!

Leave a Comment Cancel reply