Professor Zhang Changshui from Tsinghua University: Image Recognition and Machine Learning

Click the above “China Education Network” to subscribe immediately

Professor Zhang Changshui from Tsinghua University: Image Recognition and Machine Learning

On June 6, 2016, the “Tsinghua Artificial Intelligence Forum” co-hosted by Tsinghua Straits Research Institute, Beijing Tsinghua Industrial Development Research Institute, and Data Science Research Institute was successfully held at Tsinghua University. The forum invited authoritative experts in the field of artificial intelligence from Tsinghua and alumni representatives from the industry to gather at Tsinghua, where academic masters and industry guests exchanged ideas and jointly diagnosed the development of artificial intelligence. On June 8, Professor Zhang Changshui from the Department of Automation at Tsinghua University also delivered an impressive speech at Tsinghua University. The content of the speech is summarized as follows:

Tsinghua University Department of Automation

Professor Zhang Changshui: Image Recognition and Machine Learning

Professor Zhang Changshui from Tsinghua University: Image Recognition and Machine Learning

Image recognition is a very core topic in the field of artificial intelligence, and from a research perspective, machine learning is also a research direction under artificial intelligence.

The Concept and Application of Image Recognition

Image recognition refers to the technology that uses computers to process, analyze, and understand images in order to identify various patterns and objects. This research can be applied in multiple fields, such as autonomous vehicles. If a car has an auxiliary system with a camera, it can recognize all situations in that scene, including lane lines, traffic signs, obstacles, etc., making it easier and more convenient for people to drive.

In addition, some cameras capture the location of a person’s face when the user half-presses the shutter, allowing the camera to focus and make the image clearer.

Challenges in Image Recognition

One challenge is the variability in viewpoints. When we take a picture of the same object, the appearance of the image differs due to different viewpoints. Therefore, viewing the same object from different angles results in very different appearances.

Another challenge is the scale issue. Objects appear larger when close and smaller when far away, which poses certain difficulties for image recognition.

The variation of light and shadow has always been a major concern in computer vision; the same person can look completely different under different lighting conditions.

Finding a person with a cane or someone wearing a hat against a complex background is also very challenging.

Moreover, occlusion is a significant difficulty in computer vision. For instance, in a crowded image, determining the gender of a person who is occluded among the crowd is still a challenge for computers.

Deformation is another issue, as non-rigid bodies can deform during motion.

The Development History of Image Recognition

The research on object recognition in the visual field has a history of several decades, but until a few years ago, there were not many influential image recognition products, such as OCR, fingerprint recognition, and face detection.

Image recognition initially started with single object recognition. Our objective world is complex and diverse; how to recognize it? The scientific research method generally starts with simple problems. For example, starting with the recognition of building blocks, as they have a limited number of standardized shapes.

From the late 1980s to the 1990s, machine learning developed rapidly, yielding remarkable research results, including support vector machine methods, AdaBoosting methods, and computational learning theory. These advances significantly propelled the development of machine learning and recognition.

After 2002, Chinese female scientist Li Fei-Fei began to approach image recognition with a new perspective. They aimed to design a unified framework for image recognition capable of identifying thousands of objects. Additionally, the team hoped to apply outstanding results from the field of machine learning to image recognition, borrowing the “bag of words” method from text analysis for image recognition. In natural language processing, one task is text classification, which employs the “bag of words” method. For example, to recognize a face, one only needs to check if the image contains a nose, eyes, mouth, and chin; if all these elements are present, it indicates a face. You might think this is simple.

How to apply the “bag of words” method to image recognition? When recognizing an image, high-frequency “words” within the image can be used to identify it, where “words” refer to small image patches. In fact, the “words” in the image are not so intuitive; they are small image patches, very low-level image blocks, typically sized 3×3, 5×5, or 7×7.

In 2006, Hinton published an article in Science introducing deep learning methods, and someone suggested that Hinton test their method on object recognition problems. As a result, they achieved first place in a competition in 2012 with an 85% recognition rate.

Challenges and Future Research Directions

Although image recognition has made great strides, it still faces certain challenges. For instance, when performing image recognition, people usually need to annotate data and then train the machine with these images. Annotating data can be a tedious task that requires significant time and financial resources. Each object in the database must be enclosed in a box and assigned a category label.

Currently, many problems remain unresolved. Existing technologies only analyze images to identify which parts are birds and which are trees, but they do not provide a deeper understanding of the image. The algorithm does not comprehend the relationships between these objects, which is crucial for human understanding of an image.

Despite discussions on image recognition, many underlying methods are closely tied to machine learning, requiring collaboration between computer vision scientists and professionals to transform results into products that make life smarter.

(This article is based on Professor Zhang Changshui’s speech at the “Tsinghua Artificial Intelligence” forum, and has not been confirmed by the author.)

— — END — —

Professor Zhang Changshui from Tsinghua University: Image Recognition and Machine Learning

Leave a Comment