Click on the above“Beginner Learning Vision” to select and addStar or “Top”
Heavyweight content delivered first time.
According to Amazon’s official statement, Amazon Go is a result of technological innovation, applying various technologies such as computer vision, deep learning algorithms, wireless radio frequency identification, image analysis, and sensor fusion in a cashier-less convenience store, with principles similar to those of self-driving cars.
Now, let’s introduce the relationship between computer vision, image processing, pattern recognition, and machine learning.
To achieve computer vision, it must be aided by image processing, which relies on the effective use of pattern recognition, and pattern recognition is an important branch of artificial intelligence closely linked with machine learning. Looking at all relationships, we find that the application of computer vision serves machine learning, with each link being indispensable and mutually reinforcing.
Computer Vision
Computer Vision (computer vision): The ability to simulate human visual mechanisms to acquire and process information using computers. It refers to using cameras and computers to replace human eyes in recognizing, tracking, and measuring targets, and further performing graphic processing, transforming images into formats more suitable for human observation or for instruments to detect.
Computer vision studies related theories and technologies, attempting to establish artificial intelligence systems capable of extracting ‘information’ from images or multi-dimensional data. The challenge of computer vision is to develop visual capabilities for computers and robots that are comparable to human levels.
Machine vision requires image signals, texture and color modeling, geometric processing and reasoning, as well as object modeling. A capable visual system should tightly integrate all these processes.
Image Processing
Image Processing (image processing): A technology that analyzes images using computers to achieve desired results. Also known as image processing.
Image processing generally refers to digital image processing. A digital image is a large two-dimensional array obtained through sampling and digitization using devices such as digital cameras and scanners, with the elements of this array called pixels, whose values are integers referred to as grayscale values.
The main contents of image processing technology include three parts: image compression, enhancement and restoration, matching, description, and recognition. Common processes include image digitization, image encoding, image enhancement, image restoration, image segmentation, and image analysis.
Pattern Recognition
Pattern Recognition (Pattern Recognition) refers to the process of processing and analyzing various forms of information (numerical, textual, and logical relationships) representing things or phenomena to describe, identify, classify, and explain them, an important component of information science and artificial intelligence.
Pattern recognition is often referred to as pattern classification. From the nature of processing problems and methods of solving problems, pattern recognition can be divided into supervised classification and unsupervised classification. Patterns can also be divided into abstract and concrete forms. The former, such as consciousness, thought, discussions, etc., falls under the category of conceptual recognition research, another branch of artificial intelligence. The pattern recognition we refer to mainly involves recognizing and classifying specific patterns of objects such as speech waveforms, seismic waves, electrocardiograms, electroencephalograms, images, photographs, texts, symbols, and biosensors.Pattern recognition research primarily focuses on two aspects: one is studying how biological entities (including humans) perceive objects, which belongs to the category of cognitive science; the other is how to implement the theory and methods of pattern recognition using computers under given tasks.
Applying computers to identify and classify a set of events or processes, the identified events or processes can be specific objects such as text, sound, images, etc., or abstract objects such as state and degree. These objects are distinguished from numerical information, referred to as pattern information.
Pattern recognition is related to statistics, psychology, linguistics, computer science, biology, control theory, etc. It intersects with research in artificial intelligence and image processing.
Machine Learning
Machine Learning (Machine Learning) is the study of how computers can simulate or realize human learning behavior to acquire new knowledge or skills, reorganizing existing knowledge structures to continuously improve their performance. It is the core of artificial intelligence and the fundamental way to endow computers with intelligence, with applications spanning all fields of artificial intelligence, primarily using induction and synthesis rather than deduction.
Machine learning holds a very important position in artificial intelligence research. An intelligent system without learning capabilities is difficult to be considered a true intelligent system; however, past intelligent systems generally lacked learning capabilities. As artificial intelligence continues to develop, these limitations become increasingly prominent. It is in this context that machine learning has gradually become one of the cores of artificial intelligence research. Its applications have spread across all branches of artificial intelligence, such as expert systems, automated reasoning, natural language understanding, pattern recognition, computer vision, intelligent robotics, and more.
The research in machine learning is based on understanding human learning mechanisms from physiology, cognitive science, etc., to establish computational models or recognition models of human learning processes, develop various learning theories and methods, study general learning algorithms, and conduct theoretical analyses, establishing task-oriented learning systems with specific applications. These research goals mutually influence and promote each other.
The purpose of human research on computers is to improve social productivity levels, enhance quality of life, and rescue people from monotonous, complex, or even dangerous work. Today’s computers have far exceeded humans in computational speed; however, in many aspects, especially those related to human intelligent activities such as visual functions, auditory functions, olfactory functions, and natural language understanding, they are still inferior to humans.
This situation fails to meet the requirements of some advanced applications. For example, we hope computers can detect suspicious situations on the road early and alert drivers to avoid accidents. We also hope computers can assist us in autonomous driving; current technology is still insufficient to meet the demands of such advanced applications, requiring more research results in artificial intelligence and practical experience in system implementation.
Artificial Intelligence
Artificial intelligence refers to technologies designed by humans to simulate or reproduce certain intelligent behaviors in computers. It is generally believed that human intelligent activities can be divided into two categories: perceptual behaviors and cognitive activities. Examples of artificial intelligence research simulating perceptual behaviors include speech recognition, speaker identification, and other aspects related to human auditory functions known as “computer hearing”; knowledge of shape, distance, and speed perception related to human vision known as “computer vision”; etc. Examples of artificial intelligence research simulating cognitive activities include symbolic reasoning, fuzzy reasoning, theorem proving, and other aspects related to human thinking known as “computer thinking,” etc.
One of the research objects of computer vision developed from image processing and pattern recognition is how to use two-dimensional projection images to recover the three-dimensional world. The theoretical methods used in computer vision are primarily based on geometry, probability, kinematics calculations, and three-dimensional reconstruction visual computing theory, with foundations including projective geometry, rigid body motion mechanics, probability theory, random processes, image processing, artificial intelligence, and other theories.The fundamental goals of computer vision include the following:(1) Calculate the distance from the observation point to the target object based on one or more two-dimensional projection images;(2) Calculate the motion parameters of the target object based on one or more two-dimensional projection images;(3) Calculate the surface physical properties of the target object based on one or more two-dimensional projection images;(4) Recover projection images of larger spatial areas based on multiple two-dimensional projection images.
The ultimate goal of computer vision is to achieve the understanding of the three-dimensional world using computers, that is, to realize certain functions of the human visual system.
In the field of computer vision, medical image analysis and optical character recognition require a certain height of pattern recognition. For instance, the preprocessing and feature extraction stages in pattern recognition apply image processing techniques; image analysis in image processing also applies pattern recognition techniques. In most practical applications of computer vision, computers are preset to solve specific tasks; however, methods based on machine learning are becoming increasingly popular. Once research in machine learning further develops, the future of “general-purpose” computer vision applications may become a reality.
A major question studied in artificial intelligence is: how to enable systems to possess “planning” and “decision-making capabilities”? This allows them to perform specific technical actions (for example: moving a robot through a specific environment). This question is closely related to computer vision issues. Here, the computer vision system serves as a perceiver, providing information for decision-making. Other research directions include pattern recognition and machine learning (which also falls under the field of artificial intelligence but has important connections with computer vision), thus computer vision is often seen as a branch of artificial intelligence and computer science.
Machine learning studies how computers can simulate or realize human learning behavior to acquire new knowledge or skills, reorganizing existing knowledge structures to continuously improve their performance. It is the core of artificial intelligence and the fundamental way to endow computers with intelligence, with applications spanning all fields of artificial intelligence, primarily using induction and synthesis rather than deduction. To achieve the goals of computer vision, two technical approaches can be considered. The first is the bionic method, starting from analyzing the process of human vision, using the best reference provided by nature—the human visual system—to establish a computational model of the visual process, and then implementing it with computer systems. The second is the engineering method, which breaks free from the constraints of the human visual system, utilizing all feasible and practical technical means to achieve visual functions. The general practice of this method is to treat the human visual system as a black box, focusing only on what output the visual system will provide for a certain input during implementation.
Both methods are theoretically usable, but the challenges faced are that the output corresponding to a certain input of the human visual system cannot be directly measured. Moreover, since human intelligent activities are the result of the combined effects of a multifunctional system, even if an input-output pair is obtained, it is difficult to confirm that it is solely the response generated by the current input visual stimulus rather than a result of the combined historical state.
It is not difficult to understand that the research in computer vision has dual significance.
First, it is to meet the needs of artificial intelligence applications, that is, the need to implement an artificial visual system with computers. These results can be installed on computers and various machines, enabling computers and robots to have the ability to “see”.
Second, the research outcomes of visual computational models also provide significant reference for further understanding and studying the mechanisms of the human visual system itself, and even the mechanisms of the human brain.
Disclaimer: Some content is sourced from the internet for the purpose of learning and communication. The copyright of the article belongs to the original author. If there are any issues, please contact for removal.
Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
Reply "Extension Module Chinese Tutorial" in the "Beginner Learning Vision" WeChat public account backend to download the first OpenCV extension module tutorial in Chinese, covering installation of extension modules, SFM algorithms, stereo vision, target tracking, biological vision, super-resolution processing, and more than twenty chapters of content.
Download 2: Python Visual Practical Projects 52 Lectures
Reply "Python Visual Practical Projects" in the "Beginner Learning Vision" WeChat public account backend to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help quickly learn computer vision.
Download 3: OpenCV Practical Projects 20 Lectures
Reply "OpenCV Practical Projects 20 Lectures" in the "Beginner Learning Vision" WeChat public account backend to download 20 practical projects based on OpenCV for advanced learning in OpenCV.
Discussion Group
Welcome to join the WeChat reader group to exchange with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually subdivide in the future). Please scan the WeChat ID below to join the group, with a note: "nickname + school/company + research direction", for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM". Please follow the format for notes; otherwise, you will not be accepted. After successful addition, you will be invited to relevant WeChat groups based on research direction. Please do not send advertisements in the group; otherwise, you will be removed from the group. Thank you for your understanding~