Vision

(Physiological Term)

Vision is a physiological term. Light acts on the visual organs, exciting sensory cells, and after processing the information through the visual nervous system, vision (vision) is produced. Through vision, humans and animals perceive the size, brightness, color, motion, and stillness of external objects, obtaining various information crucial for survival; at least over 80% of external information is obtained through vision. Vision is the most important sense for humans and animals, referring to the sensation produced by the image stimuli of objects on the retina.

Vision is the subjective sensation obtained by the peripheral sensory organs of the visual system (the eyes) receiving electromagnetic wave stimuli within a certain wavelength range from the external environment, which are encoded, processed, and analyzed by the central nervous system.

The human eye can be divided into two parts: the retina, which contains photoreceptor cells (rods and cones), and the refractive system (cornea, aqueous humor, lens, and vitreous body). The suitable stimuli are electromagnetic waves with wavelengths of 370-740 nanometers, which is the visible light part, approximately 150 colors. The light in this part forms an image on the retina through the refractive system, and the signal is transmitted to the brain’s visual center through the optic nerve, allowing the distinction of the color and brightness of the observed objects. Thus, one can clearly see the contours, shapes, sizes, colors, distances, and surface details of luminous or reflective objects within the visual range.It is worth noting that relatedvisual deception experiments suggest that what people see is related to what they want to see.The process of vision formation: Light→Cornea→Pupil→Lens(refracting light)→Vitreous Body(supporting and fixing the eyeball)→Retina (forming the image)→Optic Nerve(conducting visual information)→BrainVisual Center(forming vision).

Computer Vision

Computer vision is a field that studies how to make machines “see.” More specifically, it refers to using cameras and computers to replace the human eye in recognizing, tracking, and measuring targets, as well as further processing images into formats more suitable for human observation or transmission to detection instruments. As a scientific discipline, computer vision researches related theories and technologies, attempting to establish artificial intelligence systems capable of extracting ‘information’ from images or multidimensional data. Here, ‘information’ refers to what Shannon defined as information that can assist in making a “decision.” Since perception can be viewed as extracting information from sensory signals, computer vision can also be seen as the study of how to enable artificial systems to “perceive” from images or multidimensional data.

Computer vision simulates biological vision using computers and related devices. Its main task is to process captured images or videos to obtain corresponding three-dimensional information, just as humans and many other organisms do every day.

Computer vision is a discipline about how to use cameras and computers to acquire the data and information we need from the objects being photographed. Metaphorically speaking, it is like equipping computers with eyes (cameras) and brains (algorithms) to enable them to perceive the environment. The Chinese idiom “Seeing is Believing” and the Western saying “One picture is worth ten thousand words” express the importance of vision to humans. Computer vision is a challenging and important research field in both engineering and science. It is an interdisciplinary subject that has attracted researchers from various disciplines, includingcomputer science, signal processing, physics, applied mathematics and statistics, neurophysiology, and cognitive science.

Principles of Computer Vision

Computer vision uses various imaging systems to replace visual organs as input-sensitive means, with computers replacing the brain to complete processing and interpretation. The ultimate research goal of computer vision is to enable computers to observe and understand the world through vision like humans, possessing the ability to autonomously adapt to the environment. This is a goal that requires long-term effort to achieve. Therefore, before reaching the ultimate goal, the mid-term goal that researchers strive for is to establish a visual system that can complete certain tasks based on a degree of visual sensitivity and feedback intelligence. For example, one important application area of computer vision is autonomous vehicle visual navigation, which has yet to achieve the capability to recognize and understand any environment like humans and complete autonomous navigation. Therefore, the research goal is to achieve visual-assisted driving systems that can track roads on highways and avoid collisions with vehicles ahead. It should be noted that in computer vision systems, computers serve to replace the human brain, but this does not mean that computers must process visual information in the same way as human vision. Computer vision can and should process visual information according to the characteristics of computer systems. However, the human visual system is, so far, the most powerful and complete visual system known. As will be seen in the following chapters, research on human visual processing mechanisms will inspire and guide computer vision research. Therefore, studying human vision mechanisms using computer information processing methods and establishing computational theories of human vision is also a very important and interesting research area. This area of research is called computational vision. Computational vision can be considered a research field within computer vision.

Many disciplines have research goals similar to or related to computer vision. These disciplines include image processing, pattern recognition or image recognition, scene analysis, image understanding, etc. Computer vision includes image processing and pattern recognition; in addition, it also includes spatial shape description, geometric modeling, and recognition processes. Achieving image understanding is the ultimate goal of computer vision.

Current Research Status

The outstanding characteristics of the computer vision field are its diversity and imperfection. The pioneers of this field can be traced back to earlier times, but it was not until the late 1970s, when computer performance improved enough to handle large-scale data such as images, that computer vision received formal attention and development. However, these developments often originated from needs in different fields, so the definition of what constitutes a “computer vision problem” has never been formally established, and naturally, there has been no established formula for how to solve “computer vision problems.”

Nevertheless, people have begun to master some methods for solving specific computer vision tasks, but unfortunately, these methods usually only apply to a narrow set of targets (such as faces, fingerprints, text, etc.), making them difficult to apply broadly in different situations.

The application of these methods is often part of a large-scale system for solving complex problems (e.g., medical image processing, quality control and measurement in industrial manufacturing). In most practical applications of computer vision, computers are preset to solve specific tasks, but methods based on machine learning are becoming increasingly popular. Once research in machine learning further develops, the future of “general-purpose” computer vision applications may become a reality.

One of the main problems studied in artificial intelligence is: how to enable systems to possess “planning” and “decision-making capabilities”? Thus, enabling them to complete specific technical actions (e.g., moving a robot through a specific environment). This problem is closely related to computer vision issues. Here, the computer vision system serves as a sensor, providing information for decision-making. Other research directions include pattern recognition and machine learning (which also belongs to the field of artificial intelligence but has important connections with computer vision), and thus, computer vision is often viewed as a branch of artificial intelligence and computer science.

Physics is another field with important connections to computer vision.

Computer vision focuses on fully understanding electromagnetic waves—mainly visible light and infrared light—reflected by object surfaces, and this process is based on optical physics and solid-state physics. Some cutting-edge image perception systems even apply quantum mechanics theories to analyze the images representing the real world. At the same time, many measurement problems in physics can also be solved through computer vision, such as fluid motion. Thus, computer vision can also be considered an extension of physics.

Another significant area is neurobiology, particularly the biological visual systems.

Throughout the 20th century, extensive research has been conducted on the eyes, neurons, and brain tissues related to visual stimuli of various animals, leading to some descriptions of how “natural” visual systems operate (though still somewhat rough). This has also formed a subfield within computer vision—attempting to establish artificial systems that simulate biological visual operations at different levels of complexity. At the same time, some methods based on machine learning in the computer vision field also reference certain biological mechanisms.

Another related field to computer vision is signal processing. Many methods for processing univariate signals, especially for time-varying signals, can be naturally extended to methods for processing bivariate or multivariate signals in computer vision. However, due to the unique properties of image data, many methods developed in computer vision do not have corresponding versions in univariate signal processing. A major feature of these methods is their nonlinearity and the multidimensionality of image information. These two points, as part of computer vision, form a special research direction in signal processing.

In addition to the fields mentioned above, many research topics can also be regarded as purely mathematical problems. For example, many problems in computer vision are theoretically based on statistics, optimization theory, and geometry.

How to implement existing methods through various hardware and software, or how to modify these methods to achieve reasonable execution speed without sacrificing sufficient accuracy, is the main issue in the field of computer vision today.

Machine Vision

Machine vision is a rapidly developing branch of artificial intelligence. In simple terms, machine vision is the use of machines to replace human eyes for measurement and judgment.Machine vision systems convert the captured targets into image signals through machine vision products (which include two types: CMOS and CCD), which are then sent to dedicated image processing systems to obtain the morphological information of the captured targets. Based on pixel distribution and brightness, color, etc., this information is transformed into digital signals; the image system performs various calculations on these signals to extract the features of the targets, and then controls the actions of the on-site equipment based on the results of the discrimination.

Machine Vision (Machine vision)

The characteristic of machine vision systems is to enhance production flexibility and automation levels. In some dangerous work environments unsuitable for manual operations or where human vision cannot meet requirements, machine vision is often used to replace human vision. At the same time, in large-scale industrial production processes, using manual visual inspection for product quality is inefficient and lacks precision; machine vision detection methods can significantly improve production efficiency and the level of automation. Moreover, machine vision is easy to integrate information, serving as a foundational technology for achieving computer-integrated manufacturing.

Machine vision is a comprehensive technology that includes image processing, mechanical engineering technology, control, optical source illumination, optical imaging, sensors, analog and digital video technology, and computer hardware and software technology (image enhancement and analysis algorithms, image cards, I/O cards, etc.). A typical machine vision application system includes image capture, lighting systems, image digitization modules, digital image processing modules, intelligent judgment decision modules, and mechanical control execution modules.

The most fundamental characteristic of machine vision systems is to enhance production flexibility and automation levels. In some dangerous work environments unsuitable for manual operations or where human vision cannot meet requirements, machine vision is often used to replace human vision. Meanwhile, in large-scale repetitive industrial production processes, machine vision detection methods can significantly improve production efficiency and automation levels.

A typical industrial machine vision system includes:Lighting, lenses(fixed-focus lenses, zoom lenses, telecentric lenses, microscope lenses), cameras(including CCD cameras and COMS cameras), image processing units (or image capture cards), image processing software, monitors, communication/input-output units, etc.

The applications of machine vision mainly include detection and robot vision:

⒈ Detection: This can be further divided into high-precision quantitative detection (such as cell classification in microscopic photos, measurement of mechanical parts’ dimensions and positions) and qualitative or semi-quantitative detection without measuring instruments (such as appearance inspection of products, identification and positioning of components on assembly lines, defect detection, and assembly completeness inspection).

⒉ Robot vision: Used to guide robots in operations and actions across a wide range, such as picking parts from a chaotic pile of workpieces discharged from a hopper and placing them on a conveyor belt or other equipment (i.e., hopper picking problem). For operations and actions within a small range, tactile sensing technology is also needed.

Due to the ability of machine vision systems to quickly acquire large amounts of information, easily automate processing, and readily integrate with design information and processing control information, machine vision systems are widely used in modern automated production processes for condition monitoring, finished product inspection, and quality control. However, machine vision technology is relatively complex, and the greatest difficulty lies in the unclear mechanisms of human vision. Humans can describe the problem-solving process for a certain issue through introspection, allowing simulation by computers. However, despite every normal person being a “visual expert,” it is impossible to describe their visual process through introspection. Therefore, establishing a machine vision system is a very challenging task.

Similarities and Differences Between Computer Vision and Machine Vision

Computer vision, image processing, image analysis, robot vision, and machine vision are closely related disciplines. If you open textbooks with these names, you will find that there is a significant overlap in technology and application fields. This indicates that the foundational theories of these disciplines are roughly the same, even raising doubts about whether they are different names for the same discipline.

However, various research institutions, academic journals, conferences, and companies often categorize themselves into one of these fields, leading to various characteristics being proposed to distinguish these disciplines. Below is one method of distinction, though it cannot be said to be entirely accurate.

The main research object of computer vision is the mapping of three-dimensional scenes onto single or multiple images, such as three-dimensional scene reconstruction. The research in computer vision largely targets the content of images.

Image processing and image analysis mainly focus on two-dimensional images, achieving transformations of images, especially targeting pixel-level operations, such as enhancing image contrast, edge extraction, noise reduction, and geometric transformations like image rotation. This characteristic indicates that both image processing and image analysis are not concerned with the specific content of the images.

Machine vision primarily refers to visual research in industrial fields, such as the vision of autonomous robots, used for detection and measurement. This indicates that in this field, software and hardware, image perception, and control theories are often closely integrated with image processing to achieve efficient robot control or various real-time operations.

Pattern recognition uses various methods to extract information from signals, mainly employing statistical theories. A major direction in this field is extracting information from image data.

Another area is called imaging technology. The initial research content of this field mainly focused on image creation, but sometimes also involves image analysis and processing. For example, medical imaging includes a large amount of image analysis in the medical field.

For all these fields, a possible process is that you work in a computer vision laboratory, engaging in image processing, ultimately solving a problem in the machine vision field, and then publishing your results at a pattern recognition conference.

Main Problems Existing

Almost every specific application of computer vision technology must solve a series of the same problems. These classic problems include:

Recognition

A classic problem shared by computer vision, image processing, and machine vision is determining whether a set of image data contains a specific object, image feature, or motion state. This problem can usually be solved automatically by machines, but so far, no single method has been able to widely determine various situations: to recognize any object in any environment. Existing technologies can only solve the recognition of specific targets well, such as simple geometric shape recognition, face recognition, printed or handwritten document recognition, or vehicle recognition. Moreover, these recognitions require specific environments, with specified lighting, backgrounds, and target posture requirements.

General recognition has evolved into several slightly different concepts in different contexts:

Recognition (narrow sense): Identifying one or more previously defined or learned objects or categories, usually also providing their two-dimensional positions or three-dimensional postures during the identification process.

Discrimination: Recognizing a single object itself. For example: recognizing a specific face, recognizing a specific fingerprint.

Monitoring: Discovering specific content from images. For example: discovering abnormal skills of cells or tissues in medicine, discovering passing vehicles by traffic monitoring instruments. Monitoring is often done by discovering special areas in images through simple image processing, providing a starting point for subsequent more complex operations.

Several specific application directions for recognition:

Content-based image retrieval: Searching for all images containing specified content in a vast image collection. The specified content can take various forms, such as a red roughly circular pattern or a bicycle. Here, searching for the latter content is clearly more complex than the former, as the former describes a low-level intuitive visual feature, while the latter involves an abstract concept (or high-level visual feature), namely ‘bicycle’. It is evident that the appearance of a bicycle is not fixed.

Pose estimation: Evaluating the position or orientation of an object relative to the camera. For example: evaluating the pose and position of a robotic arm.

Optical character recognition involves recognizing printed or handwritten text in images, with the usual output being transformed into an easily editable document format.

Motion

Monitoring the motion of objects based on sequential images includes various types, such as:

Self-motion: Monitoring the three-dimensional rigid motion of the camera.

Image tracking: Tracking moving objects.

Scene Reconstruction

Given two or more images of a scene or a video, scene reconstruction seeks to establish a computer model/three-dimensional model for that scene. The simplest case is generating a set of points in three-dimensional space. In more complex cases, a complete three-dimensional surface model will be established.

Image Restoration

The goal of image restoration is to remove noise from images, such as instrument noise, blurriness, etc.

[This article is dedicated to David Marr, the pioneer of computer vision, a graduate of Cambridge University]

What Is Vision? Computer Vision? Machine Vision?