
Definition: A technology that simulates biological vision using computers and related devices. It processes captured images or videos to achieve a multidimensional understanding of the corresponding scenes.
Discipline: Computer Science Technology_Artificial Intelligence_Pattern Recognition
Related Terms: Pattern Recognition, Artificial Intelligence, Image Processing
Computer vision, also known as machine vision, is a discipline that teaches computers how to “see” the world. Figuratively speaking, it is like giving computers eyes (cameras) and a brain (algorithms) to perceive their environment.
Specifically, computer vision simulates biological vision using computers and related devices, replacing visual organs with various imaging devices as input means, and using computers to replace the brain for processing and interpretation. The ultimate research goal of computer vision is to enable computers to observe and understand the world like humans through vision, and to possess the ability to autonomously adapt to their environment.
It is important to note that in a computer vision system, the computer acts as a substitute for the human brain, but this does not mean that computers must process visual information in the same way as human vision. Computer vision can process visual information based on the characteristics of the computer system. However, the human visual system is, to date, the most powerful and complete visual system known, and research on the human visual processing mechanism will provide inspiration and guidance for the study of computer vision.
Computer vision itself includes many different application fields. Below are several basic and popular applications.
Image Classification: Predicts the category of a new set of test images based on a given set of labeled images and can provide the confidence level of the predictions.
Object Recognition and Detection: Given an input image, the algorithm can automatically identify common objects in the image and output their categories and locations, in brief, it frames the objects in the original image, such as in face recognition.
Visual Question Answering: The research aim of visual question answering is for users to ask questions based on the input image, and the algorithm automatically answers based on the content of the questions.
3D Reconstruction: Obtains two-dimensional data images of scene objects through cameras, analyzes and processes these images, and then deduces the three-dimensional information of objects in the real environment based on computer vision knowledge.
Currently, computer vision is inseparable from our daily lives. At airports, we can see various face recognition systems; in hospitals, cameras can accurately identify the patient’s diseased areas; in schools, we can also use computer vision to recognize teachers’ handwritten notes… However, there is still a significant gap between computer vision and human understanding of image information. Humans can generate many contextually relevant guesses from an image, but computers cannot yet do this, so there is still a lot of room for development in computer vision.
(Further reading author: Professor Yang Xin, School of Computer Science, Dalian University of Technology)
Source: National Committee for the Standardization of Scientific and Technical Terms