Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications

Click on the above Beginner’s Guide to Vision” to select “Star” or “Pin

Important content delivered promptly.
Selected from | tryolabs Participated by | Mo Wang
This is a beginner’s guide to computer vision, introducing the concepts, principles, and use cases of computer vision.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
The fantasy of machines simulating the human visual system is outdated. Since the first academic papers appeared in the 1960s, computer vision has come a long way, with modern systems emerging that can be integrated into mobile applications.
Today, due to its wide application and enormous potential, computer vision has become one of the hottest subfields of artificial intelligence and machine learning. Its goal is to replicate the powerful capabilities of human vision.
But what exactly is computer vision? What is its current application status across different industries? What are some well-known commercial use cases? What are typical tasks in computer vision?
This article will introduce the basic concepts and real-world applications of computer vision. For anyone who has heard of computer vision but is unsure what it is and how it can be applied, this article is a convenient way to understand this complex topic.
You can read through this article or jump directly to a specific section.
Table of Contents
  • What is Computer Vision?

  • What Problems Does Computer Vision Solve?

  • Differentiating Computer Vision from Related Fields

  • Industry Applications

  • Retail

  • Manufacturing

  • Healthcare

  • Autonomous Driving

  • Insurance

  • Agriculture

  • Security

  • Typical Computer Vision Tasks

  • Image Classification

  • Localization

  • Object Detection

  • Object Recognition

  • Instance Segmentation

  • Object Tracking

  • How Computer Vision Works

  • General Strategies

  • Existing Datasets

  • Training Object Detection Models

  • Commercial Use Cases

  • Visual Search Engines

  • Facebook Face Recognition

  • Amazon Go

  • Tesla Autopilot

  • Microsoft InnerEye

  • Current Status of Computer Vision in Small Companies

  • How to Implement Computer Vision Projects

What is Computer Vision?
What Problems Does Computer Vision Solve?
Humans can understand and describe scenes in images. For example, in the image below, humans can do more than just detect that there are four people, a street, and several cars in the foreground.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
The cover of the Beatles’ album “Abbey Road”. (Image source: https://liveforlivemusic.com/news/beatles-abbey-road/)
In addition to this basic information, humans can also see that the people in the foreground are walking, one of whom is barefoot, and we even know who they are. We can rationally infer that the people in the image are not in danger of being hit by a car, and that the white Volkswagen is not parked properly. Humans can also describe the clothing of the people in the image, not just the color but also the material and texture.
This is also the skill that computer vision systems need. Simply put, the main problem that computer vision solves is:
Given a two-dimensional image, a computer vision system must identify the objects in the image and their features, such as shape, texture, color, size, spatial arrangement, etc., in order to describe the image as completely as possible.
Differentiating Computer Vision from Related Fields
The tasks accomplished by computer vision far exceed those of other fields, such as image processing and machine vision, although they share some commonalities. Next, let’s explore the differences between these fields.
  • Image Processing

Image processing aims to process raw images to apply some transformation. Its goal is usually to enhance the image or use it as input for a specific task, while the goal of computer vision is to describe and interpret the image. For example, typical image processing components such as denoising, contrast adjustment, or rotation can be performed at the pixel level without needing a comprehensive understanding of the image as a whole.
  • Machine Vision

Machine vision is a subset of computer vision used to perform certain (production line) actions. In the chemical industry, machine vision systems can inspect containers on production lines (for cleanliness, emptiness, or damage) or check whether finished products are properly packaged, thus aiding product manufacturing.
  • Computer Vision

Computer vision can solve more complex problems, such as face recognition, detailed image analysis (which can assist in visual search, such as Google Images), or biometric methods.
Industry Applications
Humans can not only understand the scenes in images but also, with some training, interpret calligraphy, impressionist paintings, abstract art, and two-dimensional ultrasound images of fetuses.
From this perspective, the field of computer vision is particularly complex, with a wealth of practical applications.
From e-commerce to traditional industries, companies of all types and sizes can now leverage the powerful capabilities of computer vision, thanks to innovations driven by artificial intelligence and machine learning (more specifically, computer vision).
Next, let’s look at the industry applications that have been most impacted by computer vision in recent years.
Retail
In recent years, the application of computer vision in retail has become one of the most significant technological trends. Below, we will introduce some common use cases. If you want a more detailed understanding of the potential applications of computer vision in retail, please refer to:https://tryolabs.com/resources/retail-innovations-machine-learning/.
Behavior Tracking
Physical retail stores utilize computer vision algorithms and cameras to understand customers and their behavior.
Computer vision algorithms can recognize faces and determine characteristics such as gender or age range. Additionally, retail stores can use computer vision technology to track customers’ movements within the store, analyze their paths, detect walking patterns, and count how many times the store’s display is noticed by passersby.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
With the addition of gaze direction detection, retail stores can answer this important question: where to place products in the store to enhance customer experience and maximize sales.
Computer vision is also a powerful tool for developing theft prevention mechanisms. Face recognition algorithms can be used to identify known shoplifters or detect when a customer places an item in their backpack.
Inventory Management
Computer vision has two main applications in inventory management.
Through security camera image analysis, computer vision algorithms can generate highly accurate estimates of the remaining products in the store. For store managers, this is invaluable information that can help them quickly detect unusual product demands and respond early.
Another common application is analyzing shelf space utilization and identifying suboptimal configurations. Besides discovering wasted space, such algorithms can also provide better product placement suggestions.
Manufacturing
The main issues on production lines are machine interruptions or defective products, which can lead to production delays and profit losses.
Computer vision algorithms have proven to be an effective way to implement predictive maintenance. The algorithms analyze visual information (from sources like cameras mounted on robots) to preemptively detect potential issues with machinery. Such systems can predict whether packaging or automotive assembly robots will interrupt, which is a significant contribution.
This can also be used to reduce defect rates, as the system can detect defects in various components across the production line. This allows manufacturers to respond in real-time and take corrective measures. Defects may not be severe, and production can continue, but the products are marked in some way or directed down a specific production path. However, sometimes it is necessary to stop the production line. For additional benefits, such systems can be trained for each use case, categorizing defects by type and severity.
Healthcare
In the healthcare sector, there is a vast number of existing applications for computer vision.
Undoubtedly, medical image analysis is the most well-known example, significantly enhancing the medical diagnostic process. Such systems analyze MRI images, CT scans, and X-rays to identify abnormalities such as tumors or search for symptoms of neurological diseases.
In many cases, image analysis techniques extract features from the images to train classifiers that can detect anomalies. However, some specific applications require more refined image processing. For example, when analyzing colonoscopy images, segmentation of the images is necessary to identify polyps and prevent colorectal cancer.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
3D rendered CT scan image of the thorax with segmentation. (Image source: https://en.wikipedia.org/wiki/Image_segmentation)
The above image shows the segmentation results needed to observe thoracic elements. The system segments and colors each significant part: pulmonary artery (blue), pulmonary vein (red), mediastinum (yellow), and diaphragm (purple).
A large number of such applications are already in use, such as estimating postpartum bleeding, quantifying coronary artery calcification, and measuring blood flow within the body without MRI.
However, medical images are not the only application of computer vision in the healthcare sector. For instance, computer vision technology provides indoor navigation assistance for visually impaired individuals. These systems can locate pedestrians and surrounding objects in floor plans to provide a real-time visual experience. Gaze tracking and eye analysis can be used to detect early cognitive impairments, such as autism in children or reading disabilities, which are highly correlated with abnormal gaze behaviors.
Autonomous Driving
Have you ever wondered how autonomous vehicles “see” the road? Computer vision plays a core role in this, helping self-driving cars perceive and understand their surroundings to operate appropriately.
One of the most exciting challenges in computer vision is image and video object detection. This includes locating and classifying various objects to differentiate whether an object is a traffic light, car, or pedestrian, as shown in the image below:
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Object detection in autonomous vehicles. (Image source: https://cdn-images-1.medium.com/max/1600/1*q1uVc-MU-tC-WwFp2yXJow.gif)
Such technologies, along with analyzing data from sensors and/or radar, enable cars to “see.”
Image object detection is a complex and powerful task, as discussed previously. See:https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.
Another article explores this topic from the perspective of human-image interaction. See:https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.
Insurance
Computer vision has a significant impact on the insurance industry, especially in claims processing.
Computer vision applications can guide customers visually through the claims documentation process. They can analyze images in real-time and send them to the appropriate insurance brokers. At the same time, they can estimate and adjust maintenance costs, determine whether they fall within insurance coverage, and even detect potential insurance fraud. All of these greatly shorten the claims process and provide a better experience for customers.
From a preventive perspective, computer vision is extremely useful in avoiding accidents. Numerous computer vision applications designed to prevent collisions are integrated into industrial machinery, vehicles, and drones. This marks a new era in risk management that could transform the entire insurance industry.
Agriculture
Computer vision has a significant impact on agriculture, particularly in precision agriculture.
In the global economic activity of food production, there are a series of valuable computer vision applications. Food production faces recurring problems that were previously monitored by humans. Now, computer vision algorithms can detect or reasonably predict pests and diseases. Such early diagnoses help farmers take appropriate measures quickly, reducing losses and ensuring production quality.
Another long-standing challenge is weeding, as weeds develop resistance to herbicides, potentially causing severe losses for farmers. Now, robots equipped with computer vision technology can monitor entire fields and precisely spray herbicides. This greatly conserves the amount of pesticides used, benefiting both the Earth’s environment and production costs.
Soil quality is also a major factor in agriculture. Some computer vision applications can identify potential defects and nutrient deficiencies in soil from photos taken with a smartphone. After analysis, these applications provide soil restoration techniques and possible solutions for detected soil issues.
Computer vision can also be used for classification. Some algorithms classify fruits, vegetables, and even flowers by recognizing key characteristics (such as size, quality, weight, color, texture, etc.). These algorithms can also detect defects and estimate which agricultural products have longer shelf lives and which should be sold in local markets. This greatly extends the shelf life of agricultural products and reduces the time required before they hit the market.
Security
Similar to retail, businesses with high security requirements (such as banks or casinos) can benefit from computer vision applications that analyze images captured by security cameras to identify customers.
On another level, computer vision is a powerful tool in homeland security tasks. It can be used to improve cargo inspections at ports or monitor sensitive locations such as embassies, power plants, hospitals, railways, and sports arenas. Here, computer vision can not only analyze and classify images but also provide detailed and meaningful descriptions of scenes, offering key factors for real-time decision-making.
Generally, computer vision is widely applied in defense tasks, such as reconnaissance of enemy terrain, automatic identification of enemy forces in images, automated vehicle and machine movements, and search and rescue operations.
Typical Computer Vision Tasks
How to replicate the human visual system? How is it achieved?
Computer vision is based on a multitude of different tasks combined to achieve highly complex applications. The most common tasks in computer vision are image and video recognition, which involves determining the different objects contained in an image.
Image Classification
The most well-known task in computer vision is likely image classification, which classifies a given image. Let’s look at a simple binary classification example: we want to classify an image based on whether it contains a tourist attraction. Suppose we have built a classifier for this task and provide an image (see below).
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
The Eiffel Tower (Image source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)
The classifier thinks the above image belongs to the category of images containing tourist attractions. However, this does not mean that the classifier recognizes the Eiffel Tower; it may have simply seen a photo of the tower before and was told that the image contains a tourist attraction.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Postcard of Paris tourist attractions. (Image source: http://toyworldgroup.com/image/cache/catalog/Ecuda%20Puzzles/Postcard%20Form%20Paris%20/14840-500×500.jpg)
A more powerful version of this classifier can handle more than two categories. For example, the classifier classifies the image into specific types of tourist attractions, such as the Eiffel Tower, Arc de Triomphe, Sacré-Cœur, etc. In such scenarios, each image input may have multiple answers, just like the postcard above.
Localization
Now, suppose we not only want to know the name of the tourist attraction in the image but are also interested in its location within the image. The goal of localization is to find the location of a single object in the image. For example, the location of the Eiffel Tower in the image below is marked.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
The Eiffel Tower marked with a red bounding box. (Image source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)
The standard way to perform localization is to define a bounding box that encloses the object in the image.
Localization is a very useful task. For example, it can perform automatic object cropping on a large number of images. By combining localization with classification tasks, one can quickly build a well-known tourist attraction (cropped) image dataset.
Object Detection
We imagine an action that includes both localization and classification, repeating that action for all objects of interest in an image; this is object detection. In this scenario, the number of objects in the image is unknown. Thus, the goal of object detection is to identify and classify the objects in the image.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Results of object detection (Image source: http://research.ibm.com/artificial-intelligence/computer-vision/images/cv-research-areas-object-detection.jpg)
In this dense image, we can see that the computer vision system has identified many different objects: cars, people, bicycles, and even signboards with text.
This task is challenging even for humans. Some objects are only partially visible because part of them is outside the image or they overlap with one another. Moreover, the size differences among similar objects can be significant.
A direct application of object detection is counting, which is widely used in real life, from counting the types of harvested fruits to counting the number of people at public gatherings or football matches, and much more.
Object Recognition
Object recognition is slightly different from object detection, although they use similar techniques. Given a specific object, the goal of object recognition is to find instances of that object in the image. This is not classification but rather determining whether that object appears in the image, and if so, performing localization. Searching for images containing a company’s logo is one example. Another example is monitoring real-time images captured by security cameras to recognize a person’s face.
Instance Segmentation
We can think of instance segmentation as the next step after object detection. It involves not only identifying objects in an image but also creating a mask for each detected object as accurately as possible.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
(Caption) Results of instance segmentation.
From the image above, you can see that the instance segmentation algorithm created masks for the four Beatles members and some cars (though the result is not complete, especially for Lennon).
The cost of performing such tasks manually is high, while instance segmentation technology simplifies their implementation. In France, the law prohibits media from exposing children’s images without explicit consent from guardians. Using instance segmentation technology, one can blur children’s faces in television or movie scenes.
Object Tracking
Object tracking aims to track moving objects over time, using consecutive video frames as input. This function is essential for robots; for example, goalie robots need to perform various tasks from tracking the ball to blocking it. Object tracking is also crucial for autonomous vehicles, enabling advanced spatial reasoning and path planning. Similarly, object tracking is useful in multi-object tracking systems, including systems designed to understand user behavior (such as computer vision systems in retail stores) and systems that monitor football or basketball players in games.
One relatively straightforward way to perform object tracking is to execute object detection on each image in the video sequence and compare each object instance to determine their movement trajectories. The drawback of this method is that performing object detection for each image usually incurs high costs. An alternative approach requires capturing the tracked object only once (usually the first time the object appears) and then identifying its movement trajectory in subsequent images without explicitly recognizing the object. Finally, the object tracking method does not necessarily detect the object; it can simply observe the movement trajectory of the target without knowing what the tracked object is.
How Computer Vision Works
As mentioned earlier, the goal of computer vision is to mimic the workings of the human visual system. How do algorithms achieve this goal? This article will introduce some of the most important concepts.
General Strategies
Deep learning methods and techniques have profoundly changed computer vision and other artificial intelligence fields. For many tasks, using deep learning methods has become standard practice. In particular, convolutional neural networks (CNNs) outperform the best results achievable with traditional computer vision techniques.
The following four steps outline the general approach to building a computer vision model using CNNs:
  1. Create a dataset of labeled images or use an existing dataset. Labels can be image categories (for classification tasks), bounding boxes and category pairs (for object detection problems), or pixel-level segmentation for each object of interest in the image (for instance segmentation problems).

  2. Extract features relevant to the task from each image, which is the focus of modeling. For example, features used to recognize faces are significantly different from those used to identify tourist attractions or human organs.

  3. Train a deep learning model based on the features. Training means inputting many images into the machine learning model and allowing the model to learn how to solve the task based on the features.

  4. Evaluate the model using images different from those used for training, thereby testing the accuracy of the trained model.

This strategy is quite basic but works well. Such methods are called supervised machine learning, requiring a dataset that includes the phenomena the model is to learn.
Existing Datasets
Building datasets is often costly, but they are crucial for developing computer vision applications. Fortunately, there are several ready-to-use datasets available. The largest and most famous is ImageNet, which contains 14 million manually labeled images. This dataset includes 1 million images with bounding box annotations.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
ImageNet images with bounding boxes (Image source: http://www.image-net.org/bbox_fig/kit_fox.JPG)
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
ImageNet images with object attribute annotations (Image source: http://www.image-net.org/attribute_fig/pullfigure.jpg)
Another well-known dataset is the Microsoft Common Objects in Context (COCO) dataset, which contains 328,000 images, 91 object categories (these categories are easily recognizable, even by a 4-year-old child), and 2.5 million annotated instances.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Examples of annotated images in the COCO dataset. (Image source: https://arxiv.org/abs/1405.0312)
Although the number of available datasets in this field is not particularly large, there are still some suitable for different tasks, such as the CelebFaces Attributes Dataset (CelebA dataset, which contains over 200,000 celebrity images), the Indoor Scene Recognition dataset (which includes 15,620 indoor scene images), and the Plant Image Analysis dataset (which includes 1 million plant images belonging to 11 different categories).
Training Object Detection Models
  • Viola–Jones Method

There are many methods to solve the object detection problem. For many years, the method proposed by Paul Viola and Michael Jones in their paper “Robust Real-time Object Detection” has become a popular approach.
Although this method can detect a wide variety of object categories, it was initially inspired by the goal of detecting faces. This method is fast, straightforward, and is the algorithm used in many smart cameras that can perform real-time face detection with minimal processing power.
The core feature of this method is: it is based on Haar features trained with many binary classifiers. Haar features represent edges and lines and are computationally simple.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Haar features (Image source: https://docs.opencv.org/3.4.3/haar_features.jpg)
Although relatively basic, these features can capture important elements in the specific case of face detection, such as the nose, mouth, or eyebrow spacing. This supervised method requires many positive and negative samples.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Detecting the face of the Mona Lisa.
This article does not discuss the algorithm’s details. However, the above image demonstrates the process of the algorithm detecting the face of the Mona Lisa.
  • CNN-based Methods

Deep learning has revolutionized machine learning, especially in computer vision. Currently, deep learning-based methods have become the cutting-edge technology for many computer vision tasks.
Among them, R-CNN is easy to understand, and its authors proposed a three-stage process:
  1. Use region proposal methods to extract possible objects.

  2. Use CNN to identify features in each region.

  3. Use support vector machines (SVM) to classify each region.

Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
R-CNN architecture (Image source: https://arxiv.org/abs/1311.2524)
The region proposal method was initially introduced in the paper “Selective Search for Object Recognition,” although the R-CNN algorithm does not care which region proposal method is used. Step 3 is critical as it reduces the number of candidate objects and lowers computational costs.
The features extracted here are not as intuitive as Haar features. In short, CNNs can be used to extract 4096-dimensional feature vectors from each region candidate. Given the nature of CNNs, the input should have the same dimensions. This is also one of the weaknesses of CNNs; many methods have addressed this issue. Returning to the R-CNN method, the trained CNN architecture requires inputs to be fixed regions of 227 × 227 pixels. Since the sizes of candidate regions vary, the R-CNN authors distorted the images to meet the dimensional requirements.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Examples of distorted images that meet CNN input dimensional requirements.
Although this method has achieved good results, there are some challenges during the training process, and this method has ultimately been surpassed by others. Some of these methods are discussed in-depth in this article: https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.
Commercial Use Cases
Computer vision applications are being deployed by more and more companies to address business problems or enhance product performance. They may have already become part of people’s daily lives without them even noticing. Here are some common use cases.
Visual Search Engines
In 2001, the emergence of Google Images meant that visual search technology could be used by the public. Visual search engines can retrieve images based on specific content criteria. A common use case is searching by keywords, but sometimes we provide a source image and ask the engine to find similar images. In some cases, more detailed search criteria can be specified, such as images of beaches taken in summer that contain at least 10 people.
There are now many visual search engines, some of which can be used directly as websites, some require API calls, and some are mobile applications.
The most famous visual search websites are undoubtedly Google Images, Bing, and Yahoo. The first two websites can use multiple keywords or a single image as search input, while searching with an image is known as “reverse image search”. Yahoo only supports keyword searches, and the search results are also quite good, as shown in the image below.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Yahoo Image Search.
There are also some visual search websites worth noting, such as TinEye, which only supports reverse image search, and Picsearch, which only supports text searches but has a vast coverage.
In mobile applications, visual search technology is gradually becoming a standard feature, and there is considerable variation between such applications.
These implementations include Google Goggles (later replaced by Google Lens), which can extract detailed information from images. For example, it can provide breed information from a photo of a cat or information about artworks in a museum.
In the e-commerce market, Pinterest developed Pinterest Lens. If you need new outfit ideas for existing clothing, you can take a photo of that clothing, and Pinterest Lens will return styling suggestions, including items you can purchase. In recent years, visual search for online shopping has become one of the fastest-growing trends.
Finally, a higher-level case of visual search is the visual question-answering system, see:https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.
Facebook Face Recognition
Although face detection technology has been widely used in cameras for autofocus purposes since the mid-2000s, there have been many more impressive achievements in the field of face recognition in recent years. The most common (and controversial) application is likely recognizing individuals in images or videos. This is often used in security systems but also appears in social media: face management systems add filters to faces for face-based searches and even prevent voters from voting multiple times during elections. Face recognition can also be applied in more complex scenarios, such as recognizing emotions in facial expressions.
One use case that has sparked both interest and concern is Facebook’s face recognition system. One of the main goals of the development team is to prevent strangers from using images that feature users’ faces (see the example in the image below) or to inform visually impaired users about the people appearing in images or videos.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
Facebook Face Recognition.(Image source: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2017/12/Facebook-Tagging-796×428.jpg)
Despite the concerning aspects, this technology is beneficial in many scenarios, such as combating online harassment.
Amazon Go
Tired of waiting in line at supermarkets and grocery stores? Amazon Go stores offer a different experience. With the help of computer vision, there are no lines and no checkout counters.
The idea is simple: customers enter the store, select the items they want, and leave without waiting in line to check out.
How is this achieved? Thanks to Amazon’s “Just Walk Out” technology. Customers must download a mobile app that helps Amazon identify their identity. When they want to enter an Amazon Go store, the app provides a QR code. At the entrance of the store, there are turnstiles for customers to enter and exit; when customers enter the store, the turnstiles read their QR codes. An interesting feature is that others can accompany the customer into the store, and those companions do not need to install the app.
Customers can move freely within the store, which is where computer vision comes into play. The store is equipped with a series of sensors, including cameras, motion sensors, and weight sensors on products. These devices collect behavioral information on each person. They detect in real-time the items customers take from the shelves. Customers can take an item, change their mind, and put it back. The system will ultimately charge the first customer who picked it up, even if it is handed to another customer who wants to buy it; the first customer who picked it up still has to pay. Thus, the system creates a virtual shopping cart containing all the items picked up and maintains it in real-time. This makes the shopping process very smooth for customers.
When customers finish shopping, they can simply walk out of the store. As they pass through the turnstiles, the system will not require customers to scan items or QR codes but will record the transaction amount and send a confirmation notification to the customer.
Amazon Go is a case of computer vision positively impacting the real world and human daily life.
Tesla Autopilot
Making cars drive themselves is not just a distant dream. Tesla’s Autopilot technology provides very convenient autonomous driving features. This is not a fully autonomous driving system but rather a driving assistant that can drive cars on specific roads. This is a key point that Tesla emphasizes: the responsibility for controlling the car remains with the driver at all times.
Autonomous driving is achieved through object detection and tracking technologies.
To make Autopilot work, Tesla cars must be “highly armed”: eight panoramic cameras provide 360-degree images within a range of 250 meters, ultrasonic sensors for object detection, and radar for processing surrounding environmental information. This way, Tesla cars can adjust their speed based on traffic conditions, brake in time when encountering obstacles, maintain or change lanes, make turns, and park smoothly.
Tesla’s Autopilot technology is another exciting example of how computer vision positively impacts human daily activities.
Microsoft InnerEye
In the healthcare sector, Microsoft’s InnerEye is a valuable tool that helps radiologists, oncologists, and surgeons process radiographic images. Its primary purpose is to accurately identify tumors from 3D images of malignant tumors.
Comprehensive Guide to Computer Vision: Concepts, Principles, and Applications
3D images of cancerous tumors.
Based on computer vision and machine learning technologies, InnerEye outputs highly detailed 3D modeling images of tumors. The above screenshot shows the complete 3D segmentation of a brain tumor created by InnerEye. From the video above, you can see experts controlling the InnerEye tool to guide it in performing tasks; InnerEye functions like an assistant.
In radiotherapy, InnerEye results make it possible to target tumors directly with radiation without harming important organs.
These results also help radiologists better understand image sequences, judging whether the disease is progressing, stabilizing, or responding well to treatment based on changes in tumor size. Thus, medical images become an important means of tracking and measuring progress.
Finally, InnerEye can be used for planning precise surgeries.
Current Status of Computer Vision in Small Companies
While the implementation of computer vision in large companies is often discussed, this does not mean that all companies must be of the scale of Google or Amazon to benefit from this machine learning technology. Companies of any size can leverage data and computer vision technology to become more efficient and make better decisions.
Let’s look at some real-world cases from small companies:
Tryolabs helped a small risk management company in San Francisco build and implement a computer vision system to scale the processing of rooftop inspection images.
Before using computer vision technology, the company’s experts manually analyzed drone-captured photos to detect damage in rooftop construction. Although the analysis was accurate, the service could not be effectively scaled due to time consumption and limited human resources.
To address this issue, we built a deep learning system capable of understanding images and automatically identifying rooftop issues (such as pooling water, loose cables, and rust). To do this, we developed a deep neural network that could detect problems based on rooftop images, analyze input images, and make detection results available to external tools through an API.
As a result, the company’s order volume and revenue increased.
How to Implement Computer Vision Projects
As with all innovations worth pursuing within an organization, you should choose a strategic approach to implement computer vision projects.
Successfully leveraging computer vision technology for innovation depends on overall business strategy, resources, and data.
The following questions can help you build a strategic roadmap for computer vision projects.
1. Should the computer vision solution reduce costs or increase revenue?
A successful computer vision project should either reduce costs or increase revenue (or both), and you should define the project’s goals. Only in this way can it make a significant impact on the organization and its development.
2. How to measure the project’s success?
Each computer vision project is different, and you need to define success metrics specific to that project. Once the metrics are set, you should ensure they are recognized by business personnel and data scientists.
3. Can information access be guaranteed?
When starting a computer vision project, data scientists should be able to easily access data. They need to collaborate with important colleagues from different departments (such as IT). These colleagues should provide support with their business knowledge, as internal bureaucracy can become a major constraint.
4. Is the data collected by the organization appropriate?
Computer vision algorithms are not magic. They require data to function, and the quality of the input data determines their performance. There are various methods and sources available to collect appropriate data, depending on your goals. Regardless, the more input data you have, the more likely the computer vision model will perform well. If you have doubts about the quantity and quality of the data, you can ask data scientists to help evaluate the dataset’s quality and find the best ways to obtain third-party data if necessary.
5. Is the data collected by the organization in the correct format?
In addition to having the appropriate amount and type of data, you also need to ensure the data is in the correct format. Suppose you use thousands of perfect smartphone photos (high resolution, white background) to train an object detection algorithm. Then you find out that the algorithm cannot operate because the actual use case is to detect people holding smartphones under different lighting/contrast/background conditions, not just the smartphones themselves. Thus, your previous data collection efforts may be rendered useless, and you will need to start over. Furthermore, you should understand that if the data is biased, the algorithm will learn that bias.
For more information on how to launch successful computer vision projects, see the blog:https://tryolabs.com/blog/2019/02/13/11-questions-to-ask-before-starting-a-successful-machine-learning-project/.
I hope this article helps readers understand the concepts, working principles, and real-world applications of computer vision.
Original link:
https://tryolabs.com/resources/introductory-guide-computer-vision/
Good news! The Beginner's Guide to Vision Knowledge Planet is now open to the public👇👇👇 Download 1: OpenCV-Contrib Extension Module Chinese Tutorial Reply "Extension Module Chinese Tutorial" in the backend of the "Beginner's Guide to Vision" WeChat public account to download the first OpenCV extension module tutorial in Chinese available online, covering over twenty chapters including extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, etc. Download 2: Python Visual Practical Projects 52 Lectures Reply "Python Visual Practical Projects" in the backend of the "Beginner's Guide to Vision" WeChat public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help you quickly learn computer vision. Download 3: OpenCV Practical Projects 20 Lectures Reply "OpenCV Practical Projects 20 Lectures" in the backend of the "Beginner's Guide to Vision" WeChat public account to download 20 practical projects based on OpenCV to advance your learning in OpenCV. Group Chat Welcome to join the WeChat reader group to exchange ideas with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these will be gradually subdivided). Please scan the WeChat ID below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format, otherwise, you will not be approved. After successful addition, you will be invited to enter the related WeChat group based on your research direction. Please do not send advertisements in the group, or you will be removed from the group. Thank you for your understanding~ 

Leave a Comment