Fundamental Concepts, Principles, and Applications of Computer Vision

Click on the above“Beginner’s Guide to Vision” to select “Star” or “Pin”

Heavyweight content delivered in real-time

This is an introductory guide to computer vision, detailing concepts, principles, and use cases.

The fantasy of machines simulating the human visual system is outdated. Since the first academic papers emerged in the 1960s, computer vision has come a long way, with modern systems now available that can be integrated into mobile applications.

Today, due to its widespread applications and immense potential, computer vision has become one of the hottest subfields of artificial intelligence and machine learning. Its goal is to replicate the powerful capabilities of human vision.

But what exactly is computer vision? What is its current application status across different industries? What are some well-known commercial use cases? What are typical computer vision tasks?

This article will introduce the fundamental concepts and real-world applications of computer vision, serving as a convenient way for anyone who has heard of computer vision but is unsure what it is and how it can be applied to understand this complex topic.

You can read through this article or jump directly to a specific section.

Table of Contents

What is Computer Vision?

What Problems Does Computer Vision Solve?
Differentiating Computer Vision from Related Fields

Industry Applications

Retail
Manufacturing
Healthcare
Autonomous Driving
Insurance
Agriculture
Security

Typical Computer Vision Tasks

Image Classification
Localization
Object Detection
Object Recognition
Instance Segmentation
Object Tracking

How Computer Vision Works

General Strategies
Existing Datasets
Training Object Detection Models

Commercial Use Cases

Visual Search Engines
Facebook Face Recognition
Amazon Go
Tesla Autopilot
Microsoft InnerEye

Current Applications of Computer Vision in Small Companies
How to Implement Computer Vision Projects

What is Computer Vision?

What Problems Does Computer Vision Solve?

Humans can understand and describe scenes in images. For example, in the image below, humans can do more than just detect that there are four people, a street, and several cars in the foreground.

The cover of The Beatles’ album “Abbey Road”. (Image source: https://liveforlivemusic.com/news/beatles-abbey-road/)

In addition to this basic information, humans can also see that the people in the foreground are walking, one of them is barefoot, and we even know who they are. We can rationally infer that the people in the image are not in danger of being hit by cars, as the white Volkswagen is not parked properly. Humans can also describe what the individuals in the image are wearing, not just the color of their clothes, but also the material and texture.

This is also the skill that computer vision systems need. Simply put, the main problem that computer vision solves is:

Given a two-dimensional image, the computer vision system must identify the objects in the image and their characteristics, such as shape, texture, color, size, spatial arrangement, etc., in order to describe the image as completely as possible.

Differentiating Computer Vision from Related Fields

The tasks completed by computer vision far exceed those of other fields like image processing and machine vision, although they share some commonalities. Next, let’s understand the differences between these fields.

Image Processing

Image processing aims to manipulate raw images to apply some transformation. Its goal is usually to enhance the image or use it as input for a specific task, whereas the goal of computer vision is to describe and interpret the image. For instance, typical image processing components like noise reduction, contrast adjustment, or rotation can be performed at the pixel level without needing a comprehensive understanding of the overall image.

Machine Vision

Machine vision is a specific case of computer vision used to perform certain (production line) actions. In the chemical industry, machine vision systems can inspect containers on the production line (for cleanliness, emptiness, and damage) or check if finished products are properly packaged, thereby assisting in product manufacturing.

Computer Vision

Computer vision can solve more complex problems, such as face recognition, detailed image analysis (which can assist in visual search, like Google Images), or biometric methods.

Industry Applications

Humans can not only understand scenes in images but, with some training, can also interpret calligraphy, Impressionist paintings, abstract art, and two-dimensional ultrasound images of fetuses.

From this perspective, the field of computer vision is particularly complex, with a plethora of practical applications.

From e-commerce to traditional industries, companies of all types and sizes can now leverage the powerful capabilities of computer vision, thanks to innovations driven by artificial intelligence and machine learning (specifically computer vision).

Let’s take a look at the industry applications most affected by computer vision in recent years.

Retail

In recent years, the application of computer vision in retail has become one of the most important technological trends. Below are some common use cases. For a more detailed understanding of the potential applications of computer vision in retail, please refer to: https://tryolabs.com/resources/retail-innovations-machine-learning/.

Behavior Tracking

Physical retail stores use computer vision algorithms and cameras to understand customers and their behaviors.

Computer vision algorithms can recognize faces, determining characteristics of individuals, such as gender or age range. Additionally, retail stores can use computer vision technology to track customers’ movements within the store, analyze their pathways, detect walking patterns, and count how often the store attracts pedestrian attention.

By adding gaze direction detection, retail stores can answer this important question: where to place products in the store to enhance consumer experience and maximize sales.

Computer vision is also a powerful tool for developing theft prevention mechanisms. Face recognition algorithms can be used to identify known shoplifters or detect when a customer places items in their backpack.

Inventory Management

Computer vision has two main applications in inventory management.

Through security camera image analysis, computer vision algorithms can generate very accurate estimates of remaining products in the store. For store managers, this is invaluable information that can help them quickly detect unusual product demand and respond early.

Another common application is analyzing shelf space utilization to identify suboptimal configurations. Besides discovering wasted space, such algorithms can also provide better product placement solutions.

Manufacturing

The primary issues on production lines are machine interruptions or defective products, which can lead to production delays and profit losses.

Computer vision algorithms have proven to be an effective way to implement predictive maintenance. These algorithms analyze visual information (from sources like cameras on robots) to detect potential problems with machines in advance. Such systems can predict whether packaging or automotive assembly robots will interrupt, which is a significant contribution.

It can also be used to reduce defect rates, as the system can detect defects in components across the entire production line. This allows manufacturers to respond in real-time and take corrective actions. Defects may not be severe, allowing the production process to continue, but products can be marked in some way or directed to specific production paths. However, sometimes it is necessary to stop the production line. For further benefits, such systems can be trained for each use case, classifying defects by type and severity.

Healthcare

In the healthcare industry, there are a vast number of existing applications of computer vision.

Without a doubt, medical image analysis is the most well-known example, significantly enhancing medical diagnostic processes. Such systems analyze MRI images, CT scans, and X-ray images to identify abnormalities like tumors or search for symptoms of neurological diseases.

In many cases, image analysis techniques extract features from images to train classifiers capable of detecting anomalies. However, some specific applications require more refined image processing. For example, analyzing colonoscopy images requires segmentation to identify polyps and prevent colorectal cancer.

3D rendered CT scan images of the thorax. (Image source: https://en.wikipedia.org/wiki/Image_segmentation)

The above image shows the segmentation results required to observe thoracic elements. The system segments each important part and colors them: pulmonary arteries (blue), pulmonary veins (red), mediastinum (yellow), and diaphragm (purple).

Currently, many such applications are in use, such as estimating postpartum hemorrhage, quantifying coronary artery calcification, and measuring blood flow in the body without MRI.

However, medical images are not the only use of computer vision in the healthcare industry. For instance, computer vision technology provides indoor navigation assistance for visually impaired individuals. These systems can locate pedestrians and surrounding objects within floor plans to provide real-time visual experiences. Gaze tracking and eye analysis can be used to detect early cognitive impairments, such as autism in children or dyslexia, which are highly correlated with abnormal gaze behavior.

Autonomous Driving

Have you ever wondered how autonomous vehicles “see” the road? Computer vision plays a core role in this, helping autonomous vehicles perceive and understand their surroundings to operate appropriately.

One of the most exciting challenges in computer vision is object detection in images and videos. This includes locating and classifying different numbers of objects to distinguish whether an object is a traffic light, a car, or a pedestrian, as shown in the image below:

Object detection for autonomous vehicles. (Image source: https://cdn-images-1.medium.com/max/1600/1*q1uVc-MU-tC-WwFp2yXJow.gif)

This technology, combined with data analysis from sources such as sensors and/or radar, enables vehicles to “see”.

Image object detection is a complex and powerful task, which we have previously discussed; see: https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.

Another article explores this topic from the perspective of human-image interaction; see: https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.

Insurance

The impact of computer vision in the insurance industry is significant, especially in claims processing.

Computer vision applications can guide customers in processing claims documents visually. It can analyze images in real-time and send them to the appropriate insurance agents. At the same time, it can estimate and adjust maintenance costs, determine whether coverage is in place, and even detect insurance fraud. All of these greatly shorten the claims process, providing customers with a better experience.

From a preventive perspective, computer vision is immensely useful in avoiding accidents. Numerous computer vision applications designed to prevent collisions are integrated into industrial machinery, automobiles, and drones. This marks a new era of risk management that could transform the entire insurance industry.

Agriculture

Computer vision has a profound impact on agriculture, particularly precision agriculture.

In the global economic activity of food production, there are a series of valuable computer vision applications. Food production faces recurring problems that were previously monitored by humans. Now, computer vision algorithms can detect or reasonably predict pest infestations. Such early diagnoses help farmers take appropriate measures quickly, minimizing losses and ensuring production quality.

Another long-standing challenge is weeding, as weeds develop resistance to herbicides, potentially causing significant losses for farmers. Now, robots equipped with computer vision technology can monitor entire fields and precisely spray herbicides. This greatly conserves the amount of pesticide used, providing substantial benefits for both the environment and production costs.

Soil quality is also a major factor in agriculture. Some computer vision applications can identify potential defects and nutrient deficiencies in soil from photos taken with mobile phones. After analysis, these applications can provide soil restoration techniques and possible solutions for detected soil issues.

Computer vision can also be used for classification. Some algorithms classify fruits, vegetables, and even flowers by recognizing key characteristics (such as size, quality, weight, color, texture, etc.). These algorithms can also detect defects and estimate which agricultural products have a longer shelf life and which should be sold at local markets. This significantly extends the shelf life of agricultural products and reduces the time required before they hit the market.

Security

Similar to retail, businesses with high security requirements (such as banks or casinos) can benefit from computer vision applications that analyze images captured by security cameras to identify customers.

On another level, computer vision is a powerful tool in homeland security tasks. It can be used to improve cargo inspections at ports or monitor sensitive locations such as embassies, power plants, hospitals, railways, and stadiums. Here, computer vision can not only analyze and classify images but also provide detailed and meaningful descriptions of scenes, offering key factors for real-time decision-making.

Typically, computer vision is widely used in defense tasks, such as reconnaissance of enemy terrain, automatic identification of enemies in images, automation of vehicle and machine movements, and search and rescue operations.

Typical Computer Vision Tasks

How is it possible to highly replicate the human visual system?

Computer vision is based on a multitude of different tasks, combined to achieve highly complex applications. The most common tasks in computer vision are image and video recognition, which involve determining the different objects contained in an image.

Image Classification

The most well-known task in computer vision is likely image classification, which categorizes a given image. Let’s look at a simple binary classification example: we want to classify an image based on whether it contains a tourist attraction. Suppose we have built a classifier for this task and provided it with an image (see below).

The Eiffel Tower (Image source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)

The classifier believes that the above image belongs to the category of images containing tourist attractions. However, this does not mean that the classifier recognizes the Eiffel Tower; it may have simply seen a photo of the tower before and was told that the image contains a tourist attraction.

A postcard of Paris tourist attractions. (Image source: http://toyworldgroup.com/image/cache/catalog/Ecuda%20Puzzles/Postcard%20Form%20Paris%20/14840-500×500.jpg)

A more powerful version of the classifier can handle more than two categories. For instance, the classifier categorizes images into specific types of tourist attractions, such as the Eiffel Tower, the Arc de Triomphe, Sacré-Cœur Basilica, etc. In such scenarios, each image input may have multiple answers, just like the postcard above.

Localization

Now, suppose we not only want to know the name of the tourist attraction appearing in the image but are also interested in its location within the image. The goal of localization is to find the position of a single object within the image. For example, in the image below, the location of the Eiffel Tower is marked.

The Eiffel Tower marked with a red boundary box. (Image source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)

The standard way to execute localization is to define a bounding box that surrounds the object in the image.

Localization is a very useful task. For example, it can perform automatic object cropping on a large number of images. By combining localization with classification tasks, one can quickly build a famous tourist attraction (cropped) image dataset.

Object Detection

We imagine performing both localization and classification actions simultaneously, repeating this action for all objects of interest in an image; this is object detection. In this scenario, the number of objects in the image is unknown. Therefore, the goal of object detection is to find and classify the objects in the image.

Object detection results (Image source: http://research.ibm.com/artificial-intelligence/computer-vision/images/cv-research-areas-object-detection.jpg)

In this dense image, we can see that the computer vision system has identified many different objects: cars, people, bicycles, and even signs containing text.

This problem is even challenging for humans. Some objects are only partially visible because they are partly outside the image or overlapping with each other. Additionally, the size differences of similar objects can be significant.

A direct application of object detection is counting, which is widely used in real life, from counting the types of fruits harvested to counting the number of people at public gatherings or events like soccer matches.

Object Recognition

Object recognition is slightly different from object detection, although they use similar techniques. Given a specific object, the goal of object recognition is to find instances of that object in the image. This is not classification but determining whether the object appears in the image and, if so, performing localization. Searching for images containing a company’s logo is one example. Another example is monitoring real-time images captured by security cameras to identify a person’s face.

Instance Segmentation

We can view instance segmentation as the next step of object detection. It not only involves identifying objects from an image but also requires creating a mask for each detected object as accurately as possible.

(Caption) Instance segmentation results.

You can see from the above image that the instance segmentation algorithm has created masks for the four Beatles members and some cars (although the result is not complete, especially for Lennon).

The cost of performing such tasks manually is high, while instance segmentation technology simplifies the implementation of such tasks. In France, laws prohibit the media from exposing children’s images without explicit consent from guardians. With instance segmentation technology, it is possible to blur children’s faces in television or film.

Object Tracking

Object tracking aims to track moving objects over time, using continuous video frames as input. This functionality is necessary for robots; for example, goalkeeper robots need to perform various tasks, from tracking the ball to blocking it. Object tracking is equally important for autonomous vehicles, enabling advanced spatial reasoning and path planning. Similarly, object tracking is useful in multi-person tracking systems, including systems that monitor user behavior (such as computer vision systems in retail stores) and those that track soccer or basketball players in games.

A relatively straightforward way to perform object tracking is to execute object detection on each image in a video sequence and compare each object instance to determine their movement trajectories. The drawback of this method is that performing object detection for every image is often costly. An alternative approach requires capturing the tracked object only once (usually the first time the object appears) and then identifying its movement trajectory in subsequent images without explicitly recognizing the object. Finally, object tracking methods may not necessarily detect the object; they can merely observe the movement trajectory of the target without knowing what the tracked object is.

How Computer Vision Works

As previously mentioned, the goal of computer vision is to mimic how the human visual system works. How do algorithms achieve this goal? This article will introduce some of the most important concepts.

General Strategies

Deep learning methods and techniques have profoundly changed computer vision and other fields of artificial intelligence; for many tasks, using deep learning methods has become standard practice. In particular, the performance of convolutional neural networks (CNNs) has surpassed the best results achievable with traditional computer vision techniques.

The following four steps outline the general approach to building computer vision models using CNNs:

Create a dataset containing labeled images or use an existing dataset. Labels can be image categories (for classification tasks), bounding boxes and category pairs (for object detection problems), or pixel-level segmentation for each object of interest in the image (for instance segmentation problems).
Extract features relevant to the task from each image, which is the focus of modeling. For example, the features used to recognize faces differ significantly from those used to identify tourist attractions or human organs.
Train a deep learning model based on the features. Training means inputting many images into the machine learning model, which learns how to solve the task based on the features.
Evaluate the model using images different from those used for training to test the accuracy of the trained model.

This strategy is very basic but effective. Such methods are called supervised machine learning, which requires a dataset containing phenomena for the model to learn.

Existing Datasets

Building datasets is often costly, but they are crucial for developing computer vision applications. Fortunately, there are some ready-made datasets available. Among them, the largest and most famous is ImageNet, which contains 14 million manually labeled images. This dataset includes 1 million images with bounding box annotations.

ImageNet images with bounding boxes (Image source: http://www.image-net.org/bbox_fig/kit_fox.JPG)

ImageNet images with object attribute annotations (Image source: http://www.image-net.org/attribute_fig/pullfigure.jpg)

Another well-known dataset is the Microsoft Common Objects in Context (COCO) dataset, which contains 328,000 images, 91 object categories (these categories are easily recognizable, even by a 4-year-old), and 2.5 million annotated instances.

Annotated image examples from the COCO dataset. (Image source: https://arxiv.org/abs/1405.0312)

Although the number of available datasets in this field is not particularly large, there are still some suitable for different tasks, such as the CelebFaces Attributes Dataset (CelebA dataset, which contains over 200,000 celebrity images), Indoor Scene Recognition Dataset (containing 15,620 indoor scene images), and Plant Image Analysis Dataset (which includes 1 million plant images belonging to 11 different categories).

Training Object Detection Models

Viola–Jones Method

There are many methods to solve the object detection problem. For many years, the method proposed by Paul Viola and Michael Jones in the paper “Robust Real-time Object Detection” has become a popular method.

Although this method can be used to detect a large number of object categories, it was initially inspired by the goal of face detection. This method is fast and straightforward, used in foolproof cameras, and can perform real-time face detection with almost no wasted processing power.

The core feature of this method is training a large number of binary classifiers based on Haar features. Haar features represent edges and lines and are computationally simple.

Haar features (Image source: https://docs.opencv.org/3.4.3/haar_features.jpg)

Although quite basic, these features can capture important elements like the nose, mouth, or inter-eye distance in the specific case of face detection. This supervised method requires many positive and negative samples.

Detecting Mona Lisa’s face.

This article will not discuss algorithm details. However, the above image demonstrates the process of the algorithm detecting Mona Lisa’s face.

Methods Based on CNN

Deep learning has revolutionized machine learning, especially in computer vision. Currently, methods based on deep learning have become the cutting-edge technology for many computer vision tasks.

Among them, R-CNN is easy to understand, with its authors proposing a three-stage process:

Use region proposal methods to extract possible objects.
Use CNN to identify features in each region.
Use support vector machines (SVM) to classify each region.

R-CNN architecture (Image source: https://arxiv.org/abs/1311.2524)

The region proposal method was initially proposed in the paper “Selective Search for Object Recognition”; however, the R-CNN algorithm does not care which region proposal method is used. Step 3 is crucial because it reduces the number of candidate objects, lowering computational costs.

The features extracted here are not as intuitive as Haar features. In summary, CNNs can extract 4096-dimensional feature vectors from each region proposal. Given the nature of CNNs, the input should have the same dimensions. This is also one of the weaknesses of CNNs, and many methods have addressed this issue. Returning to the R-CNN method, the trained CNN architecture requires input to be fixed at 227 × 227 pixels. Since the sizes of candidate regions can vary, the authors of R-CNN distorted the images to meet the dimensional requirements.

Examples of distorted images that meet CNN input dimensional requirements.

Although the method has achieved good results, there are some difficulties in the training process, and it has ultimately been surpassed by other methods. Some of these methods are discussed in depth in this article: https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.

Commercial Use Cases

Computer vision applications are being deployed by an increasing number of companies to address business problems or enhance product performance. They may have already become a part of daily life without you noticing. Here are some common use cases.

Visual Search Engines

In 2001, the emergence of Google Images made visual search technology accessible to the public. Visual search engines can retrieve images based on specific content criteria. A common use case is searching by keywords, but sometimes we provide a source image and ask the engine to find similar images. In some cases, more detailed search criteria can be specified, such as images of beaches, taken in summer, containing at least 10 people.

There are now many visual search engines, some of which can be used directly as websites, some require API calls, and others are mobile applications.

The most famous visual search websites are undoubtedly Google Images, Bing, and Yahoo. The first two can use multiple keywords or a single image as search input, with the image input also known as “reverse image search”. Yahoo only supports keyword searches, but the search results are still quite good, as shown in the image below.

Yahoo image search.

There are also several noteworthy visual search websites, such as TinEye, which only supports reverse image searches, and Picsearch, which only supports text searches but has a very extensive coverage.

In mobile applications, as visual search technology gradually becomes a standard feature, there is significant variation between such applications.

Such implementations include Google Goggles (later replaced by Google Lens), which can extract details from images. For example, it can provide breed information from a photo of a cat or provide information about artworks in museums.

In the e-commerce market, Pinterest has developed Pinterest Lens. If you need new outfit ideas for existing clothing, you can take a photo of that clothing, and Pinterest Lens will return outfit suggestions, including items you can purchase. In recent years, visual search for online shopping has become one of the fastest-growing trends.

Finally, a more advanced case of visual search is the visual question-answering system, see: https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.

Facebook Face Recognition

Although face detection technology used for autofocus purposes has been common in cameras since the mid-2000s, there have been many more excellent achievements in the field of face recognition in recent years. The most common (and controversial) application is likely to identify people in images or videos. This is often used in security systems but also appears in social media: face management systems add filters to faces for searching by face, and even prevent voters from voting multiple times during elections.

A use case that has sparked both interest and concern is Facebook’s face recognition system. One of the main goals of the development team is to prevent strangers from using images containing users’ faces (see the example in the image below) or to inform visually impaired users about the people appearing in images or videos.

Facebook face recognition. (Image source: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2017/12/Facebook-Tagging-796×428.jpg)

Besides the concerning aspects, this technology is beneficial in many scenarios, such as combating online harassment.

Amazon Go

Are you tired of waiting in line at supermarkets and grocery stores? Amazon Go stores offer a different experience. With the help of computer vision, there is no need to queue, and no packaging boxes.

The concept is simple: customers enter the store, select the items they want, and leave the store without waiting in line to check out.

How is this achieved? Thanks to Amazon’s “Just Walk Out” technology. Customers must download a mobile app that helps Amazon identify their identity. When they want to enter an Amazon Go store, the app provides a QR code. There are turnstiles at the store entrance for customers to enter and exit the store, which read the customer’s QR code when they enter. An interesting feature is that others can accompany the customer into the store, and the companions do not need to install the app.

Customers can freely move around the store, which is where computer vision comes into play. The store is equipped with a series of sensors, including cameras, motion sensors, and weight sensors on products. These devices collect behavioral information about each person. They detect in real-time the items customers take from the shelves. Customers can take an item and change their minds, putting it back. The system ultimately charges the first customer who picked it up, even if it is handed to another customer who wants to buy it; the first customer who picked it up still needs to pay. Thus, the system creates a virtual shopping cart containing all the items picked up and maintains it in real-time. This makes the shopping process very smooth for customers.

When customers finish shopping, they can walk out of the store. As they pass through the turnstiles, the system does not require customers to scan products or QR codes but records the transaction amount and sends a confirmation notification to the customer.

Amazon Go is an example of how computer vision positively impacts the real world and human daily life.

Tesla Autopilot

Making cars drive themselves is not just a distant dream. Tesla Autopilot technology provides very convenient autonomous driving features. It is not a fully autonomous driving system but rather a driving assistant that can drive cars on specific routes. This is the emphasis Tesla places: in all circumstances, the driver is responsible for controlling the car.

Autonomous driving is achieved through object detection and tracking technologies.

To make Autopilot work, Tesla cars must be “highly armed”: eight panoramic cameras provide 360-degree images within a 250-meter range, ultrasonic sensors detect objects, and radar processes surrounding environmental information. This allows Tesla cars to adjust their speed according to traffic conditions, brake in time when encountering obstacles, maintain or change lanes, turn, and park smoothly.

Tesla Autopilot technology is another exciting example of how computer vision positively impacts human daily activities.

Microsoft InnerEye

In the healthcare industry, Microsoft’s InnerEye is a valuable tool for helping radiologists, oncologists, and surgeons process radiological images. Its primary purpose is to accurately identify tumors from 3D images of malignant tumors.

3D images of cancerous tumors.

Based on computer vision and machine learning technologies, InnerEye outputs very detailed 3D modeling images of tumors. The above screenshot shows the complete 3D segmentation of a brain tumor created by InnerEye. From the video above, you can see experts controlling the InnerEye tool, guiding it to perform tasks, with InnerEye functioning like an assistant.

In radiotherapy, InnerEye’s results make it possible to target tumors for radiation without harming important organs.

These results also help radiologists better understand image sequences, judging whether the disease is progressing, stable, or responding well to treatment based on changes in tumor size. In this way, medical images become an important means of tracking and measuring.

Finally, InnerEye can be used for planning precise surgeries.

Current Applications of Computer Vision in Small Companies

While the implementation of computer vision in large companies is often discussed, this does not mean that all companies must be of the scale of Google or Amazon to benefit from this machine learning technology. Companies of any size can leverage data and computer vision technology to become more efficient and make better decisions.

Let’s look at some real cases from small companies:

Tryolabs helped a small risk management company located in San Francisco build and implement a computer vision system for scaling the processing of rooftop inspection images.

Before using computer vision technology, company experts manually analyzed photos taken by drones to detect damages in rooftop construction. Although the analysis results were accurate, the service could not be effectively scaled due to time consumption and limited human resources.

To solve this problem, we built a deep learning system capable of understanding images and automatically identifying rooftop issues (such as water accumulation, loose cables, and rust). To this end, we developed a deep neural network capable of detecting problems based on rooftop images, analyzing input images, and making detection results available to external tools via an API.

As a result, both the volume of orders and revenue for the company increased.

How to Implement Computer Vision Projects

As with all innovations worth pursuing within organizations, you should choose a strategic approach to implement computer vision projects.

Successful innovation using computer vision technology depends on overall business strategy, resources, and data.

The following questions can help you build a strategic roadmap for your computer vision project.

1. Should the computer vision solution reduce costs or increase revenue?

Successful computer vision projects either reduce costs or increase revenue (or both); you should define the objectives of the project. Only then can it have a significant impact on the organization and its development.

2. How will project success be measured?

Each computer vision project is different, and you need to define success metrics specific to that project. Once the metrics are set, you should ensure they are recognized by business personnel and data scientists.

3. Can information acquisition be guaranteed?

When starting a computer vision project, data scientists should have easy access to data. They need to collaborate with key colleagues from different departments (such as IT). These colleagues should provide support with their business knowledge, as internal bureaucracies can become significant constraints.

4. Is the data collected by the organization appropriate?

Computer vision algorithms are not magic. They require data to operate, and the quality of input data determines their performance. There are various methods and sources for collecting appropriate data, depending on your goals. In any case, the more input data you have, the more likely the computer vision model will perform well. If you have concerns about the quantity and quality of the data, you can ask data scientists to help assess the quality of the dataset and, if necessary, find optimal ways to obtain third-party data.

5. Is the data collected by the organization in an appropriate format?

In addition to having the right amount and type of data, you also need to ensure the data format is appropriate. Suppose you use thousands of perfect mobile photos (high resolution, white background) to train an object detection algorithm. Then you find that the algorithm cannot run because the actual use case is detecting people holding mobile phones under different lighting/contrast/background conditions, rather than detecting the phones themselves. In this case, your previous data collection efforts would be essentially wasted, and you would need to start over. Moreover, you should understand that if the data is biased, the algorithm will learn that bias.

For more information on how to initiate successful computer vision projects, see the blog: https://tryolabs.com/blog/2019/02/13/11-questions-to-ask-before-starting-a-successful-machine-learning-project/.

We hope this article helps readers understand the concepts, workings, and real-world applications of computer vision.

Download 1: Chinese Tutorial for OpenCV-Contrib Extension Modules

Reply "Chinese Tutorial for Extension Modules" in the backend of the "Beginner's Guide to Vision" public account to download the first Chinese version of the OpenCV extension module tutorial available online, covering over twenty chapters including module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, etc.

Download 2: 52 Lectures on Python Vision Practical Projects

Reply "Python Vision Practical Projects" in the backend of the "Beginner's Guide to Vision" public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, adding eyeliner, license plate recognition, character recognition, emotion detection, text content extraction, and face recognition, helping to quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects

Reply "20 Lectures on OpenCV Practical Projects" in the backend of the "Beginner's Guide to Vision" public account to download 20 practical projects based on OpenCV to achieve advanced learning in OpenCV.

Community Group

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these will gradually be subdivided). Please scan the WeChat ID below to join the group, noting: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM". Please follow the format; otherwise, entry will not be granted. After successful addition, you will be invited into relevant WeChat groups based on your research direction. Please do not send advertisements in the group; otherwise, you will be removed from the group. Thank you for your understanding~

Leave a Comment Cancel reply