The fantasy that “machines can simulate the human visual system” is outdated. Since the first academic papers appeared in the 1960s, computer vision has come a long way, and modern systems have emerged that can be integrated into mobile applications.
Today, due to its wide applications and immense potential, computer vision has become one of the hottest subfields of artificial intelligence and machine learning. Its goal is to replicate the powerful capabilities of human vision.
But what exactly is computer vision? What is its current application status in different industries? What are some well-known commercial use cases? What are typical computer vision tasks?
This article will introduce the basic concepts and real-world applications of computer vision. It serves as a convenient way for anyone who has heard of computer vision but is uncertain about what it is and how it is applied to understand this complex issue.
You can read through this article or jump directly to a specific section.
-
Retail
-
Manufacturing
-
Healthcare
-
Autonomous Driving
-
Insurance
-
Agriculture
-
Security
-
Image Classification
-
Localization
-
Object Detection
-
Object Recognition
-
Instance Segmentation
-
Object Tracking
What Problems Does Computer Vision Solve?
Humans can understand and describe the scenes in images. For example, in the following image, humans can do more than just detect that there are four people, a street, and several cars in the foreground of the image.
The cover of The Beatles’ album “Abbey Road”. (Source: https://liveforlivemusic.com/news/beatles-abbey-road/)
In addition to this basic information, humans can also see that the people in the foreground are walking, one of them is barefoot, and we even know who they are. We can rationally infer that the people in the image are not in danger of being hit by a car, and the white Volkswagen is not parked properly. Humans can also describe the clothing of the people in the image, not just the color of their clothes but also the material and texture.
This is also the skill required by computer vision systems. Simply put, the main problem that computer vision solves is:
Given a two-dimensional image, a computer vision system must identify the objects in the image and their features, such as shape, texture, color, size, spatial arrangement, etc., in order to describe the image as completely as possible.
Differentiating Computer Vision from Related Fields
The tasks performed by computer vision exceed those of other fields, such as image processing and machine vision, even though there are some commonalities. Next, let’s explore the differences between these fields.
Image processing aims to manipulate raw images to apply some transformation. Its goal is usually to improve the image or to use it as input for a specific task, while the goal of computer vision is to describe and interpret the image. For example, typical image processing components like denoising, contrast enhancement, or rotation can be executed at the pixel level without a comprehensive understanding of the entire image.
Machine vision is a specific application of computer vision used to perform certain (production line) actions. In the chemical industry, machine vision systems can inspect containers on the production line (for cleanliness, emptiness, and damage) or verify whether finished products are properly packaged, thus aiding in product manufacturing.
Computer vision can solve more complex problems, such as facial recognition, detailed image analysis (which can assist in visual search, like Google Images), or biometric methods.
Humans can not only understand the scenes in images, but with a little training, they can also interpret calligraphy, impressionist paintings, abstract art, and two-dimensional ultrasound images of fetuses.
From this perspective, the field of computer vision is especially complex, with a wide range of practical applications.
From e-commerce to traditional industries, companies of all types and sizes can now leverage the powerful capabilities of computer vision, thanks to innovations driven by artificial intelligence and machine learning (more specifically, computer vision).
Let’s take a look at the industry applications that have been most affected by computer vision in recent years.
In recent years, the application of computer vision in retail has become one of the most important technological trends. Below are some common use cases. For a more detailed understanding of potential applications of computer vision in retail, please refer to:https://tryolabs.com/resources/retail-innovations-machine-learning/.
Physical retail stores utilize computer vision algorithms and cameras to understand customer behavior.
Computer vision algorithms can recognize faces and determine characteristics such as gender or age range. Additionally, retail stores can use computer vision technology to track customer movement within the store, analyze their paths, detect walking patterns, and count how often the store catches the attention of passersby.

By adding gaze direction detection, retail stores can answer this important question: where should products be placed in the store to enhance consumer experience and maximize sales?
Computer vision is also a powerful tool for developing anti-theft mechanisms. Facial recognition algorithms can be used to identify known shoplifters or detect if a customer is placing items into their backpack.
Computer vision has two main applications in inventory management.
By analyzing images from security cameras, computer vision algorithms can generate very accurate estimates of remaining products in the store. This is valuable information for store managers, helping them to quickly notice unusual demands for goods and respond promptly.
Another common application is to analyze shelf space utilization and identify suboptimal configurations. Besides discovering wasted space, such algorithms can also provide better product placement solutions.
The main problems on production lines are machine interruptions or defective products, which can lead to production delays and profit loss.
Computer vision algorithms have proven to be an effective way to implement predictive maintenance. By analyzing visual information (from cameras on robots, etc.), algorithms can detect potential issues in machines early. Such systems can predict whether packaging or automotive assembly robots might interrupt, which is a significant contribution.
This can also be used to reduce defect rates, as systems can detect defects in various components across the production line. This allows manufacturers to respond in real-time and take corrective action. Defects may not be severe, and production can continue, but products can be marked in some way or directed to specific production paths. However, sometimes stopping the production line is necessary. For further benefits, such systems can be trained for each use case to classify defects by type and severity.
In the healthcare sector, there are numerous existing applications of computer vision.
Undoubtedly, medical image analysis is the most well-known example, significantly enhancing the medical diagnosis process. Such systems analyze MRI images, CT scans, and X-ray images to identify abnormalities like tumors or search for symptoms of neurological diseases.
In many cases, image analysis techniques extract features from images to train classifiers capable of detecting anomalies. However, some specific applications require more refined image processing. For example, analyzing colonoscopy images requires segmentation to identify polyps and prevent colorectal cancer.
3D rendered CT scan image of the thorax. (Source: https://en.wikipedia.org/wiki/Image_segmentation)
The above image shows the segmentation results needed to observe elements in the thorax. The system segments each important part and colors them: pulmonary arteries (blue), pulmonary veins (red), mediastinum (yellow), and diaphragm (purple).
A large number of such applications are already in use, such as estimating postpartum hemorrhage, quantifying coronary artery calcification, and measuring blood flow in the body without MRI.
However, medical imaging is not the only area where computer vision is applied in healthcare. For example, computer vision technology provides indoor navigation assistance for visually impaired individuals. These systems can locate pedestrians and surrounding objects in floor plans to provide real-time visual experiences. Gaze tracking and eye analysis can be used to detect early cognitive impairments, such as autism in children or dyslexia, which are highly associated with abnormal gaze behaviors.
Have you ever thought about how autonomous vehicles “see” the road? Computer vision plays a central role in this, helping autonomous vehicles perceive and understand their surroundings to operate appropriately.
One of the most exciting challenges in computer vision is image and video object detection. This includes locating and classifying different numbers of objects to differentiate whether an object is a traffic light, a car, or a pedestrian, as shown in the image below:

Object detection for autonomous vehicles. (Source: https://cdn-images-1.medium.com/max/1600/1*q1uVc-MU-tC-WwFp2yXJow.gif)
Such technologies, combined with the analysis of data from sensors and/or radar, allow cars to “see.”
Image object detection is a complex and powerful task, which we have previously discussed; see:https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.
Another article explores this topic from the perspective of human-image interaction; see:https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.
Computer vision has a significant impact on the insurance industry, especially in claims processing.
Applications of computer vision can guide clients visually through claims documentation. It can analyze images in real-time and send them to suitable insurance brokers. At the same time, it can estimate and adjust maintenance costs, determine whether coverage is applicable, and even detect insurance fraud. All of these greatly shorten the claims process, providing better experiences for clients.
From a preventive perspective, computer vision is extremely useful in avoiding accidents. Numerous computer vision applications are available to prevent collisions, integrated into industrial machinery, cars, and drones. This marks a new era of risk management that could transform the entire insurance industry.
Computer vision has a significant impact on agriculture, particularly precision agriculture.
In the global economic activity of food production, there are several valuable applications of computer vision. Food production faces several recurring issues that were previously monitored by humans. Now, computer vision algorithms can detect or reasonably predict pests and diseases. Such early diagnosis can help farmers take appropriate measures quickly, reducing losses and ensuring production quality.
Another long-standing challenge is weeding, as weeds develop resistance to herbicides, potentially causing significant losses for farmers. Now, robots equipped with computer vision technology can monitor entire fields and accurately spray herbicides. This greatly conserves the amount of pesticides used, benefiting both the Earth’s environment and production costs.
Soil quality is another major factor in agriculture. Some computer vision applications can identify potential defects and nutrient deficiencies in soil from photos taken with mobile phones. After analysis, these applications can offer soil restoration techniques and possible solutions for the detected soil issues.
Computer vision can also be used for classification. Some algorithms classify fruits, vegetables, and even flowers by recognizing their main characteristics (such as size, quality, weight, color, texture, etc.). These algorithms can also detect defects, estimating which agricultural products have a longer shelf life and which should be sold in local markets. This greatly extends the shelf life of agricultural products, reducing the time required before they hit the market.
Similar to retail, businesses with high security requirements (such as banks or casinos) can benefit from computer vision applications that analyze images captured by security cameras to identify customers.
On another level, computer vision is a powerful tool in homeland security tasks. It can improve port cargo inspections or monitor sensitive locations, such as embassies, power plants, hospitals, railways, and stadiums. Here, computer vision can not only analyze and classify images but also provide detailed and meaningful descriptions of scenes, offering key factors for real-time decision-making.
Typically, computer vision is widely applied in defense tasks, such as reconnaissance of enemy terrain, automatic identification of enemies in images, automation of vehicle and machine movements, and search and rescue operations.
Typical Computer Vision Tasks
How can we replicate the human visual system? This is achieved through a variety of tasks in computer vision, combined to create highly complex applications. The most common tasks in computer vision include image and video recognition, which involve determining the different objects contained in an image.
The most well-known task in computer vision is likely image classification, which categorizes a given image. Let’s look at a simple binary classification example: we want to classify an image based on whether it contains a tourist attraction. Suppose we build a classifier for this task and provide an image (see below).
The Eiffel Tower. (Source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)
The classifier determines that the above image belongs to the category of images containing tourist attractions. However, this does not mean that the classifier recognized the Eiffel Tower; it may have only seen a photo of the tower before and was told that the image contained a tourist attraction.
Postcard of Paris tourist attractions. (Source: http://toyworldgroup.com/image/cache/catalog/Ecuda%20Puzzles/Postcard%20Form%20Paris%20/14840-500×500.jpg)
A more powerful version of this classifier can handle more than two categories. For example, the classifier can categorize images into specific types of tourist attractions, such as the Eiffel Tower, Arc de Triomphe, Sacré-Cœur, etc. In such scenarios, each image input may have multiple answers, just like the postcard above.
Now, let’s assume we not only want to know the name of the tourist attraction appearing in the image but are also interested in its location within the image. The goal of localization is to find the location of a single object in the image. For example, the location of the Eiffel Tower in the image below is marked.
Eiffel Tower marked with a red bounding box. (Source: https://cdn.pariscityvision.com/media/wysiwyg/tour-eiffel.jpg)
The standard way to perform localization is to define a bounding box that encloses the object in the image.
Localization is a very useful task. For example, it can automatically crop objects from a large number of images. By combining localization with classification tasks, we can quickly build a dataset of famous tourist attraction (cropped) images.
We envision an action that simultaneously includes both localization and classification, repeatedly executing this action on all objects of interest in an image; this is object detection. In this scenario, the number of objects in the image is unknown. Therefore, the goal of object detection is to identify objects in the image and classify them.
Object detection results. (Source: http://research.ibm.com/artificial-intelligence/computer-vision/images/cv-research-areas-object-detection.jpg)
In this dense image, we can see that the computer vision system identifies a large number of different objects: cars, people, bicycles, and even signs containing text.
This task is challenging even for humans. Some objects are only partially visible because they are partially out of the image or overlapping with each other. Additionally, there is a vast size difference among similar objects.
A direct application of object detection is counting, which is widely used in real life, from counting the types of fruits harvested to counting the number of people at public gatherings or football matches.
Object Recognition
Object recognition is slightly different from object detection, although they use similar techniques. Given a specific object, the goal of object recognition is to find instances of that object in the image. This is not classification; rather, it is determining whether the object appears in the image, and if so, performing localization. Searching for images containing a company logo is one example. Another example is monitoring real-time images captured by security cameras to recognize a person’s face.
We can think of instance segmentation as the next step after object detection. It involves not only identifying objects from an image but also creating a mask for each detected object as accurately as possible.
(Caption) Instance segmentation results.
You can see from the above image that the instance segmentation algorithm creates masks for the four members of The Beatles and some cars (though the result is incomplete, especially for Lennon).
The cost of manually performing such tasks is high, while instance segmentation technology simplifies the implementation of these tasks. In France, the law prohibits media from exposing children’s images without explicit consent from guardians. By using instance segmentation technology, children’s faces in TV shows or movies can be blurred.
Object tracking aims to track moving objects over time, using continuous video frames as input. This function is essential for robots; for example, goalkeeper robots need to perform various tasks from tracking the ball to blocking it. Object tracking is also crucial for autonomous vehicles, enabling advanced spatial reasoning and path planning. Similarly, object tracking is useful in multi-person tracking systems, including systems for understanding user behavior (like computer vision systems in retail stores) and monitoring football or basketball players in games.
A relatively straightforward way to perform object tracking is to execute object detection on each image in a video sequence and compare each object instance to determine their movement trajectories. The drawback of this method is that performing object detection for each image can be costly. An alternative approach only requires capturing the tracked object once (usually the first time the object appears) and then identifying its movement trajectory in subsequent images without explicitly recognizing the object. Finally, the object tracking method may not necessarily detect the object; it can simply observe the movement trajectory of the target without knowing what the tracked object is.
How Computer Vision Works
As mentioned earlier, the goal of computer vision is to mimic how the human visual system works. How do algorithms achieve this goal? This article will introduce some of the most important concepts.
Deep learning methods and techniques have profoundly changed computer vision and other artificial intelligence fields. For many tasks, using deep learning methods has become standard practice. In particular, convolutional neural networks (CNNs) outperform the best results achievable with traditional computer vision techniques.
The following four steps outline a general approach to building computer vision models using CNNs:
-
Create a dataset containing labeled images or use an existing dataset. Labels can be image categories (for classification tasks), bounding boxes and category pairs (for object detection problems), or pixel-level segmentation for each object of interest in the image (for instance segmentation problems).
-
Extract features relevant to the task from each image, which is the focus of modeling. For example, features used to recognize faces differ significantly from those used to identify tourist attractions or human organs.
-
Train a deep learning model based on the extracted features. Training means inputting many images into the machine learning model, and the model learns how to solve the task based on the features.
-
Evaluate the model using images different from those used for training to test the accuracy of the trained model.
This strategy is very basic but effective. Such methods are called supervised machine learning, which requires a dataset containing phenomena the model needs to learn.
Building datasets is often costly, but they are crucial for developing computer vision applications. Fortunately, there are some ready-made datasets available. The largest and most famous is ImageNet, which contains 14 million manually labeled images. This dataset includes 1 million images with bounding box annotations.

ImageNet images with bounding boxes (Source: http://www.image-net.org/bbox_fig/kit_fox.JPG)
ImageNet images with object attribute annotations (Source: http://www.image-net.org/attribute_fig/pullfigure.jpg)
Another well-known dataset is the Microsoft Common Objects in Context (COCO) dataset, which contains 328,000 images, 91 object categories (these categories are easily recognizable, even by a 4-year-old), and 2.5 million annotated instances.
Annotated image examples from the COCO dataset. (Source: https://arxiv.org/abs/1405.0312)
Although there are not many available datasets in this field, there are still some suitable for different tasks, such as the CelebFaces Attributes Dataset (CelebA dataset, a facial attribute dataset containing over 200,000 celebrity images), the Indoor Scene Recognition dataset (containing 15,620 indoor scene images), and the Plant Image Analysis dataset (including 1 million plant images belonging to 11 different categories).
Training Object Detection Models
There are many methods to solve the object detection problem. For many years, the method proposed by Paul Viola and Michael Jones in the paper “Robust Real-time Object Detection” has become a popular approach.
Although this method can be used to detect a wide range of object categories, it was originally inspired by the goal of detecting faces. This method is fast and straightforward, used in idiot-proof cameras, allowing for real-time face detection with minimal processing power waste.
The core feature of this method is: it is trained using Haar features with many binary classifiers. Haar features represent edges and lines, making computation simple.
Haar features (Source: https://docs.opencv.org/3.4.3/haar_features.jpg)
Although relatively basic, these features can capture important elements, such as the nose, mouth, or eyebrow spacing, in the specific case of face detection. This supervised method requires many positive and negative samples.

Detecting the face of the Mona Lisa.
This article will not discuss the algorithm’s details. However, the above image demonstrates the process of the algorithm detecting the face of the Mona Lisa.
Deep learning has transformed machine learning, especially in computer vision. Currently, deep learning-based methods have become cutting-edge technologies for many computer vision tasks.
Among them, R-CNN is easy to understand, and its authors proposed a three-stage process:
-
Extract possible objects using region proposal methods.
-
Use CNN to identify features in each region.
-
Classify each region using support vector machines (SVM).
R-CNN architecture (Source: https://arxiv.org/abs/1311.2524)
The region proposal method was initially proposed in the paper “Selective Search for Object Recognition”; although the R-CNN algorithm does not care which region proposal method is used, step 3 is crucial as it reduces the number of candidate objects and lowers computational costs.
The features extracted here are not as intuitive as Haar features. In summary, CNNs can be used to extract 4096-dimensional feature vectors from each region candidate. Given the nature of CNNs, the input should have the same dimensions. This is one of the weaknesses of CNNs, and many methods have addressed this issue. Returning to the R-CNN method, the trained CNN architecture requires input to be fixed at 227 × 227 pixels. Since the sizes of candidate regions vary, the authors of R-CNN distorted the images to meet the dimensional requirements.
Example of a distorted image meeting CNN input dimensional requirements.
Although this method has achieved good results, there are some difficulties during the training process, and it has ultimately been surpassed by other methods. Some of these methods are discussed in-depth in this article:https://tryolabs.com/blog/2017/08/30/object-detection-an-overview-in-the-age-of-deep-learning/.
Computer vision applications are increasingly being deployed by companies to address business issues or enhance product performance. They may have already become part of daily life without you even noticing. Here are some common use cases.
In 2001, the emergence of Google Images meant that visual search technology could be used by the public. Visual search engines can retrieve images based on specific content criteria. A common use case is searching by keywords, but sometimes we provide a source image and ask the engine to find similar images. In some cases, more detailed search criteria can be specified, such as images of beaches, taken in summer, containing at least 10 people.
Now there are many visual search engines, some of which can be used directly as websites, while others require API calls or are mobile applications.
The most famous visual search websites are undoubtedly Google Images, Bing, and Yahoo. The first two websites allow multiple keywords or a single image as search input, with image input also known as “reverse image search.” Yahoo only supports keyword searches, but the search results are also quite good, as shown in the image below.
Yahoo Image Search.
There are also several visual search websites worth noting, such as TinEye, which only supports reverse image search, and Picsearch, which only supports text searches but has a vast coverage.
In mobile applications, visual search technology is gradually becoming a standard feature, leading to significant differences among such applications.
These implementations include Google Goggles (later replaced by Google Lens), which can obtain detailed information from images. For example, it can provide breed information from a photo of a cat or information about artworks in a museum.
In the e-commerce market, Pinterest developed Pinterest Lens. If you need new outfit ideas for existing clothes, you can take a picture of the clothing item, and Pinterest Lens will return outfit suggestions, including items you can purchase to match. In recent years, visual search for online shopping has become one of the fastest-growing trends.
Finally, a more advanced case of visual search is the visual question-answering system, see:https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/.
Facebook Face Recognition
Although face detection technology has been commonly used in cameras for autofocus purposes since the mid-2000s, there have been many more outstanding achievements in the field of face recognition in recent years. The most common (and controversial) application is likely to recognize individuals in images or videos. This is often used in security systems but also appears on social media: face management systems add filters to faces to enable searches by face and even prevent voters from voting multiple times during elections.
One use case that has sparked both interest and concern is Facebook’s face recognition system. One of the main goals of the development team was to prevent strangers from using images that feature users’ faces (see the example in the image below) or to inform visually impaired users about the people appearing in images or videos.
Facebook Face Recognition. (Source: https://cdn0.tnwcdn.com/wp-content/blogs.dir/1/files/2017/12/Facebook-Tagging-796×428.jpg)
Aside from the concerning aspects, this technology is beneficial in many scenarios, such as combating online harassment.
Tired of waiting in line at supermarkets and grocery stores? Amazon Go stores offer a different experience. With the help of computer vision, there are no lines or checkout counters here.
The concept is simple: customers enter the store, select the desired products, and leave without waiting in line to check out.
How is this achieved? Thanks to Amazon’s “Just Walk Out” technology. Customers must download a mobile app that helps Amazon recognize their identity. When they want to enter an Amazon Go store, the app provides a QR code. There are turnstiles at the store entrance for customers to enter and exit, and the turnstiles read the customer’s QR code when they enter the store. An interesting feature is that other people can accompany the customer into the store, and the companions do not need to install the app.
Customers can move freely within the store, which is where computer vision plays a role. The store is equipped with a series of sensors, including cameras, motion sensors, and weight sensors on products. These devices collect behavioral information about each person. They detect in real-time the products customers take off the shelves. Customers can take an item off a shelf and, if they change their mind, put it back. The system ultimately charges the first customer who picked it up, even if it is handed to another customer who wants to buy it; the first customer still has to pay. Thus, the system creates a virtual shopping cart containing all the items picked up and maintains it in real-time. This makes the shopping process very smooth for customers.
When customers finish shopping, they can simply walk out of the store. As they pass through the turnstiles, the system does not require customers to scan their items or QR codes but records the transaction amount and sends a confirmation notification to the customers.
Amazon Go is a case of computer vision positively impacting the real world and human daily life.
Making cars drive themselves is not just a distant dream. Tesla’s Autopilot technology provides convenient autonomous driving features. This is not a fully autonomous driving system but an assistant that can drive cars on specific roads. This is a key point emphasized by Tesla: in all cases, the responsibility for controlling the vehicle lies with the driver.
Autonomous driving is achieved through object detection and tracking technologies.
For Autopilot to work, Tesla vehicles must be “highly armed”: eight cameras provide 360-degree images within a range of 250 meters, ultrasonic sensors detect objects, and radar processes surrounding environmental information. This way, Tesla cars can adjust their speed based on traffic conditions, brake in time when encountering obstacles, maintain or change lanes, turn, and park smoothly.
Tesla’s Autopilot technology is another exciting example of computer vision positively impacting human daily activities.
Microsoft InnerEye
In the healthcare sector, Microsoft’s InnerEye is a valuable tool that helps radiologists, oncologists, and surgeons process radiographic images. Its primary purpose is to accurately identify tumors from 3D images of malignant tumors.

3D images of malignant tumors.
Based on computer vision and machine learning technology, InnerEye outputs highly detailed 3D modeling images of tumors. The above screenshot shows the complete 3D segmentation of a brain tumor created by InnerEye. From the video above, you can see experts controlling the InnerEye tool, guiding it to perform tasks; InnerEye operates like an assistant.
In radiation therapy, InnerEye results enable targeted radiation to tumors without harming important organs.
These results also help radiologists better understand sequences of images, assessing whether a disease has progressed, stabilized, or responded well to treatment based on changes in tumor size. Thus, medical imaging becomes an important means of tracking and measuring.
Finally, InnerEye can be used for planning precise surgeries.
The Current Application Status of Computer Vision in Small Companies
While the implementation of computer vision in large companies is often discussed, this does not mean that all companies must be of the scale of Google or Amazon to benefit from this machine learning technology. Companies of any size can leverage data and computer vision technology to become more efficient and make better decisions.
Let’s look at some real cases from small companies:
Tryolabs helped a small risk management company in San Francisco build and implement a computer vision system to scale processing of rooftop inspection images.
Before using computer vision technology, company experts manually analyzed photos taken by drones to detect damage in rooftop construction. Although the analysis results were accurate, the service could not be effectively scaled due to time consumption and limited human resources.
To solve this problem, we built a deep learning system capable of understanding images and automatically identifying rooftop issues (such as water accumulation, loose cables, and rust). To achieve this, we developed a deep neural network that detects problems based on rooftop images, analyzes input images, and provides detection results through an API for external tools.
As a result, the company saw growth in both order volume and revenue.
How to Implement Computer Vision Projects
As with all innovations worth pursuing within an organization, you should choose a strategic approach to implement computer vision projects.
Successful innovations using computer vision technology depend on overall business strategy, resources, and data.
The following questions can help you build a strategic roadmap for computer vision projects.
1. Should the computer vision solution reduce costs or increase revenue?
Successful computer vision projects either reduce costs or increase revenue (or both); you should define the goals of the project. Only then can it have a significant impact on the organization and its development.
2. How will the success of the project be measured?
Each computer vision project is different, and you need to define success metrics specific to the project. Once the metrics are set, you should ensure they are recognized by business personnel and data scientists.
3. Can information acquisition be guaranteed?
When starting a computer vision project, data scientists should have easy access to data. They need to collaborate with important colleagues from different departments (such as IT). These colleagues should provide support with their business knowledge, as internal bureaucracy can become a major constraint.
4. Is the data collected by the organization appropriate?
Computer vision algorithms are not magic. They need data to operate, and the quality of the input data determines their performance. There are various methods and sources for collecting appropriate data, depending on your goals. Regardless, the more input data you have, the more likely the computer vision model will perform well. If you have concerns about the quantity and quality of the data, you can ask data scientists to assess the dataset quality and, if necessary, find optimal ways to acquire third-party data.
5. Is the data collected by the organization in the appropriate format?
In addition to having the right amount and type of data, you also need to ensure the data is in the correct format. Suppose you are using thousands of perfect mobile photos (high resolution, with a white background) to train an object detection algorithm. Then you find out that the algorithm cannot run because the actual use case is to detect people holding mobile phones under different lighting/contrast/background conditions, rather than detecting the phones themselves. In this case, your previous data collection efforts would be essentially wasted, and you would need to start over. Moreover, you should be aware that if there is bias in the data, the algorithm will learn that bias.
Hopefully, this article will help readers understand the concepts, workings, and real-world applications of computer vision.
Source: Math-AI