A Review of Computer Vision Development in the Last Decade and Future Directions

Click the above “Beginner’s Guide to Vision” to select “Star” or “Top“.

Important content delivered at the first time.

In the next decade, computer vision will make significant progress. In this article, we will explore the development trends and breakthroughs in computer vision from 2010 to 2020, as well as future goals for computer vision.

A Review of Computer Vision Development in the Last Decade and Future Directions

01. A Brief History of Computer Vision

Throughout the 1980s, 1990s, and 2000s, computer vision was a very challenging task. Even in laboratory environments, it was difficult to achieve good processing results. During that era, machine learning systems used for training visual learning were manually designed through feature engineering.

So what is feature engineering? It means we used the intuition of “experts” to create special designs that worked on specific patterns in images, thus creating many useful features for learning computer vision. Over the years, we have accumulated many different methods, each with its own abbreviation: HOG, SIFT, ORB, and even SURF. However, the unfortunate reality is that solving real-world problems requires a lot of time to integrate these techniques to achieve better results. We could use them to detect lane markings on roads, but they couldn’t be used to recognize and differentiate faces. Building a universal system remained an unattainable dream.

02. Beyond Feature Engineering

In early 2010, computer vision underwent a significant transformation as we witnessed the largest revolution in the field since the invention of computers. In 2012, a computer vision algorithm named AlexNet improved its competitors by 10% in the ImageNet Large Scale Visual Recognition Challenge. The world was shocked. The most amazing thing about it was: the model did not use any manually designed components. Instead, the model relied on a general learning system called neural networks. The breakthrough of AlexNet was the use of GPUs (graphics processing units) to train computer vision models faster: AlexNet was trained for 6 days on two consumer-grade GPUs. For comparison, OpenAI’s GPT-3 released in 2020 was trained over an equivalent simulation time of 355 years, costing around $4,600,000. Since AlexNet, we have continued to add clear data points: the larger the dataset, the larger the model, and the longer the training time, the better our learning capabilities became.

Recently, in the past few years, with the emergence of transformers, we have seen new breakthroughs in visual algorithms. Transformers are a deep learning architecture based on encoders and decoders that have been popular for some time in natural language processing (NLP) tasks. The DETR paper from Facebook’s AI research team caused a stir as it demonstrated how to use transformers to achieve state-of-the-art performance in visual tasks. Transformers are easier to implement than currently popular computer vision algorithms (such as MaskRCNN) and represent a step towards reducing the automation of computer vision. The less time we spend developing and tuning algorithms, the more likely we are to accomplish increasingly complex tasks.

In the next decade, these will have a huge impact on computer vision, and there is much debate about whether smart agents (IoT cameras, Alexa, and Google Home devices) exist in the cloud or directly on the devices themselves.

03. Data Functionality and Synthetic Data for Computer Vision

We have discussed algorithms and hardware. Now, we delve into the most important part of the AI conundrum: data.

Historical trends show us the following: first, algorithms are becoming increasingly general, and second, the demand for manually set data is decreasing. As a result, the performance of computer vision is more dependent on the data used for training it. It is no surprise that we have all seen tech giants amass large datasets.

However, obtaining large datasets does not solve all the AI problems. Because these datasets, whether scraped from the Internet or meticulously staged and captured indoors, are not the best choice for training more general autonomous algorithms. The errors contained in this “real data” inevitably seep into computer vision algorithms. Furthermore, real data is not easy to input for training: it needs to be cleaned, labeled, annotated, and repaired.

Therefore, we find ourselves preparing to welcome a new era of technological change, as significant as the introduction of neural networks and transformers. Data is the biggest barrier to the development of computer vision. We believe the solution is data synthesis. In brief: synthetic data is data created and generated by computers (such as CGI seen in video games or movies). Full control of this virtual world means that pixel labels can be used (consider metadata, such as which pixels correspond to faces in images), even labels that may not be possible to annotate in real-world datasets.

Synthetic data is still in its early stages. Similar to the 2010s, currently, each synthetic dataset is designed using human intuition. However, as we mentioned, startups (including us!) are building systems that will enable us to generate an infinite stream of synthetic data designed by learning systems themselves.

The emergence of automated synthetic data generation will transform computer vision. A decade from now, computer vision algorithms will continuously improve through a process called lifelong learning. The model will identify its weaknesses, generate new synthetic data for that weakness, and then train on that dataset. The best-case scenario is: complete automation achieved, running on clusters of GPUs somewhere in the cloud.

This is what we can expect as we enter the 2020s: it is related to data, more specifically, to data synthesis. This will optimize and enable more complex computer vision tasks.

Download 1: OpenCV-Contrib Extension Module Chinese Tutorial

Reply in the “Beginner’s Guide to Vision” public account backend:Chinese Tutorial for Extension Modules to download the first Chinese version of the OpenCV extension module tutorial, covering more than twenty chapters including extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing.

Download 2: 31 Lectures on Python Vision Practical Projects

In the “Beginner’s Guide to Vision” public account backend, reply:31 Lectures on Python Vision Practical Projects to download 31 vision practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, helping to quickly learn computer vision.

Download 3: 20 Lectures on OpenCV Practical Projects

In the “Beginner’s Guide to Vision” public account backend, reply: 20 Lectures on OpenCV Practical Projects to download 20 practical projects based on OpenCV to advance OpenCV learning.

Download 4: Leetcode Algorithm Open Source Book

In the “Beginner’s Guide to Vision” public account backend, reply: leetcode to download. An open-source book where each problem beats 100% runtime, you deserve to have it!

Community Group

Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions and others (will gradually be subdivided in the future), please scan the WeChat number below to join the group, and note: “nickname + school/company + research direction”, for example: “Zhang San + Shanghai Jiao Tong University + Vision SLAM”. Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited into relevant WeChat groups based on your research direction. Please do not send advertisements in the group, otherwise you will be removed from the group, thank you for your understanding~

Leave a Comment Cancel reply