Click the above “Beginner’s Guide to Vision” to choose to add “Star” or “Pin“
Important information delivered promptly
The author believes that deep learning is just a tool for computer vision, not a panacea. Do not use it just because it is popular. Traditional computer vision techniques can still shine, and understanding them can save you a lot of time and trouble; moreover, mastering traditional computer vision can indeed help you perform better in deep learning. This is because you can better understand the inner workings of deep learning and execute preprocessing steps to improve deep learning results.
This article is inspired by a common question in forums:
Has deep learning replaced traditional computer vision?
Or in other words:
Since deep learning seems so effective, is it still necessary to learn traditional computer vision techniques?
This is a good question. Deep learning has indeed brought revolutionary breakthroughs to the fields of computer vision and artificial intelligence. Many problems that once seemed difficult can now be solved better by machines than by humans. Image classification is the best proof of this. Indeed, as previously mentioned, deep learning is responsible for bringing computer vision into the industry landscape.
However, deep learning is still just a tool for computer vision and is clearly not a cure-all. Therefore, this article will elaborate on this. That is, I will explain why traditional computer vision techniques are still very useful and worth learning and passing on.
The article is divided into the following sections/arguments:
-
Deep learning requires large data
-
Deep learning can sometimes go too far
-
Traditional computer vision will enhance your deep learning skills
Before getting into the main content, I think it is necessary to explain in detail what “traditional computer vision” is, what deep learning is, and its revolutionary nature.
Before the advent of deep learning, if you had a task like image classification, you would perform a process called “feature extraction.” The so-called “features” are the “interesting,” descriptive, or informative small parts of an image. You would apply a combination of what I refer to in this article as “traditional computer vision techniques” to find these features, including edge detection, corner detection, object detection, and so on.
When using these techniques related to feature extraction and image classification, you would extract as many features as possible from images of a class of objects (e.g., chairs, horses, etc.) and consider them as the “definition” of that class of objects (known as a “bag of words”). Next, you would search for these “definitions” in other images. If a significant portion of the features in the bag of words exists in another image, then that image is classified as containing that specific object (like a chair, horse, etc.).
The difficulty of this feature extraction method for image classification lies in the fact that you must choose which features to look for in each image. As you start to increase the number of categories you are trying to distinguish, say beyond 10 or 20, this becomes very cumbersome and even difficult to implement. Do you look for corners? Edges? Or texture information? Different categories of objects are best described using different types of features. If you choose to use many features, you will have to deal with a massive number of parameters, and you will also need to fine-tune them yourself.
Deep learning introduced the concept of “end-to-end learning,” which (in short) allows machines to learn to find features in each specific category of objects, i.e., the most descriptive and prominent features. In other words, it lets neural networks discover potential patterns in various types of images.
Therefore, with end-to-end learning, you no longer need to manually decide which traditional machine vision techniques to use to describe features. The machine does all of this for you. As Wired magazine wrote:
For example, if you want to teach a [deep] neural network to recognize a cat, you do not have to tell it to look for whiskers, ears, fur, or eyes. You just need to show it thousands of images of cats, and it will naturally solve the problem. If it always misidentifies foxes as cats, you don’t have to rewrite the code. You just continue to train it.
The following image illustrates the difference between feature extraction (using traditional computer vision) and end-to-end learning:
That’s the background introduction. Now let’s discuss why traditional computer vision is still essential and why learning it is still beneficial.
First of all, deep learning requires data, lots and lots of data. The well-known image classification models mentioned earlier are trained based on massive datasets. The top three training datasets are:
-
ImageNet – 1.5 million images, 1000 object categories;
-
COCO – 2.5 million images, 91 object categories;
-
PASCAL VOC – 500,000 images, 20 object categories.
However, a poorly trained model is likely to perform poorly outside of your training data because the machine has no insight into the problem and cannot generalize without having seen the data. Moreover, it is also too difficult for you to look inside the training model and manually adjust it, as a deep learning model has millions of parameters – each of which is adjusted during training. In a sense, a deep learning model is a black box.
Traditional computer vision is completely transparent, allowing you to better assess whether your solution is still effective outside of the training environment. Your in-depth understanding of the problem can be incorporated into your algorithm. And if anything goes wrong, you can more easily figure out what needs to be adjusted and where to adjust it.
This is probably my favorite reason to support traditional computer vision techniques.
Training a deep neural network takes a long time. You need specialized hardware (like high-performance GPUs) to train the latest and most advanced image classification models. Want to train on your decent laptop? Go take a week off, and when you return, the training might still not be finished.
Furthermore, what if your training model performs poorly? You have to go back to square one and redo everything with different training parameters. This process can repeat hundreds of times.
But sometimes all of this is completely unnecessary. Because traditional computer vision techniques can solve problems more efficiently than deep learning, and with less code. For example, in a project I once participated in, we checked whether each can on a conveyor belt contained a red spoon. Now, you could train a deep neural network to detect the spoon through the lengthy process described earlier, or you could write a simple algorithm that thresholds the color red (marking any pixel with a certain range of red as white and all other pixels as black), and then count how many white pixels there are. Simple, done in an hour!
Mastering traditional computer vision techniques can save you a lot of time and reduce unnecessary hassle.
Understanding traditional computer vision can actually help you perform better in deep learning.
For example, the most commonly used neural network in the field of computer vision is the convolutional neural network. But what is convolution? Convolution is actually a widely used image processing technique (e.g., Sobel edge detection). Understanding this can help you understand what happens inside neural networks, allowing you to design and fine-tune them to better solve your problems.
Another thing is preprocessing. The data you input to the model often needs to be processed to prepare for the upcoming training. These preprocessing steps are mainly accomplished through traditional computer vision techniques. For example, if you do not have enough training data, you can perform a process called data augmentation. Data augmentation refers to randomly rotating, moving, cropping, etc., the images in your training dataset to create “new” images. By performing these computer vision operations, you can greatly increase your training data volume.
This article discusses why deep learning has not replaced traditional computer vision techniques and why the latter is still worth learning and passing on. First, the article focuses on the issue that deep learning often requires a large amount of data to perform well. Sometimes there isn’t a large amount of data, and traditional computer vision can serve as an alternative in such cases. Second, deep learning can sometimes go too far for specific tasks. In these tasks, standard computer vision can solve problems more efficiently than deep learning, using less code. Third, mastering traditional computer vision can indeed help you perform better in deep learning because you can better understand the internal workings of deep learning and execute preprocessing steps to improve deep learning results.
In summary, deep learning is just a tool for computer vision, not a cure-all. Do not use it just because it is popular. Traditional computer vision techniques can still shine, and understanding them can save you a lot of time and trouble.
Good news!
The Beginner's Guide to Vision knowledge group is now open to the public👇👇👇
Download 1: Chinese Tutorial for OpenCV-Contrib Extension Modules
Reply "Chinese Tutorial for Extension Modules" in the "Beginner's Guide to Vision" public account to download the first Chinese version of the OpenCV extension module tutorial online, covering over twenty chapters including extension module installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, etc.
Download 2: 52 Lectures on Python Vision Practical Projects
Reply "Python Vision Practical Projects" in the "Beginner's Guide to Vision" public account to download 31 vision practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, facial recognition, etc., to help you quickly learn computer vision.
Download 3: 20 Lectures on OpenCV Practical Projects
Reply "OpenCV Practical Projects 20 Lectures" in the "Beginner's Guide to Vision" public account to download 20 practical projects based on OpenCV, achieving advanced learning of OpenCV.
Group Chat
Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (these will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction," for example: "Zhang San + Shanghai Jiao Tong University + Vision SLAM." Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited into relevant WeChat groups based on your research direction. Please do not send advertisements in the group; otherwise, you will be removed. Thank you for your understanding~