How to Quickly Improve Yourself in Computer Vision?

ClickI Love Computer Vision to star and get CVML new technologies faster

Introduction: The author of this article is a friend from 52CV, recommending not only learning paths but also practical projects and some classic tasks, which are worth referencing for friends who are exploring.

How to Quickly Improve Yourself in Computer Vision?

Author: I Am Not Good at This Link: https://www.zhihu.com/question/337889115/answer/770797118 Source: Zhihu, authorized by the author, no secondary reproduction allowed.

First, basic knowledge of machine learning is essential, as some traditional methods involve manual features + machine learning methods, which won the ImageNet visual challenge before 2012. Here are some recommended materials, though they are quite common.

Such as Professor Zhou Zhihua’s “Machine Learning”, Professor Li Hang’s “Statistical Learning Methods”, “Machine Learning Practice”, Professor Andrew Ng’s cs229, and Professor Li Hongyi’s machine learning videos (available on Bilibili). In addition, basic knowledge of image processing is also indispensable, such as Gonzalez’s “Digital Image Processing”.

Next, we have deep learning. After 2012, deep learning methods have dominated the field of computer vision. Classic networks that you must know include LeNet, AlexNet, VGG, GoogLeNet, ResNet, DenseNet, SENet, etc., as well as some lightweight networks like ShuffleNet and the MobileNet series. I will also update related paper interpretations in my column later.

In more detailed tasks, there are some classic tasks. The above networks are mainly classification networks used for classification, such as given an image, output whether it is a cat or a dog. However, there are many more complex problems, such as an image containing both a cat and a dog; what should the network output in this case? This leads to classic tasks like detection and segmentation tasks.

Detection tasks actually output the location and probability of the target, where the location is indicated by bounding boxes around the target objects. The development of object detection is roughly as follows (my level is limited, so there may be errors; constructive criticism is welcome). In an image, candidate boxes are densely generated, and features are extracted with a machine learning classifier. Here, a question arises: how to generate candidate boxes? Methods include sliding window and random search, and they continue to develop.

In 2014, a well-known paper appeared—RCNN, which applied deep learning to object detection, but only used CNN as a feature extractor. I won’t elaborate further here; the RCNN series (RCNN, Fast RCNN, Faster RCNN, Mask RCNN), YOLO series (YOLO v1, YOLO v2, YOLO v3), and SSD have emerged. Since last year, some anchor-free works have appeared, and at ECCV 2018, a work called CornerNet proposed the idea of transforming the object detection problem into a keypoint detection problem. Subsequently, a series of keypoint-based object detection works like CenterNet and ExtremeNet have emerged.

Recently, Google has been working on NAS and object detection, as well as using reinforcement learning to select data augmentation strategies (I will replicate the data augmentation strategy from that paper soon).

Segmentation tasks are pixel-level classification problems, which also have traditional methods and deep learning methods. A classic deep learning method for segmentation is FCN, where the output size should match the input size (here size refers to width and height).

Object tracking tasks share some similarities with detection tasks but have some differences. The effect of detecting each frame is very similar to tracking, but object detection usually detects known categories, while tracking can follow the content given in the first frame. There are some traditional methods, relevant filtering methods, and deep learning methods, such as the twin network series of papers. Further derivatives include Re-ID.

In addition, there are many computer vision tasks, such as super-resolution and 3D reconstruction (which also involves computer graphics, etc.). Due to space limitations, I won’t introduce the basic developments of these tasks one by one. You can find some papers to study on your own, paying attention not only to recent articles but also to those from the last century. It is best to write a review after reading dozens or hundreds of articles and reviewing the code.

There are some courses and books in the field of computer vision and deep learning, such as Professor Li Mu’s “Hands-On Deep Learning”, which includes courses and Jupyter notebooks, very good and worth discovering. There are also the “Deep Learning” book, “Computer Vision: Algorithms and Applications”, “Computer Vision: A Modern Approach”, and “Computer Vision: Models, Learning, and Inference”.

For courses, you can check cs231n, a very classic computer vision course, and cs224d, although this course is for NLP, RNNs and similar topics are very helpful for time series modeling, so it’s worth paying attention to.

Recently, you should also focus on the GNN paper list, which uses graph ideas to address some computer vision problems, as well as tasks that combine CV and NLP, such as VQA and captioning, which have been around for a long time. Here is a paper you can refer to: Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods.

In addition, coding ability is also very necessary.

Whether it’s traditional digital image processing or graphics, I’ve recently been looking at some graphics algorithms and plan to implement them one by one. Whether it’s deep learning methods, a certain coding ability is required. Usually, you should look at others’ code and learn from it. Recently, I replicated a paper from ECCV 2018 called HairNet, which is much more engineering-oriented compared to my previous code, making it easier to use on other devices.

Traditional digital image processing often uses MATLAB, while graphics often uses C++. Most deep learning codes are based on Python, along with several deep learning frameworks such as PyTorch, MXNet, TensorFlow, Caffe, Darknet, etc. Currently, these frameworks are more or less used. I recommend the first two; PyTorch is currently a more mainstream framework, and MXNet can be learned based on Li Mu’s book, which is helpful for understanding the underlying code. I learned how the dataloader is implemented through that code.

As for the word “quickly”, I don’t know how to embody it; it varies from person to person. A slow bird flies first, diligence can make up for clumsiness. Good learning will yield rewards, and improvement often happens unconsciously. Just like how I unconsciously transitioned from NLP and KG circles to CV and CG (funny face.jpg).

Recent experiences have given me some insights. Traditional digital image processing methods are still essential, mathematically fancy, but they may also work well in practice, while also having interpretability. When entering computer vision or any other discipline/field, it’s better to be grounded and not to build castles in the air.

CV Sub-Direction Group Chat

52CV has established multiple professional CV group chats, including: object tracking, object detection, semantic segmentation, pose estimation, face recognition and detection, medical image processing, super-resolution, neural architecture search, GAN, reinforcement learning, etc. Scan the code to add CV Jun to join the group,

(Please be sure to specify the relevant direction, such as: object detection )

How to Quickly Improve Yourself in Computer Vision?

For those who like to communicate on QQ, you can add the official QQ group of 52CV: 805388940.

(I may not be online all the time, so if I can’t verify you in time, please forgive me)

How to Quickly Improve Yourself in Computer Vision?

Long press to follow I Love Computer Vision

Leave a Comment