How to Quickly Improve Yourself in Computer Vision?

Click on the “Computer Vision Life” above and select “Star”

Quickly obtain the latest insights

Author: I am not good at this, compiled by I Love Computer Vision

Link:

https://www.zhihu.com/question/337889115/answer/770797118

First, a basic knowledge of machine learning is essential, as traditional methods involve manual features + machine learning approaches. Winning the ImageNet visual challenge before 2012, here are some recommended materials, although they are quite common.

Such as Professor Zhou Zhihua’s “Machine Learning”, Professor Li Hang’s “Statistical Learning Methods”, “Machine Learning in Practice”, Professor Andrew Ng’s cs229, and Professor Li Hongyi’s machine learning videos (available on Bilibili). In addition, basic knowledge of image processing is also indispensable, such as Gonzalez’s “Digital Image Processing”.

Next, we have deep learning. After 2012, deep learning methods dominated the field of computer vision. One must know classic networks such as LeNet, AlexNet, VGG, GoogLeNet, ResNet, DenseNet, SENet, and some lightweight networks like ShuffleNet and MobileNet series. I will also update related paper interpretations in my column later.

In more detailed tasks, there are some classic tasks. The aforementioned networks are mainly classification networks used for classification, such as outputting whether an image is of a cat or a dog. However, there are many more complex problems, such as images containing both cats and dogs. What should the network output in this case? This has led to the emergence of classic tasks such as detection tasks and segmentation tasks.

The detection task actually outputs the location and probability of the object, with the location being defined by a bounding box around the target object. The development of object detection is roughly as follows (my level is limited, and there may be errors, welcome criticism and correction). In an image, candidate boxes are densely generated, and features are extracted + machine learning classifiers. Here arises the question of how to generate candidate boxes, such as sliding window methods, random search methods, etc., which have been continuously developed.

In 2014, a very famous paper appeared—RCNN, which applied deep learning to object detection, but only used CNN as a feature extractor. I won’t elaborate on this here. The RCNN series (RCNN, Fast RCNN, Faster RCNN, Mask RCNN), the YOLO series (YOLO v1, YOLO v2, YOLO v3), and SSD have emerged since last year, along with some anchor-free works, such as CornerNet, which proposed the idea of converting the object detection problem into a keypoint detection problem, leading to a series of keypoint-based object detection works like CenterNet and ExtremeNet.

Recently, Google has been working on NAS and object detection, as well as using reinforcement learning to select data augmentation strategies (I will replicate the data augmentation strategy from that paper soon).

Segmentation tasks are pixel-level classification problems, which also have traditional methods and deep learning methods. A classic deep learning method is FCN, which should output the same size as the input (where size refers to width and height).

The object tracking task has some similarities to the detection task, but also some differences. The detection effect for each frame is quite similar to tracking, but object detection usually detects known categories, while tracking can follow the content from the first frame. There are some traditional methods, correlation filter methods, and deep learning methods, such as the twin network series of papers. Further derived from this are Re-ID tasks.

In addition, there are many other computer vision tasks, such as super-resolution, 3D reconstruction (which also involves computer graphics, etc.). Due to space limitations, I won’t introduce the basic developments of these tasks one by one. You can find some papers to study on your own, noting not only recent articles but also those from the last century. It’s best to write a review after reading dozens or hundreds of articles and reviewing the code.

There are some courses and books on computer vision and deep learning, such as Professor Li Mu’s “Hands-on Learning Deep Learning”, which has courses and Jupyter notebooks, very good, wish I had encountered it earlier, as well as the “Deep Learning” book, “Computer Vision: Algorithms and Applications”, “Computer Vision: A Modern Approach”, and “Computer Vision: Models, Learning and Inference”.

For courses, you can check out cs231n, a very classic computer vision course, and cs224d. Although this course is for NLP, RNNs are very helpful for time series modeling, so you can pay attention to it as well.

Recently needed to focus on GNN paper lists, using graph ideas to address some computer vision problems, as well as tasks that combine CV and NLP, such as VQA, captioning, etc. There is a paper to refer to: Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods.

Moreover, coding ability is also very necessary.

Whether traditional digital image processing or graphics, I have recently been looking at some graphics algorithms and plan to implement them one by one. Whether deep learning methods or others, a certain level of coding ability is required. It’s good to regularly look at the code written by others and learn from it. Recently, I replicated an ECCV 2018 paper HairNet, which is much more engineered compared to my previous code, making it easier to use on other devices.

Traditional digital image processing often uses MATLAB, graphics often uses C++, and deep learning codes are mainly based on Python, along with some deep learning frameworks such as PyTorch, MXNet, TensorFlow, Caffe, Darknet, etc. Currently, these are used to varying degrees. I recommend the first two; PyTorch is currently a more mainstream framework, and MXNet can be learned based on Professor Li Mu’s book, which is very helpful for understanding the underlying code. I understood how the dataloader is implemented through that code.

The word “quickly” is hard for me to define; it varies from person to person. A slow bird flies first, and diligence can make up for clumsiness. Good learning will yield rewards, and improvement happens unknowingly, just like how I inadvertently transitioned from the NLP and KG circle to CV and CG (funny face.jpg).

Recent experiences have given me some feelings; traditional digital image processing methods are still essential. Mathematically fancy, but they may also work well in practice, while also possessing interpretability and other characteristics. When entering the field of computer vision or any other discipline/field, it is better to be grounded and not build castles in the air.

Group Chat

Welcome to join the reader group of the public account to communicate with peers. Currently, there are WeChat groups for SLAM, algorithm competitions, image detection and segmentation, face and human detection, medical imaging, autonomous driving, etc. (which will gradually be subdivided later). Please scan the WeChat ID below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format when noting, otherwise, you will not be approved. After successful addition, you will be invited to the relevant WeChat group based on your research direction. Please do not send advertisements in the group, or you will be removed. Thank you for your understanding~

Efficiently connectAI fieldproject cooperation, consulting services,internships, job hunting, recruitment needs, backed by 250,000 public account fans, looking forward to establishing connections with you, making it no longer difficult to find people and technology!

How to Quickly Improve Yourself in Computer Vision?

Group Chat

Recommended Reading

Leave a Comment Cancel reply