(Welcome to follow the “I Love Computer Vision” public account, a valuable and in-depth account~)

Many people have asked how to get started with CV, this article is a re-release of an old text, it is quite long, translated from a foreign blog, the time is a bit long, but the principles are the same, and it is definitely worth reading!

This article discusses the literature, experts, research groups, blogs in this field in detail from the perspective of a beginner who has just started researching computer vision, emphasizing how to start research, how to choose a direction, how to read papers, implement code, debug code, etc. It also elaborates on how to learn machine learning for research in computer vision. It is a valuable reference for new PhDs, scholars, and developers who want to delve deeper into this field.

Due to the limitations of WeChat public accounts, many hyperlinks in the original text cannot be clicked. Click to read the original text to view the complete article.

Top Conferences and Journals

First-tier top conferences: CVPR, ECCV, ICCV, NIPS, IJCAI
High-reputation second-tier top conference: BMVC
Famous second-tier top conferences: ICIP, ACCV, ICPR, SIGGRAPH
Top journals: PAMI, IJCV
Famous journals: CVIU, IVC
Top conferences listed by Microsoft Academic Research
Ranks from Core
Ranks from Arnetminer
Source lists conference papers from recent years
Journal lists impact factors
Journal scores from EigenFactor

Top Expert Authors

Microsoft Academic authors list
Google Scholar List
HOG feature author Navneet Dalal
Jitendra Malik.
Gary Bradski, founder of OpenCV
David Lowe, inventor of SIFT features
List of vision people (but not necessarily top authors)
Computer Vision: Algorithms and Applications by Richard Szeliski

Top Research Groups

Check them here
Check others here
CMU: Robotics everywhere.
LEAR
ImageLab Group
Machine Vision Laboratory at UWE
ALCOR
Centre for Image Processing and Analysis (CIPA)
ImageMetry
VISILAB
GRIMA – Machine Intelligence Group
Vision and Sensing Research Group – University of Canberra
CAVE – Computer Vision Laboratory at Columbia University
Computational Biomedicine Laboratory (CBL), University of Houston
Vision Lab – University of Antwerp.

Visual Geometry Group, Oxford UK (Andrew Zisserman’s group)
LEAR, Grenoble, France (Cordelia Schmid’s group)
WILLOW, Paris France (Jean Ponce’s group)
CVLAB EPFL, Lausanne Switzerland (Pascal Fua’s group)
Computer vision group ETH, Zurich Switzerland (Luc Van Gool’s group)
UCB (Malik, Darrel, Efros)
UMD (Davis, Chellappa, Jacobs, Aloimonos, Doermann)
UIUC (Forsyth, Hoiem, Ahuja, Lazebnik)
UCSD (Kriegman)
UT-Austin (Aggarwal, Grauman)
Stanford (Fei-Fei Li, Savarese)
USC (Nevatia, Medioni)
Brown (Felzenszwalb, Hays, Sudderth)
NYU (Rob Fergus)
UC-Irvine (Ramanan, Fowlkes)
UNC (Tamara Berg, Alex Berg, Jan-Michael Frahm)
Columbia (Belhumeur, Shree Nayar, Shih-Fu Chang)
Laboratory for Computational Intelligence, University of British Columbia, Vancouver (David Lowe’s group)
Computer Science Department, University of Toronto, Toronto (Deep Learning fame Hilton, Srivastava, Salakhutdinov)
Centre for Vision Research, York University, Toronto

Blogs

Tomasz Malisiewicz blog
The Serious Computer Vision Blog
Research blog of Roman Shapovalov
Computer Vision Talks
Steves Computer Vision Blog
The Computer Vision
Computer Vision Blog
Andy’s Computer Vision and Machine Learning Blog
Computer Vision Models
solem’s vision blog
uncannyvision blog
Blogs on Computer Vision, Machine Vision and Image Processing
All About Computer Vision
Open Computer Vision

CV Industry Labs and Startups

Microsoft and Google
IBM Research
NEC Labs America
Acute3D (Sophia Antipolis, France) was founded in 2011.
Bubbli
ShoppTag
Oculusai
Videosurf (video search)
Willow garage (robotics)
Sportvision (sports broadcast)
Intelli-vision (surveillance)
Gauss Surgical
Adobe’s Advanced Technology Labs
Dolby

How to Start Research

I like to divide computer vision problems into two types
Some research directions involve artificial intelligence-based learning methods, such as image classification, OCR, video tracking, etc.

Most of the papers you can see are in this direction.
Learning means we have a lot of data (e.g., like ImageNet, 1 million images and their labels), and then learn this pattern (e.g., classify characters in images)
For this type of direction, you must learn a lot about machine learning.

Other research directions involve algorithms that do not require learning, such as 3D reconstruction, optical flow computation, panoramic stitching (52CV comment: In fact, many algorithms for 3D reconstruction and optical flow estimation are now based on learning. You can search for keywords on this site to obtain related information.)

Using Textbooks and Courses

A direct approach is to start with books.
Don’t get stuck in books. Remember, you want to start research. Try to understand the basics and do some coding. Keep your eyes on the work that interests you most recently.
Try to find different research visual problems.. see which excites you more.
Then you want to move on to the next stage: “Start from papers”

Starting from Papers

Start with top conferences and journal papers. Other lower-level conferences may have false results and waste your time.

CVPR keeps a list of important conferences and many papers.
Use documents to know what available tracks are.. Wiki can also help.
Use Google Scholar to find reviews on specific issues. Reviews can save a lot of time.
Consider papers from the last 3 years. Assuming we are in 2014, consider 2011, then 2012, then 2013. Don’t start from 2014.
Collect documents that look relevant in their titles. Search them to see if there is source code. Try to start from source code files.

Starting will be difficult as you encounter many terms and tools you don’t know. Be patient. Google them, ask questions on forums like Quora or Stackoverflow.
Try to find a specific research direction (e.g., 3D reconstruction, point clouds, scene understanding, object recognition, large image data, multi-object tracking, image descriptor theory, etc.). Check wiki or conference paper directories for what interests you.
Use conferences to learn about papers in a certain direction or use Google Scholar search.
Follow researchers whose work is more authoritative. Pay attention to highly cited literature.
Prefer to start from research work that has running software, saving you time.
To learn some engineering implementation directions, choose a simple and elegant paper and then implement it. Reproduce the results of the paper. While doing this, many questions will arise, and many times you will have to make some assumptions because not everything is mentioned in the papers you see. There are also many implementation details that are not listed, such as how to implement this efficiently. You will learn about issues such as performance, experiments, etc. Papers such as: Viola Jones face detection, Christophe Lampert Efficient Subwindow Search, or Brian Fulkerson superpixel neighborhoods, etc. Implementing papers with complete code is a very good idea so you can check what issues your implementation has.
For your own research work, try to use existing open-source code instead of starting everything from scratch; don’t reinvent the wheel!
If the paper doesn’t have publicly available code, you can try to contact the authors to see if you can get the code.
If understanding a paper after several attempts is still difficult, switch to another paper. Or change direction. (This is when you are looking for research directions)
This might be useful for you, the best award-winning papers collection.
Graduate seminar courses depend on the papers.

Starting from Code

Starting from code to paper is to begin understanding the problem you are researching from some available code.
Find an open-source library and then try it, such as OpenCV.

There are many good books about OpenCV.
There are also many videos on Youtube:
https://www.youtube.com/playlist?v=MfnEtFAWooQ&list=PLo1wvPF7fMxQ_SXibg1azwBfmTFn02B9O
https://www.youtube.com/playlist?v=xEnPZ78queI&list=PLDqunwM5dbtIbEuXv1rB7OFBoRzEF8GH6

Learn Matlab and use it to write initial solution prototypes (as it often allows for faster prototyping).
Helpful: Join OpenCV yahoo group and read comments & messages.
Choose an interesting toy project and implement it.

Machine Learning

Machine learning is the core algorithm for learning from data.
For computer vision, especially for beginners, at the beginning, you don’t need to learn too much machine learning. You can just use them like a black box.

By the way, this is a challenging field. To become an expert, you need to spend a lot of time.

If you want to grow enough in this field, you need to pay attention to more details.
At the beginning, you only need to learn some basics + recently used algorithms.
Every 4-5 years, some algorithms become popular in the literature.

For example, 3 years ago (before 2012, 52CV comment), SVM was very popular.
Nowadays (2014, 2015), deep learning often performs the best.

Build a foundation in the field:

Complete Andrew NG’s machine learning course on Coursera.

Understand what the recently used algorithms are.

Try to read more about these algorithms.
Try to do some coding. Search for popular tools and use them.
For example for SVM (libsvm), CNN (Caffe)
Either ask some professionals
Or download top conference papers from 2-3 years range in your questions. Browse them and know what learning algorithms they used.
Overall, there should be few people repeating. Pay more attention to them.
Then

Now, you can go back to the previous papers/books and continue reading, and when it comes to ML, you will find the topics easier.
Go deeper

Please refer to Andrew Nn Stanford Machine Learning Course
Other videos and books online
Please refer to Dr. Mostafa’s “Learning From Data” videos.
Learn from Dr. Waleed’s CS395: Pattern Recognition.
Textbook: Pattern Recognition and Machine Learning
Want to know more about how learning happens?
Learn more about the algorithms and the math behind them.

Some Recommended Papers

It’s hard to say what a good paper is. Maybe it’s better to define the problem and serve as a reference.
Top publications in vision
What are the must-read papers in the field of computer vision? What should students read to conduct research in this field?
Very useful university courses

CS395T: Visual Recognition, Fall 2012
CMPT888: Human Activity Recognition Summer 2010
CMPT882: Recognition Problems in Computer Vision, Summer 2009

Gaining Experience

When obtaining a PhD, you usually learn to deal with all these issues.
How do you efficiently and reliably solve all the problems in research? To understand all these problems, you basically have to be a member of a research group for a few years. If you are in a lab focused on object detection, you will have many students around you solving the same problem, and talking with classmates late at night is the only way I know you can gain expertise: communicate more.
How do you debug code and effectively tune parameters? The best practice is to look at the excellent code of more advanced students. Before you start debugging machine learning algorithms, you should be generally familiar with debugging. Debugging machine learning algorithms is not like debugging quicksort. If you fix all the errors, your algorithm may still not work, possibly due to other issues such as lack of data, low model complexity, etc. Frankly, debugging vision/learning algorithms is more like art than science. Tuning parameters of algorithms or software libraries you didn’t write is not easy. You should learn how to properly use validation data, understand how to run the full training/evaluation process, and be prepared for cross-validation.
How do you implement large-scale problems on a personal computer? (For image/video analysis, there may be a lot of data exceeding your memory, how to handle it?) Generally speaking, you won’t implement a large problem on one PC. One of the most valuable skills I learned in graduate school is how to parallel compute in a cluster. Universities/labs without clusters find it hard to compete with universities that have medium to large clusters. This is also one of the reasons many professors join organizations like Google and Facebook — they have the data and computing resources to allow senior researchers to tackle larger and larger problems. If you cannot access a large cluster, I would recommend applying for an internship at places like Google. You will learn a lot there (at least I did). While you cannot take any code you wrote home, you will learn many lessons that will affect your life as a student. If you have to work on one machine, you will have to cut the dataset into smaller chunks and gradually load the chunks into memory.

Materials

Online Videos and Talks

Online courses: Discrete reasoning and artificial vision learning
UCF computer vision video lectures: videos
EGGN 512 – Computer Vision video
Video lectures include many computer vision.
Technical talks for some conferences, such as ICML2011, host most (all) talks in video. Others, like CVPR2011, only have selected videos. This is a good way to learn a lot of recent work without relying on reading reports.
CVPR2010, they hosted many videos for talks. They also have many summer school ML videos.
Wired, IEEE Spectrum, TechCrunch, TED, BigThink, Sixty Symbols, GISCIA, http://www.youtube.com/user/GoogleTechTalks,

Courses

Introduction to Computer Vision (Stanford University; Professor Fei-Fei Li) a fairly standard CV course.
Computer Vision (UIUC; Professor Forsyth) a fairly standard CV course.
Learning-based methods in vision (CMU; Professor Alexei Efros) I learned a lot about texture recognition and some state-of-the-art methods using fancy ML techniques.
Fundamental Object Recognition and Scene Understanding (CMU; Professor Antonio Torralba) This is an ongoing course focusing on higher-level vision. The first lecture looks promising, but I’m not sure what the others will be like.
Machine Vision MIT course
Computer Vision MIT course progress

Computer Vision

Computer Vision: Models, Learning, and Inference – This is a great (free!) preprint that leans heavily towards machine learning. Each section provides background on the models or machine learning tools involved as well as inference methods. The start is a deep overview of necessary probability and machine learning concepts. I just started reading this book, but it’s very useful for getting overviews of parts like model and shape models.
Computer Vision: Algorithms and Applications – Richard Szeliski. A survey book. This is a more traditional textbook that is referenced in many current CV courses, such as the aforementioned one by Fei-Fei Li and my school’s current CV course (JHU).
Multiple View Geometry in Computer Vision – Richard Hartley and Andrew Zisserman
Modern Methods in Computer Vision – David Forsyth and Jean Ponce
Visual Object Recognition: A Comprehensive Lecture on AI and Machine Learning – Kristen Grauman and Bastian Leibe
3D Computer Vision by Trucco and Verri
Digital Image Processing 3rd Edition by Gonzales and Woods
Practical Algorithms for Image Analysis
http://www.computervisiononline.com/books

Computer Vision and Image Processing Coding

Programming Computer Vision with Python – Jan Erik Solem
Learning OpenCV – Gray Bradski and Adrian Kaehler
Fundamentals of Digital Image Processing: Examples in Matlab – Chris Solomon and Toby Breckon

Human Vision

Vision: A Computational Investigation into the Human Representation and Processing of Visual Information – David Marr
Steps towards a theory of visual information: Active perception, signal-symbol transformation, and the interaction between sensing and control – Stefano Soatto
Fundamental Vision: An Introduction to Visual Perception – Robert Snowden, Peter Thompson, and Tom Troscianko
Programming Computer Vision with Python

Others

CV papers are recent computer vision papers from vision conferences.
Summer school on visual recognition and machine learning, Grenoble, 2012
I will take some machine learning courses and some courses on signal processing/time-frequency analysis/wavelet analysis.

Exciting Applications

Never-Ending Image Learning (NEIL)

This is a computer program running 24X7 browsing the internet to extract visual information from internet data. It is supported by Google and the Department of Defense’s Navy Research Office.
It currently identifies objects – object relations, object – property relations, scene – object relations, scene – property relations

Face Detection
Tennis Tracking
Body Pose Estimation with Depth Cameras
3D Scanning Technology showcased by Microsoft, Heads Turn
Color change displays human blood flow
Reconstruct entire cities in 3D using only public Flickr photos
Autonomous objects, such as self-driving cars
Predator Object Tracking
Kinect Fusion – Building real-time 3D models from moving Kinect
Veebot, a robot that collects blood samples
Harp: Detecting laser interruptions to play notes (simple, powerful). Piano.
Google Photo Search
Physical Security
PTAM is an important application in AR
Google Glass
Google Street View: Capturing the world at street level
Word Lens: A language translation application based on augmented reality camera. The mobile camera can recognize text in one language and display it translated into another language. The best thing I found about this application is that the translation is done in real-time without being connected to the internet!
CarSafe: This application uses computer vision and machine learning algorithms to monitor and detect whether the driver is tired or distracted while using two independent cameras to track road conditions. This paper provides some details and results: CarSafe: A Driver Safety Application that uses dual cameras on smartphones to detect dangerous driving behaviors.
iOnRoad: This is a mobile driving assistance system application that uses Qualcomm FastCV mobile-optimized computer vision library. It uses the native camera and sensors of the smartphone to perform various functions. The application has advanced features such as forward collision warning, lane departure warning, head monitoring, and car locator.
Jumio: A real-time credit card scanning and verification application for online and mobile checkouts. They also provide identity verification for passports and licenses in many countries.

Exciting Algorithms

HOG features + linear SVM are very useful for object detection.

Part-based HOG + SVM
Exemplar-based HOG + SVM

RANSAC (RANdom SAmple Consensus) – Simple/Powerful/Robust

High-dimensional data exists within low-dimensional structures.
Optimal random RANSAC
Matching with PROSAC – Progressive Sample Consensus

Hough Transform Algorithm
Approximate Nearest Neighbor Algorithm based on KD Trees
Markov Random Fields
2D image stitching, image mining, 3D reconstruction of texture objects with SIFT algorithm
SURF
Viola-Jones: Face detection
Shape Context
Deformable Part Models
Simultaneous Localization and Mapping

Others

Job Opportunities

CVPR job postings
http://www.computervisiononline.com/jobs
Join LinkedIn and check image processing or computer vision interest groups.
Adobe’s Advanced Technology Labs http://www.adobe.com/technology/ …

Datasets

Click here
Dataset summary
Tracking videos
There are too many online… Google.

Software

My list
http://www.computervisiononline.com/software
http://www.computer-vision-software.com/blog/

Deadlines

Events
Calendar

Useful Websites

Google Scholar

Top publications
Google Scholar can tell you more about researchers.
Google Scholar can tell you more about papers.

Microsoft Academic Research

You can check the top key figures ranked in a field
You can get top conferences and journals in a field
You can know about people’s citations to understand work quality. If someone has 100 and 100 citations, it seems that each work is used by 1 person. On the other hand, if citations are 10000, it means on average 100 works are cited. The second has stronger reference value.

http://www.scopus.com/
http://wokinfo.com/products_tools/analytical/jcr/
http://www.computervisiononline.com
http://www.computervisioncentral.com/
http://computervision.wikia.com

Ad-hocs

ICCV Marr Award
Computer Vision and Business Applications
ImageNet Challenge
PASCAL Challenge
Imageworld for publishing global events and academic job opportunities in the field of computer vision, image analysis, and medical image analysis.
Robotics competitions
What are some computer vision tasks that Deep Learning still cannot solve?
Awesome Computer Vision
Awesome Deep Vision
Emails Digest in Vision

Links

What mathematical knowledge do you need to understand computer vision?

If you want to buy a book that systematically explains computer vision, I recommend the “Computer Vision Tutorial (2nd Edition)” published in 2017 by Professor Zhang Yujin from Tsinghua University.

Professor Zhang is an authority in the field of CV, has been researching in the field of computer vision for many years, published numerous papers domestically and internationally, and published more than 10 monographs and textbooks (believe you must know his book “Image Engineering”), and has a significant influence in the field of computer vision.

Original source:

https://sites.google.com/site/mostafasibrahim/research/articles/how-to-start

(Welcome to follow the “I Love Computer Vision” public account, a valuable and in-depth account~)

Comprehensive Guide to Starting Research in Computer Vision

(Welcome to follow the “I Love Computer Vision” public account, a valuable and in-depth account~)

Many people have asked how to get started with CV, this article is a re-release of an old text, it is quite long, translated from a foreign blog, the time is a bit long, but the principles are the same, and it is definitely worth reading!

Top Conferences and Journals

Top Expert Authors

Top Research Groups

Blogs

CV Industry Labs and Startups

How to Start Research

Using Textbooks and Courses

Starting from Papers

Starting from Code

Machine Learning

Some Recommended Papers

Gaining Experience

Materials

Online Videos and Talks

Courses

Computer Vision

Computer Vision and Image Processing Coding

Human Vision

Others

Exciting Applications

Exciting Algorithms

Others

Job Opportunities

Datasets

Software

Deadlines

Useful Websites

Ad-hocs

Links

Leave a Comment Cancel reply