Comprehensive Guide to Starting Research in Computer Vision

(Welcome to follow the “I Love Computer Vision” public account, a valuable and in-depth account~)
Many people have asked how to get started with CV, this article is a re-release of an old text, it is quite long, translated from a foreign blog, the time is a bit long, but the principles are the same, and it is definitely worth reading!
This article discusses the literature, experts, research groups, blogs in this field in detail from the perspective of a beginner who has just started researching computer vision, emphasizing how to start research, how to choose a direction, how to read papers, implement code, debug code, etc. It also elaborates on how to learn machine learning for research in computer vision. It is a valuable reference for new PhDs, scholars, and developers who want to delve deeper into this field.

Due to the limitations of WeChat public accounts, many hyperlinks in the original text cannot be clicked. Click to read the original text to view the complete article.

Top Conferences and Journals

  • First-tier top conferences: CVPR, ECCV, ICCV, NIPS, IJCAI

  • High-reputation second-tier top conference: BMVC

  • Famous second-tier top conferences: ICIP, ACCV, ICPR, SIGGRAPH

  • Top journals: PAMI, IJCV

  • Famous journals: CVIU, IVC

  • Top conferences listed by Microsoft Academic Research

  • Ranks from Core

  • Ranks from Arnetminer

  • Source lists conference papers from recent years

  • Journal lists impact factors

  • Journal scores from EigenFactor

Top Expert Authors

  • Microsoft Academic authors list

  • Google Scholar List

  • HOG feature author Navneet Dalal

  • Jitendra Malik.

  • Gary Bradski, founder of OpenCV

  • David Lowe, inventor of SIFT features

  • List of vision people (but not necessarily top authors)

  • Computer Vision: Algorithms and Applications by Richard Szeliski

Top Research Groups

  • Check them here

  • Check others here

  • CMU: Robotics everywhere.

  • LEAR

  • ImageLab Group

  • Machine Vision Laboratory at UWE

  • ALCOR

  • Centre for Image Processing and Analysis (CIPA)

  • ImageMetry

  • VISILAB

  • GRIMA – Machine Intelligence Group

  • Vision and Sensing Research Group – University of Canberra

  • CAVE – Computer Vision Laboratory at Columbia University

  • Computational Biomedicine Laboratory (CBL), University of Houston

  • Vision Lab – University of Antwerp.

  • Visual Geometry Group, Oxford UK (Andrew Zisserman’s group)

  • LEAR, Grenoble, France (Cordelia Schmid’s group)

  • WILLOW, Paris France (Jean Ponce’s group)

  • CVLAB EPFL, Lausanne Switzerland (Pascal Fua’s group)

  • Computer vision group ETH, Zurich Switzerland (Luc Van Gool’s group)

  • UCB (Malik, Darrel, Efros)

  • UMD (Davis, Chellappa, Jacobs, Aloimonos, Doermann)

  • UIUC (Forsyth, Hoiem, Ahuja, Lazebnik)

  • UCSD (Kriegman)

  • UT-Austin (Aggarwal, Grauman)

  • Stanford (Fei-Fei Li, Savarese)

  • USC (Nevatia, Medioni)

  • Brown (Felzenszwalb, Hays, Sudderth)

  • NYU (Rob Fergus)

  • UC-Irvine (Ramanan, Fowlkes)

  • UNC (Tamara Berg, Alex Berg, Jan-Michael Frahm)

  • Columbia (Belhumeur, Shree Nayar, Shih-Fu Chang)

  • Laboratory for Computational Intelligence, University of British Columbia, Vancouver (David Lowe’s group)

  • Computer Science Department, University of Toronto, Toronto (Deep Learning fame Hilton, Srivastava, Salakhutdinov)

  • Centre for Vision Research, York University, Toronto

Blogs

  • Tomasz Malisiewicz blog

  • The Serious Computer Vision Blog

  • Research blog of Roman Shapovalov

  • Computer Vision Talks

  • Steves Computer Vision Blog

  • The Computer Vision

  • Computer Vision Blog

  • Andy’s Computer Vision and Machine Learning Blog

  • Computer Vision Models

  • solem’s vision blog

  • uncannyvision blog

  • Blogs on Computer Vision, Machine Vision and Image Processing

  • All About Computer Vision

  • Open Computer Vision

CV Industry Labs and Startups

  • Microsoft and Google

  • IBM Research

  • NEC Labs America

  • Acute3D (Sophia Antipolis, France) was founded in 2011.

  • Bubbli

  • ShoppTag

  • Oculusai

  • Videosurf (video search)

  • Willow garage (robotics)

  • Sportvision (sports broadcast)

  • Intelli-vision (surveillance)

  • Gauss Surgical

  • Adobe’s Advanced Technology Labs

  • Dolby

How to Start Research

  • I like to divide computer vision problems into two types

  • Some research directions involve artificial intelligence-based learning methods, such as image classification, OCR, video tracking, etc.

    • Most of the papers you can see are in this direction.

    • Learning means we have a lot of data (e.g., like ImageNet, 1 million images and their labels), and then learn this pattern (e.g., classify characters in images)

    • For this type of direction, you must learn a lot about machine learning.

  • Other research directions involve algorithms that do not require learning, such as 3D reconstruction, optical flow computation, panoramic stitching (52CV comment: In fact, many algorithms for 3D reconstruction and optical flow estimation are now based on learning. You can search for keywords on this site to obtain related information.)

Using Textbooks and Courses

  • A direct approach is to start with books.

  • Don’t get stuck in books. Remember, you want to start research. Try to understand the basics and do some coding. Keep your eyes on the work that interests you most recently.

  • Try to find different research visual problems.. see which excites you more.

  • Then you want to move on to the next stage: “Start from papers”

Starting from Papers

  • Start with top conferences and journal papers. Other lower-level conferences may have false results and waste your time.

    • CVPR keeps a list of important conferences and many papers.

    • Use documents to know what available tracks are.. Wiki can also help.

    • Use Google Scholar to find reviews on specific issues. Reviews can save a lot of time.

    • Consider papers from the last 3 years. Assuming we are in 2014, consider 2011, then 2012, then 2013. Don’t start from 2014.

    • Collect documents that look relevant in their titles. Search them to see if there is source code. Try to start from source code files.

  • Starting will be difficult as you encounter many terms and tools you don’t know. Be patient. Google them, ask questions on forums like Quora or Stackoverflow.

  • Try to find a specific research direction (e.g., 3D reconstruction, point clouds, scene understanding, object recognition, large image data, multi-object tracking, image descriptor theory, etc.). Check wiki or conference paper directories for what interests you.

  • Use conferences to learn about papers in a certain direction or use Google Scholar search.

  • Follow researchers whose work is more authoritative. Pay attention to highly cited literature.

  • Prefer to start from research work that has running software, saving you time.

  • To learn some engineering implementation directions, choose a simple and elegant paper and then implement it. Reproduce the results of the paper. While doing this, many questions will arise, and many times you will have to make some assumptions because not everything is mentioned in the papers you see. There are also many implementation details that are not listed, such as how to implement this efficiently. You will learn about issues such as performance, experiments, etc. Papers such as: Viola Jones face detection, Christophe Lampert Efficient Subwindow Search, or Brian Fulkerson superpixel neighborhoods, etc. Implementing papers with complete code is a very good idea so you can check what issues your implementation has.

  • For your own research work, try to use existing open-source code instead of starting everything from scratch; don’t reinvent the wheel!

  • If the paper doesn’t have publicly available code, you can try to contact the authors to see if you can get the code.

  • If understanding a paper after several attempts is still difficult, switch to another paper. Or change direction. (This is when you are looking for research directions)

  • This might be useful for you, the best award-winning papers collection.

  • Graduate seminar courses depend on the papers.

Starting from Code

  • Starting from code to paper is to begin understanding the problem you are researching from some available code.

  • Find an open-source library and then try it, such as OpenCV.

    • There are many good books about OpenCV.

    • There are also many videos on Youtube:

    • https://www.youtube.com/playlist?v=MfnEtFAWooQ&list=PLo1wvPF7fMxQ_SXibg1azwBfmTFn02B9O

    • https://www.youtube.com/playlist?v=xEnPZ78queI&list=PLDqunwM5dbtIbEuXv1rB7OFBoRzEF8GH6

  • Learn Matlab and use it to write initial solution prototypes (as it often allows for faster prototyping).

  • Helpful: Join OpenCV yahoo group and read comments & messages.

  • Choose an interesting toy project and implement it.

Machine Learning

  • Machine learning is the core algorithm for learning from data.

  • For computer vision, especially for beginners, at the beginning, you don’t need to learn too much machine learning. You can just use them like a black box.

    • By the way, this is a challenging field. To become an expert, you need to spend a lot of time.

  • If you want to grow enough in this field, you need to pay attention to more details.

  • At the beginning, you only need to learn some basics + recently used algorithms.

  • Every 4-5 years, some algorithms become popular in the literature.

    • For example, 3 years ago (before 2012, 52CV comment), SVM was very popular.

    • Nowadays (2014, 2015), deep learning often performs the best.

  • Build a foundation in the field:

    • Complete Andrew NG’s machine learning course on Coursera.

  • Understand what the recently used algorithms are.

    • Try to read more about these algorithms.

    • Try to do some coding. Search for popular tools and use them.

    • For example for SVM (libsvm), CNN (Caffe)

    • Either ask some professionals

    • Or download top conference papers from 2-3 years range in your questions. Browse them and know what learning algorithms they used.

    • Overall, there should be few people repeating. Pay more attention to them.

    • Then

  • Now, you can go back to the previous papers/books and continue reading, and when it comes to ML, you will find the topics easier.

  • Go deeper

    • Please refer to Andrew Nn Stanford Machine Learning Course

    • Other videos and books online

    • Please refer to Dr. Mostafa’s “Learning From Data” videos.

    • Learn from Dr. Waleed’s CS395: Pattern Recognition.

    • Textbook: Pattern Recognition and Machine Learning

    • Want to know more about how learning happens?

    • Learn more about the algorithms and the math behind them.

Some Recommended Papers

  • It’s hard to say what a good paper is. Maybe it’s better to define the problem and serve as a reference.

  • Top publications in vision

  • What are the must-read papers in the field of computer vision? What should students read to conduct research in this field?

  • Very useful university courses

    • CS395T: Visual Recognition, Fall 2012

    • CMPT888: Human Activity Recognition Summer 2010

    • CMPT882: Recognition Problems in Computer Vision, Summer 2009

Gaining Experience

  • When obtaining a PhD, you usually learn to deal with all these issues.

  • How do you efficiently and reliably solve all the problems in research? To understand all these problems, you basically have to be a member of a research group for a few years. If you are in a lab focused on object detection, you will have many students around you solving the same problem, and talking with classmates late at night is the only way I know you can gain expertise: communicate more.

  • How do you debug code and effectively tune parameters? The best practice is to look at the excellent code of more advanced students. Before you start debugging machine learning algorithms, you should be generally familiar with debugging. Debugging machine learning algorithms is not like debugging quicksort. If you fix all the errors, your algorithm may still not work, possibly due to other issues such as lack of data, low model complexity, etc. Frankly, debugging vision/learning algorithms is more like art than science. Tuning parameters of algorithms or software libraries you didn’t write is not easy. You should learn how to properly use validation data, understand how to run the full training/evaluation process, and be prepared for cross-validation.

  • How do you implement large-scale problems on a personal computer? (For image/video analysis, there may be a lot of data exceeding your memory, how to handle it?) Generally speaking, you won’t implement a large problem on one PC. One of the most valuable skills I learned in graduate school is how to parallel compute in a cluster. Universities/labs without clusters find it hard to compete with universities that have medium to large clusters. This is also one of the reasons many professors join organizations like Google and Facebook — they have the data and computing resources to allow senior researchers to tackle larger and larger problems. If you cannot access a large cluster, I would recommend applying for an internship at places like Google. You will learn a lot there (at least I did). While you cannot take any code you wrote home, you will learn many lessons that will affect your life as a student. If you have to work on one machine, you will have to cut the dataset into smaller chunks and gradually load the chunks into memory.

Materials

Online Videos and Talks

  • Online courses: Discrete reasoning and artificial vision learning

  • UCF computer vision video lectures: videos

  • EGGN 512 – Computer Vision video

  • Video lectures include many computer vision.

  • Technical talks for some conferences, such as ICML2011, host most (all) talks in video. Others, like CVPR2011, only have selected videos. This is a good way to learn a lot of recent work without relying on reading reports.

  • CVPR2010, they hosted many videos for talks. They also have many summer school ML videos.

  • Wired, IEEE Spectrum, TechCrunch, TED, BigThink, Sixty Symbols, GISCIA, http://www.youtube.com/user/GoogleTechTalks,

Courses

  • Introduction to Computer Vision (Stanford University; Professor Fei-Fei Li) a fairly standard CV course.

  • Computer Vision (UIUC; Professor Forsyth) a fairly standard CV course.

  • Learning-based methods in vision (CMU; Professor Alexei Efros) I learned a lot about texture recognition and some state-of-the-art methods using fancy ML techniques.

  • Fundamental Object Recognition and Scene Understanding (CMU; Professor Antonio Torralba) This is an ongoing course focusing on higher-level vision. The first lecture looks promising, but I’m not sure what the others will be like.

  • Machine Vision MIT course

  • Computer Vision MIT course progress

Computer Vision

  • Computer Vision: Models, Learning, and Inference – This is a great (free!) preprint that leans heavily towards machine learning. Each section provides background on the models or machine learning tools involved as well as inference methods. The start is a deep overview of necessary probability and machine learning concepts. I just started reading this book, but it’s very useful for getting overviews of parts like model and shape models.

  • Computer Vision: Algorithms and Applications – Richard Szeliski. A survey book. This is a more traditional textbook that is referenced in many current CV courses, such as the aforementioned one by Fei-Fei Li and my school’s current CV course (JHU).

  • Multiple View Geometry in Computer Vision – Richard Hartley and Andrew Zisserman

  • Modern Methods in Computer Vision – David Forsyth and Jean Ponce

  • Visual Object Recognition: A Comprehensive Lecture on AI and Machine Learning – Kristen Grauman and Bastian Leibe

  • 3D Computer Vision by Trucco and Verri

  • Digital Image Processing 3rd Edition by Gonzales and Woods

  • Practical Algorithms for Image Analysis

  • http://www.computervisiononline.com/books

Computer Vision and Image Processing Coding

  • Programming Computer Vision with Python – Jan Erik Solem

  • Learning OpenCV – Gray Bradski and Adrian Kaehler

  • Fundamentals of Digital Image Processing: Examples in Matlab – Chris Solomon and Toby Breckon

Human Vision

  • Vision: A Computational Investigation into the Human Representation and Processing of Visual Information – David Marr

  • Steps towards a theory of visual information: Active perception, signal-symbol transformation, and the interaction between sensing and control – Stefano Soatto

  • Fundamental Vision: An Introduction to Visual Perception – Robert Snowden, Peter Thompson, and Tom Troscianko

  • Programming Computer Vision with Python

Others

  • CV papers are recent computer vision papers from vision conferences.

  • Summer school on visual recognition and machine learning, Grenoble, 2012

  • I will take some machine learning courses and some courses on signal processing/time-frequency analysis/wavelet analysis.

Exciting Applications

  • Never-Ending Image Learning (NEIL)

    • This is a computer program running 24X7 browsing the internet to extract visual information from internet data. It is supported by Google and the Department of Defense’s Navy Research Office.

    • It currently identifies objects – object relations, object – property relations, scene – object relations, scene – property relations

  • Face Detection

  • Tennis Tracking

  • Body Pose Estimation with Depth Cameras

  • 3D Scanning Technology showcased by Microsoft, Heads Turn

  • Color change displays human blood flow

  • Reconstruct entire cities in 3D using only public Flickr photos

  • Autonomous objects, such as self-driving cars

  • Predator Object Tracking

  • Kinect Fusion – Building real-time 3D models from moving Kinect

  • Veebot, a robot that collects blood samples

  • Harp: Detecting laser interruptions to play notes (simple, powerful). Piano.

  • Google Photo Search

  • Physical Security

  • PTAM is an important application in AR

  • Google Glass

  • Google Street View: Capturing the world at street level

  • Word Lens: A language translation application based on augmented reality camera. The mobile camera can recognize text in one language and display it translated into another language. The best thing I found about this application is that the translation is done in real-time without being connected to the internet!

  • CarSafe: This application uses computer vision and machine learning algorithms to monitor and detect whether the driver is tired or distracted while using two independent cameras to track road conditions. This paper provides some details and results: CarSafe: A Driver Safety Application that uses dual cameras on smartphones to detect dangerous driving behaviors.

  • iOnRoad: This is a mobile driving assistance system application that uses Qualcomm FastCV mobile-optimized computer vision library. It uses the native camera and sensors of the smartphone to perform various functions. The application has advanced features such as forward collision warning, lane departure warning, head monitoring, and car locator.

  • Jumio: A real-time credit card scanning and verification application for online and mobile checkouts. They also provide identity verification for passports and licenses in many countries.

Exciting Algorithms

  • HOG features + linear SVM are very useful for object detection.

    • Part-based HOG + SVM

    • Exemplar-based HOG + SVM

  • RANSAC (RANdom SAmple Consensus) – Simple/Powerful/Robust

    • High-dimensional data exists within low-dimensional structures.

    • Optimal random RANSAC

    • Matching with PROSAC – Progressive Sample Consensus

  • Hough Transform Algorithm

  • Approximate Nearest Neighbor Algorithm based on KD Trees

  • Markov Random Fields

  • 2D image stitching, image mining, 3D reconstruction of texture objects with SIFT algorithm

  • SURF

  • Viola-Jones: Face detection

  • Shape Context

  • Deformable Part Models

  • Simultaneous Localization and Mapping

Others

Job Opportunities

  • CVPR job postings

  • http://www.computervisiononline.com/jobs

  • Join LinkedIn and check image processing or computer vision interest groups.

  • Adobe’s Advanced Technology Labs http://www.adobe.com/technology/ …

Datasets

  • Click here

  • Dataset summary

  • Tracking videos

  • There are too many online… Google.

Software

  • My list

  • http://www.computervisiononline.com/software

  • http://www.computer-vision-software.com/blog/

Deadlines

  • Events

  • Calendar

Useful Websites

  • Google Scholar

    • Top publications

    • Google Scholar can tell you more about researchers.

    • Google Scholar can tell you more about papers.

  • Microsoft Academic Research

    • You can check the top key figures ranked in a field

    • You can get top conferences and journals in a field

    • You can know about people’s citations to understand work quality. If someone has 100 and 100 citations, it seems that each work is used by 1 person. On the other hand, if citations are 10000, it means on average 100 works are cited. The second has stronger reference value.

  • http://www.scopus.com/

  • http://wokinfo.com/products_tools/analytical/jcr/

  • http://www.computervisiononline.com

  • http://www.computervisioncentral.com/

  • http://computervision.wikia.com

Ad-hocs

  • ICCV Marr Award

  • Computer Vision and Business Applications

  • ImageNet Challenge

  • PASCAL Challenge

  • Imageworld for publishing global events and academic job opportunities in the field of computer vision, image analysis, and medical image analysis.

  • Robotics competitions

  • What are some computer vision tasks that Deep Learning still cannot solve?

  • Awesome Computer Vision

  • Awesome Deep Vision

  • Emails Digest in Vision

Links

  • What mathematical knowledge do you need to understand computer vision?

If you want to buy a book that systematically explains computer vision, I recommend the “Computer Vision Tutorial (2nd Edition)” published in 2017 by Professor Zhang Yujin from Tsinghua University.

Professor Zhang is an authority in the field of CV, has been researching in the field of computer vision for many years, published numerous papers domestically and internationally, and published more than 10 monographs and textbooks (believe you must know his book “Image Engineering”), and has a significant influence in the field of computer vision.

Original source:

https://sites.google.com/site/mostafasibrahim/research/articles/how-to-start

(Welcome to follow the “I Love Computer Vision” public account, a valuable and in-depth account~)

Comprehensive Guide to Starting Research in Computer Vision

Leave a Comment