Beginner’s Guide to Computer Vision

Click on the above “Beginner’s Guide to Vision” to select and add a Star or “Pin

Important content delivered at the first time

Beginner's Guide to Computer Vision

Exploring Through the Eyes of Machines

  • If we want machines to learn to think, we need to teach them how to see their surroundings with vision. – Fei-Fei Li, Director of the Stanford AI Lab and Stanford Vision Lab

The phenomenon of enabling computers or machines like phones to see their surroundings is called computer vision. The research on machines mimicking human eyes can be traced back to the 1950s, and since then, we have come a long way. Computer vision technology has entered our phones through various e-commerce sectors and camera applications. Just think, when machines can see their surroundings as accurately as human eyes, what else can they not do? The human eye is a complex structure that can understand intricate environmental phenomena. Similarly, enabling machines to see things and giving them enough capability to understand what they see, and then further classify it, remains a daunting task. When using computer vision, achieving accuracy almost equivalent to that of the human eye can perform millions of calculations in the blink of an eye. This involves not only converting images into pixels but also attempting to understand the content of images through these pixels. Next, you will first learn how to extract information from these pixels and understand what they represent.

Beginner's Guide to Computer Vision

So, let’s understand how machines see (like human eyes)?

A. Representing Colors Digitally: In computer science, each color is represented by a specific hexadecimal value. Machines understand what colors make up image pixels through this encoding. As humans, we are genetically equipped to distinguish different hues.

Beginner's Guide to Computer Vision

B. Image Segmentation: This allows computers to recognize similar color groups and then segment the image, distinguishing the foreground from the background. Color gradient techniques are used to find the edges of different objects.

Beginner's Guide to Computer Vision

C. Finding Corners: After segmentation, certain features in the image are identified, known as corners. In simple terms, the algorithm searches for lines that intersect at certain angles and covers specific parts of the image with a color shade. Corners (also known as features) are building blocks that help find more detailed information contained in the image.

Beginner's Guide to Computer Vision

D. Finding Textures: Determining the texture within an image is another important factor for correctly identifying it. The differences in texture between two objects make it easier for machines to classify them correctly.

Beginner's Guide to Computer Vision

E. Making Guesses: After performing the above steps, the machine needs to make predictions or inferences close to the correct values and match the images with those in the database.

Beginner's Guide to Computer Vision

F. Finally, Look at the Big Picture! In the end, a machine will see a larger, clearer picture and check if it has correctly identified the image according to the algorithm instructions provided. Over the past few years, accuracy has greatly improved, but when machines are asked to process images with mixed objects, they still make mistakes.

Universities with Computer Vision Research Groups:

Universities in the USA

Carnegie Mellon University Robotics Institute, University of California, Los Angeles, University of North Carolina at Chapel Hill, University of Washington, University of California, Berkeley, Stanford University, Massachusetts Institute of Technology, Cornell University, University of Pennsylvania, University of California, Irvine, Columbia University, University of Illinois at Urbana-Champaign, University of Southern California, University of Michigan, Princeton University, University of Rochester, University of Texas at Austin, University of Maryland, College Park, Brown University, University of Central Florida, New York University, Michigan State University, University of Massachusetts, Amherst, Northwestern University, University of California, San Diego

Universities in Canada:

University of Alberta, University of Toronto, University of British Columbia, Simon Fraser University

Universities in Europe:

INRIA France, University of Oxford, ETH Zurich, Max Planck Institute in Germany, University of Edinburgh, University of Surrey, University of Freiburg, KTH Sweden, Dresden University, Darmstadt University of Technology, EPFL Switzerland, KU Leuven, Barcelona Computer Vision Center, IDIAP Switzerland, Imperial College London, Heidelberg International Airport, University of Manchester, University of Bonn, RWTH Aachen University, University of Amsterdam, Technical University of Munich, Czech Technical University, University of Cambridge, Graz University of Technology, IST Austria, Queen Mary University of London, University of Zurich, Delft University of Technology, University of Leeds, University of Bern, Lund University, University of Trento, University of Florence, University of Stuttgart, Saarland University, CentraleSupélec, Télécom Paris, University of Oulu, Karlsruhe Institute of Technology

If you are a beginner in the field of computer vision, you can find a series of foundational knowledge points you need to understand below.

A. Beginner Level Mathematics:

  • Linear Algebra

    • https://www.khanacademy.org/math/linear-algebra

  • Singular Value Decomposition

    • https://www.youtube.com/watch?v=sJV0QyHoRio

  • Introductory Pattern Recognition

    • https://urlify.cn/ZjUFVr

  • Principal Component Analysis

    • https://www.youtube.com/watch?v=H0HjNuNvFVI

  • Kalman Filtering

    • https://www.youtube.com/watch?v=d0D3VwBh5UQ

  • Fourier Transform

    • https://urlify.cn/MnmeE3

  • Wavelets

    • https://www.youtube.com/watch?v=4fQAlD5wZKA

Image Processing:

  • Duke University offers an online course on Coursera

    • https://www.coursera.org/learn/image-processing

  • Gonzalez and Woods’ Digital Image Processing

    • http://www.imageprocessingplace.com/

B. Advanced Level

  • Linear Discriminant Analysis

    • https://www.youtube.com/watch?v=aSyQqHY4Vqc

  • Probability, Bayes’ Rule, Maximum Likelihood, MAP

    • https://urlify.cn/jyMN7

  • Mixture and Expectation-Maximization Algorithm

    • https://www.youtube.com/watch?v=Q1oqJSgp_Dk

  • Introductory Statistical Learning

    • https://www.coursera.org/specializations/statistics

  • Support Vector Machines

    • https://www.youtube.com/watch?v=_PwhiWxHK8o

  • Genetic Algorithms

    • https://www.youtube.com/watch?v=kHyNqSnzP8Y

  • Hidden Markov Models

    • https://www.youtube.com/watch?v=D_RIe5bd3hk

  • Bayesian Networks

    • https://www.coursera.org/learn/probabilistic-graphical-models

To gain practical knowledge about theories and techniques (especially algorithms), start learning OpenCV from a computer vision perspective:

  • Learn OpenCV: Computer Vision with OpenCV Library (https://urlify.cn/a6jEve)

  • Tombone’s Computer Vision Blog (http://www.computervisionblog.com/)

Tip: When programming in C, C++, or Python, we use the OpenCV library for relevant practical tasks in computer vision. When programming in MATLAB, we use the Computer Vision System Toolbox (https://urlify.cn/2q2YZz). Similarly, if you are programming in other languages, you will also need more open-source libraries.

You should also be aware of key works of scientific research in the field, where you can learn from:

  • SIFT: A classic descriptor for general vision

    • https://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

  • HOG: A well-known descriptor particularly suitable for human detection

  • Viola-Jones: A great face detector

    • https://urlify.cn/NFjQZj

  • Shape Contexts

    • https://urlify.cn/7jMNZ3

  • Deformable Part Models

    • https://urlify.cn/MrYnU3

Essential Reading List Includes:

Beginner Level:

  1. Computer Vision: Algorithms and Applications

    http://szeliski.org/Book/

  2. Computer Vision: Modern Methods by David A. Forsyth, Jean Ponce

  3. https://urlify.cn/vequUv

  4. Multiple View Geometry in Computer Vision. Authors: Richard Hartley, Andrew Zisserman

  5. https://urlify.cn/QzM7b2

Advanced Level – Towards Deep Learning

  1. Online book “Neural Networks and Deep Learning” by Michael Nielsen; it is a great and gentle introduction: Neural Networks and Deep Learning

  2. http://neuralnetworksanddeeplearning.com/

  3. Deep Learning Book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

  4. http://www.deeplearningbook.org/

What happens when machines can sense your emotions? Click the link to watch the video https://youtu.be/QFk3e5PcK7s

TED Talks to Watch:

  1. Fei-Fei Li: How we teach computers to understand pictures

  • https://urlify.cn/yiYzQ3

  • Blaise Agüera and Arcas: How PhotoSynth connects the world’s images

    • https://urlify.cn/Izuyau

  • Chieko Asakawa: How new technologies help the blind explore the world

    • https://urlify.cn/zIFzMz

  • Jennifer Healey: If cars could talk, accidents could be avoided

    • https://urlify.cn/qQnIVz

  • Golan Levin: Art that looks back at you

    • https://www.ted.com/talks/golan_levin_art_that_looks_back_at_you

  • Paul Debevec: Animating a photo-real digital face

    • https://www.ted.com/talks/paul_debevec_animating_a_photo_real_digital_face

  • Golan Levin: Software as art

    • https://www.ted.com/talks/golan_levin_software_as_art

    Online Courses:

    Beginner Level:

    • Udacity: Introduction to Computer Vision

      • https://www.udacity.com/course/introduction-to-computer-vision–ud810

    • Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition

      • http://cs231n.stanford.edu/

    • University of Central Florida – Video Lectures by Professor Mubarak Shah

      • https://www.crcv.ucf.edu/videos/lectures/2014.php

    • From the concepts and algorithms obtained from the above resources, you can tackle some tasks and complete a project on your own.

    Advanced Level – Towards Deep Learning

    • Geoff Hinton’s Neural Networks lectures on Coursera

      • https://www.coursera.org/learn/neural-networks

    • Stanford Course: Deep Learning for Natural Language Processing

      • http://cs224d.stanford.edu/

    • Stanford University Course: Convolutional Neural Networks for Visual Recognition

      • http://cs231n.stanford.edu/

    Lecture Courses:

    • Deep Learning in Computer Vision (Professor Sanja Fidler)

      • http://www.cs.utoronto.ca/~fidler/teaching/2015/CSC2523.html

    • Advanced Computer Vision (Professor James Hays)

      • http://www.cc.gatech.edu/~hays/7476/

    Global Projects

    Beginner's Guide to Computer Vision

    a. Microsoft computer scientists and researchers are working to “solve” cancer

    • https://news.microsoft.com/stories/computingcancer/ b.Tokyo Project – Providing AI-based application prototypes to enhance the awareness of social, physical, and textual environments for the blind or visually impaired.

    • https://www.microsoft.com/en-us/research/project/project-tokyo/ C.Teaching machines to predict the future

    • http://news.mit.edu/2016/teaching-machines-to-predict-the-future-0621

      Beginner's Guide to Computer Vision

      The leftmost column shows the frame before the operation begins, below it is the algorithm’s prediction. The right column shows the next frame of the video.

    Another way to familiarize yourself with the ongoing research in the field of computer vision is to follow authors and read their papers at top conferences like CVPR, ICCV, ECCV, and BMVC.

    Conversations with Experts

    The following excerpts from conversations are with two experts who are passionate about the field of computer vision.

    Conversation with Professor Devi Parikh | Visiting Researcher at Facebook AI Research | Assistant Professor at Georgia Tech (formerly Virginia Tech)

    Beginner's Guide to Computer Vision

    Computer vision is a subfield of artificial intelligence aimed at building intelligent computers that can replicate human brain vision. Machine learning is the general term for teaching machines to learn, but computer vision specifically deals with visual data. In machine learning, we use statistical tools more, while computer vision uses both statistical and non-statistical tools. For example, the frequency of using machine learning tools in 3D reconstruction tasks in the field of computer vision is lower than in techniques like image classification and object recognition. Many computer vision tasks have their own requirements, and we develop specific machine learning tools for them. For any student wanting to start learning in this field, I recommend choosing a research question they are interested in through researchers’ web pages. Most of the time, people are researching cutting-edge questions, and these can be found on that webpage with available standard datasets. They can choose a research question, a dataset, and a library they might want to use, and then get started. For students pursuing a master’s or Ph.D., I typically look for those who are responsible, motivated, and determined to be my students. To clarify your basic concepts, try reading research papers to understand the cutting-edge AI questions researchers around the world are studying.

    Conversation with Richa Agrawal | University of Pennsylvania Alumna | Computer Vision Research Engineer at Whodat

    Beginner's Guide to Computer Vision

    I graduated from MNIT Jaipur, where I connected with the Robotics research group, and we collaborated on some projects and participated in a national competition at IIT Roorkee, winning the competition. This experience greatly inspired me. After completing my bachelor’s degree, I started working at Yahoo, but I realized that this was not what I wanted to do, so I went to the University of Pennsylvania for my master’s degree. During my master’s, I explored different research areas through various courses and ultimately decided to focus on computer vision as my main research direction. After graduation, I worked at a startup in the US and wanted to find job opportunities in computer vision in India. At Whodat (a Bangalore-based computer vision startup), we research and process images using augmented reality and visualization techniques. For example, when you plan to buy furniture for your home, you choose a furniture store based on the layout of your home, but after the furniture is delivered, there is often a problem of it being too big or too small. Currently, there is no technology to solve this issue. We are trying to build a solution that allows you to visualize store furniture in your home. This will enable you to make better decisions and easily purchase items. During my studies, there were many times when I couldn’t give my best and often felt frustrated, but later a friend’s advice came. He told me, “Only a few people (less than 0.1%) can do this (study abroad for a master’s degree and have such research in fields like computer vision), and you are one of them. If you double your efforts, you can do things that others cannot.”

    Some advice for students starting out is to participate in competitions and hackathons after talking to peers from other colleges. It is important to find your interests rather than working in places you do not like. For example, computer vision is a vast field in India, with ample development space. All you need is a camera that has already begun to penetrate into smaller cities. Therefore, the future of computer vision is definitely bright.

    Original link: https://medium.com/readers-writers-digest/beginners-guide-to-co24b720

    Good news!
    The Beginner's Guide to Vision Knowledge Planet
    is now open to the public👇👇👇
    
    
    
    
    Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
    Reply "Chinese Tutorial for Extension Modules" in the background of the "Beginner's Guide to Vision" public account to download the first Chinese version of the OpenCV extension module tutorial on the internet, covering installation of extension modules, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters.
    
    Download 2: Python Vision Practical Project 52 Lectures
    Reply "Python Vision Practical Project" in the background of the "Beginner's Guide to Vision" public account to download 31 practical vision projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, and facial recognition, to help quickly learn computer vision.
    
    Download 3: OpenCV Practical Projects 20 Lectures
    Reply "OpenCV Practical Projects 20 Lectures" in the background of the "Beginner's Guide to Vision" public account to download 20 practical projects based on OpenCV to advance OpenCV learning.
    
    Group Chat
    
    Welcome to join the public account reader group to communicate with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided in the future). Please scan the WeChat number below to join the group, and note: “Nickname + School/Company + Research Direction”, for example: “Zhang San + Shanghai Jiao Tong University + Visual SLAM”. Please follow the format; otherwise, you will not be approved. After successful addition, you will be invited to enter the relevant WeChat group based on your research direction. Please do not post advertisements in the group; otherwise, you will be removed from the group. Thank you for your understanding~
    
    
    

    Leave a Comment