Understanding Machine Learning: Concepts, Schools, and Algorithms

Source: Machine Learning Grocery Store Deep Learning Go

This article is about 3500 words, and it is recommended to read it in 7 minutes.
This article introduces the basic concepts, schools, and common algorithms of machine learning.

1. Overview of Machine Learning

1. What is Machine Learning?

Machines learn by analyzing large amounts of data. For example, they can recognize cats or faces without programming by being trained with images to generalize and identify specific targets.

2. The Relationship Between Machine Learning and Artificial Intelligence

Machine learning is a field focused on finding patterns in data and using these patterns to make predictions. It is a part of artificial intelligence and intersects with knowledge discovery and data mining.

3. How Machine Learning Works

① Select Data: Divide your data into three groups: training data, validation data, and test data.

② Model Data: Use training data to build a model using relevant features.

③ Validate Model: Use your validation data to connect to your model.

④ Test Model: Use your test data to check the performance of the validated model.

⑤ Use Model: Use the fully trained model to make predictions on new data.

⑥ Tune Model: Improve the performance of the algorithm using more data, different features, or adjusted parameters.

4. The Position of Machine Learning

① Traditional Programming: Software engineers write programs to solve problems. Data exists first → to solve a problem, software engineers write a process to tell the machine how to act → the computer follows this process and produces results;

② Statistics: Analysts compare the relationships between variables;

③ Machine Learning: Data scientists use training datasets to teach computers how to act, and then the system performs the task. Large data exists first → machines learn to classify using training datasets, adjusting specific algorithms to achieve target classifications → the computer can learn to recognize relationships, trends, and patterns in the data;

④ Intelligent Applications: Intelligent applications use results obtained from artificial intelligence, such as the diagram showing a precision agriculture application based on data collected by drones.

5. Practical Applications of Machine Learning

Machine learning has many application scenarios, here are some examples, how would you use it?

Rapid 3D mapping and modeling: To build a railway bridge, PwC’s data scientists and domain experts applied machine learning to data collected by drones. This combination achieved precise monitoring and rapid feedback in successful work.
Enhanced analysis to reduce risks: To detect internal trading, PwC combined machine learning with other analytical techniques to develop a more comprehensive user profile and gained deeper insights into complex suspicious behaviors.
Predicting the best-performing targets: PwC used machine learning and other analytical methods to assess the potential of different racehorses in the Melbourne Cup.

2. The Evolution of Machine Learning

For decades, various “tribes” of artificial intelligence researchers have been competing for dominance. Now is it time for these tribes to unite? They may have to do so because collaboration and algorithm fusion are the only ways to achieve true Artificial General Intelligence (AGI). Here is the evolutionary path of machine learning methods and what the future may look like.

1. Five Major Schools

① Symbolism: Uses symbols, rules, and logic to represent knowledge and perform logical reasoning, with favored algorithms being: rules and decision trees.

② Bayesian: Obtains probabilities to perform probabilistic reasoning, with favored algorithms being: Naive Bayes or Markov.

③ Connectionism: Uses probability matrices and weighted neurons to dynamically identify and generalize patterns, with favored algorithms being: neural networks.

④ Evolutionism: Generates variations and selects the optimal ones for specific goals, with favored algorithms being: genetic algorithms.

⑤ Analogizer: Optimizes functions based on constraints (as high as possible without deviating from the path), with favored algorithms being: Support Vector Machines.

2. Stages of Evolution

1980s

Dominant School: Symbolism
Architecture: Servers or mainframes
Dominant Theory: Knowledge Engineering
Basic Decision Logic: Decision Support Systems, limited practicality

1990s to 2000

Dominant School: Bayesian
Architecture: Small server clusters
Dominant Theory: Probability Theory
Classification: Scalable comparisons or contrasts that are good enough for many tasks

Early to Mid 2010s

Dominant School: Connectionism
Architecture: Large server farms
Dominant Theory: Neuroscience and Probability
Recognition: More accurate image and sound recognition, translation, sentiment analysis, etc.

3. These Schools are Expected to Collaborate and Integrate Their Methods

Late 2010s

Dominant School: Connectionism + Symbolism
Architecture: Many clouds
Dominant Theory: Memory Neural Networks, Large-scale Integration, Knowledge-based Reasoning
Simple Q&A: Narrow, domain-specific knowledge sharing

2020s+

Dominant School: Connectionism + Symbolism + Bayesian + …
Architecture: Cloud Computing and Fog Computing
Dominant Theory: Perception with networks, reasoning and working with rules
Simple Perception, Reasoning, and Action: Limited automation or human-machine interaction

2040s+

Dominant School: Algorithm Fusion
Architecture: Ubiquitous Servers
Dominant Theory: Meta-learning of the best combinations
Perception and Response: Actions or responses based on knowledge or experience gained through multiple learning methods

3. Algorithms of Machine Learning

Which machine learning algorithm should you use? This largely depends on the nature and quantity of available data and your training goals in each specific use case. Do not use the most complex algorithm unless its results are worth the expensive costs and resources. Here are some of the most common algorithms, sorted by ease of use.

1. Decision Tree: In a stepwise response process, typical decision tree analysis uses hierarchical variables or decision nodes, for example, categorizing a given user as creditworthy or uncreditworthy.

Advantages: Good at assessing a series of different features, qualities, and characteristics of people, places, and things.
Example Scenarios: Rule-based credit assessment, race result prediction.

2. Support Vector Machine: Based on hyperplanes, support vector machines can classify data groups.

Advantages: Support vector machines excel at binary classification between variable X and other variables, regardless of whether their relationship is linear.
Example Scenarios: News classification, handwriting recognition.

3. Regression: Regression can map the relationship between a dependent variable and one or more independent variables. In this example, spam and non-spam emails are distinguished.

Advantages: Regression can identify continuous relationships between variables, even if the relationship is not very obvious.
Example Scenarios: Road traffic flow analysis, email filtering.

4. Naive Bayes Classification: The Naive Bayes classifier is used to calculate the branch probabilities of possible conditions. Each independent feature is “naive” or conditionally independent, so they do not affect other objects. For example, in a jar containing 5 yellow and red balls, what is the probability of drawing two yellow balls consecutively? From the top branch in the diagram, the probability of drawing two yellow balls consecutively is 1/10. The Naive Bayes classifier can calculate joint conditional probabilities of multiple features.

Advantages: The Naive Bayes method can quickly classify relevant objects with significant features in small datasets.
Example Scenarios: Sentiment analysis, consumer classification.

5. Hidden Markov Model: The visible Markov process is completely deterministic—a given state often accompanies another state. A traffic light is an example. In contrast, the hidden Markov model calculates hidden states based on visible data. Subsequently, with the analysis of hidden states, the hidden Markov model can estimate possible future observation patterns. In this example, the probability of high or low pressure (the hidden state) can be used to predict the probabilities of sunny, rainy, and cloudy days.

Advantages: Allows for variability in data, suitable for recognition and prediction operations.
Example Scenarios: Facial expression analysis, weather forecasting.

6. Random Forest: The random forest algorithm improves the precision of decision trees by using multiple trees with randomly selected data subsets. This example examines a large number of genes associated with breast cancer recurrence at the gene expression level and calculates the risk of recurrence.

Advantages: The random forest method has been shown to be useful for large-scale datasets and items with many sometimes irrelevant features.
Example Scenarios: User churn analysis, risk assessment.

7. Recurrent Neural Network: In any neural network, each neuron transforms many inputs into a single output through one or more hidden layers. Recurrent neural networks (RNNs) further propagate values layer by layer, enabling layer-wise learning. In other words, RNNs have some form of memory, allowing previous outputs to influence later inputs.

Advantages: RNNs have predictive capabilities when there is a large amount of ordered information.
Example Scenarios: Image classification and captioning, political sentiment analysis.

8. Long Short-Term Memory (LSTM) and Gated Recurrent Unit Neural Network (GRU): Early forms of RNNs suffered from loss. Although these early recurrent neural networks only allowed for retaining a small amount of early information, recent LSTMs and GRUs have both long-term and short-term memory. In other words, these new RNNs have better memory control, allowing them to retain earlier values or reset these values when necessary to process many sequential steps, avoiding “gradient decay” or the ultimate degradation of values passed layer by layer. LSTMs and GRUs allow us to use memory modules or structures called “gates” to control memory, which can pass or reset values at the appropriate time.

Advantages: LSTMs and GRUs have the same advantages as other recurrent neural networks, but due to their better memory capabilities, they are used more frequently.
Example Scenarios: Natural language processing, translation.

9. Convolutional Neural Network: Convolution refers to the fusion of weights from subsequent layers, which can be used to label the output layer.

Advantages: Convolutional neural networks are very useful when there are very large datasets, many features, and complex classification tasks.
Example Scenarios: Image recognition, text-to-speech, drug discovery.

Editor: Wang Jing

Leave a Comment Cancel reply