1. Overview of Machine Learning

1) What is Machine Learning

Artificial Intelligence (Artificial intelligence) is a new technical science that studies and develops theories, methods, technologies, and application systems to simulate, extend, and enhance human intelligence. It is a broad and vague concept, and the ultimate goal of artificial intelligence is to enable computers to simulate human thinking and behavior.

Artificial intelligence began to emerge around the 1950s, but its development was slow due to limitations in data and hardware devices.

Illustration of 72 Basic Machine Learning Concepts

Machine Learning (Machine learning) is a subset of artificial intelligence and a means to achieve artificial intelligence, but it is not the only way. It is a discipline that specifically studies how computers can simulate or implement human learning behaviors to acquire new knowledge or skills, reorganizing existing knowledge structures to continuously improve their performance. It began to flourish around the 1980s, giving birth to a large number of machine learning models related to mathematical statistics.

Deep Learning (Deep learning) is a subset of machine learning inspired by the human brain, consisting of artificial neural networks (ANN) that mimic similar structures present in the human brain. In deep learning, learning is conducted through a deep, multi-layered “network” of interconnected “neurons”. The term “depth” usually refers to the number of hidden layers in the neural network. It experienced explosive growth after 2012 and is widely used in many scenarios.

Let’s take a look at how famous scholars abroad define machine learning:

Machine learning studies how computers can simulate human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve themselves.

From a practical perspective, machine learning, supported by big data, enables machines to perform deep statistical analysis on data through various algorithms for “self-learning,” allowing artificial intelligence systems to gain inductive reasoning and decision-making capabilities.

Through the classic spam filtering application, we can further understand the principles of machine learning and what T, E, P in the definition refer to:

2) Three Elements of Machine Learning

The three elements of machine learning include data, model, and algorithm. The relationship between these three elements can be represented by the following diagram:

(1) Data

Data-driven: Data-driven refers to our decision-making supported by objective quantitative data through active data collection and analysis. In contrast, experience-driven refers to what we commonly call “guessing”.

Illustration of 72 Basic Machine Learning Concepts

(2) Model & Algorithm

Model: In the context of AI data-driven, a model refers to a hypothesis function that makes decisions Y based on data X, which can take different forms, such as computational or rule-based.

Algorithm: Refers to the specific computational methods of learning models. Statistical learning is based on training datasets, selecting the optimal model from the hypothesis space according to learning strategies, and finally considering what computational methods to use to solve the optimal model. It is usually an optimization problem.

Illustration of 72 Basic Machine Learning Concepts

3) Development History of Machine Learning

The term artificial intelligence first appeared in 1956 to explore effective solutions to certain problems. In 1960, the U.S. Department of Defense trained computers to mimic human reasoning processes using the concept of “neural networks”.

Before 2010, tech giants like Google and Microsoft improved machine learning algorithms, elevating query accuracy to new heights. Later, with the increase in data volume, advanced algorithms, and improvements in computing and storage capacity, machine learning further developed.

4) Core Technologies of Machine Learning

Classification: Applications that train models with classified data and accurately classify and predict new samples based on the model.

Clustering: Identifying similarities and differences in massive data and aggregating them into multiple categories based on maximum commonalities.

Outlier Detection: Analyzing the distribution patterns of data points to identify outliers that differ significantly from normal data.

Regression: Finding the best fitting parameters for the model based on training known attribute value data and predicting the output value of new samples based on the model.

5) Basic Workflow of Machine Learning

The machine learning workflow includes several steps: data preprocessing (Processing), model learning (Learning), model evaluation (Evaluation), and new sample prediction (Prediction).

Illustration of 72 Basic Machine Learning Concepts

Data Preprocessing: Input (raw data + labels) → processing (feature processing + scaling, feature selection, dimensionality reduction, sampling) → output (test set + training set).

Model Learning: Model selection, cross-validation, result evaluation, hyperparameter selection.

Model Evaluation: Understanding the model’s score on the test dataset.

New Sample Prediction: Predicting the test set.

6) Application Scenarios of Machine Learning

As a data-driven method, machine learning has been widely applied in data mining, computer vision, natural language processing, biometric recognition, search engines, medical diagnosis, credit card fraud detection, securities market analysis, DNA sequencing, speech and handwriting recognition, and robotics.

Illustration of 72 Basic Machine Learning Concepts

Intelligent Healthcare: Intelligent prosthetics, exoskeletons, healthcare robots, surgical robots, intelligent health management, etc.

Facial Recognition: Access control systems, attendance systems, facial recognition anti-theft doors, electronic passports, and ID cards, and can also use facial recognition systems and networks to capture fugitives nationwide.

Robotics Control: Industrial robots, robotic arms, multi-legged robots, vacuum robots, drones, etc.

2. Basic Terminology of Machine Learning

Illustration of 72 Basic Machine Learning Concepts

Supervised Learning (Supervised Learning): The training set has labeled information, and the learning methods include classification and regression.

Unsupervised Learning (Unsupervised Learning): The training set has no labeled information, and the learning methods include clustering and dimensionality reduction.

Reinforcement Learning (Reinforcement Learning): A learning method with delayed and sparse feedback labels.

Illustration of 72 Basic Machine Learning Concepts

Example/Sample: A piece of data in the dataset above.

Attribute/Feature: “Color”, “Root”, etc.

Attribute Space/Sample Space/Input Space X: The space formed by all attributes.

Feature Vector: A coordinate vector corresponding to each point in the space.

Label: Information about the result of the example, such as (“Color=Green, Root=Curled, Sound=Blurry”, Good Melon), where “Good Melon” is called the label.

Classification: If the prediction is for discrete values, such as “Good Melon”, “Bad Melon”, this type of learning task is called classification.

Hypothesis: The learned model corresponds to a certain underlying regularity about the data.

Truth: The underlying regularity itself.

Learning Process: Aimed at finding or approximating the truth.

Generalization Ability: The ability of the learned model to apply to new samples. Generally speaking, the larger the training sample, the more likely it is to obtain a model with strong generalization ability through learning.

3. Classification of Machine Learning Algorithms

1) Problem Scenarios Based on Machine Learning Algorithms

Machine learning has developed into a multidisciplinary field over the past 30 years, involving probability theory, statistics, approximation theory, convex analysis, computational complexity theory, and many other disciplines. The theory of machine learning mainly designs and analyzes algorithms that allow computers to automatically “learn”.

Machine learning algorithms automatically analyze patterns from data and use those patterns to predict unknown data.

The theory of machine learning focuses on effective and feasible learning algorithms. Many inference problems are difficult to follow programmatically, so some machine learning research develops easily manageable approximate algorithms.

Illustration of 72 Basic Machine Learning Concepts

The main categories of machine learning include: supervised learning, unsupervised learning, and reinforcement learning.

Illustration of 72 Basic Machine Learning Concepts

Supervised Learning: A function is learned from the given training dataset, allowing predictions based on this function when new data arrives. The training set for supervised learning requires both input and output, or features and targets. The targets in the training set are labeled by humans. Common supervised learning algorithms include regression analysis and statistical classification.

Unsupervised Learning: Unlike supervised learning, the training set has no manually labeled results. Common unsupervised learning algorithms include Generative Adversarial Networks (GAN) and clustering.

Reinforcement Learning: Learning how to take actions by observing. Each action affects the environment, and the learning object makes judgments based on the feedback observed from the surrounding environment.

2) Classification Problems

Classification problems are a very important component of machine learning. The goal is to determine which known sample class a new sample belongs to based on certain features of known samples. Classification problems can be subdivided as follows:

Binary Classification Problem: Indicates that there are two categories in the classification task for which the new sample belongs to a known sample class.

Multiclass Classification (Multiclass classification) problem: Indicates that there are multiple categories in the classification task.

Multilabel Classification (Multilabel classification) problem: Assigning a series of target labels to each sample.

3) Regression Problems

Illustration of 72 Basic Machine Learning Concepts

4) Clustering Problems

Learn more about machine learning clustering algorithms: Clustering Algorithms.

Illustration of 72 Basic Machine Learning Concepts

5) Dimensionality Reduction Problems

Learn more about machine learning dimensionality reduction algorithms: PCA Dimensionality Reduction Algorithm.

4. Model Evaluation and Selection in Machine Learning

1) Machine Learning and Data Fitting

The most typical supervised learning in machine learning consists of classification and regression problems. In classification problems, we learn a “decision boundary” to distinguish data; in regression problems, we learn to fit a curve to the sample distribution.

2) Training Set and Dataset

Using housing price estimation as an example, let’s discuss the concepts involved.

Training Set (Training Set): Helps train the model; simply put, it involves using the data from the training set to determine the parameters of the fitting curve.

Test Set (Test Set): Used to test the accuracy of the trained model.

However, the test set does not guarantee the correctness of the model; it only indicates that similar data will yield similar results using this model. During model training, all parameters are adjusted and fitted based on the existing training set data, which may lead to overfitting, meaning that the parameters fit accurately for the training set data but may perform poorly when predicting new data.

3) Empirical Error

Learning on the data in the training set. The error of the model on the training set is called “empirical error”. However, empirical error is not always better when smaller, as we hope to achieve good prediction results on new, unseen data.

4) Overfitting

Overfitting refers to a model that performs well on the training set but performs poorly on cross-validation and test sets, indicating that the model’s predictive performance on unknown samples is generally poor, with low generalization ability.

Illustration of 72 Basic Machine Learning Concepts

How to prevent overfitting?? Common methods include Early Stopping, Data Augmentation, Regularization, Dropout, etc.

Regularization: Refers to adding a regularization term at the end of the objective function, typically L1 regularization and L2 regularization. L1 regularization is based on the L1 norm, adding the L1 norm of the parameters as a term in the objective function, which is the sum of the absolute values of the parameters.

Data Augmentation: Refers to obtaining more data that meets the requirements, which is either independent and identically distributed with the existing data or approximately independent and identically distributed. Common methods include collecting more data from the source, duplicating existing data with random noise, resampling, estimating data distribution parameters based on the current dataset, and using that distribution to generate more data.

DropOut: Achieved by modifying the structure of the neural network itself.

5) Bias

Bias usually refers to the degree of deviation in model fitting. Given numerous training sets, the expected model to fit is the average model. Bias is the difference between the true model and the average model.

A simple model is a set of straight lines, and the average model obtained after averaging is a straight dashed line, which differs significantly from the true model curve (the gray shadow area is large). Therefore, simple models usually have high bias.

Illustration of 72 Basic Machine Learning Concepts

A complex model is a set of wavy lines, and after averaging, the maximum and minimum values will cancel each other out, resulting in a small difference from the true model curve, so complex models usually have low bias (the yellow curve and green dashed line are almost overlapping).

Illustration of 72 Basic Machine Learning Concepts

6) Variance

Variance usually refers to the stability (simplicity) of the model. Simple models have corresponding functions that are identical, all being horizontal lines, and the average model’s function is also a horizontal line, so simple models have very low variance and are not sensitive to data fluctuations.

Illustration of 72 Basic Machine Learning Concepts

Complex models have corresponding functions that are highly variable and have no rules, while the average model’s function is a smooth curve, resulting in high variance for complex models and high sensitivity to data fluctuations.

Illustration of 72 Basic Machine Learning Concepts

7) Balancing Bias and Variance

8) Performance Metrics

Performance metrics are numerical evaluation standards for measuring the generalization ability of models, reflecting the current problem (task requirements). Using different performance metrics may lead to different evaluation results.

(1) Regression Problems

The judgment of the “goodness” of a model not only depends on the algorithm and data but also on the current task requirements. Common performance metrics for regression problems include: Mean Absolute Error, Mean Square Error, Root Mean Square Error, R-squared, etc.

Illustration of 72 Basic Machine Learning Concepts

Mean Absolute Error (Mean Absolute Error, MAE), also known as Mean Absolute Deviation, is the average of the absolute deviations of all label values from the regression model’s predicted values.

Mean Absolute Percentage Error (Mean Absolute Percentage Error, MAPE) is an improvement of MAE that considers the proportion of absolute error relative to the true value.

Mean Square Error (Mean Square Error, MSE) is the average of the squared deviations of all label values from the regression model’s predicted values, compared to Mean Absolute Error.

Root Mean Square Error (Root-Mean-Square Error, RMSE), also known as Standard Error, is calculated based on Mean Square Error by taking the square root. RMSE is used to measure the deviation between observed values and true values.

R-squared, the coefficient of determination, reflects the proportion of total variation in the dependent variable that can be explained by the independent variables in the current regression model. The closer the proportion is to 1, the better the current regression model explains the data and describes the true distribution of the data.

(2) Classification Problems

Common performance metrics for classification problems include Error Rate, Accuracy, Precision, Recall, F1, ROC Curve, AUC Curve, and R-squared, etc.

Error Rate: The proportion of misclassified samples to the total number of samples.

Accuracy: The proportion of correctly classified samples to the total number of samples.

Precision (also known as accuracy), refers to the proportion of truly correct cases among the results returned after retrieval compared to the total number of results deemed correct.

Recall (also known as sensitivity), refers to the proportion of truly correct cases among the total number of truly correct cases (both retrieved and not retrieved).

F1 is a metric that considers both precision and recall, defined based on the harmonic mean of precision and recall: F1 metric’s general form – Fβ, allows us to express different preferences for precision and recall.

ROC Curve (Receiver Operating Characteristic Curve) is a comprehensive consideration of the quality of probability prediction ranking, reflecting the “expected generalization performance” of the learner under different tasks. The vertical axis of the ROC curve is the “True Positive Rate” (TPR), and the horizontal axis is the “False Positive Rate” (FPR).

AUC (Area Under ROC Curve) is the area under the ROC curve, representing the quality of sample prediction ranking.

From a higher perspective, understanding AUC: still taking the identification of abnormal users as an example, a high AUC value means that the model can identify as many abnormal users as possible while maintaining a low false positive rate for normal users (not misclassifying a large number of normal users as abnormal in order to identify abnormal users).

9) Evaluation Methods

How can we reliably evaluate when we don’t have unknown samples? The key is to obtain reliable “test set data” (Test Set), meaning that the test set (used for evaluation) should be mutually exclusive from the training set (used for model learning).

Common evaluation methods include: Hold-out, Cross Validation, Bootstrap.

Hold-out is one of the most common evaluation methods in machine learning, where a validation sample set is reserved from the training data; this part of the data is not used for training but for model evaluation.

Another common evaluation method in machine learning is Cross Validation. K-fold cross-validation averages the results of k different group trainings to reduce variance, making the model’s performance less sensitive to data partitioning and utilizing data more fully, resulting in more stable model evaluation results.

Bootstrap is a non-parametric method used to estimate overall values with small samples, widely applied in evolutionary and ecological research.

Bootstrap generates a large number of pseudo-samples through sampling with replacement, and by calculating these pseudo-samples, it obtains the distribution of statistics to estimate the overall distribution of data.

10) Model Tuning and Selection Criteria

We hope to find a model that has good expressive power for the current problem and a lower model complexity:

A model with good expressiveness can effectively learn the patterns and rules in the training data;
A model with low complexity has a smaller variance, is less prone to overfitting, and has better generalization performance.

11) How to Choose the Optimal Model

(1) Validation Set Evaluation Selection

Split the data into training and validation sets.
For the prepared candidate hyperparameters, train the model on the training set and evaluate it on the validation set.

(2) Grid Search/Random Search Cross Validation

Generate candidate hyperparameter sets through grid search/random search.
For each set of hyperparameters, use cross-validation to evaluate the performance.
Select the best-performing hyperparameters.

(3) Bayesian Optimization

Hyperparameter tuning based on Bayesian optimization.

(Source: “Accompanying Data” WeChat Official Account)

Click to Download Subscription Order Click Online Fill Order

Subscription Steps

Traditional Subscription:① Download Subscription Order→② Fill in Subscription Order→③ Transfer Payment→④ Email Subscription Order and Payment Receipt to[email protected]→⑤ Phone Confirmation, CompleteSubscription.

Online Subscription:① Fill in Subscription Order→② Phone Confirmation, Complete Subscription.

Contact Information

Zhang Xiuhua: 020-87592072

Su Si: 020-87515169

Chen Feiyi: 020-87592003

Email: [email protected]

1. Overview of Machine Learning

1) What is Machine Learning

2) Three Elements of Machine Learning

(1) Data

(2) Model & Algorithm

3) Development History of Machine Learning

4) Core Technologies of Machine Learning

5) Basic Workflow of Machine Learning

6) Application Scenarios of Machine Learning

2. Basic Terminology of Machine Learning

3. Classification of Machine Learning Algorithms

1) Problem Scenarios Based on Machine Learning Algorithms

2) Classification Problems

4) Clustering Problems

5) Dimensionality Reduction Problems

4. Model Evaluation and Selection in Machine Learning

1) Machine Learning and Data Fitting

2) Training Set and Dataset

3) Empirical Error

4) Overfitting

5) Bias

8) Performance Metrics

(1) Regression Problems

(2) Classification Problems

9) Evaluation Methods

10) Model Tuning and Selection Criteria

11) How to Choose the Optimal Model

(1) Validation Set Evaluation Selection

(2) Grid Search/Random Search Cross Validation

(3) Bayesian Optimization

Leave a Comment Cancel reply