What Are the Differences Between Machine Learning and Deep Learning?

Machine Learning and Deep Learning Algorithm Processes

Finally got into the AI research program, but I don’t know what the difference is between machine learning and deep learning. It feels like everything is deep learning.

Wow, I heard that a senior has been tuning parameters for 10 months to prepare for the T9 model with 200 billion parameters. I want to tune parameters for T10 to try to win Best Paper.

What Are the Differences Between Machine Learning and Deep Learning?

Currently, traditional machine learning-related research papers indeed account for a low proportion. Some people complain that deep learning is just a system engineering project without much mathematical value.

However, it is undeniable that deep learning is extremely useful, greatly simplifying the overall algorithm analysis and learning processes of traditional machine learning. More importantly, it has refreshed the precision and accuracy in some general domain tasks that traditional machine learning algorithms could not achieve.

Deep learning has been particularly popular in recent years, just like big data was five years ago. However, deep learning mainly belongs to the domain of machine learning. Therefore, in this article, we will discuss the differences in algorithm processes between machine learning and deep learning.

1. Machine Learning Algorithm Process

In fact, machine learning studies data science (which sounds a bit boring). Below is the main process of machine learning algorithms:

Dataset Preparation
Exploratory Data Analysis
Data Preprocessing
Data Splitting
Machine Learning Algorithm Modeling
Selecting Machine Learning Tasks
Finally, evaluate how well the machine learning algorithm performs on actual data.

1.1 Dataset

First, we need to study the data. The dataset is the starting point for building machine learning models. In simple terms, a dataset is essentially an M×N matrix, where M represents columns (features) and N represents rows (samples).

Columns can be decomposed into X and Y, where X can refer to features, independent variables, or input variables. Y can also refer to class labels, dependent variables, and output variables.

1.2 Data Analysis

Conducting exploratory data analysis (EDA) is aimed at gaining a preliminary understanding of the data. The main tasks of EDA include: cleaning the data, describing the data (descriptive statistics, charts), examining the data distribution, comparing relationships between data, developing intuition about the data, and summarizing the data.

In simple terms, exploratory data analysis methods are about understanding the data, analyzing it, and clarifying its distribution. It focuses on the actual distribution of the data, emphasizing visualization so that analysts can easily see the underlying patterns in the data, which can inspire them to find suitable models for the data.

In a typical machine learning algorithm process and data science project, the first thing I do is to “focus on the data” to better understand it. The three main EDA methods I typically use include:

Descriptive Statistics

Mean, median, mode, standard deviation.

Data Visualization

Heatmaps (to identify internal correlations of features), box plots (to visualize group differences), scatter plots (to visualize correlations between features), principal component analysis (to visualize clustering distributions in the dataset), etc.

Data Reshaping

Pivoting, grouping, filtering data, etc.

1.3 Data Preprocessing

Data preprocessing refers to cleaning, organizing, or regular data processing. It involves various checks and corrections of the data to correct missing values, spelling errors, normalize/standardize values for comparability, and transform data (e.g., logarithmic transformation).

For example, resizing images to a uniform size or resolution.

The quality of the data will significantly impact the quality of the machine learning algorithm model. Therefore, to achieve the best quality for the machine learning model, a large part of the traditional machine learning algorithm process involves analyzing and processing the data.

Generally speaking, data preprocessing can easily take up 80% of the time in a machine learning project process, while the actual model building phase and subsequent model analysis only account for about 20% of the remaining time.

1.4 Data Splitting

Training Set & Test Set

In the development process of machine learning models, we hope that the trained model will perform well on new, unseen data. To simulate new, unseen data, we split the available data, dividing the processed dataset into two parts: the training set and the test set.

The first part is a larger subset of data used as the training set (e.g., 80% of the original data); the second part is usually a smaller subset used as the test set (the remaining 20% of the data).

Next, we use the training set to build a predictive model and then apply this trained model to the test set (i.e., as new, unseen data) for prediction. The performance of the model on the test set is used to select the best model, and hyperparameter optimization can also be performed to achieve the best model.

Training Set & Validation Set & Test Set

Another common data splitting method is to divide the data into three parts:

Training Set
Validation Set
Test Set

The training set is used to build the predictive model, while the validation set is used for evaluation, allowing for model tuning (e.g., hyperparameter optimization), and selecting the best performing model based on the validation set results.

The operation of the validation set is similar to that of the training set. However, it is worth noting that the test set does not participate in the building and preparation of the machine learning model. It is a separate sample set reserved during the training process for adjusting the model’s hyperparameters and conducting preliminary evaluations of the model’s capabilities. Typically, training and validation occur simultaneously, where validation uses the validation set to check the initial performance of the model.

Cross-Validation

Data is the most valuable asset in the machine learning process. To utilize existing data more efficiently, N-fold cross-validation is often used, dividing the dataset into N parts. In such N-fold datasets, one part is reserved as the test data while the rest are used as training data to build the model. The machine learning process is validated through repeated cross-iteration.

This cross-validation method is widely used in machine learning processes, but it is used less frequently in deep learning.

1.5 Machine Learning Algorithm Modeling

Now we come to the most interesting part. The data screening and processing steps are quite tedious, but now we can use the carefully prepared data for modeling. Based on the type of the target variable (usually referred to as the Y variable), we can build either a classification or regression model.

Machine Learning Algorithms

Machine learning algorithms can generally be classified into one of the following three types:

Supervised Learning

This is a machine learning task that establishes a mathematical (mapping) relationship between the input X and output Y variables. Such (X, Y) pairs constitute the labeled data used to build the model to learn how to predict output from input.

Unsupervised Learning

This is a machine learning task that only uses input X variables. The X variables are unlabeled data, and the learning algorithm uses the inherent structure of the data for modeling.

Reinforcement Learning

This is a machine learning task that decides the next course of action through trial and error learning, striving to maximize the reward.

Parameter Tuning

The legendary parameter tuning expert mainly does this work. Hyperparameters are essentially the parameters of the machine learning algorithm that directly affect the learning process and prediction performance. Since there is no universal hyperparameter setting that can be applied to all datasets, hyperparameter optimization is necessary.

For example, in using Random Forest, two common hyperparameters are often optimized: mtry and ntree. mtry (maxfeatures) represents the number of variables sampled randomly as candidate variables during each split, while ntree (nestimators) represents the number of trees to be grown.

Another mainstream machine learning algorithm from ten years ago is Support Vector Machine (SVM). The hyperparameters to optimize are the C parameter and the gamma parameter for the Radial Basis Function (RBF) kernel. The C parameter is a penalty term that limits overfitting, while the gamma parameter controls the width of the RBF kernel.

Tuning usually aims to derive the best set of hyperparameter values. Many times, it is not about finding the optimal value for hyperparameters, but rather understanding the algorithm’s principles and finding parameters that suit the data and model.

Feature Selection

Feature selection is the process of selecting a subset of features from the initial large set of features. Besides achieving high-accuracy models, obtaining actionable insights is a crucial aspect of building machine learning models; thus, it is essential to select important feature subsets from a large number of features.

The task of feature selection can itself form a new research field, where considerable efforts are made to design novel algorithms and methods. Among various available feature selection algorithms, some classical methods are based on simulated annealing and genetic algorithms. Additionally, there are many methods based on evolutionary algorithms (like particle swarm optimization, ant colony optimization, etc.) and stochastic methods (like Monte Carlo).

1.6 Machine Learning Tasks

In supervised learning, two common machine learning tasks include classification and regression.

Classification

A trained classification model takes a set of variables as input and predicts the output class label. The following diagram illustrates three classes represented by different colors and labels. Each small colored sphere represents a data sample. The three classes of data samples are displayed in two dimensions; this visualization can be created by performing PCA analysis and displaying the first two principal components (PC) or by simply choosing a scatter plot of two variables.

Performance Metrics

How can we know if the trained machine learning model performs well or poorly? By using performance evaluation metrics. Some common metrics for evaluating classification performance include Accuracy (AC), Sensitivity (SN), Specificity (SP), and Matthews Correlation Coefficient (MCC).

Regression

The simplest regression model can be well summarized by the following simple equation: Y = f(X). Here, Y corresponds to the quantitative output variable, X refers to the input variable, and f denotes the mapping function that computes the output value as a function of input features (derived from the machine learning model).

The essence of the regression example formula is that if X is known, Y can be inferred. Once Y is computed (predicted), a popular visualization method is to create a simple scatter plot comparing actual values with predicted values, as shown in the following diagram.

Evaluating the performance of regression models assesses how accurately the fitted model can predict input data values. Common metrics for evaluating regression model performance include the coefficient of determination (R²). Additionally, Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are also commonly used metrics for measuring residuals or prediction errors.

2. Deep Learning Algorithm Process

Deep learning is actually a paradigm within machine learning, so their main processes are quite similar. Deep learning optimizes data analysis, and the modeling process is also shortened, unifying the previously diverse algorithms in machine learning through neural networks.

Before deep learning was widely used, the machine learning algorithm process required a lot of time to collect data, screen the data, and try various feature extraction machine learning algorithms or combine multiple features for classification and regression.

Below is the main process of machine learning algorithms:

Dataset Preparation
Data Preprocessing
Data Splitting
Defining Neural Network Model
Training the Network

Deep learning does not require us to extract features manually; instead, it automatically performs high-dimensional abstract learning on the data through neural networks, reducing the complexity of feature engineering and saving a lot of time in this regard.

However, at the same time, because it introduces deeper and more complex network model structures, the tuning work becomes more burdensome. For example: defining the neural network model structure, confirming the loss function, determining the optimizer, and finally, repeatedly adjusting the model parameters.

References [1] https://github.com/dataprofessor/infographic [2] Chen Zhongming. “Deep Learning: Principles and Practice”

Author: ZOMI

Link: https://zhuanlan.zhihu.com/p/455602945

For academic sharing only, please delete if infringed.

Statement: Some content is sourced from the internet and is for learning and communication purposes only. The copyright of the article belongs to the original author. If there are any issues, please contact for removal.