
Deep learning has become particularly popular in recent years, much like big data did five years ago. However, deep learning primarily falls within the field of machine learning. In this article, we will discuss the differences in algorithm processes between machine learning and deep learning.
-
I finally got accepted into an artificial intelligence research program! I don’t know what the differences are between machine learning and deep learning; it feels like everything is deep learning.
-
Wow, I heard that a senior has been tuning parameters for 10 months to prepare for the T9 model with 200 billion parameters. I want to tune parameters for T10 to aim for Best Paper.
Currently, traditional machine learning-related research papers indeed occupy a small proportion. Some people complain that deep learning is just a systems engineering project with no mathematical depth.
However, it is undeniable that deep learning is incredibly useful! It greatly simplifies the overall algorithm analysis and learning process of traditional machine learning, and more importantly, it has refreshed the accuracy and precision in some common domain tasks that traditional machine learning algorithms could not achieve.
Deep learning has become particularly popular in recent years, much like big data did five years ago. However, deep learning primarily falls within the field of machine learning. In this article, we will discuss the differences in algorithm processes between machine learning and deep learning.

In fact, machine learning research is all about data science (which sounds a bit dull). Here are the main processes of machine learning algorithms:
(1) Dataset preparation
(2) Exploratory analysis of the data
(3) Data preprocessing
(4) Data splitting
(5) Building machine learning algorithm models
(6) Selecting machine learning tasks
(7) Finally, evaluating how well the machine learning algorithms perform on actual data

1.1 Dataset
First, we need to study the data issue. The dataset is the starting point for building machine learning models. In simple terms, a dataset is essentially an M×N matrix, where M represents columns (features) and N represents rows (samples).
Columns can be broken down into X and Y, where X can refer to features, independent variables, or input variables. Y can refer to class labels, dependent variables, and output variables.


Heatmap (to identify internal correlations of features), box plot (to visualize group differences), scatter plot (to visualize correlations between features), principal component analysis (to visualize the clustering distribution presented in the dataset), etc.

Pivoting, grouping, filtering the data, etc.
Data preprocessing is essentially about cleaning, organizing, or processing the data. It refers to various checks and correction processes on the data to address issues such as missing values, spelling errors, normalizing/standardizing values for comparability, transforming data (e.g., logarithmic transformation), etc.
For example, resizing images to a uniform size or resolution.
The quality of the data significantly impacts the quality of the machine learning algorithm model. Therefore, to achieve the best quality of machine learning models, a large portion of the work in traditional machine learning algorithm processes is actually focused on analyzing and processing the data.
Generally speaking, data preprocessing can easily take up 80% of the time in a machine learning project process, while the actual model building phase and subsequent model analysis only account for the remaining 20%.
In the development process of machine learning models, we hope that the trained model performs well on new, unseen data. To simulate new, unseen data, we split the available data into two parts: the training set and the test set.
The first part is a larger subset of data used as the training set (e.g., 80% of the original data); the second part is usually a smaller subset used as the test set (the remaining 20% of the data).
Next, we use the training set to build a predictive model, then apply this trained model to the test set (i.e., as new, unseen data) for predictions. The performance of the model on the test set is used to select the best model, and hyperparameter optimization can also be performed to obtain the best model.

Another common method of data splitting is to divide the data into three parts:
(1) Training Set
(2) Validation Set
(3) Test Set
The training set is used to build the predictive model, while the validation set is used for evaluation, allowing for predictions and model tuning (e.g., hyperparameter optimization), and selecting the best-performing model based on the validation set results.
The operation of the validation set is similar to that of the training set. However, it is worth noting that the test set does not participate in the establishment and preparation of the machine learning model; it is a separate sample set reserved during the training process to adjust the model’s hyperparameters and conduct a preliminary evaluation of the model’s capabilities. Typically, validation occurs while training, where validation is done using the validation set to test the preliminary effectiveness of the model.

In fact, data is the most valuable asset in the machine learning process. To use existing data more economically, N-fold cross-validation is commonly used, where the dataset is divided into N parts. In such N-fold datasets, one part is reserved as test data, while the others are used as training data to build the model. The machine learning process is validated through repeated cross-iteration.
This cross-validation method is widely used in the machine learning process but is less common in deep learning.
Machine learning algorithms can generally be divided into one of the following three types:
(1) Supervised Learning
This is a machine learning task that establishes a mathematical (mapping) relationship between input X and output Y variables. Such (X, Y) pairs constitute the labeled data used to build the model to learn how to predict output from input.
(2) Unsupervised Learning
This is a machine learning task that only utilizes input X variables. The X variables are unlabeled data, and the learning algorithm uses the inherent structure of the data during modeling.
(3) Reinforcement Learning
This is a machine learning task that determines the next course of action through trial and error learning, striving to maximize reward returns.
Parameter Tuning
The legendary parameter tuning expert primarily does this work. Hyperparameters are essentially parameters of the machine learning algorithm that directly affect the learning process and predictive performance. Since there is no universal hyperparameter setting that can be generally applied to all datasets, hyperparameter optimization is necessary.
For example, in random forests, two common hyperparameters are often optimized: mtry and ntree. Mtry (maxfeatures) represents the number of variables sampled randomly as candidate variables at each split, while ntree (nestimators) represents the number of trees to grow.
Another machine learning algorithm that was still very mainstream ten years ago is the support vector machine (SVM). The hyperparameters that need to be optimized are the C parameter and the gamma parameter of the radial basis function (RBF) kernel. The C parameter is a penalty term that limits overfitting, while the gamma parameter controls the width of the RBF kernel.
Tuning is usually aimed at deriving the optimal set of hyperparameter values. Often, it is not about pursuing a single optimal value for hyperparameters; in fact, the tuning expert is just joking around. What is truly needed is to understand the algorithm principles and find parameters that suit the data and model.
Feature Selection
Feature selection is the process of choosing a subset of features from a large number of initial features. Besides achieving high-accuracy models, one of the most important aspects of building machine learning models is obtaining actionable insights. To achieve this goal, it is crucial to select important subsets of features from a large pool.
The task of feature selection can itself constitute a new research area, where substantial efforts are made to design novel algorithms and methods. Among the many available feature selection algorithms, some classic methods are based on simulated annealing and genetic algorithms. In addition, there are many methods based on evolutionary algorithms (such as particle swarm optimization, ant colony optimization, etc.) and random methods (such as Monte Carlo).
Deep learning is essentially a paradigm within machine learning, so their main processes are quite similar. Deep learning optimizes data analysis, and the modeling process is also shortened, unifying the previously diverse algorithms in machine learning through neural networks.
Before deep learning was widely used, the machine learning algorithm process required a lot of time to collect data, filter data, and attempt various feature extraction machine learning algorithms or combine multiple features for classification and regression.
Here are the main processes of machine learning algorithms:

