What Are the Differences Between Machine Learning and Deep Learning?

What Are the Differences Between Machine Learning and Deep Learning?
Deep learning has become particularly popular in recent years, much like big data did five years ago. However, deep learning primarily falls within the field of machine learning. In this article, we will discuss the differences in algorithm processes between machine learning and deep learning.
The Algorithm Processes of Machine Learning and Deep Learning
  • I finally got accepted into an artificial intelligence research program! I don’t know what the differences are between machine learning and deep learning; it feels like everything is deep learning.

  • Wow, I heard that a senior has been tuning parameters for 10 months to prepare for the T9 model with 200 billion parameters. I want to tune parameters for T10 to aim for Best Paper.

    What Are the Differences Between Machine Learning and Deep Learning?

Currently, traditional machine learning-related research papers indeed occupy a small proportion. Some people complain that deep learning is just a systems engineering project with no mathematical depth.

However, it is undeniable that deep learning is incredibly useful! It greatly simplifies the overall algorithm analysis and learning process of traditional machine learning, and more importantly, it has refreshed the accuracy and precision in some common domain tasks that traditional machine learning algorithms could not achieve.

Deep learning has become particularly popular in recent years, much like big data did five years ago. However, deep learning primarily falls within the field of machine learning. In this article, we will discuss the differences in algorithm processes between machine learning and deep learning.

What Are the Differences Between Machine Learning and Deep Learning?
01
The Algorithm Process of Machine Learning

In fact, machine learning research is all about data science (which sounds a bit dull). Here are the main processes of machine learning algorithms:

(1) Dataset preparation

(2) Exploratory analysis of the data

(3) Data preprocessing

(4) Data splitting

(5) Building machine learning algorithm models

(6) Selecting machine learning tasks

(7) Finally, evaluating how well the machine learning algorithms perform on actual data

What Are the Differences Between Machine Learning and Deep Learning?

1.1 Dataset

First, we need to study the data issue. The dataset is the starting point for building machine learning models. In simple terms, a dataset is essentially an M×N matrix, where M represents columns (features) and N represents rows (samples).

Columns can be broken down into X and Y, where X can refer to features, independent variables, or input variables. Y can refer to class labels, dependent variables, and output variables.

What Are the Differences Between Machine Learning and Deep Learning?
1.2 Data Analysis
Conducting exploratory data analysis (EDA) is to gain a preliminary understanding of the data. The main tasks of EDA include: cleaning the data, describing the data (descriptive statistics, charts), checking the data distribution, comparing relationships between data, developing intuition about the data, summarizing the data, etc.
The goal of exploratory data analysis is to understand the data, analyze the data, and clarify the data distribution. It mainly focuses on the true distribution of the data, emphasizes data visualization, allowing analysts to clearly see the hidden patterns within the data, thus gaining insights to help find suitable models for the data.
In a typical machine learning algorithm process and data science project, the first thing I do is “focus on the data” to better understand it. The three main EDA methods I typically use include:
Descriptive Statistics
Mean, median, mode, standard deviation.
What Are the Differences Between Machine Learning and Deep Learning?
Data Visualization

Heatmap (to identify internal correlations of features), box plot (to visualize group differences), scatter plot (to visualize correlations between features), principal component analysis (to visualize the clustering distribution presented in the dataset), etc.

What Are the Differences Between Machine Learning and Deep Learning?
Data Reshaping

Pivoting, grouping, filtering the data, etc.

What Are the Differences Between Machine Learning and Deep Learning?

1.3 Data Preprocessing

Data preprocessing is essentially about cleaning, organizing, or processing the data. It refers to various checks and correction processes on the data to address issues such as missing values, spelling errors, normalizing/standardizing values for comparability, transforming data (e.g., logarithmic transformation), etc.

For example, resizing images to a uniform size or resolution.

The quality of the data significantly impacts the quality of the machine learning algorithm model. Therefore, to achieve the best quality of machine learning models, a large portion of the work in traditional machine learning algorithm processes is actually focused on analyzing and processing the data.

Generally speaking, data preprocessing can easily take up 80% of the time in a machine learning project process, while the actual model building phase and subsequent model analysis only account for the remaining 20%.

1.4 Data Splitting
Training Set & Test Set

In the development process of machine learning models, we hope that the trained model performs well on new, unseen data. To simulate new, unseen data, we split the available data into two parts: the training set and the test set.

The first part is a larger subset of data used as the training set (e.g., 80% of the original data); the second part is usually a smaller subset used as the test set (the remaining 20% of the data).

Next, we use the training set to build a predictive model, then apply this trained model to the test set (i.e., as new, unseen data) for predictions. The performance of the model on the test set is used to select the best model, and hyperparameter optimization can also be performed to obtain the best model.

What Are the Differences Between Machine Learning and Deep Learning?
Training Set & Validation Set & Test Set

Another common method of data splitting is to divide the data into three parts:

(1) Training Set

(2) Validation Set

(3) Test Set

The training set is used to build the predictive model, while the validation set is used for evaluation, allowing for predictions and model tuning (e.g., hyperparameter optimization), and selecting the best-performing model based on the validation set results.

The operation of the validation set is similar to that of the training set. However, it is worth noting that the test set does not participate in the establishment and preparation of the machine learning model; it is a separate sample set reserved during the training process to adjust the model’s hyperparameters and conduct a preliminary evaluation of the model’s capabilities. Typically, validation occurs while training, where validation is done using the validation set to test the preliminary effectiveness of the model.

What Are the Differences Between Machine Learning and Deep Learning?
Cross-Validation

In fact, data is the most valuable asset in the machine learning process. To use existing data more economically, N-fold cross-validation is commonly used, where the dataset is divided into N parts. In such N-fold datasets, one part is reserved as test data, while the others are used as training data to build the model. The machine learning process is validated through repeated cross-iteration.

This cross-validation method is widely used in the machine learning process but is less common in deep learning.

What Are the Differences Between Machine Learning and Deep Learning?

1.5 Building Machine Learning Algorithm Models
This is the most interesting part! The data filtering and processing steps are often tedious, but now we can use the carefully prepared data to build models. Based on the type of data for the target variable (commonly referred to as the Y variable), we can establish either a classification or regression model.
Machine Learning Algorithms

Machine learning algorithms can generally be divided into one of the following three types:

(1) Supervised Learning

This is a machine learning task that establishes a mathematical (mapping) relationship between input X and output Y variables. Such (X, Y) pairs constitute the labeled data used to build the model to learn how to predict output from input.

(2) Unsupervised Learning

This is a machine learning task that only utilizes input X variables. The X variables are unlabeled data, and the learning algorithm uses the inherent structure of the data during modeling.

(3) Reinforcement Learning

This is a machine learning task that determines the next course of action through trial and error learning, striving to maximize reward returns.

Parameter Tuning

The legendary parameter tuning expert primarily does this work. Hyperparameters are essentially parameters of the machine learning algorithm that directly affect the learning process and predictive performance. Since there is no universal hyperparameter setting that can be generally applied to all datasets, hyperparameter optimization is necessary.

For example, in random forests, two common hyperparameters are often optimized: mtry and ntree. Mtry (maxfeatures) represents the number of variables sampled randomly as candidate variables at each split, while ntree (nestimators) represents the number of trees to grow.

Another machine learning algorithm that was still very mainstream ten years ago is the support vector machine (SVM). The hyperparameters that need to be optimized are the C parameter and the gamma parameter of the radial basis function (RBF) kernel. The C parameter is a penalty term that limits overfitting, while the gamma parameter controls the width of the RBF kernel.

Tuning is usually aimed at deriving the optimal set of hyperparameter values. Often, it is not about pursuing a single optimal value for hyperparameters; in fact, the tuning expert is just joking around. What is truly needed is to understand the algorithm principles and find parameters that suit the data and model.

Feature Selection

Feature selection is the process of choosing a subset of features from a large number of initial features. Besides achieving high-accuracy models, one of the most important aspects of building machine learning models is obtaining actionable insights. To achieve this goal, it is crucial to select important subsets of features from a large pool.

The task of feature selection can itself constitute a new research area, where substantial efforts are made to design novel algorithms and methods. Among the many available feature selection algorithms, some classic methods are based on simulated annealing and genetic algorithms. In addition, there are many methods based on evolutionary algorithms (such as particle swarm optimization, ant colony optimization, etc.) and random methods (such as Monte Carlo).

What Are the Differences Between Machine Learning and Deep Learning?

1.6 Machine Learning Tasks
In supervised learning, two common machine learning tasks include classification and regression.
Classification
A trained classification model takes a set of variables as input and predicts the output class label. The following diagram represents three classes indicated by different colors and labels. Each small colored sphere represents a data sample. The three classes of data samples are displayed in two dimensions, and this visualization can be created by performing PCA analysis and displaying the first two principal components (PC); alternatively, a simple scatter plot visualization of two variables can also be chosen.

What Are the Differences Between Machine Learning and Deep Learning?

Performance Metrics
How do we know whether the trained machine learning model performs well or poorly? By using performance evaluation metrics. Some common metrics for evaluating classification performance include accuracy (AC), sensitivity (SN), specificity (SP), and Matthews correlation coefficient (MCC).
Regression
The simplest regression model can be well summarized by the following simple equation: Y = f(X). Here, Y corresponds to the quantified output variable, X refers to the input variable, and f denotes the mapping function that calculates the output value as a function of input features (derived from the machine learning model).

What Are the Differences Between Machine Learning and Deep Learning?

The essence of the regression example formula above is that if X is known, Y can be derived. Once Y is computed (predicted), a popular visualization method is to create a simple scatter plot comparing actual values to predicted values, as shown below.
Evaluating the performance of the regression model assesses how accurately the fitted model can predict the input data values. Common metrics for evaluating regression model performance include the coefficient of determination (R²). Additionally, mean squared error (MSE) and root mean square error (RMSE) are also commonly used metrics for measuring residuals or prediction errors.
02
The Algorithm Process of Deep Learning

Deep learning is essentially a paradigm within machine learning, so their main processes are quite similar. Deep learning optimizes data analysis, and the modeling process is also shortened, unifying the previously diverse algorithms in machine learning through neural networks.

Before deep learning was widely used, the machine learning algorithm process required a lot of time to collect data, filter data, and attempt various feature extraction machine learning algorithms or combine multiple features for classification and regression.

What Are the Differences Between Machine Learning and Deep Learning?

Here are the main processes of machine learning algorithms:

(1) Dataset preparation
(2) Data preprocessing
(3) Data splitting
(4) Defining neural network models
(5) Training the network
Deep learning does not require us to extract features ourselves; instead, it automatically performs high-dimensional abstraction learning on the data through neural networks, which saves a lot of time in feature engineering.
However, at the same time, because it introduces deeper and more complex network structures, the parameter tuning work becomes more burdensome! For example: defining the structure of the neural network model, confirming the loss function, determining the optimizer, and finally the iterative process of adjusting model parameters.
What Are the Differences Between Machine Learning and Deep Learning?
China Server Cloud
China Server Cloud – a core industrial internet platform provider in China. Main business: construction and operation of regional industrial internet platforms, smart factory construction, and smart park construction. Providing online service models for small and medium-sized enterprises and offline private cloud deployment models for large enterprises. Based on the company’s PaaS platform, IoT platform, and big data platform as technical support, offering various digital transformation products and solutions.
Cooperation Consultation
Contact: 400-880-6725
Official website: www.cserver.com.cn
Online consultation:
What Are the Differences Between Machine Learning and Deep Learning?

Leave a Comment