10 Essential Algorithms in Machine Learning

Source fromMedium

Author: garvitanand2

Compiled by: Machine Heart

Contributors: Geek AI, Lu

This article introduces the 10 most commonly used machine learning algorithms, including linear regression, Logistic regression, linear discriminant analysis, Naive Bayes, KNN, random forest, etc.

1. Linear Regression

In the fields of statistics and machine learning, linear regression may be one of the most well-known and easily understood algorithms.

Predictive modeling primarily focuses on minimizing model error or making the most accurate predictions at the expense of interpretability. We will borrow and reuse algorithms from many other fields (including statistics) to achieve these goals.

The linear regression model is represented as an equation that finds specific weights (i.e., coefficients B) for the input variables, thereby describing a line that best fits the relationship between the input variable (x) and the output variable (y).

Linear Regression

For example: y = B0 + B1 * x

We will predict y under the condition of a given input value x, and the goal of the linear regression learning algorithm is to find the values of coefficients B0 and B1.

We can use different techniques to learn the linear regression model from data, such as the linear algebra solution of ordinary least squares and gradient descent optimization.

Linear regression has a history of over 200 years and has been extensively studied. When using such techniques, there are some good rules of thumb: we can remove very similar (correlated) variables and try to eliminate noise in the data as much as possible. Linear regression is a simple technique that operates quickly and is a classic algorithm suitable for beginners to try.

2. Logistic Regression

Logistic regression is another technique borrowed from the field of statistics for machine learning. It is the preferred method for binary classification problems.

Like linear regression, the goal of logistic regression is also to find the weight coefficients for each input variable. However, unlike linear regression, the output prediction of logistic regression is transformed through a nonlinear function called the “logistic function”.

The shape of the logistic function looks like a large “S”, which converts any value to the range of 0-1. This is very useful because we can apply a rule to the output of the logistic function to obtain capture values within the range of 0-1 (for example, setting a threshold of 0.5, if the function value is less than 0.5, the output value is 1), and predict the value of the class.

Logistic Regression

Due to the way the model learns, the prediction results of logistic regression can also be used as the probability of a given data instance belonging to class 0 or class 1. This is very useful for problems that require more theoretical basis for prediction results.

Similar to linear regression, logistic regression performs better when irrelevant attributes and those that are very similar (correlated) to each other are removed. The model learns quickly and is very effective for binary classification problems.

3. Linear Discriminant Analysis

Logistic regression is a traditional classification algorithm limited to binary classification problems. If you have more than two classes, then linear discriminant analysis (LDA) is the preferred linear classification technique.

LDA’s representation is very straightforward. It contains the statistical properties of the data calculated for each class. For a single input variable, these properties include:

The mean of each class.
The variance of all classes.

Linear Discriminant Analysis

The prediction result is obtained by calculating the discriminant values for each class and predicting the class with the maximum discriminant value. This technique assumes that the data follows a Gaussian distribution (bell curve), so it is best to remove outliers from the data beforehand. LDA is a simple and effective classification predictive modeling method.

4. Classification and Regression Trees

Decision trees are an important class of machine learning predictive modeling algorithms.

Decision trees can be represented as a binary tree. This binary tree is the same as the binary tree in algorithm design and data structures, nothing special. Each node represents an input variable (x) and a split point based on that variable (assuming the variable is numeric).

Decision Tree

The leaf nodes of the decision tree contain an output variable (y) used for making predictions. The prediction result is obtained by traversing the various branching paths of the tree until reaching a leaf node and outputting the class value of that leaf node.

Decision trees learn quickly and make predictions quickly. They are often accurate in a wide range of problems and do not require any special preprocessing of the data.

5. Naive Bayes

Naive Bayes is a simple yet powerful predictive modeling algorithm.

The model consists of two probabilities that can be directly calculated from the training data: 1) the probability that the data belongs to each class; 2) the conditional probability that the data belongs to each class given each x value. Once these two probabilities are calculated, Bayes’ theorem can be used to predict new data using the probability model. When your data is real-valued, it is typically assumed that the data follows a Gaussian distribution (bell curve), making it easy to estimate these probabilities.

Bayes’ Theorem

Naive Bayes is called “naive” because it assumes that each input variable is independent of the others. This is a strong assumption that does not hold for real data. However, the algorithm is very effective for a large number of complex problems.

6. K-Nearest Neighbors

The K-Nearest Neighbors (KNN) algorithm is very simple yet effective. The KNN model representation is simply the entire training dataset. Simple, right?

The prediction result for a new data point is obtained by searching the entire training set for the K most similar instances (neighbors) to that data point and summarizing the output variables of those K instances. For regression problems, the prediction result may be the mean of the output variables; for classification problems, the prediction result may be the mode (or most common) class value.

The key is how to determine the similarity between data instances. If your data features are on the same scale (e.g., all in inches), the simplest measurement technique is to use Euclidean distance, which you can directly calculate based on the differences between the input variables.

K-Nearest Neighbors

KNN may require a lot of memory or space to store all data, but it only performs calculations (or learning) in real-time when a prediction is needed. Over time, you can also update and manage training instances to ensure prediction accuracy.

Using distance or proximity measures may fail in very high dimensions (with many input variables), which can negatively affect the algorithm’s performance on your problem. This is known as the curse of dimensionality. It tells us to only use those input variables that are most relevant to the predicted output variable.

7. Learning Vector Quantization

A downside of the KNN algorithm is that you need to handle the entire training dataset. The Learning Vector Quantization (LVQ) algorithm allows you to select the number of training instances you need and learn exactly those instances.

Learning Vector Quantization

The representation of LVQ is a set of codebook vectors. They are randomly selected at the beginning and are iteratively refined through multiple rounds of the learning algorithm to best summarize the training dataset. With learning, the codebook vectors can be used to make predictions like KNN. By calculating the distance between each codebook vector and the new data instance, the most similar neighbor (the best matching codebook vector) can be found. The class value (classification) or real value (regression) of the best matching unit is then returned as the prediction result. If the data is rescaled to the same range (e.g., between 0 and 1), the best prediction results can be obtained.

If you find that KNN can provide good predictions on your dataset, you might want to try the LVQ technique, which can reduce memory space requirements and does not need to store the entire training dataset like KNN.

8. Support Vector Machines

Support Vector Machines (SVM) may be one of the most popular and discussed machine learning algorithms today.

A hyperplane is a “line” that divides the input variable space. Support Vector Machines will select a hyperplane that best separates points in the input variable space by class (class 0 or class 1). In two-dimensional space, you can think of it as a line, assuming all input points can be perfectly separated by this line. The SVM learning algorithm aims to find the coefficients that achieve the best class separation through the hyperplane.

Support Vector Machine

The distance between the hyperplane and the nearest data points is called the margin. The best hyperplane that separates the two classes is the line with the maximum margin. Only those points that are related to the definition of the hyperplane and the construction of the classifier are called support vectors, as they support or define the hyperplane. In practical applications, an optimization algorithm is used to find the coefficient values that maximize the margin.

Support Vector Machines may be one of the most powerful classifiers currently available for direct use, worth trying on your own dataset.

9. Bagging and Random Forests

Random forests are one of the most popular and powerful machine learning algorithms; it is an ensemble machine learning algorithm.

Bootstrapping is a powerful statistical method for estimating a quantity (such as the mean) from data samples. You need to take many samples from the data, compute the mean, and then average the means computed from each sampling to obtain a better estimate of the true mean of all data.

Bagging uses the same method. However, the most common practice is to use decision trees rather than estimating the entire statistical model. Bagging will take multiple samples from the training data and build a model for each data sample. When you need to predict new data, each model will produce a prediction result, and Bagging will average the predictions from all models to better estimate the true output value.

Random Forest

Random forests are an improvement of this method that creates decision trees without having to choose the optimal split point, but instead introduces randomness to make suboptimal splits.

Thus, the models created for each data sample are more unique than those created in other cases, but this unique approach still ensures high accuracy. Combining their predictions can better estimate the true output value.

If you have obtained good results with high-variance algorithms (such as decision trees), you can often achieve better results by performing Bagging on that algorithm.

10. Boosting and AdaBoost

Boosting is an ensemble technique that attempts to create a strong classifier from a large number of weak classifiers. To implement the Boosting method, you first need to build a model using the training data, then create a second model (which attempts to correct the errors of the first model). We stop adding new models until the final model can perfectly predict the training set or the number of models added has reached a limit.

AdaBoost is the first truly successful Boosting algorithm developed for binary classification problems. It is the best starting point for understanding Boosting. Current Boosting methods are built on the foundation of AdaBoost, the most famous of which is the gradient boosting machine.

AdaBoost

AdaBoost uses shallow decision trees. After creating the first tree, the performance of that tree on each training instance is used to determine how much weight should be assigned to each training instance for the next tree. Weights for hard-to-predict training data increase, while weights for easy-to-predict instances decrease. Models are created one after another, with each model updating the training instance weights, affecting the learning of the next tree in the sequence. After all trees are constructed, we can perform predictions on new data and weight their performance based on each tree’s accuracy on the training data.

Since the algorithm invests so much effort into correcting errors, it is very important to remove outliers from the data during the data cleaning process. 10 Essential Algorithms in Machine Learning

Original link: https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Fblog.goodaudience.com%2Ftop-10-machine-learning-algorithms-2a9a3e1bdaff

This article is compiled by Machine Heart, please contact this public account for authorization。

✄————————————————

Join Machine Heart (Full-time reporter / Intern): [email protected]

Submissions or seeking coverage: content@jiqizhixin.com

Advertising & Business Cooperation: [email protected]

Leave a Comment Cancel reply