Top 10 Popular AI Algorithms Explained Simply

What is Artificial Intelligence? Many people know, but most cannot explain clearly.

In fact, artificial intelligence has existed in our lives for a long time.

For example, the email we often use, wherespam filtering relies on artificial intelligence;
For example, thefingerprint recognition or facial recognition equipped in every smartphone is also achieved using artificial intelligence technology;
For example, thenon-contact temperature detectors widely used during the pandemic also utilized artificial intelligence;

However, for many people, artificial intelligence is still a relatively “profound” technology,but even the most profound technology starts from basic principles.There are ten popular algorithms in the field of artificial intelligence, their principles are simple, they were discovered and applied a long time ago, and you may have learned them in high school, and they are very common in daily life.This article will introduce the ten most popularartificial intelligence algorithms in the simplest language, allowing those interested in artificial intelligence or those who want to get started to have a more intuitive understanding.1Linear RegressionLinear Regression is probably the most popular machine learning algorithm. Linear regression aims to find a straight line that fits the data points in a scatter plot as closely as possible. It attempts to represent the independent variable (x value) and the numerical result (y value) by fitting the line equation to the data. This line can then be used to predict future values!The most commonly used technique for this algorithm is the Least Squares Method. This method calculates the best-fitting line to minimize the vertical distances from each data point to the line. The total distance is the sum of the squares of the vertical distances (green line) of all data points. The idea is to fit the model by minimizing this squared error or distance.For example, in simple linear regression, there is one independent variable (x-axis) and one dependent variable (y-axis) such as predicting next year’s housing price increase, or the sales of a new product in the next quarter, etc. It doesn’t sound difficult, but the challenge of the linear regression algorithm isn’t in deriving the predicted values, but in how to be more accurate. For that potentially very subtle number, how many engineers have exhausted their youth and hair.2Logistic RegressionLogistic Regression is similar to linear regression, but the result of logistic regression can only have two values. If linear regression predicts an open numerical value, logistic regression is more like a yes or no question. The range of Y values in the logistic function is from 0 to 1, which is a probability value. The logistic function typically presents an S-shape, dividing the chart into two regions, thus suitable for classification tasks.For example, the logistic regression curve shown above illustrates the relationship between the probability of passing an exam and study time, which can be used to predict whether one can pass the exam. Logistic regression is often used by e-commerce or food delivery platforms to predict users’ purchasing preferences for categories.3Decision TreesIf linear and logistic regression are tasks that conclude in one round, thenDecision Trees are a multi-step action, also used for regression and classification tasks, but the scenarios are usually more complex and specific. For example, a teacher facing a class of students, which ones are good students? Simply judging that a score of 90 is a good student seems too crude, and cannot rely solely on scores. For students with scores below 90, we can discuss several aspects such as homework, attendance, and questions separately.The above is an illustration of a decision tree, where each branching circle is called a node. At each node, we ask questions about the data based on available features. The left and right branches represent possible answers. The final nodes (i.e., leaf nodes) correspond to a predicted value. The importance of each feature is determined through a top-down approach. The higher the node, the more important its attribute. For example, in the teacher’s case above, the teacher believes attendance is more important than homework, so the attendance node is higher, and of course, the score node is higher.4Naive BayesNaive Bayes is based on Bayes’ theorem, which relates two conditional relationships. It measures the probability of each class, with each class’s conditional probability given the value of x. This algorithm is used for classification problems to obtain a binary “yes/no” result. Take a look at the equation below.The Naive Bayes classifier is a popular statistical technique, with a classic application being spam filtering. Of course, I bet a hotpot that 80% of people did not understand the above paragraph. (The number 80% is a guess, but experiential intuition is a kind of Bayesian calculation.) To explain Bayes’ theorem in non-technical terms, it is about determining the probability of A given that B occurs, to derive the probability of B given that A occurs. For example, if a kitten likes you, there is a% chance it will roll over in front of you; how likely is it that the kitten likes you if it rolls over in front of you? Of course, doing such a problem is like shooting in the dark, so we also need to introduce other data, such as the kitten likes you, has b% chance to cuddle, and c% chance to purr. So how do we know how likely the kitten is to like us? Bayes’ theorem allows us to calculate this from the probabilities of rolling over, cuddling, and purring.

Cat: Stop calculating, I don’t like you

5Support Vector MachinesSupport Vector Machines (SVM) are a supervised algorithm used for classification problems. Support Vector Machines try to draw two lines between data points, maximizing the margin between them. For this, we plot data items as points in n-dimensional space, where n is the number of input features. Based on this, Support Vector Machines find an optimal boundary, called a hyperplane, that best separates the possible outputs by class labels. The distance between the hyperplane and the nearest class point is called the margin. The optimal hyperplane has the largest margin, allowing for classification of points, thus maximizing the distance between the nearest data points and the two classes.Thus, the problem Support Vector Machines want to solve is how to separate a bunch of data, and their main application scenarios include character recognition, facial recognition, text classification, and various recognition tasks.6K-Nearest Neighbors (KNN)K-Nearest Neighbors (KNN) is very simple. KNN classifies objects by searching for the K most similar instances in the entire training set, i.e., K neighbors, and assigns a common output variable to all these K instances. Choosing K is crucial: a smaller value may yield a lot of noise and inaccurate results, while a larger value is impractical. It is most commonly used for classification but can also be applied to regression problems. The distance used to evaluate similarity between instances can be Euclidean distance, Manhattan distance, or Minkowski distance. Euclidean distance is the ordinary straight-line distance between two points. It is actually the square root of the sum of the squares of the differences in point coordinates.KNN classification exampleKNN is theoretically simple, easy to implement, and can be used for text classification, pattern recognition, clustering analysis, etc.7K-MeansK-Means is a clustering algorithm that classifies datasets. For example, this algorithm can be used to group users based on purchase history. It finds K clusters in the dataset. K-Means is used for unsupervised learning, so we only need to use the training data X and the number of clusters K that we want to identify. The algorithm iteratively assigns each data point to one of the K groups based on the features of each data point. It selects K points for each K-cluster (called centroids). Based on similarity, new data points are added to the cluster with the nearest centroid. This process continues until the centroids stop changing.In real life, K-Means plays an important role in fraud detection and is widely used in the automotive, healthcare, and insurance fraud detection fields.8Random ForestRandom Forest is a very popular ensemble machine learning algorithm. The basic idea of this algorithm is that the opinions of many people are more accurate than those of individuals. In Random Forest, we use an ensemble of decision trees (see Decision Trees).(a) During training, each decision tree is constructed based on bootstrap samples of the training set.(b) During classification, the decision for the input instance is made based on majority voting.Random Forest has a broad range of applications, from marketing to healthcare insurance, and can be used for modeling marketing simulations, statistical customer sourcing, retention, and churn, as well as predicting disease risks and patient susceptibility.9Dimensionality ReductionDue to the large amount of data we can capture today, machine learning problems have become more complex. This means training is extremely slow and it is challenging to find a good solution. This problem is often referred to as the “Curse of Dimensionality“.Dimensionality Reduction attempts to address this issue by combining specific features into higher-level features without losing the most important information. Principal Component Analysis (PCA) is the most popular dimensionality reduction technique.Principal Component Analysis reduces the dimensionality of the dataset by compressing it into lower-dimensional lines or hyperplanes/subspaces, preserving the significant features of the original data as much as possible.Dimensionality reduction can be illustrated by approximating all data points to a straight line.10Artificial Neural Networks (ANN)Artificial Neural Networks (ANN) can handle large and complex machine learning tasks. A neural network is essentially a set of interconnected layers composed of nodes with weights, called neurons. We can insert multiple hidden layers between the input layer and the output layer. Artificial Neural Networks use two hidden layers. Additionally, deep learning needs to be addressed.The working principle of artificial neural networks is similar to the structure of the brain. A group of neurons is assigned a random weight to determine how the neurons process input data. By training the neural network on input data, it learns the relationship between input and output. During the training phase, the system has access to the correct answers.If the network cannot accurately recognize the input, the system adjusts the weights. After sufficient training, it will consistently recognize the correct patterns.Each circular node represents an artificial neuron, and the arrows indicate the connection from the output of one artificial neuron to the input of another.Image recognition is a well-known application within neural networks.Now, you have gained a basic introduction to the most popular artificial intelligence algorithms and have some understanding of their practical applications.Image source: Turing Artificial Intelligence

Leave a Comment Cancel reply