Reflections on 7 Major Classification Algorithms from an Image Recognition Code

Recently, I was preparing some materials on machine vision and implemented a small case while writing code.

You can try implementing it yourself.

# Import modules
import ddddocr
# Call the method from the module
p=ddddocr.DdddOcr()
# Open the image to be recognized
with open('picture/img_1.png','rb') as file:
    # Pass the data stream and store: ddos attack
    img=file.read()
    #
res=p.classification(img)
print(res)

However, while writing, I happened to use a classification method called classification and decided to organize some of the most common classification algorithms in machine learning, hoping to help those learning related content.

The classification task is to identify which predefined target class the object belongs to. When the predefined target class is discrete, it is classification; when continuous, it is regression.

Common classification algorithms include decision tree classification, rule-based classification, neural networks, support vector machines, and naive Bayes classification.

Below are the main characteristics of each algorithm.

(1) Decision Trees

(1) Decision tree induction is a non-parametric method for constructing classification models. In other words, it does not require any prior assumptions and does not assume that classes and other attributes follow a certain probability distribution.

(2) Finding the optimal decision tree is an NP problem. Many decision tree algorithms use heuristic methods to guide the search of the hypothesis space.

(3) Even with a large training set, the cost of constructing a decision tree is relatively low.

(4) Decision trees are relatively easy to interpret.

(5) Decision tree algorithms are quite robust to noise interference.

(7) Redundant attributes do not adversely affect the accuracy of decision trees.

(8) Since most decision tree algorithms adopt a top-down recursive partitioning method, as one moves down the tree, the records will become smaller. At the leaf nodes, there may be fewer records, and for the classes represented by the leaf nodes, statistically significant judgments cannot be made, resulting in what is known as data fragmentation.

(9) Subtrees may be repeated multiple times in the decision tree, making the decision tree overly complex and harder to interpret.

(2) Rule-Based Classification Algorithms

(1) The expressive power of rule sets is almost equivalent to that of decision trees, as decision trees can be represented by mutually exclusive and exhaustive rule sets. Both rule-based classifiers and decision tree classifiers perform linear partitioning of the attribute space and assign classes to each partition.

(2) Rule-based classifiers are often used to produce more interpretable descriptive models, while the performance of these models can rival that of decision tree classifiers.

(3) The class-based rule ordering methods used by many rule-based classifiers (such as RIPPER) are very suitable for handling datasets with imbalanced class distributions.

(3) Nearest Neighbor Classifiers

(1) Nearest neighbor classifiers are a type of instance-based learning technique that uses specific training instances for prediction without maintaining abstractions derived from the data.

(2) Nearest neighbor classifiers are a passive learning method that does not require model building; however, the cost of testing samples is high because it requires calculating the similarity between each test sample and training samples individually. In contrast, active learning methods spend considerable computational resources to build models, and once established, classification of test samples is very fast.

(3) Nearest neighbor classifiers are highly sensitive to noise. This is because nearest neighbor classifiers predict based on local information, while decision trees and rule-based classifiers fit global models across the entire input space.

(4) Nearest neighbor classifiers can generate decision boundaries of any shape, providing a more flexible model representation compared to the usually limited linear decision boundaries of decision trees and rule-based classifiers.

(5) Unless appropriate proximity measures and data preprocessing are employed, nearest neighbor classification may yield incorrect predictions.

(4) Naive Bayes Classifiers

(1) Naive Bayes classifiers are robust against isolated noise points because these points are averaged when estimating conditional probabilities from data. By ignoring samples during modeling and classification, naive Bayes classifiers can also handle missing attribute values.

(2) This classifier is robust against irrelevant attributes.

(3) Relevant attributes may reduce the performance of naive Bayes classifiers.

(5) Bayesian Belief Networks (BBN)

(1) BBN provides a method for capturing prior knowledge in specific domains using graphical models.

(2) Constructing the network can be both time-consuming and labor-intensive; however, once the network structure is determined, adding new variables is straightforward.

(3) Bayesian networks are well-suited for handling incomplete data. Instances with missing attributes can be addressed by summing or integrating the probabilities of all possible values of that attribute.

(4) Because data and prior knowledge are combined in a probabilistic manner, this method is very robust against overfitting issues in models.

(6) Artificial Neural Networks (ANN)

(1) A multilayer neural network with at least one hidden layer is a universal approximator, meaning it can approximate any target function.

(2) ANNs can handle redundant features because weights are automatically learned during training.

(3) Neural networks are very sensitive to noise in training data.

(4) The gradient descent method used for weight learning in ANNs often converges to local minima. A method to avoid local minima is to add a momentum term to the weight update formula.

(5) Training ANNs is a time-consuming process, especially when the number of hidden nodes is large. However, classification of test samples is very fast.

(7) Support Vector Machines (SVM)

(1) The SVM learning problem can be represented as a convex optimization problem, allowing known efficient algorithms to find the global minimum of the objective function. In contrast, other classification methods typically adopt a greedy learning strategy to search the hypothesis space, which generally only yields local optimal solutions.

(2) SVM controls model capacity by maximizing the margin of the decision boundary.

(3) By introducing a dummy variable for each classification attribute in the data, SVM can be applied to classify data.

Leave a Comment