Implementing K-Nearest Neighbors Algorithm in Python

K-Nearest Neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms.

The idea behind this method is: in the feature space, if the majority of the k nearest samples (i.e., the closest samples in the feature space) around a sample belong to a certain category, then that sample also belongs to this category.

The following diagram assumes that the blue square represents those who like the Python programming language, while the red triangle represents those who like Java.

Implementing K-Nearest Neighbors Algorithm in Python

Based on the KNN algorithm principle discussed earlier, can you guess which programming language the green circle might prefer?

Answer: Java. In the figure, K=3, and among the 3 neighbors, 2 prefer Java, so I deduce that the green one also likes Java.

Since this is a prediction, it is definitely not 100% accurate; it is merely a prediction based on existing data.

There is a dataset that contains information about preferred programming languages in various locations.

Implementing K-Nearest Neighbors Algorithm in Python

Each piece of data includes longitude, latitude, and the corresponding preferred language.

Our task is to implement a prediction model based on this data, which can output the predicted language when given a new longitude and latitude data point.

Implementing K-Nearest Neighbors Algorithm in Python

Prediction results:

Implementing K-Nearest Neighbors Algorithm in Python

Loading data:

Implementing K-Nearest Neighbors Algorithm in Python

After loading, first define a method to calculate the distance between two points:

Implementing K-Nearest Neighbors Algorithm in Python

The smaller the result value, the closer the two points are, indicating that they are more similar (many similarities are determined using similar methods, such as image similarity, which can be used for image search and recommendation functions, etc.).

Implementing K-Nearest Neighbors Algorithm in Python

Then define a method to predict, knn_classify, where k represents the number of neighbors, label_points represents the training data provided, and new_point is the point to be predicted.

Implementing K-Nearest Neighbors Algorithm in Python

Finally, it can be used.

Implementing K-Nearest Neighbors Algorithm in Python

To make it clearer, it can be displayed in a chart.

The original data points are marked in red, green, and blue, while the black point represents the predicted point.

Implementing K-Nearest Neighbors Algorithm in Python

(The End)
Long press the QR code to follow! Yezi plays with you

Leave a Comment