KNN Algorithm Implementation in Python

KNN (K-Nearest Neighbors) algorithm is a simple yet effective classification and regression algorithm. Its basic idea is to classify or predict by calculating the distance between samples. Below is an example of KNN algorithm implementation in Python, including data preparation, model training, and prediction.

1. Install Required Libraries

If you haven’t installed scikit-learn and numpy, you can use the following command to install them:

pip install numpy scikit-learn

2. Implementation of KNN Algorithm

Here is a complete example of classification using the KNN algorithm, utilizing the famous Iris dataset.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 1. Load dataset
iris = datasets.load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# 2. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Feature normalization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 4. Create KNN classifier
k = 3  # Choose K value
knn = KNeighborsClassifier(n_neighbors=k)

# 5. Train the model
knn.fit(X_train, y_train)

# 6. Make predictions
y_pred = knn.predict(X_test)

# 7. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

# 8. Visualization (optional)
# Here we only plot the scatter plot of the first two features
plt.figure(figsize=(10, 6))
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_pred, cmap='viridis', marker='o', edgecolor='k', s=100)
plt.title('KNN Classification on Iris Dataset')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Code Explanation

  1. Load Dataset: Use sklearn.datasets to load the Iris dataset.

  2. Split into Training and Testing Sets: Use train_test_split to split the dataset into training and testing sets, with the test set accounting for 20%.

  3. Feature Normalization: Use StandardScaler to normalize the features, making their mean 0 and variance 1.

  4. Create KNN Classifier: Use KNeighborsClassifier to create the KNN classifier and set the K value.

  5. Train the Model: Train the KNN model using the training data.

  6. Make Predictions: Make predictions using the testing data.

  7. Evaluate the Model: Calculate the accuracy of the model and output the classification report and confusion matrix.

  8. Visualization: Plot a scatter plot of the testing set to show the classification results (only using the first two features for visualization).

Conclusion

KNN is a simple and easy-to-use classification algorithm, suitable for beginners to learn and understand the basic concepts of machine learning. Through the above example, you can learn how to implement the KNN algorithm in Python and apply it to real datasets. If you have any questions or need further assistance, feel free to ask!

Leave a Comment