Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

Selected from | Medium

Transferred from | Machine Heart

Contributors | Jiang Siyuan, Huang Xiaotian, Wu Pan

Image classification is one of the fundamental research topics in the field of artificial intelligence, and researchers have developed a large number of algorithms for image classification. Recently, Shiyu Mou published an article on Medium, comparing five methods for image classification (KNN, SVM, BP neural networks, CNN, and transfer learning) through experimental comparisons. The relevant datasets and codes for this research have also been published on GitHub.

Project address: https://github.com/Fdevmsy/Image_Classification_with_5_methods

Image classification, as the name suggests, is the process of tagging input images with fixed category labels. This is one of the core problems in the field of computer vision. Although it sounds simple, image classification has a large number of different practical applications.

Traditional Methods: Feature Description and Detection

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

It may be beneficial for some sample tasks, but the reality is much more complicated.

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

Therefore, we do not directly specify the visual appearance of each type through code but use machine learning—providing the computer with numerous instances of each category, then developing learning algorithms to observe these instances and learn the appearance of each category.

However, image classification is so complex that it often involves deep learning models, such as CNN (Convolutional Neural Networks). We know that many algorithms we learn in class (like KNN, SVM) are usually good at data mining; however, they are not the best choice for image classification.

Thus, we will compare the algorithms learned in class with CNN and transfer learning.

Goals

Our goals are:

1. To compare KNN, SVM, and BP neural networks with the industry algorithms for image recognition—CNN and transfer learning.

2. To gain experience in deep learning.

3. To explore machine learning frameworks through TensorFlow.

System Design & Implementation Details

Algorithms and Tools

The five methods used in this project are KNN, SVM, BP neural networks, CNN, and transfer learning.

The entire project can be divided into three categories of methods:

  • First category: Using KNN, SVM, and BP neural networks, these classroom algorithms. These algorithms are powerful and easy to implement. We mainly use sklearn to implement these algorithms.

  • Second category: Although traditional multi-layer perceptron models have been successfully applied to image recognition, due to their fully connected nature, they encounter dimensionality issues, making them not scale well to higher resolution images. Therefore, we built a CNN using the deep learning framework TensorFlow.

  • Third method: Retraining the last layer of a pre-trained deep neural network called Inception V3, also provided by TensorFlow. Inception V3 was trained for the ImageNet Large Visual Recognition Challenge, using data from 2012. This is a conventional task in computer vision, where the model tries to classify all images into 1000 categories, such as zebras, Dalmatians, and dishwashers. To retrain this pre-trained network, we need to ensure that our dataset is not included in the pre-training.

Implementation

First category: Preprocess the dataset and use sklearn to implement KNN, SVM, and BP neural networks.

First, we defined two different preprocessing functions using the OpenCV package: the first is image to feature vector, which can resize images and convert them into row pixel lists; the second is to extract the color histogram, which extracts a 3D color histogram from the HSV color space using cv2.normalize and flattens the result.

Next, we constructed several parameters that we need to parse. Since we want to test the accuracy of the entire dataset and subsets with different numbers of labels, we built a dataset as parameters and parsed it into our program. We also constructed the number of neighbors for the k-NN method as a parsing parameter.

After that, we started extracting the features of each image in the dataset and placing them into an array. We used cv2.imread to read each image and extracted the string from the image name to split the labels. In our dataset, we used the same format—category label.image number.jpg—to set the name, so we can easily extract the classification label of each image. We then used these two functions to extract two types of features and append them to the array rawImages, while the previously extracted labels were appended to the array labels.

The next step is to split the dataset using the function train_test_split imported from the sklearn package. This set has the suffix RI, RL, which is the split result of rawImages and labels, while the other is the split result of features and labels. We used 85% of the dataset as the training set, with the remaining 15% as the test set.

Finally, we applied KNN, SVM, and BP neural network functions to evaluate the data. For KNN, we used KNeighborsClassifier, for SVM we used SVC, and for BP neural networks we used MLPClassifier.

Second category: Build CNN using TensorFlow. The entire purpose of TensorFlow is to allow you to build a computation graph (using languages like Python) and then execute that graph in C++ (which is more efficient than Python under the same computational load).

TensorFlow can also automatically compute the gradients needed to optimize graph variables, allowing the model to perform better. This is because the graph is composed of simple mathematical expressions, so the gradients of the entire graph can be calculated using the chain rule of derivatives.

A TensorFlow graph consists of the following parts, each of which will be detailed below:

  • Placeholder variables for inputting data into the graph.

  • Optimization vectors to improve the performance of the convolutional network.

  • The mathematical formulas of the convolutional network.

  • Cost metrics that can guide variable optimization.

  • Optimization methods for updating variables.

  • The CNN architecture consists of a stack of different layers that transform the input into output through differentiable functions.

Thus, in our implementation, the first layer saves the image, and then we constructed three convolutional layers using 2 x 2 max pooling and Rectified Linear Unit (ReLU). The input is a 4-dimensional tensor:

  • Image index.

  • Y-axis of each image.

  • X-axis of each image.

  • Channel of each image.

The output is another 4-dimensional tensor:

  • Image index, the same as the input.

  • Y-axis of each image. If using 2×2 pooling, then the height and width of the input image are divided by 2.

  • X-axis of each image. Same as above.

  • Channels generated by convolution filters.

Next, we constructed two fully connected layers at the end of the network. The input is a 2-dimensional shaped tensor [num_images, num_inputs]. The output is also a 2-dimensional shaped tensor [num_images, num_outputs]

However, to connect the convolutional layers and fully connected layers, we need a flatten layer to reduce the 4-dimensional vector to a 2-dimensional vector that can be input into the fully connected layer.

The end of the CNN is usually a softmax layer, which can normalize the output from the fully connected layer, so that each element is limited between 0 and 1, and the sum of all elements equals 1.

To optimize the training results, we need a cost metric and minimize the cost in each iteration. The cost function we used here is cross-entropy (tf.nn.softmax_cross_entropy_with_logits()), and we take the average of the cross-entropy across all image classifications. The optimization method is tf.train.AdamOptimizer(), which is a high-level form of gradient descent with a tunable learning rate.

Third method: Retrain Inception V3. Modern object recognition models have millions of parameters and can take weeks to fully train a model. Transfer learning is a method that quickly accomplishes this by adopting models already trained on classification datasets (like ImageNet), as it only requires retraining the weights of new categories. Although such models do not perform as well as fully trained models, they are very efficient for many applications since they do not require a GPU and can complete training on a laptop in half an hour.

Readers can click on the link to further understand the training process of transfer learning: https://www.tensorflow.org/tutorials/image_retraining

First, we need to obtain the pre-trained model, remove the old top neural network, and then retrain an output layer based on our dataset. Although not all breeds of cats are represented in the original ImageNet dataset and fully trained models, the magic of transfer learning lies in its ability to leverage the low-level features of a trained model to recognize certain targets, as low-level features can be applied to many recognition tasks without significant changes. We then analyze all local images and calculate the bottleneck values for each. Since each image is reused multiple times during training, calculating each bottleneck value takes a considerable amount of time, but we can speed up the caching of these bottleneck values to avoid redundant computations.

This script will run for 4000 training steps. Each step randomly selects 10 images from the training set, searches for their bottleneck values from the cache, and then trains the last layer to obtain predictions. These predictions will update the weights of the last layer through the backpropagation process by comparing with the true labels.

Experiments

Dataset

Oxford-IIIT Pet Dataset: http://www.robots.ox.ac.uk/~vgg/data/pets/

This dataset contains 25 breeds of dogs and 12 breeds of cats. Each category has 200 photos. We will only use 10 breeds of cats in this project.

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

In this project, the categories we used are [Sphynx cat, Siamese cat, Ragdoll cat, Persian cat, Maine Coon, British Shorthair, Bombay cat, Burmese cat, Bengal cat, Abyssinian cat].

Therefore, we have a total of 2000 images in the dataset. Although the sizes of the images are different, we can resize them to a fixed size such as 64×64 or 128×128.

Preprocessing

In this project, we mainly use OpenCV to preprocess the images, such as reading images into arrays or resizing them to our required sizes.

A common method to enhance image training results is to randomly deform, crop, or adjust the brightness of the training inputs. By using all possible variants of the same image, this method not only has the advantage of effectively expanding the size of the training dataset but also tends to help the network learn to handle all distortions that may occur in real life.

For details, see: https://github.com/aleju/imgaug.

Evaluation

First method: The first part is preprocessing the dataset and using sklearn to apply KNN, SVM, and BP neural networks.

There are many parameters in the program that can be adjusted: in the image_to_feature_vector function, we set the image size to 128×128; we also tried training with other sizes (like 8×8, 64×64, 256×256). We found that although larger image sizes yield better results, larger images also increase execution time and memory requirements. Therefore, we finally decided to use an image size of 128×128, as it is not too large while ensuring accuracy.

In the extract_color_histogram function, we set the binary values for each channel to 32,32,32. In the previous function, we also tried 8, 8, 8 and 64, 64, 64. Although higher values can yield better results, they also require longer execution times, so we believe that 32,32,32 is appropriate.

For the dataset, we trained three types. The first type is a subset with 400 images and 2 labels. The second type is a subset with 1000 images and 5 labels. The last type is the full dataset with 1997 images and 10 labels. We parsed the different datasets as parameters in the program.

In KNeighborsClassifier, we only changed the number of neighbors and stored the classification results of the optimal K value for each dataset. All other parameters were set to default.

In MLPClassifier, we set 50 neurons in each hidden layer. We did test multiple hidden layers, but it seemed that there was no significant change in the final results. The maximum number of iterations was set to 1000, and to ensure the model converged, we tolerated a deviation of 1e-4. We also set the L2 penalty parameter alpha to its default value, the random state to 1, and the solver to ‘sgd’ with a learning rate of 0.1.

In SVC, the maximum number of iterations was set to 1000, and the class weights were set to ‘balanced’.

The runtime of our program was not too long, taking about 3 to 5 minutes for each of our three datasets.

Second method: Build CNN using TensorFlow

Using the entire large dataset would take a long time to compute the gradients of the model, so we updated the weights using only small batches of images in each iteration of the optimizer, with batch sizes generally being 32 or 64. The dataset was divided into a training set containing 1600 images, a validation set containing 400 images, and a test set containing 300 images.

The model also has many parameters that need adjustment.

The first is the learning rate. A good learning rate is small enough to allow the model to converge easily, yet large enough to prevent the convergence speed from being too slow. Therefore, we chose 1 x 10^-4.

The second parameter that needs adjustment is the size of the images fed into the network. We trained with both 64×64 and 128×128 image sizes, and the results showed that larger sizes yield higher model accuracy, but at the cost of longer run times.

Then there’s the number of layers and the shape of the neural network. However, in reality, there are so many parameters to adjust in this area that it is difficult to find an optimal value among all parameters.

According to many resources online, we found that a large part of the parameter selection for building neural networks is based on existing experience.

Initially, we aimed to build a fairly complex neural network with the following parameters:

  • # Convolutional Layer 1. filter_size1 = 5 num_filters1 = 64

  • # Convolutional Layer 2. filter_size2 = 5 num_filters2 = 64

  • # Convolutional Layer 3. filter_size3 = 5 num_filters3 = 128

  • # Fully-connected layer 1. fc1_size = 256

  • # Fully-connected layer 2. fc1_size = 256

We used 3 convolutional layers and 2 fully connected layers, and their structures were quite complex.

However, our result was overfitting. For such a complex network, the training accuracy reached 100% after 1000 iterations, but the testing accuracy was only 30%. Initially, we were puzzled as to why the model was overfitting and began to randomly adjust parameters, but the model’s performance improved. Fortunately, a few days later, I happened to read an article by Google discussing deep learning: https://medium.com/@blaisea/physiognomys-new-clothes-f2d4b59fdd6a. The article pointed out that the project they led had a problem: “A technical issue is that if there are fewer than 2000 samples, it is insufficient to train and test convolutional neural networks like AlexNet without encountering overfitting.” This made me realize that our dataset was indeed too small, and the network architecture was too complex, which caused the overfitting phenomenon.

Our dataset contains exactly 2000 images

Therefore, I began to reduce the number of layers in the neural network and the size of the kernel function. I tried adjusting many parameters, and here are the final parameters we used for the neural network architecture:

  • # Convolutional Layer 1. filter_size1 = 5 num_filters1 = 64

  • # Convolutional Layer 2. filter_size2 = 3 num_filters2 = 64

  • # Fully-connected layer 1. fc1_size = 128

  • # Number of neurons in fully-connected layer.

  • # Fully-connected layer 2. fc2_size = 128

  • # Number of neurons in fully-connected layer.

  • # Number of color channels for the images: 1 channel for gray-scale. num_channels = 3

We only used 2 small convolutional layers and 2 fully connected layers. The training results were not good; overfitting occurred again after 4000 iterations, but the testing accuracy was still 10% higher than the previous model.

We are still looking for solutions, but one obvious reason is that our dataset is indeed too small, and we do not have enough time to make more improvements.

As a final result, we achieved about 43% accuracy after 5000 iterations, which took an hour and a half. In fact, we were quite frustrated with this result, so we prepared to use another standard dataset, CIFAR-10.

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

The CIFAR-10 dataset consists of 60,000 32×32 color images in 10 classes, with 6,000 images per class. This dataset includes 50,000 training images and 10,000 test images.

We used the same neural network architecture as above, and after 10 hours of training, we achieved 78% accuracy on the test set.

Third method: Retrain Inception V3. We randomly selected some images for training, while another batch was used for validation.

This model also has many parameters that need adjustment.

The first is the number of training steps, with the default value being 4000 steps. We can also increase or decrease it based on the situation to quickly obtain an acceptable result.

Next is the learning rate, which controls the magnitude of updates to the last layer during training. Intuitively, if the learning rate is small, more time is needed for learning, but it may eventually converge to a better global accuracy. The training batch size controls how many images are checked in one training step, and since the learning rate is applied to each batch, we need to reduce it if we can achieve similar global effects with a larger batch.

Since deep learning tasks typically require long run times, we do not want the model to perform poorly after training for several hours. Therefore, we need to frequently obtain reports on validation accuracy. This way, we can also avoid overfitting. The dataset split allocates 80% of the images for primary training, 10% for a validation set that is frequently checked during training, and the remaining 10% for the final test set to predict the classifier’s performance in the real world.

Results

First category: Preprocess the dataset and use sklearn to implement KNN, SVM, and BP neural networks.

The results are shown in the table below. Since the SVM results were very poor, even lower than random guessing, we will not display its results.

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

From the results, we see:

  • In k-NN, the accuracy of raw pixels and histograms is relatively equal. In the subset with 5 labels, the histogram accuracy is slightly higher than that of raw pixels; however, overall, the results of raw pixels are better.

  • In the neural network MLP classifier, the accuracy of raw pixels is far lower than that of the histogram. For the entire dataset (10 labels), the accuracy of raw pixels is even lower than random guessing.

  • All of these two sklearn methods did not perform well; the accuracy of correctly identifying classifications in the entire dataset (10 label dataset) was only about 24%. These results indicate that using sklearn to classify images is inadequate, and they do not perform well when classifying complex images with multiple categories. However, compared to random guessing, they indeed show improvements, just not enough.

Based on the above results, we found that to improve accuracy, it is necessary to use some deep learning methods.

Second category: Build CNN using TensorFlow. As mentioned above, we cannot obtain good results due to overfitting.

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

Typically, training takes half an hour, but due to overfitting, we believe this runtime is not significant. By comparing with the first category method, we see that despite overfitting the training data, I still obtained better results.

Third category: Retrain Inception V3

The entire training process took no more than 10 minutes. We achieved excellent results and truly witnessed the power of deep learning and transfer learning.

Demonstration:

Comparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

Conclusion

Based on the above comparisons, we can see:

  • KNN, SVM, and BP neural networks are insufficient for certain specific image classification tasks.

  • Although we overfit with CNN, it is still better than those classroom methods.

  • Transfer learning is highly efficient and powerful for image classification problems. It is accurate and quick, can complete training in a short time—and does not require the help of a GPU. Even if you only have a small dataset, it can achieve good results and reduce the likelihood of overfitting.

We have learned a lot from the image classification task, which is quite different from other classification tasks in class. The datasets are relatively large and dense, requiring very complex networks, and most methods rely on the computational power of GPUs.

Experience:

  • Crop or resize images to make them smaller

  • Randomly select a small batch in each iteration of training

  • Randomly select a small batch for validation during the validation set, frequently recording validation scores during training

  • Image augmentation can be used to process images, increasing the size of the dataset

  • For image classification tasks, we need a larger dataset than 200 x 10; the CIFAR-10 dataset contains 60,000 images.

  • More complex networks require larger datasets for training

  • Be cautious of overfitting

Note: The first method was implemented by Ji Tong: https://github.com/JI-tongComparison of Five Image Classification Methods: KNN, SVM, BPNN, CNN, and Transfer Learning

References

1. CS231n Convolutional Neural Networks for Visual Recognition: http://cs231n.github.io/convolutional-networks/

2. TensorFlow Convolutional Neural Networks: https://www.tensorflow.org/tutorials/deep_cnn

3. How to Retrain Inception’s Final Layer for New Categories: https://www.tensorflow.org/tutorials/image_retraining

4. k-NN classifier for image classification: http://www.pyimagesearch.com/2016/08/08/k-nn-classifier-for-image-classification/

5. Image Augmentation for Deep Learning With Keras: http://machinelearningmastery.com/image-augmentation-deep-learning-keras/

6. Convolutional Neural Network TensorFlow Tutorial: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb

Leave a Comment