Handwritten Digit Recognition and Application Based on TensorFlow Deep Learning

Abstract:

Handwritten digit recognition is an important component of artificial intelligence recognition systems. Due to individual differences in handwritten digits, the accuracy of existing recognition systems is relatively low. This paper completes the recognition and application of handwritten digits based on the TensorFlow deep learning framework. First, the TensorFlow deep learning framework is established, and the structures of the Softmax and Convolutional Neural Network (CNN) models are analyzed. Then, deep learning is performed on the MNIST handwritten dataset consisting of 60,000 samples, followed by a comparative test on 10,000 samples. Finally, the optimal model is ported to the Android platform for application. Experimental data validates that, compared to the traditional Softmax model, the recognition rate of the TensorFlow deep learning CNN model reaches 99.17%, an improvement of 7.6%, providing significant research value for the development of artificial intelligence recognition systems.

Citation Format in Chinese: Huang Rui, Lu Xuming, Wu Yilin. Handwriting digit recognition and application based on TensorFlow deep learning[J]. Application of Electronic Technique, 2018, 44(10): 6-10.Citation Format in English: Huang Rui, Lu Xuming, Wu Yilin. Handwriting digit recognition and application based on TensorFlow deep learning[J]. Application of Electronic Technique, 2018, 44(10): 6-10.

❖

0 Introduction

With the development of technology, artificial intelligence recognition technology has been widely applied in various fields, also pushing computer applications towards intelligent development. On one hand, artificial intelligence models represented by deep learning and neural networks have received widespread attention from scholars at home and abroad; on the other hand, the open-source systems of artificial intelligence and machine learning have established an open technical platform, promoting the development of artificial intelligence research. This paper constructs Softmax and CNN models based on the second-generation open-source artificial intelligence platform TensorFlow and completes the recognition of handwritten digits.

LECUN Y et al. proposed a multi-layer neural network model, LeNet-5, for recognizing handwritten digits from 0 to 9. This research model learns through the Back Propagation (BP) algorithm, establishing one of the earliest models of CNN applications[1-2]. With the emergence of artificial intelligence image recognition, CNN has become a research hotspot, mainly applied in image classification[3], object detection[4], object tracking[5], text recognition[6], etc. Algorithms such as AlexNet[7], GoogleNet[8], and ResNet[9] have achieved significant success in recent years.

This paper, based on Google’s second-generation open-source artificial intelligence platform TensorFlow, compares and verifies the Softmax regression algorithm and the CNN model, and finally applies the trained model on the Android platform.

1 Introduction to TensorFlow

On November 9, 2015, Google released and open-sourced the second-generation artificial intelligence learning system TensorFlow[10]. Tensor represents tensors (composed of N-dimensional arrays), and Flow indicates computation based on data flow graphs. TensorFlow represents computing as tensors flowing from one end of the graph to the other. TensorFlow supports various deep neural network models, including Long Short Term Memory Networks (LSTMN), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN). The basic architecture of TensorFlow is shown in Figure 1.

As shown in Figure 1, the basic architecture of TensorFlow can be divided into front-end and back-end. Front-end: based on a programming environment that supports multiple languages, it accesses back-end programming models through system API calls. Back-end: provides the operating environment, consisting of distributed operating environment, kernel, network layer, and device layer.

2 Softmax Regression

The Softmax regression algorithm can extend the binary logistic regression problem to multiple classifications. Assuming that the sample of the regression model consists of K classes, with a total of m, the training set can be expressed by formula (1):

In the formula, x(i)∈R(n+1), y(i)∈{1, 2, …, K}, n+1 is the dimension of the feature vector x. For a given input value x, the output of K estimated probabilities is expressed by formula (2):

By performing gradient descent on parameters θ1, θ2, …, θk, the Softmax regression model is obtained, as implemented in TensorFlow as shown in Figure 2.

The matrix expression of Figure 2 can be obtained as formula (5):

By substituting the test set data into formula (5) and calculating the probability of the corresponding category, the category with the highest probability is the predicted result.

3 CNN

Convolutional Neural Networks (CNN) are a type of feedforward neural network, typically consisting of data input layers, convolution computation layers, ReLU activation layers, pooling layers, and fully connected layers. CNN replaces traditional matrix multiplication operations with convolution operations. CNNs are commonly used for image data processing, and the commonly used LenNet-5 neural network model is shown in Figure 3.

The model consists of 2 convolutional layers, 2 sampling layers (pooling layers), and 3 fully connected layers.

3.1 Convolutional Layer

The convolutional layer performs sliding convolution operations with a tunable parameter convolution kernel on the feature map of the previous layer, adds a bias, and obtains a net output, then calls the activation function to get the convolution results, outputting a new feature map through sliding convolution operations on the entire image, as shown in formulas (6)~(7):

3.2 Sampling Layer

The sampling layer divides the input feature map into multiple non-overlapping regions using an n×n window, then calculates the maximum value or average for each region, reducing the image size by n times, and finally adds a bias and obtains sampled data through the activation function. The maximum value method, average method, and output function are shown in formulas (8)~(10):

3.3 Fully Connected Output Layer

The fully connected layer classifies the original image using the extracted feature parameters. Common classification methods are shown in formula (11):

4 Experimental Analysis

This paper is based on the TensorFlow deep learning framework, using the MNIST dataset as the data source, employing both the Softmax regression algorithm and CNN deep learning for model training, followed by comparative verification of the trained models, and application on the Android platform.

4.1 MNIST Dataset

The MNIST dataset contains a training dataset (train-images-idx3) with 60,000 rows and a testing dataset (test-images-idx3) with 10,000 rows. Each sample has a unique corresponding label (label) to describe the digit, and each image contains 28×28 pixels, as shown in Figure 4.

As shown in Figure 4, each sample image consists of 28×28 pixels, which can be represented by a vector of length 784. The MNIST training dataset can be transformed into a tensor of shape [60,000, 784], where the first dimension represents the index of the image, and the second dimension represents the pixel points of each image. The corresponding labels (label) are digits between 0 and 9, which can be represented by one-hot encoding. A one-hot encoding has all dimensions as 0 except for one digit that is 1, for example, label 0 is represented as [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], thus the sample labels form a tensor of shape [60,000, 10].

4.2 Implementation of the Softmax Model

According to formula (5), the Softmax model can be decomposed into basic matrix operations and Softmax calls. The implementation of this model is as follows: (1) use symbolic variables to create interactive operation units; (2) create weight values and biases; (3) implement Softmax regression according to formula (5).

4.3 Implementation of the CNN Model

Combining the LenNet-5 neural network model, the implementation of the TensorFlow deep learning model is as follows:

(1) Initialize weights and biases;

(2) Create convolution and pooling templates;

(3) Perform convolution and pooling twice;

(4) Perform fully connected output;

(5) Softmax regression.

4.4 Evaluation Metrics

The commonly used cost function “cross-entropy” is employed, as shown in formula (12):

4.5 Model Verification

The method for verifying prediction results is as follows:

(1) Save the trained model;

(2) Input test samples for label prediction;

(3) Call the tf.argmax function to obtain the predicted label values;

(4) Match with the actual label values and calculate the recognition rate.

Based on the above steps, the recognition quantities and rates for handwritten digits 0 to 9 using both the Softmax model and the convolutional neural network are shown in Figure 5 and Table 1.

From the model prediction results in Table 1, it can be seen that the Softmax model has the highest recognition rate of 97.9% for the digit 1. The recognition rates for digits 3 and 8 are relatively lower, at 84.9% and 87.7%, respectively. The overall recognition rate of the Softmax model for handwritten digits 0 to 9 reaches 91.57%.

Combining Figures 5 and Table 1, it can be seen that the overall recognition rate based on the CNN model is higher than that of the Softmax model, with an increase of 14.7% for digit 3, and only a 1.7% increase for digit 1. The overall recognition rate for handwritten digits 0 to 9 based on the deep learning CNN model reaches 99.17%, an improvement of 7.6% over the Softmax model.

4.6 Model Application

Through comparative verification of the models, it is evident that the recognition rate based on deep learning CNN is superior to that of the Softmax model. The trained model is now ported to the Android platform for cross-platform application, implemented as follows.

(1) UI Design

A Bitmap control is used to display the user’s handwritten touch screen trajectory, with two Button controls for digit recognition and clearing the screen, respectively.

(2) TensorFlow Reference

First, compile the necessary TensorFlow jar files and so files. Then, import the trained model (.pb) into the Android project.

(3) Interface Implementation

① Interface definition and initialization:

inferenceInterface.initializeTensorFlow(getAssets(), MODEL_FILE);

② Call the interface:

inferenceInterface.fillNodeFloat(INPUT_NODE, new int[]{1, HEIGHT, WIDTH, CHANNEL}, inputs);

③ Obtain prediction results:

inferenceInterface.readNodeFloat(OUTPUT_NODE, outputs);

Through the above steps, the setup and application on the Android platform environment can be completed. First, use the Android touch screen function to capture and record handwritten trajectories. After finishing the handwriting, click the recognition button, and the system will call the model for recognition, outputting the recognition results to the user interface. After recognition is complete, click the clear button, and repeat the above operation steps for further handwritten digit recognition. The recognition effects of some handwritten digits are shown in Figure 6.

As shown in Figure 6, handwritten digit recognition based on TensorFlow deep learning has been completed on the Android platform, and the CNN training model has shown good recognition performance, achieving cross-platform application of the TensorFlow training model.

5 Conclusion

This paper is based on the TensorFlow deep learning framework, employing algorithms such as Softmax regression and CNN for training handwritten digits, and porting the model to the Android platform for cross-platform application. Experimental data indicates that the recognition rate of the Softmax regression model is 91.57%, while the recognition rate of the CNN model reaches 99.17%. This indicates that handwritten digit recognition based on deep learning has certain reference significance in artificial intelligence recognition.

References

[1] HUBEL D H, WIESEL T N. Receptive fields and functional architecture of monkey striate cortex[J]. Journal of Physiology, 1968, 195(1): 215-243.

[2] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

[3] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[J]. arXiv: 1311.2901[cs.CV].

[4] HE K, ZHOU X, REN S, et al. Deep residual learning for image recognition[J]. arXiv: 1512.03385[cs.CV].

[5] LI H, LI Y, PORIKLI F. DeepTrack: learning discriminative feature representations online for robust visual tracking[J]. IEEE Transactions on Image Processing, 2015, 25(4): 1834-1848.

[6] GOODFELLOW I J, BULATOV Y, IBARZ J, et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks[J]. arXiv: 1312.6082[cs.CV].

[7] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems. Curran Associates Inc., 2012: 1097-1105.

[8] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]. IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2015: 1-9.

[9] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[J]. arXiv: 1512.03385[cs.CV].

[10] ABADI M, AGARWAL A, BARHAM P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems[J]. arXiv: 1603.04467[cs.DC].

Author Information:

Huang Rui, Lu Xuming, Wu Yilin

(Department of Computer Science, Guangdong Second Normal University, Guangzhou, Guangdong 510303)

Recruitment Information

Handwritten Digit Recognition and Application Based on TensorFlow Deep Learning

Leave a Comment Cancel reply