This article briefly introduces the main work of the paper accepted in April 2019 by Pattern Recognition titled “Recognizing Online Handwritten Chinese Characters Using RNNs with New Computing Architectures”. This paper mainly addresses the end-to-end recognition problem of handwritten Chinese characters.
Handwritten input is a very common human-computer interaction method. With the continuous development of deep learning, researchers have gradually applied deep neural networks to online handwritten Chinese character recognition[1][2][3], and the technology has become increasingly mature.
Generally speaking, the handwritten Chinese characters we commonly see are written on touch screens. When writing on a touch screen, the hand or arm is usually supported, so the characters written on the touch screen are mostly quite neat.
In recent years, a new type of handwriting method has emerged—air writing. This method generally uses sensors that can detect finger information (such as Leap Motion) to record the trajectory of the user’s finger movements (writing Chinese characters) and displays the trajectory (characters) on the machine.
The Chinese characters written through air writing are generally written in one stroke without any stroke marks (lifting the pen, placing the pen). In addition, the shapes of air-written characters tend to be more irregular.
We provide an introductory diagram of the air writing character recognition system and a comparison between air-written characters and traditional handwritten characters in Fig. 1.
This article mainly addresses two different types of handwritten Chinese characters and designs an end-to-end recognizer based on Recurrent Neural Networks (RNN) that achieves good recognition results on both types of handwritten Chinese character datasets.
In addition, we have added two new computing architectures based on traditional Recurrent Neural Networks:
1. Variance constraint;
2. Attention weight vector.
By adding these two new computing architectures, the recurrent neural network achieves a high recognition rate with fewer parameters.
Fig.2 illustrates the basic network structure for handwritten character recognition. It consists of a N-layer unidirectional recurrent neural network, a hidden layer state vector processing layer, and a fully connected layer.
At each time step t, the neural network accepts a position coordinate of the handwritten character sample and calculates the corresponding hidden layer state vector. After the neural network has received and processed all position coordinates of the input sample, these hidden layer state vectors are processed and sent to the fully connected layer. Then, classification is performed through a softmax classifier.
1. Brief Description of Variance Constraint


From the above expression, it can be seen that the greater the absolute value of elements in the hidden layer state vector, the higher the probability that the corresponding current parameter is a key parameter for that sample. If the absolute value of the hidden layer vector is 0, then the current parameter has little effect on the classification of that sample. Thus, the size of the absolute value of the elements in the hidden layer state vector determines the importance of the current parameter for sample classification, i.e., whether the current parameter is a key parameter for that sample.
Therefore, during the training process, we constrain the variance of the hidden layer state vector, which is incorporated into the loss function. By constraining the variance of the hidden layer state vector, the absolute values of the elements in the hidden layer state vector will decrease, and all elements in the hidden layer state vector will be distributed around its mean. This reduces the number of elements with large values in the hidden layer state vector, thereby decreasing the number of key parameters for the input sample.
For the current input sample, the hidden layer state vector corresponding to each time step in the RNN has different importance for recognizing that sample. In this article, we use the RNN network itself to generate an attention weight vector, which assigns different weights to the hidden layer states at different time steps. In this article, we directly take the last dimension of the hidden layer state vector of the RNN as the weight of that hidden layer state, as shown in Fig.4.
The stroke coordinates of handwritten Chinese characters are continuous, and the current position is closely related to the previous and next positions. The hidden layer state vector at the current time step is also closely related to the states at the previous and next time steps.
Therefore, when calculating the weight of the hidden layer state vector at the current time step, it is essential to consider the hidden layer state vectors of adjacent time steps. Thus, we perform a smoothing process in time when calculating the corresponding weights, as shown in Fig.5.
TABLE 1. Effectiveness comparison of the “variance constraint” on IAHCC-UCAS2016.
From TABLE 5 and TABLE 6, it can be seen that the proposed method achieves state-of-the-art results on two handwritten Chinese character datasets: ICDAR-2013 competition database and IAHCC-UCAS2016 datasets. From TABLE 1, TABLE 2, TABLE 3, and TABLE 4, the two new computing architectures proposed in this article can effectively improve the recognition performance of the system, especially for systems with fewer parameters.
This article proposes an end-to-end recognizer for online handwritten Chinese characters. It introduces two new computing architectures based on traditional RNNs: (1) variance constraint; (2) attention weight vector.
The variance constraint mechanism can effectively reduce the number of key parameters used to represent a single sample, thereby decreasing the number of samples that a parameter can represent. This is beneficial for the parameters in the RNN system to more likely obtain the optimal representation for the input samples.
The introduction of the attention weight vector to represent the importance of the hidden layer state vector at different time steps, compared to existing attention mechanisms, does not introduce any additional parameters and achieves competitive results.
A large number of experimental results show that the two new computing architectures proposed can effectively improve the performance of traditional RNNs. However, regarding the variance constraint mechanism, the selection of hyperparameters is particularly time-consuming in experiments, so it is necessary to design an adaptive algorithm to improve the current manual selection mechanism.
In addition, the two computing architectures proposed in this article should not be limited to recurrent neural networks but should be improved and applied to other network structures as well.
[1] X.-Y. Zhang, F. Yin, Y.-M. Zhang, C.-L. Liu, Y. Bengio, Drawing and recognizing Chinese characters with recurrent neural networks, TPAMI 40 (4) (2017) 849-862. Paper link: https://arxiv.org/pdf/1606.06539.pdf
[2] W. Yang, L. Jin, Z. Xie, Z. Feng, Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge, ICDAR 15 (6) (2015) 551-555. Paper link: https://arxiv.org/abs/1505.07675
[3] H. Ren, W. Wang, K. Lu, J. Z. Q. Yuan, An end-to-end recognizer for in-air handwritten Chinese characters based on a new recurrent neural network, ICME (2017) 841-846. Paper link: https://ieeexplore.ieee.org/document/8019443
Original authors: Haiqing Ren, Weiqiang Wang, Chenglin Liu
Written by: Ren Haiqing
Formatted by: Gao Xue
Reviewed by: Yin Fei
Published by: Jin Lianwen
Disclaimer: (1) This article only represents the views of the author; personal understanding and summary may not be accurate or comprehensive, and the complete ideas and arguments of the paper should be based on the original paper. (2) The views in this article do not represent the position of this public account.
OCR Group Chat
Follow the latest cutting-edge technologies in text detection, recognition, correction, preprocessing, etc. Scan the code to add CV Jun to get you into the group (if you are already friends with other CV Jun accounts, please message directly).
(Please be sure to indicate: OCR)
If you like to communicate on QQ, you can join the official QQ Group: 805388940.
(I won’t be online all the time; if I can’t verify in time, please forgive me.)
Long press to follow I Love Computer Vision