Research on OCR Applications Based on DCU Technology in Commercial Banks

Author

Wu Yongfei, Chief Information Officer of Huaxia Bank and Chairman of Longying Zhida (Beijing) Technology Co., Ltd.

Wang Yanbo, Deputy General Manager of Information Technology Department of Huaxia Bank and Chief Data Scientist of Longying Zhida (Beijing) Technology Co., Ltd.

Chen Zhihao, Manager of Application Technology Research Office of Information Technology Department of Huaxia Bank

Li Wenxuan, Kang Xiaobo, Credit Card Center of Huaxia Bank

In the era of digital economy, empowering digital transformation with artificial intelligence has become an essential path for financial institutions to enhance their business capabilities. As important directions of financial artificial intelligence, computer vision and natural language processing have always been the focus of attention in academia and industry. In particular, Optical Character Recognition (OCR) has attracted significant attention due to its cross-disciplinary nature spanning both computer vision and natural language processing, along with its broad application prospects in finance. Given that both computer vision and natural language processing require the computational power of Graphics Processing Units (GPUs), the independent and controllable nature of GPU technology has undoubtedly become a focal point in the industry. This article summarizes the full-stack independent and controllable technology migration and substitution pathways for implementing financial OCR applications based on practical financial applications, aiming to provide a reference for the digital transformation development of the financial industry.

1. The Origin and Development of OCR Technology

OCR technology is primarily used for recognizing text. Its working principle involves optically converting text on paper documents into bitmap images, which are then processed by recognition software to convert the text within the images into electronic text formats for further editing and processing. The main goal of OCR technology is to achieve precise text recognition using features.

The history of OCR technology can be traced back to 1929 when German scientist Gustav Tausheck first proposed the concept of OCR and obtained a patent. Later, American scientist Paul W. Handel also received a corresponding patent for text recognition methods in the United States. After decades of research and practice, text recognition technology gradually found practical applications. Early OCR software could typically only recognize numbers and English letters, with IBM in the United States being the first to attempt printed Chinese character recognition in the 1960s, followed by Japanese scholars who began researching Chinese character recognition in the 1970s.

Research and application of OCR technology in China have been very active. Since 1986, domestic research on OCR technology has innovated in Chinese character modeling and recognition methods, with many developed products being widely used across various industries. Today’s OCR technology is highly mature, achieving significant improvements in recognition rates and capable of handling various fonts, sizes, and layouts. Additionally, with the development of artificial intelligence, particularly deep learning (Deep Learning, DL) technology, OCR technology is evolving towards high recognition rates, high stability, and high usability, and is widely applied in various business fields of banks, securities, insurance, automotive, logistics, medical, customs, and more.

In the financial industry, OCR technology can be used for information recognition of documents such as bank cards, ID cards, business licenses, invoices, etc. By accurately recognizing the information in images, relevant data can be quickly extracted, improving the efficiency and compliance of business processes, effectively preventing financial risks. Related technologies are being widely applied in numerous financial business scenarios.

2. Applications of OCR Technology

1. Classification of OCR Technology Applications

Generally speaking, OCR technology applications can be simply classified into two major categories: printed character recognition and handwritten character recognition. In the exploration of application implementation in the financial industry, OCR technology applications can also follow the “5N” framework methodology for scenario segmentation. The five categories of OCR recognition scenarios based on the “5N” framework are detailed in Table 1.

Research on OCR Applications Based on DCU Technology in Commercial Banks

The first is Numerical Recognition, which targets the numerical information in image data for extraction, mainly used for recognizing information such as bank card numbers, ID card numbers, and license plate numbers.

The second is Named Entity Recognition (NER), which focuses on recognizing and extracting natural language named entities (also known as “proper nouns”) in image data, primarily including the recognition of specific entity information (such as names, places, proper nouns, organization names, etc.) on documents like business licenses, invoices, driving licenses, etc. It is worth noting that NER originally refers to a fundamental key task in Natural Language Processing (NLP), serving as an important technical foundation for information extraction, syntactic analysis, question answering systems, and machine translation. Generally, NER employs rule-based methods, unsupervised learning methods, and supervised learning methods to identify three major categories (entity, time, and numerical) and seven minor categories (person names, organization names, place names, time, dates, currency, and percentages) of named entity information from the natural language text to be processed; however, in the OCR task, there is no need to involve the identification of named entities, only the extraction of corresponding information from image data.

The third is Neat Tabulated Data Recognition, which targets the neat tabular information in image data for extraction, mainly used for recognizing structured data in various financial and statistical reports.

The fourth is Natural Language Text Recognition, which focuses on recognizing natural language text information in image data, including the full text recognition of materials such as annual reports, due diligence reports, business analysis reports, performance evaluation reports, audit reports, meeting minutes, and supervision notices.

The fifth is Natural Handwriting Recognition, which targets handwritten text information in image data for extraction. Notably, this also includes the recognition of handwritten numbers, handwritten named entities, hand-drawn tables, and handwritten natural language text. Besides recognizing standard text characters, if it involves recognizing the features of handwriting, it indicates that OCR technology has entered the realm of biometric recognition technology from pure pattern recognition tasks.

2. OCR Technology Application Practice at Huaxia Bank

Following the aforementioned “5N” framework methodology, Huaxia Bank has applied all OCR information recognition and extraction technologies except for natural handwriting recognition in its relevant OCR application platforms. For various documents and tickets in existing customer imaging materials and business application materials, applying OCR technology can effectively recognize relevant information such as customer documents, card documents, and ticket documents, and can also effectively recognize general text and handwriting, extracting information from recognized images to generate structured data for subsequent storage and retrieval; later, it can also support the classification, recognition, and extraction technology applications for more different types of image data through technical expansion.

3. OCR System Based on GPU Technology

In OCR systems, artificial intelligence neural networks (NN) mainly serve as feature extractors and classifiers, inputting character images and outputting recognition results. The application of DL methods in the OCR field has significantly surpassed traditional algorithms. Emerging technical methods such as Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) aimed at computer vision (CV) and NLP have begun to expand into the OCR field with remarkable application effects.

The widespread application of GPUs in the artificial intelligence field has facilitated the training and inference of DL models. During the training process, GPUs can accelerate the training of NNs. Through parallel computing, GPUs can handle a large amount of data computation and can perform multiple iterations in a short time, making the training process faster and more efficient. During inference, GPUs can be used to accelerate NN inference. Once the model is trained, GPUs can quickly process input character images, perform feature extraction and classification, and obtain recognition results. In this process, the parallel computing capability of GPUs makes the inference process faster and more efficient.

4. Exploration and Practice of Full-Stack Independent and Controllable OCR Applications at Huaxia Bank

Huaxia Bank has adapted its financial OCR applications through a full-stack independent and controllable intelligent computing platform, ensuring controllable computing power and smoothly completing migration and transformation, providing useful references for the deep application of artificial intelligence in the domestic financial industry. Generally, the technical route for the migration and substitution of financial OCR applications mainly revolves around three core hardware and software: mainstream operating systems (OS), Central Processing Units (CPU), and GPUs. After extensive research, comparison, and verification, Huaxia Bank finally determined the technical route of the Kirin operating system, Haiguang X86 CPU, and DCU GPU. Huaxia Bank has tackled technical challenges in the financial OCR application on the Cyber Intelligence Learning Platform at the Credit Card Center, verifying migration adaptation on the full-stack independent and controllable technical route. By combining the aforementioned OCR recognition scenarios for inference verification, the recognition capability of the full-stack independent and controllable platform has been effectively validated, with model performance reaching over 90%. Furthermore, in full model inference, the full-stack independent and controllable platform can load and run all models at once on a single GPU, achieving better utilization of DCU computing resources.

Through independent innovation practices, Huaxia Bank has completed the migration and substitution of relevant intelligent learning platform full-stack independent and controllable hardware and software systems for financial OCR applications. In a full-stack independent and controllable environment, it can accommodate demand functions and meet performance indicators such as target accuracy, providing a mature technical solution reference and implementation experience for the subsequent comprehensive promotion of independent and controllable artificial intelligence-related hardware and software environments in the financial sector, while achieving sustainable system expansion, upgrades, and optimization, further enhancing the high availability of applications and fully meeting the requirements for the digital transformation development of finance.

At the second International Internet Industry Technology Innovation Conference and Internet Innovation Product Exhibition held in Beijing in August 2023, Huaxia Bank was awarded the Special Prize for Technological Innovation in 2022 for its Intelligent Learning Platform Project at the Huaxia Bank Credit Card Customer Service Center.

With the continuous advancement of technology, especially the ongoing progress in large model technology, it is believed that domestic GPUs will play an increasingly important role in financial artificial intelligence scenarios, assisting financial institutions in better achieving digital transformation. Huaxia Bank will also explore the implementation of full-stack independent and controllable GPU migration and substitution solutions in a broader range of financial technology applications, contributing to the sound development of financial technology.

[The authors of this article would like to thank Zhongke Controllable Information Industry Co., Ltd. and Beijing Yidaoboshi Technology Co., Ltd. for their contributions. Li Dawei, Xu Xiaofang from the Information Technology Department of Huaxia Bank, as well as Yang Xuan, Zhang Yue, Chen Sheng, and Feng Xiao from Longying Zhida (Beijing) Technology Co., Ltd. also contributed to this article.]

Leave a Comment