Practical Insights on Intelligent Construction of OCR Platforms in Commercial Banks

Written by / China Bank Enterprise Architecture Construction Office, Song Shouwen

In recent years, with the rapid development of artificial intelligence, OCR technology has been continuously updated and iterated, gradually enhancing its processing capabilities in recognition, and its application in the financial industry has matured and become widespread. Most commercial banks in China have introduced OCR technology to achieve automatic input of some business vouchers. However, as banks will need to handle an increasingly diverse range of vouchers in the future, the requirements for recognition accuracy and stability are constantly increasing, along with the growing resources required for model training and stricter data security controls. Designing and building an integrated OCR system framework and platform for banks, and coordinating resources to form a comprehensive OCR service capability that can serve all lines of business within the bank, is a significant challenge faced by banks in their digital transformation.

Progress and Issues in Building OCR Platforms in Commercial Banks

In the daily internal and external business operations, there are numerous operations in various business scenarios of commercial banks that require inputting text information presented in image form, such as customer identification, account opening, contract signing, and bill redemption. For these processes, the development of OCR technology has made it possible for banks to replace manual input with machine processing, which is significant for saving human resources, improving processing efficiency, and reducing operational errors. As early as 2000, the Industrial and Commercial Bank of China piloted the use of OCR technology for machine recognition of savings-related vouchers at its Harbin branch. Since then, the application of OCR technology in banks has become increasingly widespread, and many business scenarios involving image and video information recognition have introduced OCR technology for automated processing, with the types of images that can be processed continuously expanding, including ID cards, business licenses, bank cards, various bills, and real estate registration certificates, all of which can now be accurately recognized.

As commercial banks continue to enhance their requirements for intelligent operations during their digital transformation, some banks have gradually recognized certain shortcomings and pain points and have begun to implement corresponding improvement and enhancement plans. Currently, several commercial banks, such as the Industrial and Commercial Bank of China, China Bank, and Everbright Bank, have begun comprehensive upgrades and transformations of their dispersed OCR systems and modules within the bank, planning to evolve towards a comprehensive OCR platform, which has become a mainstream trend. The related progress is shown in the table. As the application of OCR technology in banks deepens, some typical problems and pain points are gradually emerging. Overall, these mainly include the following aspects.

Table: Construction Status of OCR Platforms in Domestic Commercial Banks

1. Isolated Systems Across Lines, Difficulties in Data Sharing and Reuse

Although many domestic commercial banks have applied OCR technology to recognize images and videos in various scenarios, the corresponding OCR technology modules are often scattered across different business systems, such as customer information systems, bill systems, and loan systems. The processing capabilities of OCR modules serving specific business scenarios are mostly limited to certain fixed vouchers of that business, with relatively single functions and poor scalability. Once there is a need to add new types of recognized vouchers, significant upgrades and modifications to the OCR modules and their respective systems are often required. Meanwhile, the independent operation of each system inevitably leads to inadequate resource utilization. With the introduction of artificial intelligence-related technologies, sufficient data and high computing power are crucial for improving OCR recognition performance. Dispersed OCR models across different systems will make it difficult to leverage the scale advantages of banks in terms of data and resources. Therefore, for most domestic commercial banks, the failure to form a unified, standardized, and diversified OCR service capability, along with the lack of a technical platform for OCR services across the bank, is currently the main issue faced.

2. Recognition Technology Needs Improvement; Performance in Some Scenarios is Unsatisfactory

Currently, OCR technology can accurately and efficiently process various common documents and image types, but recognition of images with a lot of handwritten text still struggles to achieve full automation, with recognition accuracy significantly lower than that of printed text, requiring frequent manual intervention. The key direction for future development of bank OCR technology is how to achieve breakthroughs in recognizing complex images dominated by handwritten text, and how to enhance the intelligence level of recognition services to reduce manual intervention and improve recognition efficiency. OCR technology is applied in numerous scenarios within banks, and recognition technology in many cases is already very mature, such as the recognition of various documents and bank cards, with recognition accuracy exceeding 99%. Additionally, for images presented in printed forms, such as invoices and financial reports, good recognition results can also be achieved through the introduction of template matching. However, recognition of bill-type documents remains a significant challenge for banks. The types of bills involved in banking business are numerous, with various layouts, and many contain a lot of handwritten text. In practice, issues often arise such as unclear handwriting, overlapping stamps, and severe wear on the bill surface. Therefore, for complex recognition scenarios like bills, OCR technology needs to be continuously improved to achieve better automated recognition results.

3. R&D Implementation Depends on Outsourcing; Self-Sufficiency Needs Enhancement

The complete realization and application of OCR technology involves numerous aspects, including data extraction, integration, and labeling, model training, evaluation, application, and optimization, as well as system operation and maintenance. Currently, many banks’ OCR systems or modules are built through external procurement, lacking experience in architecting and implementing OCR systems, as well as corresponding data analysis and model implementation capabilities. For the models that have been purchased, it is also difficult to optimize and expand them using internal capabilities. Therefore, the lack of technical foundation and self-sufficiency is another major challenge many banks face when applying OCR technology. To address this issue, continuous resource investment is needed, along with a long-term focus on the cultivation and utilization of relevant talent.

4. Diverse and Changing Application Demands; Model Iteration and Optimization Need Acceleration

In practical business scenarios, the formats of various vouchers in banking business will continuously adjust with changes in business needs or regulatory requirements. From the perspective of the entire voucher range, such occurrences are quite frequent. Even if the format of the recognition target does not change, there is always a demand for continuous optimization of models to improve recognition results. Therefore, effectively managing the lifecycle of OCR recognition models to achieve rapid updates and iterations is a major challenge in the use of OCR platforms. Model iteration requires retraining and deploying the model, and the platform must establish rapid model adjustment plans to meet these demands. At the same time, it is essential to fully incorporate artificial intelligence technologies, utilizing idle resources on the platform for training and optimization to automate model adjustments.

5. High Sensitivity of Business Data; Security Protection System Needs Strengthening

The OCR platform will acquire a large amount of business data from the bank’s systems during the service process. In the event of data leakage, it could lead to significant reputational incidents and substantial economic losses. Therefore, during the construction of the OCR platform, a comprehensive security architecture must be designed to effectively protect customer information and transaction data from multiple aspects. Specific security strategies and measures mainly include: strict management of system permissions, stringent desensitization processing of sensitive data, encryption of data transmission methods, expiration management of data retention, regular purging, effective management of security audit logs, regular checks of data security risks, and thorough implementation of data protection measures, among others.

Intelligent Architecture of Commercial Bank OCR Platforms

In response to the major pain points mentioned above, and combined with the characteristics and development trends of OCR technology, this article proposes a general enterprise-level OCR platform architecture, as shown in the figure. The platform includes two main functional layers: the development platform and the operation platform, which is highly consistent with the logical sequence of the practical application of OCR technology. The development platform primarily provides the functions for the research and deployment of various recognition models, while the operation platform primarily offers specific recognition services for various objects across the bank.

Figure: Schematic Diagram of the Intelligent Architecture of Commercial Bank OCR Platforms

1. OCR Development Platform

The core function of the OCR development platform is the training and output of OCR recognition models. The dataset required for modeling is the foundation for model training. Therefore, powerful intelligent data labeling and model training technologies are the core capabilities of the OCR development platform. From the complete process of OCR model development, it can generally be divided into four stages: data labeling, model training, model testing, and model release.

(1) Data Labeling. Before developing an OCR model, it is necessary to prepare corresponding base image materials as samples according to development objectives. Each image in the sample set needs to be labeled individually, generally divided into manual labeling and semi-automatic labeling. The OCR platform should implement visual labeling functions and system-assisted labeling functions. The visual labeling function provides a visual interface for labeling personnel to complete manual labeling tasks, supporting operations such as image rotation and custom selection. At the same time, the platform can add semi-automatic labeling functions, where the system first completes text line position labeling, batch content labeling, category labeling, and structured entity labeling; then manual corrections and interventions can be made to verify and correct the system results, significantly improving labeling efficiency and reliability.

Sample automatic expansion is also a function that the OCR development platform should possess. Sample expansion refers to generating virtual samples by erasing background images in specific areas based on user-specified rules regarding corpus, fonts, colors, and data perturbation, thereby achieving the goal of increasing the number of current sample sets. Sample expansion ensures the diversity of the sample set, and both data labeling and sample expansion processes belong to the data preparation phase of model development.

(2) Model Training. After completing the data preparation phase, OCR model training must be conducted. The model training process mainly involves simulating recognition based on the sample set, checking the recognition results, and optimizing the model structure and parameters. OCR model training requires a large amount of sample set data, and the OCR training platform ensures sufficient computational resources to continuously optimize algorithms, achieving rapid model training while ensuring recognition accuracy and reducing the time consumed for OCR model training, thus shortening the OCR model release cycle. For situations where adjustments to the OCR model structure and parameters are needed during the training process, the OCR platform incorporates new technologies to achieve a certain degree of automatic intelligent tuning of models, reducing the need for manual corrections and adjustments.

(3) Model Testing. To verify the performance of the model regarding structured data recognition, the demand side designs indicators such as recognition accuracy, missed recognition rate, recall rate, and recognition time before model development. The completed OCR model must be evaluated and verified according to these indicators. If the tests meet the standards, the model can be published through the model release function of the development platform to the operation platform. If the tests do not meet the standards, the recognition algorithm must be optimized to retrain the model to ensure that it meets the testing standards.

(4) Model Release. The OCR model that passes testing can enter the release stage, where the model is listed on the operation platform and deployed to specific business systems through its model management function to realize its required service functions. Additionally, to support the entire process of model research and development, the OCR development platform will also include auxiliary functional modules such as task management, dataset management, model management, and statistical support.

2. OCR Operation Platform

The goal of the OCR operation platform is to provide unified OCR recognition services for various products and systems during the bank’s production operations. The OCR operation platform is constructed in a layered manner from the bottom up, with various OCR models deployed at the bottom layer, encapsulating these models into various public recognition service capabilities, and then providing unified external calling interfaces through APIs and SDKs, ultimately supporting various service scenarios for the bank’s business operations. The specific functions of each layer are as follows.

(1) Model Management. This module stores and maintains various general models (printed character models, handwritten character recognition models, table monitoring models, etc.) as well as commonly used standardized models (document recognition models, voucher recognition models, report recognition models, etc.), and categorizes management based on different model types. The model management layer is also responsible for model deployment, starting and stopping models, configuring definitions for vouchers and elements to be recognized, and configuring image libraries. Additionally, developers can analyze and optimize various models by monitoring system operation logs and model recognition rates.

(2) Public Services. This module’s main function is to form diverse public application service capabilities in the form of various recognition engines (including general text, standard vouchers, internal specialized vouchers, etc.) to meet the actual recognition needs of various business scenarios. The characteristics of various services are as follows. General text recognition service is the most basic service of the OCR platform, with a wide range of use cases, high recognition accuracy, and low recognition time. Depending on the recognition target, it can be subdivided into sub-services such as text recognition, stamp recognition, and table recognition. Text recognition refers to the function of recognizing printed and handwritten characters (excluding structured information) in images line by line and outputting the results. Stamp recognition refers to recognizing the characters on the stamp in images during the recognition process and outputting the characters on the stamp. General table recognition refers to recognizing character information and table structures in images, restoring them structurally, and outputting electronic spreadsheets containing character information.

Standard voucher recognition service refers to the service that recognizes commonly seen standardized vouchers in social financial services, with recognition targets including identification documents such as ID cards, household registration books, military officer certificates, Hong Kong, Macao, and Taiwan residents’ travel permits, passports, and foreign residence permits; public proof documents such as business licenses, driving licenses, marriage certificates, real estate registration certificates, land certificates; general bank business vouchers such as bank cards, checks, and acceptances; and other vouchers such as license plate numbers, invoices, and customs declarations. Currently, the range of standard vouchers that can be effectively recognized has covered many business scenarios and is still expanding. Due to the strong professionalism of some bank businesses, the standard voucher recognition function cannot fully meet the needs, requiring support from internal voucher recognition functions.

Internal voucher recognition service targets internal vouchers with personalized characteristics of the bank, such as business application forms, agreements, and bills. Recognition of internal vouchers requires the bank to implement it based on its business needs through model training, expanding the range of recognizable vouchers for the OCR platform to meet more personalized recognition needs. The realization of this function relies on the demand side providing a basic sample set to support model training. Currently, whether standard vouchers or internal vouchers, the OCR platform must continuously update models based on changes in voucher formats to maintain the ability to recognize the latest versions of voucher information.

(3) Service Interfaces. The OCR operation platform provides services to various business systems in two forms: online API interfaces and offline SDKs. The recognition process of online API interface services runs uniformly on the OCR platform, outputting results to the connected demand side. The offline SDK form encapsulates the OCR functions for specific recognition needs into a functional package embedded in different business systems without requiring an internet connection. Compared to the online API interface model, the offline SDK mode performs recognition calculations within the business system, does not rely on online resources, and is relatively simple and quick to use, but its recognition image types are limited, and the recognition rate is lower than that of the online service method, currently only suitable for ID cards, bank cards, and other simple document types. Therefore, in the future, most scenarios will primarily use online API interface services, while offline SDK methods are generally deployed on mobile terminals involving offline operations to provide recognition services.

(4) Service Scenario Management. Based on the construction goals of the OCR platform, business scenarios that require OCR technology services in various business processes of the bank will be uniformly supported by the OCR platform. The OCR operation platform will comprehensively manage whether each business scenario in the bank’s product service process accesses OCR platform services, covering all business areas and channels of the bank. Typically, after the platform goes live, it will at least be able to support service scenarios such as customer information recognition, internal and external audit scenarios, internal financial reimbursement scenarios, and centralized operation scenarios.

Construction Ideas for Commercial Bank OCR Platforms

Although major banks have widely applied OCR technology in various business systems, there is generally a lack of technical accumulation and high dependence on external resources. In the future, to form enterprise-level general OCR service capabilities, and to build a unified OCR platform for various products and systems across the bank, it is still necessary to thoroughly plan the implementation path, clarify key implementation points, and conduct a comprehensive analysis of the important technical difficulties and challenges that may arise during the realization of the OCR platform, implementing targeted solutions.

1. Unified Logical Directory, Shared Services Across the Bank, Following Enterprise-Level Platform Architecture

The core concept of the enterprise-level OCR platform architecture is to form standardized and unified OCR recognition capabilities, achieving resource and service sharing across the bank. In the future, the types of images that banks need to automatically recognize will increase, the requirements for recognition effects will rise, and the demand for resources required for modeling will also continue to grow. This necessitates adherence to the principles of shared resources, reusable models, and standardized services throughout the OCR platform construction process. At the same time, in the design process of various information systems within the bank, relevant requirements involving OCR recognition must be fully coordinated and incorporated into the overall scope of the OCR platform, avoiding siloed construction that leads to redundant development and resource waste.

2. Detailed and Complete Labeling, Strengthening Data Foundations, Achieving Data Structural Transformation

High-quality labeled data is the foundation for model training and will directly affect the accuracy of model recognition. Banks have a natural advantage in data resources, generating massive amounts of image data in daily business processing. However, in the past, much of this image data has been stored in unstructured forms without labeling effective information, making it difficult to use for model development. Therefore, to improve the data foundation for OCR model training, developers need to establish labeling standards and provide convenient labeling operation processes and semi-automatic labeling assistance; business departments need to increase resource investment, organizing personnel to continuously conduct image data labeling work, and continuously expand the labeling database.

3. Integrating Artificial Intelligence, Using Machine Learning, Enhancing Model Algorithm Performance

With the rapid development of machine learning, deep learning, and other technologies in recent years, machine learning-based OCR recognition models have made significant progress in the diversification of recognized image types and the accuracy of recognition results compared to traditional OCR algorithms. Taking Convolutional Neural Networks (CNN) as an example, they can play a role in various stages of OCR processing, such as image preprocessing and text detection, with results significantly better than traditional algorithms. Some Natural Language Processing (NLP) technologies can also be integrated with computer vision technologies to effectively improve the quality of semantic analysis model outputs. These cutting-edge technologies are not only the foundation for future OCR technology but also the core technological competitiveness for banks in the future. Therefore, banks need to continuously increase their technical and talent resources in machine learning.

4. Broadly Integrating Industries, Expanding Graphic Recognition, Optimizing Text and Image Recognition Effects

General OCR models and algorithms are typically universal, and their integration with specific business in the financial sector is still relatively preliminary. To enhance the OCR effects for various complex vouchers, it is necessary to fully incorporate specialized knowledge of the bank’s specific business. For example, in OCR for bills and financial reports, various numbers recognized from the corresponding images often have logical relationships (such as checks and totals). Effectively utilizing the correlations between these data to verify the extracted numbers is significant for improving processing effects and recognition accuracy. By introducing industry-specific data rules, recognition models can become more intelligent and better handle recognition tasks in specific fields.

In the digital age, widely using technological means to enhance the automation level of business processing is an inevitable trend and requirement for the intelligent operation development of banks. OCR technology is one of the important technologies in this regard. Banks should fully absorb the advantages of cutting-edge technologies, forming unified OCR service capabilities under the framework of enterprise-level OCR platforms, providing automated intelligent support services for various business scenarios of the bank, and offering important momentum for the digital transformation of operations and the bank itself.