Building a Bank-wide Knowledge Graph with Graph Database

In recent years, with the deepening application of big data and artificial intelligence technologies in the financial industry, a rapid development of financial technology driven by numerous AI applications has emerged. Knowledge graphs, recognized as the third generation of artificial intelligence technology, are gradually penetrating various application areas in finance, providing deeper information analysis and mining capabilities. Currently, major banks have implemented applications in numerous business scenarios including zero loans, corporate banking, credit cards, operational risks, auditing, and anti-money laundering, with business effects and values gradually being realized.

The continuous advancement of financial technology has brought rapid business growth to banks, but it has also provided opportunities for criminals, forming a large underground industry chain targeting the financial sector. As commercial banks continuously upgrade their prevention and control measures, fraudulent activities have become increasingly difficult to execute individually, relying instead on organized fraud rings. The risk model is evolving from individual to group, and commercial banks urgently need effective means to prevent risks from fraud and money laundering groups. The perspective of risk prevention and control must shift from observing individual clients to a holistic view, transitioning from digging into individual risks to exploring “individual risks + related risks.” The essence of a knowledge graph is an information network that reveals entity relationships, which has inherent advantages for mining relationship-based risks typical of fraudulent activities. Based on graph computing technology, it can extract valuable information, knowledge, and patterns from the relational graph to provide robust means for risk prevention and control.

Guangfa Bank aims to achieve “Digital Guangfa” and is comprehensively advancing its digital transformation efforts. Given the advantages of knowledge graphs in relationship-based risk mining, Guangfa Bank has conducted in-depth research and applications in this area, achieving technological self-control and constructing a new technical application system for knowledge graphs. The goal is to enhance the querying, insight, and predictive capabilities of knowledge graph applications to a new level; meanwhile, by addressing the needs and pain points in business operations and risk prevention and control, a new defense line is constructed to identify individual related risks and group fraud risks, enhancing overall risk identification capabilities and business self-analysis efficiency while reducing asset losses; mining potential value from key customer groups to aid business marketing recommendations, improving fund collection and return capabilities.

Building a Bank-wide Knowledge Graph with Graph Database

Guangfa Bank R&D Center

Deputy Chief Engineer: Wang Li

The Value and Construction Ideas of Graph Databases

Graph databases, as the core content of knowledge graph technology, serve as an important foundation for the entire system. Compared to traditional data representation forms, graph databases can more intuitively reflect the relational characteristics between entities and handle complex relationships more efficiently. The construction of high-performance, highly available, and highly scalable graph databases can not only solve the problems of efficient and rapid network construction and knowledge mining in real-time and offline scenarios but also better support the construction of knowledge graphs, graph computing, graph analysis, and other suites in the foreground and large-scale data storage, thereby establishing a high-availability, high-throughput, low-latency data capability support for enhancing bank business efficiency.

1. Value Objectives of Graph Databases

In the past, traditional relational databases mostly used rows and columns to express data models, while the data model of graph databases is more flexible. It uses entity nodes, edges, and attributes to construct data models, allowing for flexible expansion of nodes, edges, and related attributes according to actual business needs, making it more suitable for scenarios with frequent data model changes. Furthermore, in handling deep relational query scenarios, relational databases require multiple association calculations, which are inefficient, whereas graph databases can expand associations through local traversal of nodes and edges, effectively addressing the inefficiencies of traditional relational databases in deep relational queries and calculations.

Graph databases also support multimodal data access, capable of parsing and storing semi-structured and unstructured data. Their excellent visualization effects make data easier to express and understand, which are lacking in traditional relational databases. Practical scenarios have shown that in terms of flexibility and scalability of data models, query efficiency, support for semi-structured and unstructured data, and data visualization effects, graph databases far outperform traditional relational databases.

2. Considerations for Choosing Graph Databases

Currently, the standardization system for graph databases in the domestic financial industry still needs improvement. There are various graph database solutions on the market, with significant differences in read/write performance, real-time computing capabilities, offline analysis capabilities, and algorithm richness. A single graph database often cannot meet all business scenarios. Guangfa Bank, considering its technical system and business scenario characteristics, chose open-source, self-controllable, and distributed scalable JanusGraph and NebulaGraph (hereinafter referred to as Nebula) graph databases as the application technology foundation based on considerations of open-source distributed scalability, and to meet the storage and computing needs of different scenarios, while implementing specific application layer platform functionalities.

JanusGraph can provide powerful graph traversal capabilities, meeting massive data processing tasks such as group mining and offline graph metric calculations. It can also realize the implementation of offline business scenarios based on JanusGraph. However, due to storage layer and architectural limitations, JanusGraph does not support complete computation pushdown, and its performance in multi-hop traversal is poor, making it difficult to meet the high concurrency and low latency requirements of real-time OLTP scenarios. After comparison and validation, it was found that Nebula performs well in multi-hop traversal and can meet the requirements for real-time graph metric calculations and graph mining. Therefore, for real-time business scenarios, it is planned to carry out implementations based on Nebula. Both graph databases not only provide reliable and efficient data storage but also possess strong information security and data disaster recovery capabilities.

3. Application Practices of Graph Databases

After several years of continuous refinement and application practice, Guangfa Bank has built a bank-wide knowledge graph platform application system. As shown in the figure, based on the support of graph databases, a graph computing and graph application platform has been built to construct a relational network, providing comprehensive functionalities such as graph design, graph search, graph analysis, graph rules, group recognition, and graph exploration, meeting the rapid development and optimization needs of various business lines for relational characteristics, rules, and models, and providing one-stop services for graph construction, graph computing, and graph applications to support various risk control scenarios in Guangfa Bank, helping businesses quickly identify risk points, suspicious groups, and relational connections, and realizing intelligent analysis, judgment, and decision-making in a “human-machine combination” manner, effectively improving risk prevention and control levels.

Figure: Knowledge Graph Platform

(1) The “1+N” model, where one platform supports multiple scenarios, reduces the costs of graph construction and operation. By building a universal knowledge graph platform for the entire bank, centralized management and knowledge reuse of scenario graphs for various business lines are achieved. The platform realizes data access, entity relationship extraction, and graph updates through low-code and configurable methods, supporting batch construction and real-time updates for various scenarios, efficiently completing graph construction and updates. At the same time, it provides operational management functions such as task scheduling and graph evaluation, helping operational personnel quickly identify abnormal data and tasks in the graph.

(2) A unified storage configuration center that is compatible with multiple graph databases. The platform adopts a componentized and pluggable architecture design, allowing for quick compatibility with various graph databases such as JanusGraph, Nebula, Neo4j, etc., through the abstract layer design of graph queries and graph computations. Each scenario graph can also be independently configured with its unique storage scheme to meet the personalized needs of different scenario graphs for graph databases, achieving data permission security isolation for each graph.

(3) Build a distributed graph mining engine to support analysis and mining of relationship graphs at the scale of billions. Based on the deep application needs of analysis and mining, the platform includes various graph mining functions and task templates such as subgraph extraction, relationship reasoning completion, group mining, and graph metric calculations, along with dozens of general and self-developed advanced graph algorithms and specific scenario graph mining models, enabling the completion of various complex graph mining tasks at the scale of billions of relationship graphs, effectively supporting risk transmission analysis and group mining scenarios within the bank.

(4) Create a graph laboratory to provide open capabilities, maximizing the business value of graph mining. The implementation of graph scenarios involves a high investment of costs in data preprocessing, graph construction, and mining scheme design, validation, and tuning, making it difficult to assess business value. Therefore, during the systematic construction process, a foundation graph for the entire bank was designed, and a graph laboratory was created for modeling and analysis personnel, providing capabilities for flexible and autonomous graph construction, graph mining scheme design, automated evaluation, and graph visualization exploration analysis, helping analysts quickly verify and tune schemes, maximizing the contribution of graph business value.

Application and Practice of Knowledge Graph Scenarios

1. Real-time Analysis Scenario

The application of knowledge graphs in Guangfa Bank’s real-time analysis scenario mainly involves real-time group fraud prevention for new credit card issuance. Anti-fraud measures for credit card applications are a crucial component of risk management for credit card issuance across all banking institutions. Since fraud is an organized crime behavior, and various fraud methods are constantly evolving, the challenge of anti-fraud measures is significant. Leveraging the flexible data modeling, high-performance data processing, complex relationship group management, and real-time graph data querying advantages of graph databases, Guangfa Bank has built a new credit card issuance graph with over 800 million entities and over 300 million relationships by integrating data related to credit card customer credit checks, devices, and phone numbers. Through real-time customer profiling data querying, real-time graph construction, and calculation of over 50 core metrics, it assists the credit card issuance decision system in identifying fraudulent applications in the automatic decision-making and manual review processes, achieving real-time interception of suspicious applications, reducing application processing time from T+7 to as fast as 2 seconds for risk indicators, and improving the interception rate of high credit risk customers in manual reviews to 2.5 times the original rate, while the fraud risk interception rate has increased to 1.5 times the original rate.

The key implementation logic of the above scenario is to consume and parse transaction information in real-time from the Kafka message queue through Flink processes, constructing graphs in real-time. Relying on the multimodal matching capabilities of the Nebula graph database, relevant business risk subgraphs are obtained through multimodal matching, and graph metrics are calculated in real-time based on the business risk subgraphs, which are then labeled according to business rules and models, providing risk assessment data to the business system.

2. Offline Decision-Making Scenario

The application of knowledge graphs in Guangfa Bank’s offline decision-making scenario primarily involves account risk control against gambling and fraud, as well as the mining of fraud rings related to credit card points and installments.

Departments like the People’s Bank of China and the Ministry of Public Security have repeatedly issued notices such as the “Guidelines for the Division of Responsibilities in Gambling Fund Chains,” requiring commercial banks to strengthen monitoring of accounts involved in gambling and fraud. To better implement regulatory requirements and enhance monitoring and identification of such accounts, Guangfa Bank has constructed a gambling and fraud graph with over 500 million entities and nearly 1 billion relationships by integrating relevant business data such as customer transfers, devices, addresses, and blacklists. Utilizing the relational analysis advantages of graph databases, it outputs indirect suspicious lists and graph characteristics, and association rules to assist branches in investigations, verifications, and account management, while suspicious lists are also sent to the risk control system for real-time transaction monitoring. This graph significantly improves the control effect on gambling and fraud, cumulatively identifying tens of thousands of suspicious accounts and controlling a total account balance of nearly 100 million.

Regarding malicious credit card point redemption and installment fraud, traditional control measures mainly rely on offline data retrieval, which is limited in scope and dimensions, making it difficult to identify common characteristics of customers redeeming points and committing installment fraud through deeper relational connections. This has led to organized misuse of points and fraudulent applications for credit funds, causing losses to the bank. Guangfa Bank has built a credit card risk graph with over 1.5 billion entities and 2.5 billion relationships by integrating relevant business data such as credit card customer mobile devices, phone numbers, and addresses. Utilizing the subgraph mining characteristics of graph databases, specific graph structure mining is conducted for abnormal point redemption and installment fraud scenarios, outputting lists of risk customers and potential risk clients, which are sent to the business system for control, thereby enhancing relational efficiency and business review efficiency. This graph effectively curtails the phenomenon of malicious point redemption and fraudulent credit fund applications through intermediaries, achieving an approximately 8-fold increase in the identification rate of risky clients and over 90% improvement in overall risk group control rate, saving tens of millions annually.

The key implementation logic for both scenarios involves leveraging the relational analysis advantages and subgraph mining characteristics of graph databases. Daily, the Louvain algorithm, connected graphs, and business-customized group graph structure rules are used to mine risk communities, identify business risk groups, calculate group risk metrics, and perform rule scoring operations on risk groups while managing potential risk clients according to rules, with risk data sent to related systems for business review and effective risk control.

Prospects for Graph Database and Knowledge Graph Applications

The efficient processing capabilities of large-scale graph data in offline scenarios and the high concurrency and low latency graph computing capabilities in real-time scenarios may be the focus and challenges of future knowledge graph construction. The construction of these capabilities relies heavily on graph databases, as their large-scale data storage and querying capabilities, as well as real-time graph updates and computations, are key factors in future graph database capability selection.

On the other hand, in recent years, graph learning technologies represented by graph neural networks have sparked a research boom in the field of artificial intelligence. Graph learning is a machine learning method related to artificial intelligence that extracts useful features and patterns from graph data by learning the relationships between nodes and edges, applicable to tasks such as prediction, classification, clustering, and anomaly detection. Graph learning technologies help address issues like low interpretability of past machine learning algorithms. How to resolve memory and hardware bottlenecks in graph learning and the challenges of large-scale graph deep learning, and apply them to financial scenarios to generate business value, is another important direction for future exploration of knowledge graphs.

As a general artificial intelligence technology, knowledge graphs have a wide range of applications in scenarios such as anti-fraud, anti-money laundering, cash-out prevention, and precise marketing. With the development of business and the maturity of technology, the scenarios supported by knowledge graphs will continue to increase in the future. The application of knowledge graph technology not only enhances bank business from “point to surface” but also realizes the deep mining of hidden relational fraud risks, providing significant assistance for the intelligent development strategy of risk management and laying a crucial capability foundation for the construction and acceleration of banks’ digital transformation.

(Column Editors: Yang Kunhua, Wei Yanan)