Easily Process Text Data in New Financial Risk Control with Word2vec

Submission by Machine Heart

Author: Tang Zhengyang

The author of this article, Tang Zhengyang, is the Market Manager at CreditX. He provides a clear introduction to the deep learning technology Word2vec and its applications in the field of financial risk control.

In the current more inclusive market environment, the customer base and scope of new financial services have further expanded and deepened. The business model is also tending towards being smaller, more dispersed, efficient, and scalable, which poses greater challenges to traditional risk control. On one hand, the coverage of high-value financial data for such customer groups has greatly decreased, and on the other hand, there are many uncertainties about how business experts should relate to risks when faced with more unstructured data. In fact, these data, which differ from traditional strong credit data, are showing an increasingly important role in new financial risk control practices. Numerous mature scenarios have proven that reasonably utilizing and leveraging their value can often lead to an unimaginable improvement in overall risk control effectiveness.

Below, we will take text data as an example to briefly introduce what the deep learning technology Word2vec looks like, how it came about, and its application cases in our financial risk control scenarios.

One Hot Vector and Distributed Representation

Let me ask you a question: if several words are presented to you, how would you want your computer to understand each word? Of course, there are many ways to do this. Here, we introduce a sparse representation method called the one hot vector, which is represented as follows:

Easily Process Text Data in New Financial Risk Control with Word2vec

This representation solves our problem but also has certain drawbacks. Each word is a dimension, and if the number of words increases dramatically, it will lead to the curse of dimensionality, which poses significant difficulties for our modeling process. At this point, you might consider another approach, using only 4 dimensions to represent basic information like gender, elderly, adult, and infant. This representation method is called distributed representation, also known as word vectors:

Easily Process Text Data in New Financial Risk Control with Word2vec Word Vectors

After deep learning emerged, the concept of word vectors also gained popularity due to breakthroughs in computational bottlenecks. Firstly, there is a common assumption that words with similar meanings also appear in similar contexts in text. In other words, similar words have similar contexts. Therefore, we can use the context of a word, such as the frequency with which it co-occurs with other words, to form a vector that represents that word. If the sentence is particularly long, we can limit the window to only take the co-occurrence frequency of the word with the n words before and after it.

For example, consider the following three sentences as a corpus:

I like deep learning.
I like NLP.
I enjoy modeling.

Taking a window length of n=1, the following diagram shows that each column is the word vector for that column’s word.

Easily Process Text Data in New Financial Risk Control with Word2vec

Word2vec

Now, our main character Word2vec makes its entrance. Unlike the previous co-occurrence counts, Word2vec, as the current mainstream word embedding algorithm, primarily predicts the surrounding word probabilities of each word within a window of length c to form the word vector. In this way, words are mapped into a high-dimensional vector space, allowing us to calculate the distance between words, i.e., semantic similarity.

In Word2vec, the two most important models are the CBOW and Skip-gram models. The former uses the context of words to predict the current word, while the latter uses the current word to predict the context.

Let’s first take CBOW as an example. CBOW stands for Continuous Bag-of-Words Model, as it uses continuous space to represent words, and the order of these words is not important. Its neural network structure is designed as follows:

Easily Process Text Data in New Financial Risk Control with Word2vec

Input layer: The context of word w consists of the word vectors of 2c words.
Projection layer: Sum up the 2c vectors from the input layer.
Output layer: A Huffman tree, where the leaf nodes are the words that have appeared in the corpus, and their weights are the frequency of occurrence.

Why does the output layer of CBOW need to be structured as a Huffman tree? Because we want to derive the probabilities of each possible w based on the training corpus. Let’s look at an example. For instance, the sentence is: I, like, watching, Brazil, football, World Cup; W=football.

Easily Process Text Data in New Financial Risk Control with Word2vec

In this Huffman tree, the path that the word “football” takes is easily visible, where the θ on the non-root nodes represents the parameter vector to be trained, aiming to achieve the following effect: when a new vector x is produced in the projection layer, I can determine the probabilities of being assigned to the left node (1) or the right node (0) at each layer using the logistic regression formula:

σ(xTθ) = 1/(1+e^(-xTθ))

The probabilities are:

p(d|x,θ) = 1-σ(xTθ) and p(d|x,θ) = σ(xTθ)

Thus, we have:

p(football|Context(football)) = ∏ p(d|x,θ)

Now that the model is established, we can train v(Context(w)), x, and θ using the corpus to adjust and optimize. Due to space limitations, we won’t elaborate on the specific formulas.

Similarly, for the Skip-gram model, which stands for Continuous Skip-gram Model, the reasoning process of using the known current word to predict the context is quite similar to that of CBOW.

Practical Effect Examples

After all this, how magical is Word2vec? We will use the Chinese Wikipedia as the training corpus to show you an intuitive example: for instance, if we want to see the words with the highest semantic similarity to “linguistics” and their probabilities, we get the following results:

Easily Process Text Data in New Financial Risk Control with Word2vec

Interestingly, as shown in the following figure, X(KING) – X(QUEEN) ≈ X(MAN) – X(WOMAN), where X(w) represents the word vector of word w learned using Word2vec. This means that word vectors can capture some common implicit semantic relationships between KING and QUEEN, and between MAN and WOMAN.

Easily Process Text Data in New Financial Risk Control with Word2vec

Mature Application Cases in New Financial Risk Control Scenarios

In fact, in new financial risk control scenarios, data such as text often contains deep meanings that are closely related to default risk. Traditional statistical methods, labeling, or even regular definitions often fail to fully exploit their risk value. As shown in the figure below, by converting text into word vector representations that computers can “understand” and compute through complex word vector models, and using deep learning technology for feature extraction, we can use mature classifier networks to achieve a high degree of risk association between text data and default risk.

Easily Process Text Data in New Financial Risk Control with Word2vec

Many practices in large mature risk control scenarios have also found that for the increasing amount of unstructured data such as text, time series, and images under new financial business models, fully exploiting their value is showing an unimaginable improvement in risk control effectiveness.

Leave a Comment Cancel reply