Understanding Word2Vec with Visualizations

Meaning of Word2Vec

A word cannot be understood by a neural network; it needs to be converted into numbers before being fed into it. The most naive way is one-hot encoding, but it is too sparse and not effective. So we improve it by compressing one-hot into a dense vector.

The word2vec algorithm predicts words based on context, thus obtaining the word vector matrix.

The task of predicting words is merely a facade; what we need is not the predicted words, but the parameter matrix weights that are continuously updated through this prediction task.

The prediction task is accomplished by a simple three-layer neural network, which has two parameter matrices V and U, where V∈R^Dh*^|W| and U∈R^|W|*Dh.

V is the matrix from the input layer to the hidden layer, also known as the look-up table (because the input is a one-hot vector, multiplying a one-hot vector by a matrix is equivalent to selecting a column from that matrix. Each column is considered as a word vector).

U is the matrix from the hidden layer to the output layer, also known as the word representation matrix (where each row is considered as a word vector).

The final word vector matrix is obtained by adding the two word vector matrices: V + U^T, and each column represents a word vector.

Two Implementation Methods

2.1. Skip-Gram

Training task: Predict context words based on the center word.

Input: A center word (center word, x∈R^|W|*1)

Parameters: A look-up table V∈R^Dh*|W|, and a word representation matrix U∈R^|W|*Dh

Understanding Word2Vec with Visualizations

Skip-Gram Steps Diagram:

Understanding Word2Vec with Visualizations

2.2. CBOW

Contrary to Skip-Gram, it trains word vectors by predicting the center word based on context words.

Understanding Word2Vec with Visualizations

CBOW Steps Diagram:

Understanding Word2Vec with Visualizations

Recommended Reading:

Selected Insights | Summary of Insights from the Last Six Months

Insights | Learning Notes from National Taiwan University’s Machine Learning Foundations Course 5 — Training versus Testing

Insights | Detailed Notes from MIT’s Linear Algebra Course [Lesson 1]

Welcome to follow our WeChat public account for learning and communication~

Understanding Word2Vec with Visualizations

Welcome to join our group for discussions and learning

Understanding Word2Vec with Visualizations

2.1. Skip-Gram

2.2. CBOW

Leave a Comment Cancel reply