Skip to content
Let A, B, C, D represent four different Chinese words. After calculating with word2vec, we obtain the following word vectors:
If A represents the word “excellent” and C represents “celebrate,” what words might B and D represent respectively? Why?
It is relatively easy to deduce that D is likely very similar to C in terms of part of speech, so we can infer that D might be a positive verb, such as “congratulate.”
However, it is important to note that the initial reaction is that B is likely a negative word, such as “bad” or “inferior.”
But this inference is incorrect. Although “good” and “bad” are antonyms, the distance should be relatively large, but it should not be this large—note that the angle between A and B is already maximized. Both “good” and “bad” are adjectives, at least sharing the same part of speech. Considering that C and A have different parts of speech and the angle is already 90 degrees, it indicates that A and B must have significant differences in meaning.
Therefore, any noun guessed would be reasonable, for example, you could guess Beijing, Tsinghua, Real Madrid, or Old Gold Mountain, all are fine.
The similarity between two vectors can be assessed by calculating the angle between them:
Visualization of the four three-dimensional vectors in the problem:
Using the relevant code from the experimental textbook, practice word2vec. Can you find other similar examples like “Hubei – Hunan + Changsha = Wuhan”?