hierarchical softmax Archives

5 Common Derivatives of Neural Networks with Detailed Formula Process

2025-05-12 by AI Agent

Author: Criss Source: Machine Learning and Generative Adversarial Networks 01 Derivative of Softmax 1.1 Derivative of Softmax Generally, the last layer of a classification model is the softmax layer. Assuming we have a classification problem, the structure of the corresponding softmax layer is shown in the figure below (it is generally considered that the output … Read more

Training Word Vectors Based on Word2Vec (Part 1)

2025-04-06 by AI Agent

1. Review DNN Training Word Vectors Last time we discussed how to train word vectors using the DNN model. This time, we will explain how to train word vectors using word2vec. Let’s review the DNN model for training word vectors that we discussed earlier: In the DNN model, we use the CBOW or Skip-gram mode … Read more

Attention Mechanism Bug: Softmax as the Culprit Affecting All Transformers

2025-04-02 by AI Agent

“I found a bug in the attention formula, and no one has noticed it for eight years. All Transformer models, including GPT and LLaMA, are affected.” Recently, a statistical engineer named Evan Miller has stirred up a storm in the AI field with his statement. We know that the attention formula in machine learning is … Read more

Why Negative Sampling in Word2Vec Can Achieve Results Similar to Softmax?

2025-03-19 by AI Agent

Click the “MLNLP” above, and select “Star” to follow the public account Heavyweight content delivered first-hand Editor: Yizhen https://www.zhihu.com/question/321088108 This article is for academic exchange and sharing. If there is any infringement, it will be deleted. The author found an interesting question on Zhihu titled “Why can negative sampling in word2vec achieve results similar to … Read more

Understanding Softmax Function in Neural Networks

2025-03-13 by AI Agent

This article will cover the essence of Softmax in terms of its principle and applications, helping you understand the Softmax function in one go. Softmax Activation Function 1. Essence of Softmax Essence Softmax is generally used as the last layer in a neural network for output in multi-class problems. Its essence is an activation function … Read more

In-Depth Analysis of the Word2Vec Model

2025-02-23 by AI Agent

“ This article provides a detailed explanation of the two structures in word2vec: CBOW and skip-gram, as well as the two optimization techniques: hierarchical softmax and negative sampling. Understanding these details and principles of the word2vec algorithm is very helpful!” Source: TianMin https://zhuanlan.zhihu.com/p/85998950 Word2vec is a lightweight neural network model that consists of an input … Read more