Changes in Transformer Architecture Since 2017
Reading articles about LLMs, you often see phrases like “we use the standard Transformer architecture.” But what does “standard” mean, and has it changed since the original paper? Interestingly, despite the rapid growth in the NLP field over the past five years, the Vanilla Transformer still adheres to the Lindy Effect, which suggests that the … Read more