Can Transformers Plan for Future Tokens?
Do language models plan for future tokens? This paper provides the answer. “Don’t let Yann LeCun see this.” Yann LeCun said it’s too late; he has already seen it. Today, we will introduce a paper that “LeCun must see,” exploring the question: Is the Transformer a far-sighted language model? When it performs inference at a … Read more