Can Transformers Plan for Future Tokens?

Can Transformers Plan for Future Tokens?

Do language models plan for future tokens? This paper provides the answer. “Don’t let Yann LeCun see this.” Yann LeCun said it’s too late; he has already seen it. Today, we will introduce a paper that “LeCun must see,” exploring the question: Is the Transformer a far-sighted language model? When it performs inference at a … Read more