Exploring Attention as Square Complexity RNN
This article is approximately 3900 words long and is recommended for an 8-minute read. In this article, we demonstrate that Causal Attention can be rewritten in the form of an RNN. In recent years, RNNs have rekindled interest among researchers and users due to their linear training and inference efficiency, hinting at a “Renaissance” in … Read more