Skip to content
Machine Heart Editorial Team
The research team from Google indicates that replacing the transformer self-attention layer with Fourier Transform can achieve 92% accuracy on the GLUE benchmark, with training times 7 times faster on GPU and 2 times faster on TPU.
Since its introduction in 2017, the Transformer architecture has dominated the NLP field. One of the only limitations of Transformer applications is the enormous computational cost of a key component – a self-attention mechanism that scales with quadratic complexity based on sequence length.
Based on this, researchers from Google suggest replacing the self-attention sublayer with a simple linear transformation that