Fourier Transform Replaces Transformer Self-Attention Layer

Machine Heart reports

Machine Heart Editorial Team

The research team from Google indicates that replacing the transformer self-attention layer with Fourier Transform can achieve 92% accuracy on the GLUE benchmark, with training times 7 times faster on GPU and 2 times faster on TPU.

Since its introduction in 2017, the Transformer architecture has dominated the NLP field. One of the only limitations of Transformer applications is the enormous computational cost of a key component – a self-attention mechanism that scales with quadratic complexity based on sequence length.

Based on this, researchers from Google suggest replacing the self-attention sublayer with a simple linear transformation that

Leave a Comment Cancel reply