New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

The Combination of Frequency Domain and Attention Mechanism is an innovative network design approach that utilizes frequency domain analysis to enhance the feature extraction process and further optimizes the efficiency of feature utilization through attention mechanisms.

This strategy helps the model capture and utilize key frequency components in signals, which not only improves the model’s performance and accuracy but also simplifies the model design and optimization process to some extent.

Taking the FcaNet from the Zhejiang University team as an example:

FcaNet is a very clever channel attention mechanism that extends SE from the perspective of the frequency domain using DCT. This method is simple and efficient, requiring only minor modifications to the original code to achieve a 1.8% performance improvement over the SENet50 model.

Therefore, the combination of frequency domain and Attention is also a popular direction in deep learning, and due to our previous lack of focus on the frequency domain, there are many innovative points to explore.

This article shares 9 Innovative Fusion Solutions of Frequency Domain + Attention Mechanism, including both the latest and classic methods, mainly involving adaptive frequency domain feature extraction + attention, multi-scale frequency domain + attention, etc. for inspiration.

Scan the code to add Xiaoxiang, reply “Frequency Domain Attention to getallpapers + code

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

SpectFormer

SpectFormer: Frequency and Attention is What You Need in a Vision Transformer

Method: Previously, transformers either used full attention layers or spectral layers. Spectformer combines both aspects and shows better performance than either full attention layers or full spectral layers. The authors proposed a parameterized method for further adaptability in specific tasks.

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Details of the SpectFormer Architecture

Innovative Points:

  • The key idea of the SpectFormer architecture is to segment the image into a series of patches and obtain patch embeddings using a linear projection layer. Standard positional encoding layers are then used for positional embedding. SpectFormer includes a series of transformer blocks with spectral layers and attention layers. This architecture can capture local and semantic information in images.
  • Variants of the SpectFormer architecture include SpectFormer-s, SpectFormer-B, and SpectFormer-L, which use different spectral layers such as FNet, FNO, GFNet, and AFNO. These variants can reduce the number of parameters and lower computational complexity.
  • The configuration using the initial spectral layer followed by multi-head attention layers in the SpectFormer architecture is more beneficial than other configurations, leading to the proposal of the SpectFormer architecture.
New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Some Experimental Results

SFANet

Spatial-Frequency Attention for Image Denoising

Method: The paper introduces a Spatial Frequency Attention Network (SFANet) for image denoising. The authors propose a Windowed Frequency Channel Attention (WFCA) module to effectively model the long-range dependencies of images. WFCA utilizes channel attention in the deep frequency feature domain to adaptively model the combined amplitude and phase information of deep frequency features. At the same time, an expanded self-attention (SA) module is used in the spatial domain to model long-term dependencies.

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

SFANet Overview

Innovative Points:

  • A Windowed Frequency Channel Attention (WFCA) block is proposed to effectively model the long-range dependencies of images. The WFCA module uses channel attention in the frequency domain to extract global dependencies, allowing for more global dependency modeling compared to traditional SA-based blocks, with logarithmic linear complexity.
  • The windowed self-attention mechanism is applied in SFANet to model long-term dependencies in the spatial domain. By using a Multi-Scale Dilated Self-Attention (MDSA) block, the receptive field of windowed self-attention on shallow features is expanded without additional computation, thus better capturing long-range information in images.
  • A windowed frequency channel attention (WFCA) module is proposed in the frequency attention module (FAM), solving the size mismatch issue of frequency domain inputs with a simple and effective windowing strategy, and utilizing channel attention in the frequency domain to improve image recovery performance.
New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Some Experimental Results

Scan the code to add Xiaoxiang, reply “Frequency Domain Attention to getallpapers + code

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

DFANet

DFANet: Denoising Frequency Attention Network for Building Footprint Extraction in Very-High-Resolution Remote Sensing Images

Method: The paper proposes a Denoising Frequency Attention Network (DFANet) for building footprint extraction in high-resolution remote sensing images. DFANet consists of three parts: a U-shaped backbone network, a Denoising Frequency Attention Block (DFAB), and a Pyramid Pooling Module (PPM). DFAB enhances building-related information at a lower cost, thereby improving feature maps at each layer. To better capture building features of different shapes and sizes, the widely used PPM is introduced to expand the receptive field of DFANet.

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

DFANet Overview

Innovative Points:

  • A Denoising Frequency Attention Network named DFANet is proposed for extracting building footprints from high-resolution remote sensing images. This method leverages frequency differences and filters and enhances feature maps through attention mechanisms to improve building detection performance.
  • The Pyramid Pooling Module (PPM) is introduced to handle buildings of different scales. By increasing the receptive field, PPM significantly improves the performance of building extraction tasks.
New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Some Experimental Results

FEDformer

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Method: The paper proposes a frequency domain-based time series compression representation method and applies it to long-term time series forecasting. Unlike other long-term prediction algorithms, the authors use neural networks for frequency domain operations. They also introduce a random selection Fourier component time series compression representation method for efficient computation of the Transformer. Additionally, the authors present a wavelet transform-based time series representation method and compare it with the Fourier basis representation method.

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

FEDformer Structure

Innovative Points:

  • Frequency Enhanced Decomposed Transformer (FEDformer): A model architecture combining Fourier analysis and Transformer for long-term time series forecasting is proposed. By applying Fourier transform in the Transformer, the model can better capture the global features of time series. The model is designed to bring the distribution of prediction results closer to the distribution of true results, thus improving prediction accuracy. FEDformer achieves improvements of 14.8% and 22.6% on multi-dimensional and one-dimensional time series forecasting problems, respectively.
  • A random Fourier mode selection strategy is proposed to address the issue of selecting frequency components in Fourier transform. Through theoretical analysis and empirical research, it is proven that randomly selecting a certain number of Fourier components can better represent time series and reduce the computational complexity of the Transformer from quadratic to linear.
New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Some Experimental Results

Scan the code to add Xiaoxiang, reply “Frequency Domain Attention to getallpapers + code

New Ideas on Attention Mechanism: Frequency Domain + Attention, Precision Exceeds SOTA 22.6%

Leave a Comment