Significantly Improve Image Recognition Network Efficiency: Facebook’s IdleBlock Hybrid Composition Method

Selected from arXiv

Authors:Bing Xu, Andrew Tulloch, Yunpeng Chen, Xiaomeng Yang, Lin Qiao

Compiled by Machine Heart

Recently, Facebook AI proposed a new convolutional module called IdleBlock and a Hybrid Composition (HC) method using this module. Experiments show that this simple new method not only significantly improves network efficiency but also surpasses most neural architecture search works, achieving SOTA performance at comparable computational costs. This research is expected to provide new insights into the development of image recognition networks, neural architecture search, and even network design ideas in other fields.
Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Link: https://arxiv.org/pdf/1911.08609.pdf

In recent years, convolutional neural networks (CNNs) have dominated the field of computer vision. Since the birth of AlexNet, the computer vision community has discovered several designs that can improve CNNs, making this backbone network stronger and more efficient. Notable single branch networks include Network in Network, VGGNet, ResNet, DenseNet, ResNext, MobileNet v1/v2/v3, and ShuffleNet v1/v2. Recently, multi-resolution backbone networks have also attracted the attention of the research community. To achieve multi-resolution learning, researchers have designed complex connections within modules to handle information exchange between different resolutions. Examples that effectively implement this method include MultiGrid-Conv, OctaveConv, and HRNet. These methods have made significant contributions to advancing the design ideas of backbone networks.
To design more efficient CNNs, there are two mainstream development directions: Neural Architecture Search (NAS) and Network Pruning (NP). The idea of NAS is: Given limited computational resources, automatically determine the optimal network connectivity, module design, and hyperparameters. Hyperparameter searching is a classic research topic in the field of machine learning, and the NAS referred to in this paper is limited to searching for connectivity and module design of neural networks. The idea of NP is: Given a pre-trained network, use an automatic algorithm to remove unimportant connections, thereby reducing computation and parameter counts.
Unlike NAS, which searches for connectivity, and NP, EfficientNet provides composite hyperparameters for backbone networks: depth scaling factor d, width scaling factor w, input resolution scaling factor r, known as the compound scaling factor. Based on a variant of MobileNet v3, these jointly searched scaling factors allow the EfficientNet series networks to be 5 to 10 times more efficient in terms of computational cost (MAdds) or parameter count than all previous backbone networks.
The authors believe that the current workflow for achieving efficient convolutional networks can be divided into two steps: 1) Design a network architecture; 2) Prune the connections within that network.
In the first step, the authors studied the common patterns between architectures designed by human experts and those obtained through searching: For each backbone network, its architecture is determined by the design of its regular modules and reduction blocks. Their specific operation is to insert a reduction block at the beginning of each stage and then repeatedly stack regular modules. Since each stage is repeated multiple times, the number of regular modules in each stage may vary. The authors refer to this design pattern as Monotonous Design (as shown in Figure 3).
Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method
Figure 3:Monotonous Design.
For example, ResNet is monotonously repeating the Bottleneck module, ShuffleNet is monotonously repeating the ShuffleBlock, MobileNet v2/v3 and EfficientNet are monotonously repeating the Inverted Residual Block (MBBlock), NASNet repeats the Normal Cell, and FBNet is a variant of MBBlock with different hyperparameters. The authors state that all mainstream network modules currently ensure complete information exchange.
The second step will prune certain connections, which means that not every module will guarantee complete information exchange.
In this paper, researchers from Facebook AI designed a more efficient network for image recognition tasks by considering pruning in the network design steps. They created a new module design method called Idle. In the Idle design, a subspace of the input is not transformed: it is simply idle and directly passed to the output (as shown in Figure 1).
Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method
Figure 1:Idle Design Concept.Information exchange in Idle design applies outside the Idle module.
The researchers also broke the monotonic design limitations of current advanced architectures and referred to the newly proposed non-monotonic composition method as Hybrid Composition (HC) (as shown in Figure 4).

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Figure 4:Hybrid Composition.
The initial results are consistent with expectations: If IdleBlock is used monotonously to construct a network, we get a pruned network with an acceptable accuracy loss. If IdleBlock (and MBBlock) are used for hybrid construction, it can significantly save computation while greatly reducing accuracy loss. However, there are unexpected findings in the results: By utilizing the computation saved from IdleBlock hybrid composition, increasing the network depth can yield a new SOTA network structure under the same computation—without the need for complex multi-resolution designs or neural architecture search.
Design of Idle and IdleBlock
Key Convolutional Module Design
Below is a brief overview of some key convolutional building module designs:
The goal of the Bottleneck module is to reduce the computational cost of spatial convolutions. Each module consists of expanded inputs and outputs, with no non-linearity. Residual connections are located between the expanded representations.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Figure 5:Bottleneck Module.
The goal of the Inverted Residual Block (MBBlock) is to extract rich spatial information from expanded projections. Each module consists of narrower inputs and outputs, with no non-linearity. Residual connections are located between the narrowed representations.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Figure 6:Inverted Residual Block.
ShuffleBlock v1 is an extension of the Bottleneck module. It reduces the computational cost of narrowed representations by introducing grouped pointwise operations and using channel shuffle operations afterward.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Figure 7:ShuffleBlock v1.
ShuffleBlock v2 removes grouped pointwise operations and instead uses segmentation to obtain narrowed representations. Similar to the Bottleneck module and ShuffleBlock v1, each consists of expanded inputs and outputs.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Figure 8:ShuffleBlock v2.
Idle Design
This paper proposes a new design pattern: Idle, which aims to directly pass a subspace of the input to the output tensor without undergoing any transformation. Figure 1 above illustrates the concept of Idle and network pruning. The authors introduced an Idle factor α ∈ (0, 1) in Idle design, which can also be viewed as a pruning factor. Given an input tensor x with C channels, the tensor is split into two branches: one is the active branch x_1 containing C · (1 − α) channels, which outputs a tensor y_1 with C ·(1−α) channels; the other is the Idle branch x_2 with C · α channels, which is directly copied to the output tensor y with C channels.
For differences between Idle design and Residual connection, Dense connection, and ShuffleBlock v2, please refer to the original text.
IdleBlock
First, the authors present some intuitive results and experimental lessons learned from ShuffleBlock v1/v2 and MBBlock:
  • Depth convolution needs to be applied on expanded feature maps (MobileNet v1 vs MobileNet v2);

  • Grouped convolutions are unnecessary (ShuffleNet v1 vs ShuffleNet v2);

  • Channel shuffle operations are not friendly to various accelerators and should be avoided.

Based on these lessons, this paper proposes an Idle version of MBBlock: IdleBlock. IdleBlock has two variants. If the connection function used to build the output tensor based on the two branches is concat(y1, x2), it is called L-IdleBlock (as shown in Figure 9); if the connection function is concat(x2, y1), it is called R-IdleBlock. If an IdleBlock follows an information exchange module, L-IdleBlock and R-IdleBlock are equivalent. When stacking two or more IdleBlocks, the mixing of L/R-IdleBlock units differs from the monotonous composition of L/R-IdleBlock units.
Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method
Figure 9:L-IdleBlock.
Hybrid Composition Network
Hybrid Composition (HC) is a brand new non-monotonic network composition method.
In hybrid composition, each stage of the network uses various types of building modules for non-monotonic composition. This can only be achieved when the input and output dimensional constraints of different modules are the same.
In cases using IdleBlock, both IdleBlock and MBBlock meet the input and output constraints of hybrid composition. Furthermore, once IdleBlock and MBBlock are hybridized, the first pointwise convolution operation in MBBlock can help us exchange information between the two branches of IdleBlock without performing explicit channel shuffle operations like in ShuffleBlock.
However, hybrid composition brings another problem. If a network stage contains n MBBlock units, there are 2^n candidate combinations for placing MBBlock and IdleBlock in the Idle network, but only a small portion of these candidate combinations need to be explored.
To address this challenge, the researchers explored three different hybrid composition configurations of MBBlock and IdleBlock: Maximum, None, and Adjacent. For specific explanations, please refer to the original paper.
Experiments
The authors conducted experiments based on the ImageNet 2012 classification dataset, which demonstrated the effectiveness of the hybrid composition using IdleBlock.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Table 1:Results of applying different hybrid composition configurations on MobileNet v3.★ indicates distributed training was used.None configuration is the standard MobileNet v3.Adjacent + 1 IdleBlock L/R replaces one MBBlock with one L-IdleBlock and one R-IdleBlock configuration.When using IdleBlock to add or replace MBBlock, the same SE, channels, and activation settings as the replaced MBBlock were used.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Table 2:Comparison of hybrid composition using IdleBlock on MobileNet v3 with SOTA human expert-designed networks and NAS networks.Results of the new method are represented as HC(M=x, I=y):M is the total number of MBBlocks, I is the total number of IdleBlocks.★ indicates distributed training was used.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Table 3:Results of applying different hybrid composition configurations on EfficientNet-B0.As with MobileNet v3 experiments, SE, channels, non-linear activations, and DropConnect settings were the same as the replaced MBBlock.

Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Table 4:Comparison of EfficientNet-B0 using hybrid composition with current best methods.★ indicates results of the new method;◇ indicates results from GluonCV;□ indicates networks trained and tested with images at a resolution of 320 × 320.
Additionally, the authors conducted some controlled variable experiments to further validate the effectiveness of the new method.
Machine Heart “SOTA Models”:22 major fields, 127 tasks, comprehensive coverage of SOTA research in machine learning.
Significantly Improve Image Recognition Network Efficiency: Facebook's IdleBlock Hybrid Composition Method

Click to read the original text, and visit immediately.

Leave a Comment