How To Determine The Number Of Layers And Neurons In A Neural Network?

Author: Yu Yu Lu Ming Editor: Peter

Hello everyone, I am Peter~

There are many doubts about the number of hidden layers and neurons in neural networks. I happened to see an article that answered these questions well, and I’m sharing it with you~

https://zhuanlan.zhihu.com/p/100419971

1. Introduction

BP neural networks are mainly composed of input layer, hidden layer, and output layer. The number of nodes in the input and output layers is fixed.

Whether for regression or classification tasks, choosing the right number of layers and the number of nodes in the hidden layers will greatly affect the performance of the neural network.

How To Determine The Number Of Layers And Neurons In A Neural Network? — Image Source: Andrew Ng – Deep Learning

The number of nodes in the input and output layers is easy to determine:

The number of neurons in the input layer: equal to the number of input variables in the data to be processed
The number of neurons in the output layer: equal to the number of outputs associated with each input

The difficulty lies in determining the appropriate number of hidden layers and their neurons.

2. Number of Hidden Layers

Determining the number of hidden layers is a crucial question. First, one needs to note:

In neural networks, hidden layers are needed only when the data is not linearly separable!

Since a single sufficiently large hidden layer is adequate for approximation of most functions, why would anyone ever use more? One reason hangs on the words “sufficiently large”. Although a single hidden layer is optimal for some functions, there are others for which a single-hidden-layer-solution is very inefficient compared to solutions with more layers.——Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, 1999

Therefore, for general simple datasets, one or two hidden layers are usually sufficient. However, for complex datasets involving time series or computer vision, additional layers are required. A single-layer neural network can only represent linear separable functions, which are very simple problems, such as two classes in a classification problem that can be neatly separated by a straight line.

Specifically, the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer with any “squashing” activation function (such as the logistic sigmoid activation function) can approximate any Borel measurable function from one finite-dimensional space to another with any desired non-zero amount of error, provided that the network is given enough hidden units.——Deep learning, 2016

In summary, multiple hidden layers can be used to fit non-linear functions.

The relationship between the number of hidden layers and the effectiveness/use of the neural network can be summarized in the following table:

To summarize:

No hidden layers: can only represent linearly separable functions or decisions
Number of hidden layers = 1: can fit any function that “contains continuous mappings from one finite space to another”
Number of hidden layers = 2: with appropriate activation functions, can represent decision boundaries of arbitrary precision and can fit any smooth mapping with any precision
Number of hidden layers > 2: additional hidden layers can learn complex descriptions (some form of automatic feature engineering)

Empirically, greater depth does seem to result in better generalization for a wide variety of tasks. This suggests that using deep architectures does indeed express a useful prior over the space of functions the model learns.——Deep learning, 2016

The deeper the layers, the theoretically enhanced ability to fit functions, and the performance should be better. However, in practice, deeper layers may lead to overfitting issues and also increase training difficulty, making it hard for the model to converge.

Therefore, my experience is that when using BP neural networks, it is best to refer to existing high-performing models. If there are none, start with one or two layers based on the table above and try not to use too many layers.

In specific fields such as CV and NLP, special models like CNN, RNN, and attention should be used. One should not blindly stack multiple layers of neural networks without considering the practicalities. Trying transfer learning and fine-tuning existing pre-trained models can yield better results with less effort.

Determining the number of neurons in the hidden layers is just a small part of the problem. It is also necessary to determine how many neurons each of these hidden layers should contain. The following will introduce this process.

3. Number of Neurons in Hidden Layers

Using too few neurons in the hidden layers will lead to underfitting.

Conversely, using too many neurons will also cause some problems. First, having too many neurons in the hidden layers may lead to overfitting.

When a neural network has too many nodes (too much information processing capability), the limited amount of information in the training set is insufficient to train all the neurons in the hidden layers, thus leading to overfitting.

Even if the information in the training data is sufficient, having too many neurons in the hidden layers will increase training time, making it difficult to achieve the desired results. Clearly, choosing an appropriate number of hidden layer neurons is crucial.

Generally, using the same number of neurons across all hidden layers is sufficient. For certain datasets, having a larger first layer followed by smaller layers can lead to better performance because the first layer can learn many low-order features, which can be fed into subsequent layers to extract higher-order features.

It is important to note that adding layers will yield greater performance improvement than adding more neurons in a single hidden layer. Therefore, do not add too many neurons in one hidden layer.

There are many heuristics for determining the number of neurons.

On Stackoverflow, experts have provided empirical formulas for reference:

Another method for reference is that the number of neurons is typically determined by several principles:

The number of hidden neurons should be between the size of the input layer and the size of the output layer.
The number of hidden neurons should be 2/3 of the size of the input layer plus 2/3 of the size of the output layer.
The number of hidden neurons should be less than twice the size of the input layer.

In summary, the optimal number of hidden layer neurons needs to be determined through continuous experimentation. It is recommended to start with a small number, such as 1 to 5 layers and 1 to 100 neurons. If underfitting occurs, gradually add more layers and neurons. If overfitting occurs, reduce the number of layers and neurons.

Additionally, practical methods such as Batch Normalization, Dropout, and Regularization can be considered to reduce overfitting.

4. References

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep learning.” nature521.7553 (2015): 436-444.
Heaton Research: The Number of Hidden Layers
Ahmed Gad, Beginners Ask “How Many Hidden Layers/Neurons to Use in Artificial Neural Networks?”
Jason Brownlee, How to Configure the Number of Layers and Nodes in a Neural Network
Lavanya Shukla, Designing Your Neural Networks

This article is reprinted from the WeChat public account “Turing Artificial Intelligence”.

（End）

1. Introduction

2. Number of Hidden Layers

3. Number of Neurons in Hidden Layers

4. References

Leave a Comment Cancel reply