Implementing Neural Networks from Scratch in Python

Click the "Learn Visuals" above, select to add "Star" or "Top"
Heavy content delivered in real-time

There is something that might surprise beginners: Neural network models are not complex! The term ‘neural network’ sounds sophisticated, but in reality, neural network algorithms are simpler than people think.

This article is entirely prepared for beginners. We will understand the principles of neural networks by implementing one from scratch using Python. The outline of this article is as follows:

  1. Introduction to the basic structure of neural networks – neurons;
  2. Using the sigmoid activation function in neurons;
  3. A neural network is just a collection of connected neurons;
  4. Constructing a dataset where the inputs (or features) are weight and height, and the output (or label) is gender;
  5. Learning about loss functions and mean squared error loss;
  6. Training the network to minimize its loss;
  7. Calculating partial derivatives using backpropagation;
  8. Training the network with stochastic gradient descent.

Implementing Neural Networks from Scratch in Python

Brick: Neuron

Implementing Neural Networks from Scratch in Python

First, let’s take a look at the basic unit of a neural network, the neuron. A neuron receives inputs, performs some operations on the data, and then produces an output. For example, this is a 2-input neuron:

Implementing Neural Networks from Scratch in Python

Three things happen here. First, each input is multiplied by a weight (in red):

Then, the weighted inputs are summed and a bias b (in green) is added:

Finally, this result is passed to an activation function f:

The purpose of the activation function is to transform an unbounded input into a predictable form. A commonly used activation function is the sigmoid function:

Implementing Neural Networks from Scratch in Python

The range of the sigmoid function is (0, 1). Simply put, it compresses (−∞, +∞) to (0, 1), where large negative numbers approximate to 0 and large positive numbers approximate to 1.

Implementing Neural Networks from Scratch in Python

A simple example

Implementing Neural Networks from Scratch in Python

Suppose we have a neuron with the sigmoid activation function and the parameters as follows:

It is represented in vector form. Now, we give this neuron an input. We use the dot product to represent it:

When the input is [2, 3], the output of this neuron is 0.999. The process of getting an output given an input is called feedforward.

Implementing Neural Networks from Scratch in Python

Encoding a neuron

Implementing Neural Networks from Scratch in Python

Let’s implement a neuron! Use Python’s NumPy library to complete the mathematical calculations:

import numpy as np

def sigmoid(x):
  # Our activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

class Neuron:
  def __init__(self, weights, bias):
    self.weights = weights
    self.bias = bias

  def feedforward(self, inputs):
    # Weighted input, add bias, then use activation function
    total = np.dot(self.weights, inputs) + self.bias
    return sigmoid(total)

weights = np.array([0, 1]) # w1 = 0, w2 = 1
bias = 4                   # b = 4
n = Neuron(weights, bias)

x = np.array([2, 3])       # x1 = 2, x2 = 3
print(n.feedforward(x))    # 0.9990889488055994

Do you remember this number? It is the 0.999 we calculated earlier in the example.

Implementing Neural Networks from Scratch in Python

Assembling neurons into a network

Implementing Neural Networks from Scratch in Python

A neural network is a collection of neurons. Here is a simple neural network:

Implementing Neural Networks from Scratch in Python

This network has two inputs, a hidden layer with two neurons (h1, h2), and an output layer with one neuron (o1). Note that the input to o1 is the output of h1 and h2, thus forming a network.

The hidden layer is the layer between the input layer and the output layer, and it can have multiple layers.

Implementing Neural Networks from Scratch in Python

Example: Feedforward

Implementing Neural Networks from Scratch in Python

Continuing with the network from the previous diagram, suppose each neuron’s weights are and the bias is the same with the activation function also being the sigmoid function. Let the corresponding neuron outputs be represented by .

What results do we get when the input is ?

The output of this neural network for the input is 0.7216, which is very simple.

The number of layers in a neural network and the number of neurons in each layer can be arbitrary. The basic logic is the same: inputs are propagated forward through the neural network, ultimately yielding an output. Next, we will continue using this network.

Implementing Neural Networks from Scratch in Python

Encoding the neural network: Feedforward

Implementing Neural Networks from Scratch in Python

Next, we will implement the feedforward mechanism of this neural network, still using this diagram:

Implementing Neural Networks from Scratch in Python
import numpy as np

# ... code from previous section here

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)
  Each neuron has the same weights and bias:
    - w = [0, 1]
    - b = 0
  '''
  def __init__(self):
    weights = np.array([0, 1])
    bias = 0

    # This is the neuron class from the previous section
    self.h1 = Neuron(weights, bias)
    self.h2 = Neuron(weights, bias)
    self.o1 = Neuron(weights, bias)

  def feedforward(self, x):
    out_h1 = self.h1.feedforward(x)
    out_h2 = self.h2.feedforward(x)

    # The input to o1 is the output of h1 and h2
    out_o1 = self.o1.feedforward(np.array([out_h1, out_h2]))

    return out_o1

network = OurNeuralNetwork()
x = np.array([2, 3])
print(network.feedforward(x)) # 0.7216325609518421

The result is correct, everything seems fine.

Implementing Neural Networks from Scratch in Python

Training the neural network: Part 1

Implementing Neural Networks from Scratch in Python

Now we have the following data:

Name Weight (lbs) Height (inches) Gender
Alice 133 65 F
Bob 160 72 M
Charlie 152 70 M
Diana 120 60 F

Next, we will use this data to train the weights and biases of the neural network so that we can predict gender based on height and weight:

Implementing Neural Networks from Scratch in Python

We will represent male (M) as 0 and female (F) as 1, and we have transformed the values:

Name Weight (minus 135) Height (minus 66) Gender
Alice -2 -1 1
Bob 25 6 0
Charlie 17 4 0
Diana -15 -6 1

I randomly chose 135 and 66 to standardize the data; usually, the mean value is used.

Implementing Neural Networks from Scratch in Python

Loss

Implementing Neural Networks from Scratch in Python

Before training the network, we need to quantify whether the current network is ‘good’ or ‘bad’, so we can look for a better network. This is the purpose of defining loss.

Here we use mean squared error (MSE) loss: , let’s take a closer look:

  • n is the number of samples, which is 4 here (Alice, Bob, Charlie, and Diana).
  • y is the variable to predict, which is gender here.
  • y_true is the true value of the variable (‘correct answer’). For example, Alice’s y_true is 1 (female).
  • y_pred is the predicted value of the variable. This is the output of our network.

e is called the squared error. Our loss function is the average of all squared errors. The better the prediction, the less the loss.

Better prediction = less loss!

Training the network = minimizing its loss.

Implementing Neural Networks from Scratch in Python

Example of loss calculation

Implementing Neural Networks from Scratch in Python

Suppose our network always outputs 0, in other words, thinks everyone is male. How does the loss look?

Name y_true y_pred (y_true – y_pred)^2
Alice 1 0 1
Bob 0 0 0
Charlie 0 0 0
Diana 1 0 1

Implementing Neural Networks from Scratch in Python

Code: MSE loss

Implementing Neural Networks from Scratch in Python

Here is the code to calculate the MSE loss:

import numpy as np

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

y_true = np.array([1, 0, 0, 1])
y_pred = np.array([0, 0, 0, 0])

print(mse_loss(y_true, y_pred)) # 0.5

If you do not understand this code, you can take a look at the quick start guide on NumPy regarding array operations.

Alright, let’s continue.

Implementing Neural Networks from Scratch in Python

Training the neural network: Part 2

Implementing Neural Networks from Scratch in Python

Now we have a clear goal: to minimize the loss of the neural network. By adjusting the weights and biases of the network, we can change its predictions, but how can we gradually reduce the loss?

This section involves multivariable calculus; if you are not familiar with calculus, you may skip the mathematical content.

To simplify the problem, let’s assume our dataset only contains Alice:

Suppose our network always outputs 0, in other words, thinks everyone is male. How does the loss look?

Name Weight (minus 135) Height (minus 66) Gender
Alice -2 -1 1

Then the mean squared error loss is just Alice’s squared error:

We can also see the loss as a function of the weights and biases. Let’s label the weights and biases in the network:

Implementing Neural Networks from Scratch in Python

Thus, we can express the network’s loss as:

Suppose we want to optimize w, how will the loss L change when we change w? We can use the partial derivative to answer this question. How to calculate it?

The following data is somewhat complicated; do not worry, prepare paper and pen.

First, let’s rewrite this partial derivative:

Since we already know y, we can calculate

Now let’s tackle y_pred. These are the outputs of the respective neurons, we have:

Since w only affects y_pred (not y), we can do this:

For w, we can also do this:

Here, h is height, and w is weight. This is the second time we have seen the derivative of the sigmoid function. Solving:

We will use this derivative later.

We have already decomposed the loss into several computable parts:

This method of calculating partial derivatives is called the ‘backpropagation algorithm’.

A lot of mathematical symbols; if you haven’t figured it out yet, let’s look at a practical example.

Implementing Neural Networks from Scratch in Python

Example: Calculating partial derivatives

Implementing Neural Networks from Scratch in Python

We are still looking at the case where the dataset only contains Alice:

Name
Alice 1 0 1
Name Height (minus 135) Weight (minus 66) Gender
Alice -2 -1 1

Initialize all weights and biases to 1 and 0 respectively. Perform feedforward calculations in the network:

The output of the network is y_pred, which does not have a strong inclination towards Male (0) or Female (1). Let’s calculate

Hint: We have already obtained the derivative of the sigmoid activation function.

Done! This result means that increasing w will slightly increase y_pred.

Implementing Neural Networks from Scratch in Python

Training: Stochastic Gradient Descent

Implementing Neural Networks from Scratch in Python

Now everything is ready to train the neural network! We will use an optimization algorithm called stochastic gradient descent to optimize the weights and biases of the network to minimize the loss. The core is this update equation:

α is a constant known as the learning rate, used to adjust the speed of training. What we need to do is subtract

  • If dL is a positive number, w decreases, L decreases.
  • If dL is a negative number, w increases, L increases.

If we optimize each weight and bias in the network this way, the loss will continuously decrease, and the network’s performance will continuously improve.

Our training process is as follows:

  1. Select a sample from our dataset and optimize using stochastic gradient descent – we optimize for one sample at a time;
  2. Calculate the partial derivative of each weight or bias (e.g., w1, b1, etc.);
  3. Update each weight and bias using the update equation;
  4. Repeat the first step;

Implementing Neural Networks from Scratch in Python

Code: A Complete Neural Network

Implementing Neural Networks from Scratch in Python

We can finally implement a complete neural network:

Name Height (minus 135) Weight (minus 66) Gender
Alice -2 -1 1
Bob 25 6 0
Charlie 17 4 0
Diana -15 -6 1
Implementing Neural Networks from Scratch in Python
import numpy as np

def sigmoid(x):
  # Sigmoid activation function: f(x) = 1 / (1 + e^(-x))
  return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):
  # Derivative of sigmoid: f'(x) = f(x) * (1 - f(x))
  fx = sigmoid(x)
  return fx * (1 - fx)

def mse_loss(y_true, y_pred):
  # y_true and y_pred are numpy arrays of the same length.
  return ((y_true - y_pred) ** 2).mean()

class OurNeuralNetwork:
  '''
  A neural network with:
    - 2 inputs
    - a hidden layer with 2 neurons (h1, h2)
    - an output layer with 1 neuron (o1)

  *** Disclaimer ***:
    The code below is for simplicity and demonstration, not optimal.
    Real neural network code is completely different from this. Do not use this code.
    Instead, read/run it to understand how this specific network works.
  '''
  def __init__(self):
    # Weights
    self.w1 = np.random.normal()
    self.w2 = np.random.normal()
    self.w3 = np.random.normal()
    self.w4 = np.random.normal()
    self.w5 = np.random.normal()
    self.w6 = np.random.normal()

    # Biases
    self.b1 = np.random.normal()
    self.b2 = np.random.normal()
    self.b3 = np.random.normal()

  def feedforward(self, x):
    # X is a numeric array with 2 elements.
    h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
    h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
    o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
    return o1

  def train(self, data, all_y_trues):
    '''
    - data is a (n x 2) numpy array, n = # of samples in the dataset.
    - all_y_trues is a numpy array with n elements.
      Elements in all_y_trues correspond to those in data.
    '''
    learn_rate = 0.1
    epochs = 1000 # Number of times to go through the entire dataset

    for epoch in range(epochs):
      for x, y_true in zip(data, all_y_trues):
        # --- Do a feedforward (we will need these values later)
        sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
        h1 = sigmoid(sum_h1)

        sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
        h2 = sigmoid(sum_h2)

        sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
        o1 = sigmoid(sum_o1)
        y_pred = o1

        # --- Calculate partial derivatives.
        # --- Naming: d_L_d_w1 represents "partial L / partial w1"
        d_L_d_ypred = -2 * (y_true - y_pred)

        # Neuron o1
        d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
        d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
        d_ypred_d_b3 = deriv_sigmoid(sum_o1)

        d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
        d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

        # Neuron h1
        d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
        d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
        d_h1_d_b1 = deriv_sigmoid(sum_h1)

        # Neuron h2
        d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
        d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
        d_h2_d_b2 = deriv_sigmoid(sum_h2)

        # --- Update weights and biases
        # Neuron h1
        self.w1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w1
        self.w2 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_w2
        self.b1 -= learn_rate * d_L_d_ypred * d_ypred_d_h1 * d_h1_d_b1

        # Neuron h2
        self.w3 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w3
        self.w4 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_w4
        self.b2 -= learn_rate * d_L_d_ypred * d_ypred_d_h2 * d_h2_d_b2

        # Neuron o1
        self.w5 -= learn_rate * d_L_d_ypred * d_ypred_d_w5
        self.w6 -= learn_rate * d_L_d_ypred * d_ypred_d_w6
        self.b3 -= learn_rate * d_L_d_ypred * d_ypred_d_b3

      # --- Calculate total loss at the end of each epoch
      if epoch % 10 == 0:
        y_preds = np.apply_along_axis(self.feedforward, 1, data)
        loss = mse_loss(all_y_trues, y_preds)
        print("Epoch %d loss: %.3f" % (epoch, loss))

# Define dataset
data = np.array([
  [-2, -1],  # Alice
  [25, 6],   # Bob
  [17, 4],   # Charlie
  [-15, -6], # Diana
])
all_y_trues = np.array([
  1, # Alice
  0, # Bob
  0, # Charlie
  1, # Diana
])

# Train our neural network!
network = OurNeuralNetwork()
network.train(data, all_y_trues)

As the network learns, the loss steadily decreases.

Implementing Neural Networks from Scratch in Python

Now we can use this network to predict gender:

# Make some predictions
emily = np.array([-7, -3]) # 128 lbs, 63 inches
frank = np.array([20, 2])  # 155 lbs, 68 inches
print("Emily: %.3f" % network.feedforward(emily)) # 0.951 - F
print("Frank: %.3f" % network.feedforward(frank)) # 0.039 - M

Implementing Neural Networks from Scratch in Python

What’s next?

Implementing Neural Networks from Scratch in Python

We have completed a simple neural network, let’s quickly review:

  • Introduced the basic structure of neural networks – neurons;
  • Used the sigmoid activation function in neurons;
  • Explained that a neural network is a collection of connected neurons;
  • Constructed a dataset with inputs (or features) as weight and height, and outputs (or labels) as gender;
  • Learned about loss functions and mean squared error loss;
  • Trained the network to minimize its loss;
  • Calculated partial derivatives using backpropagation;
  • Trained the network using stochastic gradient descent;

Next, you can:

  • Implement larger and better neural networks using machine learning libraries like TensorFlow, Keras, and PyTorch;
  • Explore other types of activation functions;
  • Explore other types of optimizers;
  • Learn about convolutional neural networks, which revolutionized the field of computer vision;
  • Learn about recurrent neural networks, commonly used for natural language processing;
  • Author: Victor ZhouOriginal link: https://victorzhou.com/blog/intro-to-neural-networks/

  • Download 1: OpenCV-Contrib Extension Module Chinese Tutorial
    
    Reply "OpenCV Extension Module Chinese Tutorial" in the background of "Learn Visuals" public account to download the first Chinese version of the OpenCV extension module tutorial, covering installation, SFM algorithms, stereo vision, object tracking, biological vision, super-resolution processing, and more than twenty chapters of content.
    
    Download 2: Python Visual Practical Projects 52 Lectures
    
    Reply "Python Visual Practical Projects" in the background of "Learn Visuals" public account to download 31 visual practical projects including image segmentation, mask detection, lane line detection, vehicle counting, eyeliner addition, license plate recognition, character recognition, emotion detection, text content extraction, face recognition, etc., to help quickly learn computer vision.
    
    Download 3: OpenCV Practical Projects 20 Lectures
    
    Reply "OpenCV Practical Projects 20 Lectures" in the background of "Learn Visuals" public account to download 20 practical projects based on OpenCV to achieve advanced learning in OpenCV.
    
    Exchange Group
    
    Welcome to join the public account reader group to exchange with peers. Currently, there are WeChat groups for SLAM, 3D vision, sensors, autonomous driving, computational photography, detection, segmentation, recognition, medical imaging, GAN, algorithm competitions, etc. (will gradually be subdivided). Please scan the WeChat number below to join the group, and note: "Nickname + School/Company + Research Direction", for example: "Zhang San + Shanghai Jiao Tong University + Visual SLAM". Please follow the format; otherwise, it will not be approved. Successful additions will be invited into relevant WeChat groups based on research direction. Please do not send advertisements in the group, otherwise, you will be removed from the group. Thank you for your understanding~
    
    

Leave a Comment