Author: Arjun Sarkar

Translator: Chen Zhiyan

Proofreader: Ouyang Jin

This article is about 1900 words, and is recommended for an 8minute read.

This article will teach you how to write custom loss functions using wrapper functions and OOP in Python.

Tags: TensorFlow 2, Loss Function

Figure 1: Gradient Descent Algorithm (Source: Public Domain, https://commons.wikimedia.org/w/index.php?curid=521422)

Neural networks use training data to map a set of inputs to a set of outputs, achieving this through some form of optimization algorithm, such as gradient descent, stochastic gradient descent, AdaGrad, AdaDelta, etc., with the latest algorithms including Adam, Nadam, or RMSProp. The “gradient” in gradient descent refers to the error gradient. After each iteration, the network compares its predicted output with the actual output and then calculates the “error”.

Typically, for neural networks, the goal is to minimize the error.The objective function that seeks to minimize the error is commonly referred to as the cost function or loss function, and the value calculated by the “loss function” is referred to as the “loss”. Typical loss functions used in various problems include:

Mean Squared Error;
Mean Squared Logarithmic Error;
Binary Crossentropy;
Categorical Crossentropy;
Sparse Categorical Crossentropy.

TensorFlow already includes the above loss functions, which can be called directly, as shown below:

1. Call the loss function as a string

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

2. Call the loss function as an object

from tensorflow.keras.losses import mean_squared_error
model.compile(loss=mean_squared_error, optimizer='sgd')

The advantage of calling the loss function as an object is that parameters such as thresholds can be passed to the loss function.

from tensorflow.keras.losses import mean_squared_error
model.compile(loss=mean_squared_error(param=value), optimizer='sgd')

Creating Custom Loss Functions Using Existing Functions:

To create a loss function using existing functions, you first need to define the loss function, which will accept two parameters, y_true (true labels/outputs) and y_pred (predicted labels/outputs).

def loss_function(y_true, y_pred):
    ***some calculation***
    return loss

Creating a Mean Squared Error Loss Function (RMSE):

Define the loss function name – my_rmse. The goal is to return the mean squared error between the target (y_true) and the prediction (y_pred).

The formula for RMSE is:

Error: The difference between the true labels and predicted labels.
sqr_error: The square of the error.
mean_sqr_error: The mean of the squared errors.
sqrt_mean_sqr_error: The square root of the mean squared error (Root Mean Square Error).

Creating Huber Loss Function:

Figure 2: Huber Loss Function (Green) and Squared Error Loss Function (Blue) (Source: Qwertyus— Own work, CCBY-SA4.0, https://commons.wikimedia.org/w/index.php?curid=34836380)

The calculation formula for Huber loss function:

Here, δ is the threshold, and a is the error (which will be calculated as the difference between the actual labels and predicted labels).

When |a| ≤ δ, loss = 1/2 * (a)²

When |a| > δ, loss = δ (|a| – (1/2) * δ)

Source Code:

Detailed Explanation:

First, define a function – my_huber_loss, which requires two parameters: y_true and y_pred,

Set the threshold threshold = 1.

Calculate the error error a = y_true – y_pred. Next, check whether the absolute value of the error is less than or equal to the threshold, is_small_error returns a boolean (true or false).

When |a| ≤ δ, loss = 1/2 * (a)², calculate small_error_loss, which is the square of the error divided by 2. Otherwise, when |a| > δ, the loss equals δ (|a| – (1/2) * δ), calculate this value with big_error_loss.

Finally, in the return statement, first check whether is_small_error is true or false; if true, the function returns small_error_loss, otherwise it returns big_error_loss, using tf.where to implement.

You can use the following code to compile the model:

In the above code, set the threshold to 1.

If you need to adjust hyperparameters (threshold), and include a new threshold during compilation, you must use a wrapper function to encapsulate it, meaning that the loss function is wrapped into another external function. Here, we need to use a wrapper function because the loss function by default only accepts y_true and y_pred values, and cannot add any other parameters to the original loss function.

Using the Wrapped Huber Loss Function

Source code for the wrapper function:

At this point, the threshold is not hard-coded and can be passed during model compilation.

Implementing Huber Loss Function Using Classes (OOP)

Here, MyHuberLoss is the class name, which then inherits from the parent class “Loss” from tensorflow.keras.losses. MyHuberLoss inherits the Loss class and can then be used as a loss function.

__init__ initializes the objects in this class. The function is called when the class instance is created, and the init function returns the threshold, calling the function to get the y_true and y_pred parameters, declaring the threshold as a class variable, which can be assigned an initial value.

In the __init__ function, set the threshold to self.threshold. In the calling function, self.threshold references all threshold class variables. Use this loss function in model.compile:

Creating Contrastive Loss (for Siamese Networks):

Siamese networks can be used to compare whether two images are similar, and the loss function used by Siamese networks is contrastive loss.

In the equations above, Y_true is a tensor regarding image similarity details; if the images are similar, it is 1, and if the images are not similar, it is 0.

D is the tensor of the Euclidean distance between image pairs. The margin is a constant used to set the minimum distance that distinguishes images as similar or different. If Y_true=1, the first part of the equation is D², and the second part is 0, so when Y_true approaches 1, the weight of D² is heavier.

If Y_true=0, the first part of the equation becomes 0, and the second part yields some results, giving more weight to the maximum term and less weight to the D squared term, making the maximum term dominate in the loss calculation.

Implementing Contrastive Loss Function Using Wrapper Functions:

Conclusion

Loss functions not available in TensorFlow can be created using functions, wrapper functions, or similar classes.

Original Title:

Creating custom Loss functions using TensorFlow 2

Original Link:

https://towardsdatascience.com/creating-custom-loss-functions-using-tensorflow-2-96c123d5ce6c

Editor: Huang Jiyan

Proofreader: Lin Yilin

Translator’s Profile

Creating Custom Loss Functions Using TensorFlow 2

Chen Zhiyan, graduated from Beijing Jiaotong University with a Master’s degree in Communication and Control Engineering. Previously worked as an engineer at Great Wall Computer Software and Systems Company, and at Datang Microelectronics Company. Currently serves as technical support at Beijing Wuyi Super Translation Technology Co., Ltd. Engaged in the operation and maintenance of intelligent translation teaching systems, with some experience in artificial intelligence deep learning and natural language processing (NLP). Enjoys translation and creation in spare time, with translations including: IEC-ISO 7816, Iraqi Oil Engineering Project, New Fiscal Taxism Declaration, etc., among which the English translation of the “New Fiscal Taxism Declaration” was officially published in GLOBAL TIMES. Hopes to join the THU Data Pie platform’s translation volunteer group in spare time to exchange and share with everyone and make progress together.

Recruitment Information for Translation Team

Job Content: A meticulous heart is needed to translate selected foreign articles into fluent Chinese. If you are an international student in data science/statistics/computer science, or working abroad in related fields, or confident in your foreign language skills, you are welcome to join the translation group.

What You Will Get: Regular translation training to improve volunteers’ translation skills, enhance awareness of cutting-edge data science, and for overseas friends to keep in touch with domestic technology application development. The THU Data Pie’s industry-university-research background provides good development opportunities for volunteers.

Other Benefits: You will have colleagues from well-known companies in data science, students from prestigious universities such as Peking University and Tsinghua University, and overseas students as your partners in the translation group.

Click “Read the Original” at the end of the article to join the Data Pie team~

Reprint Notice

If you need to reprint, please indicate the author and source prominently at the beginning (Reprinted from: Data Pie ID: DatapiTHU), and place the Data Pie QR code at the end of the article. For articles with original identification, please send [Article Name – Pending Authorized Public Account Name and ID] to the contact email to apply for whitelist authorization and edit according to requirements.

After publishing, please provide the link feedback to the contact email (see below). Unauthorized reprints and adaptations will be pursued legally.

Click “Read the Original” to embrace the organization