TensorBoard: Visualizing Training Process in TensorFlow 2.0

Written by / Li Xihan, Google Developers Expert

This article is excerpted from “Simple and Rough TensorFlow 2.0”

TensorBoard: Visualizing Training Process in TensorFlow 2.0

TensorBoard: Visualizing the Training Process

Sometimes, you want to observe the changes of various parameters during the model training process (for example, the value of the loss function). While you can check this through command line output, it may not be intuitive enough. TensorBoard is a tool that helps us visualize the training process.

First, create a folder in the code directory (e.g., ./tensorboard) to store the TensorBoard log files, and instantiate a logger in the code:

summary_writer = tf.summary.create_file_writer('./tensorboard')     # The parameter is the directory where the log files are saved
Next, when you need to log parameters during training, use the with statement to specify the logger you want to use, and run tf.summary.scalar(name, tensor, step=batch_index) on the parameters that need to be logged (usually scalars) to record the values of the parameters at the specified step. The step parameter can be defined according to your needs, usually set to the batch index during the current training process. The overall framework is as follows:
summary_writer = tf.summary.create_file_writer('./tensorboard')
# Start model training
for batch_index in range(num_batches):
    # ... (training code, current batch loss value stored in variable loss)
    with summary_writer.as_default():                               # The logger you want to use
        tf.summary.scalar("loss", loss, step=batch_index)
        tf.summary.scalar("MyScalar", my_scalar, step=batch_index)  # You can also add other custom variables

Every time tf.summary.scalar() is called, the logger writes a record to the log file. In addition to the simplest scalars, TensorBoard can also visualize other types of data (such as images, audio, etc.), see the TensorBoard documentation for details.

When we want to visualize the training process, open a terminal in the code directory (enter the TensorFlow conda environment if necessary) and run:

tensorboard --logdir=./tensorboard

Then use a browser to access the URL output by the command line program (usually http://computer_name:6006) to access the TensorBoard visual interface, as shown in the figure below:

TensorBoard: Visualizing Training Process in TensorFlow 2.0

By default, TensorBoard updates data every 30 seconds. However, you can also manually refresh by clicking the refresh button in the upper right corner.

There are a few points to note when using TensorBoard:

  • If you need to retrain, you need to delete the information in the log folder and restart TensorBoard (or create a new log folder and start TensorBoard, setting the --logdir parameter to the newly created folder);

  • Keep the log folder directory entirely in English.

Finally, here is an example using the multilayer perceptron model from the previous chapter to demonstrate the use of TensorBoard:

import tensorflow as tf
from zh.model.mnist.mlp import MLP
from zh.model.utils import MNISTLoader

num_batches = 10000
batch_size = 50
learning_rate = 0.001
model = MLP()
data_loader = MNISTLoader()
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
summary_writer = tf.summary.create_file_writer('./tensorboard')     # Instantiate logger
for batch_index in range(num_batches):
    X, y = data_loader.get_batch(batch_size)
    with tf.GradientTape() as tape:
        y_pred = model(X)
        loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)
        loss = tf.reduce_mean(loss)
        print("batch %d: loss %f" % (batch_index, loss.numpy()))
        with summary_writer.as_default():                           # Specify logger
            tf.summary.scalar("loss", loss, step=batch_index)       # Write the current loss value to the logger
    grads = tape.gradient(loss, model.variables)
    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))
* This code example calls the previously implemented modules MLP (Multilayer Perceptron) and MNISTLoader (MNIST dataset loading module). Please visit https://github.com/snowkylin/tensorflow-handbook/tree/master/source/_static/code to obtain the source code.

Benefits | Q&A Session

We know that there are many challenges and difficulties to overcome when starting with a new technology. If you have any questions related to TensorFlow, please leave a comment below this article, and our engineers and GDE will select representative questions to answer in the next issue~

In the previous article “Common Modules of TensorFlow 2.0 Part 1: Checkpoint“, we answered some representative questions as follows:

Q1: ModuleNotFoundError: No module named ‘zh’, what is this module? Thank you for your reply.

A: This part calls the code modules that have been implemented in the previously serialized articles. To obtain the source code, please visit:

  • https://github.com/snowkylin/tensorflow-handbook/tree/master/source/_static/code

Q2: Recently I want to migrate tf1’s ckpt to tf2, it seems very troublesome, not only does the model need to be rewritten, but also tf2’s restore cannot use tf1’s. More importantly, loading layers before the logit layer is very troublesome. When can we bring those loading methods from pytorch? I really want to reshape a 2D network to 3D, it is really troublesome.

A: You might consider using tf.compat.v1 to use TensorFlow 1.X’s API to read content saved in version 1.X.

Q3: Does TF have any support and tutorials for model interpretation?

A: Model interpretation is a relatively broad topic, and perhaps some tools in TensorFlow Extended (TFX) (like TensorFlow Model Analysis) can help you. Reference:

  • https://www.tensorflow.org/tfx/model_analysis/get_started

Q4: Are there any tutorials for multi-GPU and distributed training?

A: Please refer to:

  • https://tf.wiki/zh/appendix/distributed.html

  • https://www.tensorflow.org/guide/distributed_training

Table of Contents for “Simple and Rough TensorFlow 2.0”

  • TensorFlow 2.0 Installation Guide

  • TensorFlow 2.0 Basics: Tensors, Automatic Differentiation, and Optimizers

  • TensorFlow 2.0 Models: Building Model Classes

  • TensorFlow 2.0 Models: Multilayer Perceptron

  • TensorFlow 2.0 Models: Convolutional Neural Networks

  • TensorFlow 2.0 Models: Recurrent Neural Networks

  • TensorFlow 2.0 Models: Keras Training Process and Custom Components

  • Common Modules of TensorFlow 2.0 Part 1: Checkpoint

  • Common Modules of TensorFlow 2.0 Part 2: TensorBoard (This Article)

Reply with the keyword “Manual” to get a collection of series content and FAQ.

TensorBoard: Visualizing Training Process in TensorFlow 2.0

Leave a Comment