In the previous article, I introduced how to perform data parallelism with multiple GPUs in TensorFlow. In this article, I will discuss how to debug TensorFlow models.
Compared to regular Python code, debugging TensorFlow code can be relatively difficult due to the symbolic nature of TensorFlow. Here, I will introduce some debugging tools included in TensorFlow that make debugging easier.
The most common error when using TensorFlow may be passing tensors of incorrect sizes to operations. Many TensorFlow operations can operate on tensors of different dimensions and sizes, which can be convenient when using the API, but can be troublesome when issues arise.
1. Use tf.assert * ops
One way is to use tf.assert * ops to explicitly verify the dimensions and sizes of intermediate tensors to reduce unnecessary errors.
Check the official documentation for the complete list of assertion operations at: https://www.TensorFlow.org/api_guides/python/check_ops.
2. Use tf.Print
Another useful built-in debugging function is tf.Print, which marks the given tensor to standard error.
3. Use tf.compute_gradient_error
Not all operations in TensorFlow have gradients, and it is easy to unintentionally construct a graph in TensorFlow that cannot compute gradients. Use tf.compute_gradient_error to check for gradients.
4. Others
TensorFlow Summary and tfdbg (TensorFlow Debugger) are other tools available for debugging. Links:
https://www.TensorFlow.org/api_guides/python/summary https://www.TensorFlow.org/api_guides/python/tfdbg