From |jacobbuckman.com
Author|Jacob Buckman
Compiled by|Machine Heart (No Reposting), Xiaoshi
Although for most people the development language of TensorFlow is Python, it is not a standard Python library. This neural network framework runs by constructing a “computation graph,” which can be quite confusing for many beginners trying to understand its logic. In this article, Jacob Buckman, an engineer from Google Brain, will attempt to help you resolve the troubles you may encounter when first using TensorFlow.
Introduction
What Is This? Who Am I?
My name is Jacob, and I am a research scholar in the Google AI Resident program. I joined this program in the summer of 2017. Despite having extensive programming experience and a deep understanding of machine learning, I had never used TensorFlow before. At that time, I thought I would be able to get started quickly given my skills. But to my surprise, the learning curve was quite steep, and even months after joining the program, I occasionally felt confused about how to implement ideas using TensorFlow code. I treat this blog post as a message in a bottle to my past self: an introductory guide I wish I had been given at the start of my learning journey. I hope this blog post can also help others.
What Have Previous Tutorials Lacked?
Since the release of TensorFlow three years ago, it has become a cornerstone of the deep learning ecosystem. However, for beginners, it may not be intuitive, especially compared to neural network libraries like PyTorch or DyNet that run immediately upon definition.
There are many introductory tutorials for TensorFlow available on the market, covering topics from linear regression to MNIST classification and machine translation. These practical guides are excellent resources for getting TensorFlow projects up and running and can serve as entry points for similar projects. But for some application developers, the applications they develop lack good tutorials, or for those who want to break the mold (which is common in research), encountering TensorFlow for the first time can be frustrating.
I attempt to fill this gap with this article. I do not focus on a specific task but instead propose a more general approach and analyze the underlying abstract concepts behind TensorFlow. Once these concepts are mastered, deep learning with TensorFlow will become intuitive.
Target Audience
This tutorial is aimed at those who have some experience in programming and machine learning and want to learn TensorFlow. For example: a computer science student who wants to use TensorFlow for the final project of a machine learning course; a software engineer newly assigned to a deep learning project; or a confused newcomer to Google AI Resident (shoutout to my past self). If you want to further understand the basics, please refer to the following resources:
-
https://ml.berkeley.edu/blog/2016/11/06/tutorial-1/
-
http://colah.github.io/
-
https://www.udacity.com/course/intro-to-machine-learning—ud120
-
https://www.coursera.org/learn/machine-learning
Let’s get started!
Understanding TensorFlow
TensorFlow Is Not a Standard Python Library
Most Python libraries are written as natural extensions of Python. When you import a library, you get a set of variables, functions, and classes that extend and complement your code “toolbox.” You can expect the results returned when you use them. In my view, when it comes to TensorFlow, this understanding should be completely discarded. Thinking about what TensorFlow is and how it interacts with other code is fundamentally mistaken.
The relationship between Python and TensorFlow can be likened to that between JavaScript and HTML. JavaScript is a fully functional programming language that can do all sorts of wonderful things. HTML is a framework for representing a certain type of practical computational abstraction (in this case, content that can be rendered by a web browser). The role of JavaScript in interactive web pages is to assemble the HTML objects seen by the browser and then interact with them by updating them to new HTML when needed.
Similar to HTML, TensorFlow is a framework for representing a certain type of computational abstraction (called a “computation graph”). However, when we operate TensorFlow with Python, the first thing we do with Python code is to construct the computation graph. Once that is done, the second thing we do is interact with it (starting a “session” in TensorFlow). But it is important to remember that the computation graph does not reside within variables; it exists in the global namespace. As Shakespeare said: “All RAM is a stage, and all variables are merely pointers.”
The First Key Abstraction: Computation Graph
When you browse the TensorFlow documentation, you may come across indirect references to “graphs” and “nodes.” If you read carefully, you may even have found this page (https://www.tensorflow.org/programmers_guide/graphs), which covers content that I will explain in a more precise and technical manner. This section is a high-level guide that captures important intuitive concepts while ignoring some technical details.
So: what is a computation graph? It is essentially a global data structure: a directed graph that captures instructions about how to compute.
Let’s take a look at an example of building a computation graph. In the image below, the upper part shows the code we run and its output, while the lower part shows the generated computation graph.
import tensorflow as tf
Computation Graph:
As you can see, merely importing TensorFlow does not generate an interesting computation graph. It is just a single, blank global variable. But what happens when we call a TensorFlow operation?
Code:
import tensorflow as tf
two_node = tf.constant(2)
print two_node
Output:
Tensor("Const:0", shape=(), dtype=int32)
Computation Graph:
Look! We have obtained a node. It contains the constant 2. Surprised? This comes from a function called tf.constant. When we print this variable, we see it returns a tf.Tensor object, which is a pointer to the node we just created. To emphasize this, here is another example:
Code:
import tensorflow as tf
two_node = tf.constant(2)
another_two_node = tf.constant(2)
two_node = tf.constant(2)
tf.constant(3)
Computation Graph:
Every time we call tf.constant, we create a new node in the graph. Even if that node performs the same function as existing nodes, even if we reassign the node to the same variable, or even if we do not assign it to a variable at all, the result is the same.
Code:
import tensorflow as tf
two_node = tf.constant(2)
another_pointer_at_two_node = two_node
two_node = None
print two_node
print another_pointer_at_two_node
Output:
None
Tensor("Const:0", shape=(), dtype=int32)
Computation Graph:
Alright, let’s take it a step further:
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node ## equivalent to tf.add(two_node, three_node)
Computation Graph:
Now we are talking—this is the computation graph we really want! Note that the + operation is overloaded in TensorFlow, so adding two tensors simultaneously adds a node to the graph, even though it may not appear to be a TensorFlow operation.
So, does two_node point to a node containing 2, three_node point to a node containing 3, and sum_node point to the node containing …+? What’s going on? Shouldn’t it contain 5?
It turns out, it does not. The computation graph only contains the steps for computation; it does not contain the results. At least… not yet!
The Second Key Abstraction: Session
If misunderstanding TensorFlow’s abstract concepts has a “March Madness” (NCAA basketball tournament mostly held in March), then sessions would be the number one seed every year. Sessions have that confusing honor because of their counterintuitive naming yet their ubiquitous presence—almost every TensorFlow presentation explicitly calls tf.Session() at least once.
The role of a session is to handle memory allocation and optimization, allowing us to actually execute the computations specified by the computation graph. You can think of the computation graph as a “template” for the computations we want to perform: it lists all the steps. To use the computation graph, we need to start a session, which enables us to actually complete the task; for example, traversing all the nodes in the template to allocate a bunch of memory for storing the computation outputs. To perform various computations with TensorFlow, you need both the computation graph and the session.
A session contains a pointer to the global graph, which is continuously updated by pointers to all the nodes. This means it does not matter whether the session is created before or after the nodes.
Once a session object is created, you can use sess.run(node) to return the value of the node, and TensorFlow will perform all the computations necessary to determine that value.
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run(sum_node)
Output:
5
Computation Graph:
Great! We can also pass a list, sess.run([node1, node2, …]), and have it return multiple outputs:
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run([two_node, sum_node])
Output:
[2, 5]
Computation Graph:
In general, calls to sess.run() tend to be one of the biggest bottlenecks in TensorFlow, so the fewer times you call it, the better. If possible, return multiple items in a single call to sess.run() rather than making multiple calls.
Placeholders and feed_dict
So far, the computations we have done have been quite boring: there has been no opportunity to gain input, so they always output the same thing. A more valuable application might involve building a computation graph that accepts input, processes it in a certain (consistent) way, and returns an output.
The most direct way to do this is to use placeholders. Placeholders are nodes used to accept external input.
Code:
import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
sess = tf.Session()
print sess.run(input_placeholder)
Output:
Traceback (most recent call last):
...
InvalidArgumentError (see above *for* traceback): You must feed a value *for* placeholder tensor 'Placeholder' *with* dtype int32
[[Node: Placeholder = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Computation Graph:
… This is a bad example because it raises an exception. The placeholder expects to be given a value. But we did not provide a value, so TensorFlow crashed.
To provide a value, we use the feed_dict property of sess.run().
Code:
import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
sess = tf.Session()
print sess.run(input_placeholder, feed_dict={input_placeholder: 2})
Output:
2
Computation Graph:
This is much better. Note the dict format passed to feed_dict, where the key should correspond to the variable that is the placeholder node in the graph (as mentioned earlier, it essentially means a pointer to the placeholder node in the graph). The corresponding value is the data element to be assigned to each placeholder—typically a scalar or a Numpy array.
The Third Key Abstraction: Computation Path
Let’s look at another example using placeholders:
Code:
import tensorflow as tf
input_placeholder = tf.placeholder(tf.int32)
three_node = tf.constant(3)
sum_node = input_placeholder + three_node
sess = tf.Session()
print sess.run(three_node)
print sess.run(sum_node)
Output:
3
Traceback (most recent call last):
...
InvalidArgumentError (see above for traceback): You must feed a value *for* placeholder tensor 'Placeholder_2' with dtype int32
[[Node: Placeholder_2 = Placeholder[dtype=DT_INT32, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Computation Graph:
Why does the second call to sess.run() fail? Even though we did not evaluate input_placeholder, why does it still raise an error related to input_placeholder? The answer lies in the final key TensorFlow abstraction: the computation path. Fortunately, this abstraction is very intuitive.
When we call sess.run() on nodes that depend on other nodes in the graph, we also need to compute the values of those nodes. If those nodes have dependencies, we need to compute those values (and so on…) until we reach the “top” of the computation graph, i.e., nodes that have no parent nodes.
Computation path for sum_node:
All three nodes need to be evaluated to compute the value of sum_node. Most importantly, this includes the placeholder we did not fill, explaining the exception!
Now let’s look at the computation path for three_node:
According to the graph structure, we do not need to compute all nodes to evaluate the node we want! Because we do not need to evaluate the placeholder_node when evaluating three_node, running sess.run(three_node) does not raise an exception.
The fact that TensorFlow automatically computes only the necessary nodes is a huge advantage of the framework. If the computation graph is very large and has many unnecessary nodes, it can save a significant amount of runtime on calls. It allows us to build large “multi-purpose” computation graphs that use a single shared set of core nodes and do different things based on the different computation paths taken. Considering the calls to sess.run() based on the computation paths taken is crucial for nearly all applications.
Variables & Side Effects
So far, we have seen two types of “no-ancestor” nodes (no-ancestor node): tf.constant, which is the same every run, and tf.placeholder, which is different each run. We often need to consider a third case: a node that typically maintains a value during runtime but can be updated to a new value.
This is where variables come in.
Variables are crucial for deep learning with TensorFlow because the parameters of the model are variables. During training, you want to update the parameters at each step via gradient descent; but during evaluation, you want to keep the parameters unchanged and input a variety of different test sets into the model. Typically, all trainable parameters of a model are variables.
To create a variable, you need to use tf.get_variable(). The first two parameters of tf.get_variable() are required, while the rest are optional. They are tf.get_variable(name, shape). name is a string that uniquely identifies this variable object. It must be unique relative to the global graph, so be clear about all the names you have used to ensure there are no duplicates. shape is an integer array corresponding to the tensor shape, and its syntax is very intuitive: each dimension has only one integer in order. For example, a 3×8 matrix shape is [3, 8]. To create a scalar, you need to use an empty list as the shape.
Code:
import tensorflow as tf
count_variable = tf.get_variable("count", [])
sess = tf.Session()
print sess.run(count_variable)
Output:
Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value count
[[Node: _retval_count_0_0 = _Retval[T=DT_FLOAT, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](count)]]
Computation Graph:
Oops, another exception. When the variable node is first created, its value is essentially “null,” and any operation trying to evaluate it will raise this exception. We can only evaluate it after putting a value into the variable. There are mainly two ways to put a value into a variable: initializers and tf.assign(). Let’s first look at tf.assign().
Code:
import tensorflow as tf
count_variable = tf.get_variable("count", [])
zero_node = tf.constant(0.)
assign_node = tf.assign(count_variable, zero_node)
sess = tf.Session()
sess.run(assign_node)
print sess.run(count_variable)
Output:
0
Computation Graph:
Compared to the nodes we have seen so far, tf.assign(target, value) has some unique properties:
-
Identity Operation. tf.assign(target, value) does not perform any interesting computation, and is usually equal to value.
-
Side Effects. When the computation “flows through” the assign_node, side effects occur on other nodes in the graph. At this point, the side effect is replacing the value of count_variable with the value stored in zero_node.
-
Non-Dependent Edges. Even though count_variable and assign_node are connected in the graph, they are independent of each other. This means that when computing either node, the computation does not flow back through the edge. However, assign_node depends on zero_node, which needs to know what value is being assigned.
The “side effect” nodes underpin most TensorFlow deep learning workflows, so make sure you truly understand what happens at that node. When we call sess.run(assign_node), the computation path flows through assign_node and zero_node.
Computation Graph:
When the computation flows through any node in the graph, it also performs any side effects controlled by that node, as shown in green in the graph. Due to the special side effect of tf.assign, the memory associated with count_variable (previously “null”) is now permanently set to 0. This means that when we call sess.run(count_variable) next time, it will not raise any exceptions. Instead, we will get a value of 0. Success!
Next, let’s look at initializers:
Code:
import tensorflow as tf
const_init_node = tf.constant_initializer(0.)
count_variable = tf.get_variable("count", [], initializer=const_init_node)
sess = tf.Session()
print sess.run([count_variable])
Output:
Traceback (most recent call last):
...
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value count
[[Node: _retval_count_0_0 = _Retval[T=DT_FLOAT, index=0, _device="/job:localhost/replica:0/task:0/device:CPU:0"](count)]]
Computation Graph:
So, what happened here? Why is the initializer not working?
The problem lies in the separation between the session and the graph. We have set the initializer property of get_variable to point to const_init_node, but it merely adds a new connection between nodes in the graph. We have not done anything to address the root cause of the exception: the memory associated with the variable node (stored in the session, not the computation graph) is still set to “null.” We need to make const_init_node update the variable through the session.
Code:
import tensorflow as tf
const_init_node = tf.constant_initializer(0.)
count_variable = tf.get_variable("count", [], initializer=const_init_node)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
print sess.run(count_variable)
Output:
0
Computation Graph:
To do this, we add another special node: init = tf.global_variables_initializer(). Like tf.assign(), this is a node with side effects. Unlike tf.assign(), we do not need to specify what its inputs are! tf.global_variables_initializer() will look at the global graph upon its creation and automatically add dependencies to each tf.initializer in the graph. When we later evaluate sess.run(init), it tells every initializer to execute the variable initialization, allowing us to run sess.run(count_variable) without error.
Variable Sharing
You may encounter TensorFlow code with variable sharing that involves creating scopes and setting “reuse = True.” I strongly advise against using variable sharing in your own code. If you want to use a single variable in multiple places, simply programmatically keep a pointer to that variable node and reuse it when needed. In other words, for every variable you want to keep in memory, you only need to call tf.get_variable() once.
Optimizers
Finally: let’s do some real deep learning! If you’ve been keeping up with me, the remaining concepts should be very simple for you.
In deep learning, a typical “inner loop” training process goes as follows:
1. Get input and true_output
2. Compute “predicted” value based on input and parameters
3. Compute “loss” based on the difference between predicted and true_output
4. Update parameters based on the gradients of the loss
Let’s put everything together in a quick script to solve a simple linear regression problem:
Code:
import tensorflow as tf
### build the graph## first set up the parameters
m = tf.get_variable("m", [], initializer=tf.constant_initializer(0.))
b = tf.get_variable("b", [], initializer=tf.constant_initializer(0.))
init = tf.global_variables_initializer()
## then set up the computations
input_placeholder = tf.placeholder(tf.float32)
output_placeholder = tf.placeholder(tf.float32)
x = input_placeholder
y = output_placeholder
y_guess = m * x + b
loss = tf.square(y - y_guess)
## finally, set up the optimizer and minimization node
optimizer = tf.train.GradientDescentOptimizer(1e-3)
train_op = optimizer.minimize(loss)
### start the session
sess = tf.Session()
sess.run(init)
### perform the training loop*import* random
## set up problem
true_m = random.random()
true_b = random.random()
*for* update_i *in* range(100000):
## (1) get the input and output
input_data = random.random()
output_data = true_m * input_data + true_b
## (2), (3), and (4) all take place within a single call to sess.run()!
_loss, _ = sess.run([loss, train_op], feed_dict={input_placeholder: input_data, output_placeholder: output_data})
*print* update_i, _loss
### finally, print out the values we learned for our two variables*print* "True parameters: m=%.4f, b=%.4f" % (true_m, true_b)*print* "Learned parameters: m=%.4f, b=%.4f" % tuple(sess.run([m, b]))
Output:
0 2.32053831 0.57927422 1.552543 1.57332594 0.64356485 2.40612656 1.07462567 2.19987158 1.67751169 1.646242310 2.441034
...99990 2.9878322e-1299991 5.158629e-1199992 4.53646e-1199993 9.422685e-1299994 3.991829e-1199995 1.134115e-1199996 4.9467985e-1199997 1.3219648e-1199998 5.684342e-1499999 3.007017e-11*True* parameters: m=0.3519, b=0.3242
Learned parameters: m=0.3519, b=0.3242
As you can see, the loss essentially becomes zero, and we have made a good estimate of the true parameters. I hope you only feel unfamiliar with the following part of the code:
## finally, set up the optimizer and minimization node
optimizer = tf.train.GradientDescentOptimizer(1e-3)
train_op = optimizer.minimize(loss)
However, now that you have a good understanding of TensorFlow’s basic concepts, this code is easy to explain! The first line, optimizer = tf.train.GradientDescentOptimizer(1e-3), does not add nodes to the computation graph. It merely creates a Python object containing useful helper functions. The second line, train_op = optimizer.minimize(loss), adds a node to the graph and stores a pointer in the variable train_op. The train_op node has no output, but has a very complex side effect:
train_op traces back the computation path of the inputs and loss, looking for variable nodes. For each variable node it finds, it computes the gradient of that variable with respect to the loss. It then computes the new value for that variable: the current value minus the product of the gradient and the learning rate. Finally, it performs an assignment operation to update the value of the variable.
So basically, when we call sess.run(train_op), it takes a gradient descent step for all our variables. Of course, we also need to fill in the input and output placeholders using feed_dict, and we want to print the loss value for debugging purposes.
Debugging with tf.Print
When you start doing more complex things with TensorFlow, you will need to debug. Generally speaking, checking what happens in the computation graph is quite difficult. Because you never have access to the values you want to print—they are locked inside the call to sess.run()—so you cannot use regular Python print statements. Specifically, suppose you want to check an intermediate value of a computation. The intermediate value does not exist before you call sess.run(). But by the time you call sess.run(), the intermediate value is gone!
Let’s look at a simple example.
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
print sess.run(sum_node)
Output:
5
This lets us see that the answer is 5. But if we want to check the intermediate values, two_node and three_node, what do we do? One way to check intermediate values is to add a return parameter to sess.run() that points to each intermediate node you want to check, and then print its value after returning.
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
sess = tf.Session()
answer, inspection = sess.run([sum_node, [two_node, three_node]])
print inspection
print answer
Output:
[2, 3]5
This is generally useful, but as the code becomes more complex, it can get a bit tricky. A more convenient method is to use tf.Print statements. Confusingly, tf.Print is actually a TensorFlow node with outputs and side effects! It has two required parameters: the node to copy and a list of contents to print. The “node to copy” can be any node in the graph; tf.Print is an identity operation related to the “node to copy,” meaning it outputs a copy of the input. However, its side effect is to print all the current values in the “print list.”
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node
print_sum_node = tf.Print(sum_node, [two_node, three_node])
sess = tf.Session()
print sess.run(print_sum_node)
Output:
[2][3]5
Computation Graph:
An important and somewhat subtle point about tf.Print: printing is a side effect. Like all other side effects, it only occurs when the computation flows through the tf.Print node. If the tf.Print node is not on the computation path, nothing will be printed. Specifically, even if the original node that tf.Print is copying is on the computation path, the tf.Print node itself may not be. Be aware of this issue! When this happens (which will always happen), it can be very frustrating if you do not explicitly find the problem. In general, it is best to create your tf.Print node immediately after creating the node you want to copy.
Code:
import tensorflow as tf
two_node = tf.constant(2)
three_node = tf.constant(3)
sum_node = two_node + three_node### this new copy of two_node is not on the computation path, so nothing prints!
print_two_node = tf.Print(two_node, [two_node, three_node, sum_node])
sess = tf.Session()
print sess.run(sum_node)
Output:
5
Computation Graph:
Here is a good resource (https://wookayin.github.io/tensorflow-talk-debugging/#1) that provides some other useful debugging tips.
Conclusion
I hope this blog post helps you better understand what TensorFlow is, how it works, and how to use it. In summary, the concepts introduced in this article are important for all TensorFlow projects, but they only scratch the surface. As you explore TensorFlow, you may encounter a variety of other interesting concepts you need: conditions, iterations, distributed TensorFlow, variable scopes, saving and loading models, multiple graphs, multiple sessions, and multi-core, data loader queues, and more. I will discuss these topics in future blog posts. But if you use the official documentation, some code examples, and a bit of deep learning magic to reinforce the ideas you learned in this article, I believe you will be able to figure out TensorFlow!
Recommended for You
2018 Analysis of Popularity of Major Deep Learning Frameworks
Frontier Progress Report on Natural Language Processing (NLP) (with PPT)
Simple and Direct TensorFlow Eager Tutorial
[Resource] Andrew Ng’s New Book Manuscript Completed, Now Available for Free Download
Tutorial | PyTorch Experience Guide: Tips and Pitfalls