TL;DR TensorFlow has gone through several major evolutions of the user programming interface since its release in 2015, and this article will provide a brief introduction to these evolutions and discuss the motivations behind them, so as to facilitate the reader’s understanding of TensorFlow’s design thinking at different stages.

Notes:

  1. By user programming interface, we mean the programming interface that TensorFlow provides to users, not the internal programming interface of TensorFlow.
  2. We are discussing TensorFlow as a generalized concept, including TensorFlow and JAX, which are machine learning frameworks from Google based on the same underlying layers.

TensorFlow 0.x era (2015 - 2017)

Source code: https://github.com/tensorflow/tensorflow/releases/tag/0.12.1

The user programming interface for the TensorFlow 0.x era, later known as TensorFlow Core, is based on Python and consists mainly of:

  • tf.placeholder: used to define placeholders, which are used to represent the dimension and type of input data, but do not contain specific data.
  • tf.Variable: used to define variables, which are used to represent model parameters and contain concrete data.
  • tf.Session: used to execute the computation graph, map nodes in the computation graph to specific devices, and perform computations.

Code example

A classic linear regression model can be represented in TensorFlow 0.x code as follows:

import tensorflow as tf
sess = tf.InteractiveSession()

# Define the model inputs
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

# Define the model parameters
W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
sess.run(tf.global_variables_initializer()) # initialize model parameters

# Calculate the predicted values
y = tf.matmul(x,W) + b # network design, here is a linear model, y = Wx + b, W and b are model parameters, x is model input, y is model output

# Define the loss function
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_)) # cross entropy loss function

# Define the optimizer
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Train the model
for i in range(1000).
  batch = mnist.train.next_batch(100)
  train_step.run(feed_dict={x: batch[0], y_: batch[1]})

TensorFlow 1.x Era (2017 - 2019)

In terms of working principles, the TensorFlow 1.x era is designed along the same lines as the TensorFlow 0.x era. However, in terms of the user programming interface, there are several significant changes in the design thinking of the TensorFlow 1.x era from that of the TensorFlow 0.x era:

  • In the TensorFlow 1.x era, in addition to continuing to use the original approach to graph construction (the tf.placeholder approach), TensorFlow strongly recommends the use of estimator, a programming paradigm. The estimator has been around since version 0.x, but it was only in version 1.x that it really took off.
  • During the TensorFlow 1.x era, support for the Keras interface was introduced, which was originally a standalone machine learning framework that supported a variety of backends, including TensorFlow, Theano, and CNTK. The interface was designed to be easy to use, and with a wealth of documentation and examples, Keras grew to become one of the most popular machine learning frameworks of its time. Keras was later acquired by Google and became part of TensorFlow. Keras evolved in the direction of TensorFlow, dropping support for other backends, and eventually becoming part of TensorFlow in the TensorFlow 1.x era.
  • Eager Execution was a major change in the TensorFlow 1.x era. It was primarily designed to solve the problem of separating computational graph construction from computational graph execution in TensorFlow 0.x and 1.x. The main competitor of TensorFlow at that time was TensorFlow. PyTorch, TensorFlow’s main competitor at the time, had the ability to build and execute graphs at the same time, which gave PyTorch a huge advantage in development efficiency, and Eager Execution was designed to solve this problem. The Eager Execution pattern does not change the underlying implementation of TensorFlow, but rather makes TensorFlow’s user programming interface look like PyTorch’s user programming interface through syntactic sugar. Since Eager Execution is not based on a true dynamic computation graph, in practice, in some cases, Eager Execution can exhibit behavior that is difficult to understand.

Estimator code example

import tensorflow as tf
import tensorflow.feature_column as fc

def my_model_fn(
  features, # This is batch_features from input_fn
  labels, # This is batch_labels from input_fn
  mode, # An instance of tf.estimator.ModeKeys
  ModeKeys): # Additional configuration

  x = features["x"]

  logits = tf.layers.dense(inputs=x, units=10, activation=tf.nn.relu)

  # Generate predictions (for predict and eval mode)
  predictions = {
      "classes": tf.argmax(inputs=logits, axis=1),
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }

  # if in prediction mode, return result and exit
  if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # if in training mode, return result and exit
  if mode == tf.estimator.ModeKeys.TRAIN.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

  # if in evaluation mode, return result and exit
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])
  }
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Keras code example

import tensorflow as tf
from tensorflow import keras

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5)

test_loss, test_acc = model.evaluate(test_images, test_labels)

predictions = model.predict(test_images)

Eager Execution code example

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = [[2.]]
m = tf.matmul(x, x)

print(m)
# The 1x1 matrix [[4.]]

TensorFlow 2.x Era (2019 - Now)

The user programming interface in the TensorFlow 2.x era has not changed significantly from the user programming interface in the TensorFlow 1.x era. There are just a few tweaks to the default settings, the main ones being:

  • Use Eager Execution mode by default.
  • The Keras interface is used by default.

JAX era (2020 - Now)

JAX is not a replacement or successor to TensorFlow, it’s just another machine learning framework from Google with an underlying implementation that is consistent with that of the TensorFlow 2.x era. JAX attempts to use a new programming paradigm to solve the problems of TensorFlow that are less popular among researchers.

The JAX era user programming interface has two major changes from the TensorFlow 2.x era design:

  1. the main target group of the JAX programming approach is researchers, while the main target group of TensorFlow 2.x is engineers. This counteracts the threat of PyTorch, whose main target group is also researchers.
  2. JAX tries to use a more advanced but relatively rare programming paradigm and tries to create a new programming paradigm, not a continuation of TensorFlow or PyTorch.