Skip to content

Is tf.GradientTape in TF 2.0 equivalent to tf.gradients?

I am migrating my training loop to Tensorflow 2.0 API. In eager execution mode, tf.GradientTape replaces tf.gradients. The question is, do they have the same functionality? Specifically:

  • In function gradient():

    • Is the parameter output_gradients equivalent to grad_ys in the old API?
    • What about parameters colocate_gradients_with_ops. aggregation_method, gate_gradients of tf.gradients? Are they deprecated due to lack of use? Can they be replaced by using other methods in 2.0 API? Are they needed in Eager Execution at all?
  • Is function jacobian() equivalent to tf.python.ops.parallel_for.gradients?


Please find the response below.

  1. Regarding Output Gradients and grad_ys: Yes, they can be considered same.

Detailed Explanation: Info about Output Gradients is mentioned in Github -> as shown below.

output_gradients: if not None, a list of gradient provided for each Target, or None if we are to use the target’s computed downstream gradient,

Info about grad_ys is mentioned in TF Site as shown below:

grad_ys: is a list of tensors of the same length as ys that holds the initial gradients for each y in ys. When grad_ys is None, we fill in a tensor of ‘1’s of the shape of y for each y in ys. A user can provide their own initial grad_ys to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).

From the above explanations, and from the below code, mentioned in page 394 of the book, Hands on ML using Scikit-Learn & Tensorflow, we can conclude that initial value of Theta can be a Random Value and we can pass that using the parameters, output_gradients or grad_ys.

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)
  1. Regarding colocate_gradients_with_ops: Yes, it is not needed for Eager Execution as it is related to Control Flow Context of Graphs.

Detailed Explanation: colocate_gradients_with_ops points to the below code mentioned in Github -> Control flow Context is related to the concept of Context, which is related to Graphs, as explained in TF Site -> Graphs

 def _colocate_with_for_gradient(self, op, gradient_uid,
    with self.colocate_with(op, ignore_existing):
      if gradient_uid is not None and self._control_flow_context is not None:
        self._control_flow_context.EnterGradientColocation(op, gradient_uid)
          self._control_flow_context.ExitGradientColocation(op, gradient_uid)
  1. Regarding aggregation_method: The equivalent of this parameter has been implemented in 2.0, named _aggregate_grads as shown in Github link

  2. Regarding gate_gradients: Not needed for Eager as this also is related to Graph Context.

Detailed Explanation: As shown in the below code from Github ->, if gate_gradients is True, then some operations are added to graph using the function, _colocate_with_for_gradient, which in turn depends on Control Flow Context of Graphs.

if gate_gradients and len([x for x in in_grads
                                         if x is not None]) > 1:
                with ops.device(None):
                  with ops._colocate_with_for_gradient(  # pylint: disable=protected-access
                    in_grads = control_flow_ops.tuple(in_grads)
  1. Regarding jacobian: Yes they are same.