PyTorch_Tutorial_2023

Define-by-run Paradigm¶

The torch autograd package provides automatic differentiation for all operations on Tensors.

PyTorch's autograd is a reverse mode automatic differentiation system.

Backprop is defined by how your code is run, and that every single iteration can be different. For more details, visit this link.

Other frameworks that adopt a similar approach :

Chainer - https://github.com/chainer/chainer
DyNet - https://github.com/clab/dynet
Tensorflow Eager - https://research.googleblog.com/2017/10/eager-execution-imperative-define-by.html

How autograd encodes execution history¶

Conceptually, autograd maintains a graph that records all of the operations performed on variables as you execute your operations. This results in a directed acyclic graph whose leaves are the input variables and roots are the output variables. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

Internally, autograd represents this graph as a graph of Function objects (really expressions), which can be apply() ed to compute the result of evaluating the graph. When computing the forward pass, autograd simultaneously performs the requested computations and builds up a graph representing the function that computes the gradient (the .grad_fn attribute of each Variable is an entry point into this graph). When the forward pass is completed, we evaluate this graph in the backwards pass to compute the gradients.

An important thing to note is that the graph is recreated from scratch at every iteration, and this is exactly what allows for using arbitrary Python control flow statements, that can change the overall shape and size of the graph at every iteration. You don’t have to encode all possible paths before you launch the training - what you run is what you differentiate.