397 lines
11 KiB
Plaintext
397 lines
11 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"%matplotlib inline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"\n",
|
||
"Autograd: Automatic Differentiation\n",
|
||
"===================================\n",
|
||
"\n",
|
||
"Central to all neural networks in PyTorch is the ``autograd`` package.\n",
|
||
"Let’s first briefly visit this, and we will then go to training our\n",
|
||
"first neural network.\n",
|
||
"\n",
|
||
"\n",
|
||
"The ``autograd`` package provides automatic differentiation for all operations\n",
|
||
"on Tensors. It is a define-by-run framework, which means that your backprop is\n",
|
||
"defined by how your code is run, and that every single iteration can be\n",
|
||
"different.\n",
|
||
"\n",
|
||
"Let us see this in more simple terms with some examples.\n",
|
||
"\n",
|
||
"Tensor\n",
|
||
"--------\n",
|
||
"\n",
|
||
"``torch.Tensor`` is the central class of the package. If you set its attribute\n",
|
||
"``.requires_grad`` as ``True``, it starts to track all operations on it. When\n",
|
||
"you finish your computation you can call ``.backward()`` and have all the\n",
|
||
"gradients computed automatically. The gradient for this tensor will be\n",
|
||
"accumulated into ``.grad`` attribute.\n",
|
||
"\n",
|
||
"To stop a tensor from tracking history, you can call ``.detach()`` to detach\n",
|
||
"it from the computation history, and to prevent future computation from being\n",
|
||
"tracked.\n",
|
||
"\n",
|
||
"To prevent tracking history (and using memory), you can also wrap the code block\n",
|
||
"in ``with torch.no_grad():``. This can be particularly helpful when evaluating a\n",
|
||
"model because the model may have trainable parameters with\n",
|
||
"``requires_grad=True``, but for which we don't need the gradients.\n",
|
||
"\n",
|
||
"There’s one more class which is very important for autograd\n",
|
||
"implementation - a ``Function``.\n",
|
||
"\n",
|
||
"``Tensor`` and ``Function`` are interconnected and build up an acyclic\n",
|
||
"graph, that encodes a complete history of computation. Each tensor has\n",
|
||
"a ``.grad_fn`` attribute that references a ``Function`` that has created\n",
|
||
"the ``Tensor`` (except for Tensors created by the user - their\n",
|
||
"``grad_fn is None``).\n",
|
||
"\n",
|
||
"If you want to compute the derivatives, you can call ``.backward()`` on\n",
|
||
"a ``Tensor``. If ``Tensor`` is a scalar (i.e. it holds a one element\n",
|
||
"data), you don’t need to specify any arguments to ``backward()``,\n",
|
||
"however if it has more elements, you need to specify a ``gradient``\n",
|
||
"argument that is a tensor of matching shape.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import torch"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Create a tensor and set ``requires_grad=True`` to track computation with it\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"x = torch.ones(2, 2, requires_grad=True)\n",
|
||
"print(x)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Do a tensor operation:\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y = x + 2\n",
|
||
"print(y)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"``y`` was created as a result of an operation, so it has a ``grad_fn``.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(y.grad_fn)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Do more operations on ``y``\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"z = y * y * 3\n",
|
||
"out = z.mean()\n",
|
||
"\n",
|
||
"print(z, out)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``\n",
|
||
"flag in-place. The input flag defaults to ``False`` if not given.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"a = torch.randn(2, 2)\n",
|
||
"a = ((a * 3) / (a - 1))\n",
|
||
"print(a.requires_grad)\n",
|
||
"a.requires_grad_(True)\n",
|
||
"print(a.requires_grad)\n",
|
||
"b = (a * a).sum()\n",
|
||
"print(b.grad_fn)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Gradients\n",
|
||
"---------\n",
|
||
"Let's backprop now.\n",
|
||
"Because ``out`` contains a single scalar, ``out.backward()`` is\n",
|
||
"equivalent to ``out.backward(torch.tensor(1.))``.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"out.backward()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Print gradients d(out)/dx\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(x.grad)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You should have got a matrix of ``4.5``. Let’s call the ``out``\n",
|
||
"*Tensor* “$o$”.\n",
|
||
"We have that $o = \\frac{1}{4}\\sum_i z_i$,\n",
|
||
"$z_i = 3(x_i+2)^2$ and $z_i\\bigr\\rvert_{x_i=1} = 27$.\n",
|
||
"Therefore,\n",
|
||
"$\\frac{\\partial o}{\\partial x_i} = \\frac{3}{2}(x_i+2)$, hence\n",
|
||
"$\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} = \\frac{9}{2} = 4.5$.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Mathematically, if you have a vector valued function $\\vec{y}=f(\\vec{x})$,\n",
|
||
"then the gradient of $\\vec{y}$ with respect to $\\vec{x}$\n",
|
||
"is a Jacobian matrix:\n",
|
||
"\n",
|
||
"\\begin{align}J=\\left(\\begin{array}{ccc}\n",
|
||
" \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n",
|
||
" \\vdots & \\ddots & \\vdots\\\\\n",
|
||
" \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
|
||
" \\end{array}\\right)\\end{align}\n",
|
||
"\n",
|
||
"Generally speaking, ``torch.autograd`` is an engine for computing\n",
|
||
"vector-Jacobian product. That is, given any vector\n",
|
||
"$v=\\left(\\begin{array}{cccc} v_{1} & v_{2} & \\cdots & v_{m}\\end{array}\\right)^{T}$,\n",
|
||
"compute the product $v^{T}\\cdot J$. If $v$ happens to be\n",
|
||
"the gradient of a scalar function $l=g\\left(\\vec{y}\\right)$,\n",
|
||
"that is,\n",
|
||
"$v=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial y_{1}} & \\cdots & \\frac{\\partial l}{\\partial y_{m}}\\end{array}\\right)^{T}$,\n",
|
||
"then by the chain rule, the vector-Jacobian product would be the\n",
|
||
"gradient of $l$ with respect to $\\vec{x}$:\n",
|
||
"\n",
|
||
"\\begin{align}J^{T}\\cdot v=\\left(\\begin{array}{ccc}\n",
|
||
" \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{1}}\\\\\n",
|
||
" \\vdots & \\ddots & \\vdots\\\\\n",
|
||
" \\frac{\\partial y_{1}}{\\partial x_{n}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
|
||
" \\end{array}\\right)\\left(\\begin{array}{c}\n",
|
||
" \\frac{\\partial l}{\\partial y_{1}}\\\\\n",
|
||
" \\vdots\\\\\n",
|
||
" \\frac{\\partial l}{\\partial y_{m}}\n",
|
||
" \\end{array}\\right)=\\left(\\begin{array}{c}\n",
|
||
" \\frac{\\partial l}{\\partial x_{1}}\\\\\n",
|
||
" \\vdots\\\\\n",
|
||
" \\frac{\\partial l}{\\partial x_{n}}\n",
|
||
" \\end{array}\\right)\\end{align}\n",
|
||
"\n",
|
||
"(Note that $v^{T}\\cdot J$ gives a row vector which can be\n",
|
||
"treated as a column vector by taking $J^{T}\\cdot v$.)\n",
|
||
"\n",
|
||
"This characteristic of vector-Jacobian product makes it very\n",
|
||
"convenient to feed external gradients into a model that has\n",
|
||
"non-scalar output.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's take a look at an example of vector-Jacobian product:\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"x = torch.randn(3, requires_grad=True)\n",
|
||
"\n",
|
||
"y = x * 2\n",
|
||
"while y.data.norm() < 1000:\n",
|
||
" y = y * 2\n",
|
||
"\n",
|
||
"print(y)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now in this case ``y`` is no longer a scalar. ``torch.autograd``\n",
|
||
"could not compute the full Jacobian directly, but if we just\n",
|
||
"want the vector-Jacobian product, simply pass the vector to\n",
|
||
"``backward`` as argument:\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)\n",
|
||
"y.backward(v)\n",
|
||
"\n",
|
||
"print(x.grad)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can also stop autograd from tracking history on Tensors\n",
|
||
"with ``.requires_grad=True`` either by wrapping the code block in\n",
|
||
"``with torch.no_grad():``\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(x.requires_grad)\n",
|
||
"print((x ** 2).requires_grad)\n",
|
||
"\n",
|
||
"with torch.no_grad():\n",
|
||
"\tprint((x ** 2).requires_grad)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Or by using ``.detach()`` to get a new Tensor with the same\n",
|
||
"content but that does not require gradients:\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(x.requires_grad)\n",
|
||
"y = x.detach()\n",
|
||
"print(y.requires_grad)\n",
|
||
"print(x.eq(y).all())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Read Later:**\n",
|
||
"\n",
|
||
"Document about ``autograd.Function`` is at\n",
|
||
"https://pytorch.org/docs/stable/autograd.html#function\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.6.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 1
|
||
}
|