{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "spread-happiness", "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import matplotlib.ticker as ticker\n", "from IPython.display import Markdown, display, HTML\n", "\n", "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "\n", "# Fix the dying kernel problem (only a problem in some installations - you can remove it, if it works without it)\n", "import os\n", "os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'" ] }, { "cell_type": "markdown", "id": "approximate-classic", "metadata": {}, "source": [ "# PyTorch\n", "\n", "Here's your best friend when working with PyTorch: https://pytorch.org/docs/stable/index.html.\n", "\n", "The beginning of this notebook shows that PyTorch tensors can be used exactly like numpy arrays. Later in the notebook additional features of tensors will be presented." ] }, { "cell_type": "markdown", "id": "renewable-chase", "metadata": {}, "source": [ "## Creating PyTorch tensors" ] }, { "cell_type": "markdown", "id": "afraid-consortium", "metadata": {}, "source": [ "### Directly" ] }, { "cell_type": "code", "execution_count": 2, "id": "textile-mainland", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 2. 3.]\n", " [4. 5. 6.]\n", " [7. 8. 9.]]\n", "\n", "tensor([[1., 2., 3.],\n", " [4., 5., 6.],\n", " [7., 8., 9.]])\n" ] } ], "source": [ "a = np.array(\n", " [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", ")\n", "\n", "print(a)\n", "print()\n", "\n", "t = torch.tensor(\n", " [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", ")\n", "\n", "print(t)" ] }, { "cell_type": "markdown", "id": "floating-junior", "metadata": {}, "source": [ "### From a list" ] }, { "cell_type": "code", "execution_count": 3, "id": "reasonable-mistress", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]\n", "\n", "[[1. 2. 3.]\n", " [4. 5. 6.]\n", " [7. 8. 9.]]\n", "\n", "tensor([[1., 2., 3.],\n", " [4., 5., 6.],\n", " [7., 8., 9.]])\n" ] } ], "source": [ "l = [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", "\n", "print(l)\n", "print()\n", "\n", "a = np.array(l)\n", "print(a)\n", "print()\n", "\n", "t = torch.tensor(l)\n", "print(t)" ] }, { "cell_type": "markdown", "id": "incorrect-practitioner", "metadata": {}, "source": [ "### From a list comprehension" ] }, { "cell_type": "code", "execution_count": 4, "id": "straight-cooling", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n", "\n", "[ 0 1 4 9 16 25 36 49 64 81]\n", "\n", "tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])\n" ] } ], "source": [ "a = [i**2 for i in range(10)]\n", "\n", "print(a)\n", "print()\n", "print(np.array(a))\n", "print()\n", "print(torch.tensor(a))" ] }, { "cell_type": "markdown", "id": "enormous-drink", "metadata": {}, "source": [ "### From a numpy array" ] }, { "cell_type": "code", "execution_count": 5, "id": "parental-judges", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1., 2., 3.],\n", " [4., 5., 6.],\n", " [7., 8., 9.]], dtype=torch.float64)\n" ] } ], "source": [ "a = np.array(\n", " [[1.0, 2.0, 3.0], \n", " [4.0, 5.0, 6.0], \n", " [7.0, 8.0, 9.0]]\n", ")\n", "\n", "t = torch.tensor(a)\n", "\n", "print(t)" ] }, { "cell_type": "markdown", "id": "suffering-myanmar", "metadata": {}, "source": [ "### Ready-made functions in PyTorch" ] }, { "cell_type": "code", "execution_count": 6, "id": "expensive-bowling", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All zeros\n", "tensor([[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]])\n", "\n", "All chosen value (variant 1)\n", "tensor([[7., 7., 7., 7.],\n", " [7., 7., 7., 7.],\n", " [7., 7., 7., 7.]])\n", "\n", "All chosen value (variant 2)\n", "tensor([[7., 7., 7., 7.],\n", " [7., 7., 7., 7.],\n", " [7., 7., 7., 7.]])\n", "\n", "Random integers\n", "[[6 6]\n", " [8 9]\n", " [1 0]]\n", "\n", "tensor([[9, 5],\n", " [9, 3],\n", " [3, 8]])\n", "\n", "Random values from the normal distribution\n", "[[ -5.34346728 0.97207777]\n", " [ -7.26648922 -12.2890286 ]\n", " [ -2.68082928 10.95819034]]\n", "\n", "tensor([[ 1.1231, -5.9980],\n", " [20.4600, -6.4359],\n", " [-6.6826, -0.4491]])\n" ] } ], "source": [ "# All zeros\n", "a = torch.zeros((3, 4))\n", "print(\"All zeros\")\n", "print(a)\n", "print()\n", "\n", "# All a chosen value\n", "a = torch.full((3, 4), 7.0)\n", "print(\"All chosen value (variant 1)\")\n", "print(a)\n", "print()\n", "\n", "# or\n", "\n", "a = torch.zeros((3, 4))\n", "a[:] = 7.0\n", "print(\"All chosen value (variant 2)\")\n", "print(a)\n", "print()\n", "\n", "# Random integers\n", "\n", "print(\"Random integers\")\n", "a = np.random.randint(low=0, high=10, size=(3, 2))\n", "print(a)\n", "print()\n", "a = torch.randint(low=0, high=10, size=(3, 2))\n", "print(a)\n", "print()\n", "\n", "# Random values from the normal distribution (Gaussian)\n", "\n", "print(\"Random values from the normal distribution\")\n", "a = np.random.normal(loc=0, scale=10, size=(3, 2))\n", "print(a)\n", "print()\n", "a = torch.normal(mean=0, std=10, size=(3, 2))\n", "print(a)" ] }, { "cell_type": "markdown", "id": "aggressive-titanium", "metadata": {}, "source": [ "## Slicing PyTorch tensors" ] }, { "cell_type": "markdown", "id": "former-richardson", "metadata": {}, "source": [ "### Slicing in 1D\n", "\n", "To obtain only specific values from a PyTorch tensor one can use so called slicing. It has the form\n", "\n", "**arr[low:high:step]**\n", "\n", "where low is the lowest index to be retrieved, high is the lowest index not to be retrieved and step indicates that every step element will be taken." ] }, { "cell_type": "code", "execution_count": 7, "id": "desirable-documentary", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])\n", "First 5 elements: tensor([ 0, 1, 4, 9, 16])\n", "Elements from index 3 to index 5: tensor([ 9, 16, 25])\n", "Last 3 elements (negative indexing): tensor([49, 64, 81])\n", "Every second element: tensor([ 0, 4, 16, 36, 64])\n", "Negative step a[::-1] to obtain reverse order does not work for tensors\n" ] } ], "source": [ "a = torch.tensor([i**2 for i in range(10)])\n", "\n", "print(\"Original: \", a)\n", "print(\"First 5 elements:\", a[:5])\n", "print(\"Elements from index 3 to index 5:\", a[3:6])\n", "print(\"Last 3 elements (negative indexing):\", a[-3:])\n", "print(\"Every second element:\", a[::2])\n", "\n", "print(\"Negative step a[::-1] to obtain reverse order does not work for tensors\")" ] }, { "cell_type": "markdown", "id": "micro-explosion", "metadata": {}, "source": [ "### Slicing in 2D\n", "\n", "In two dimensions it works similarly, just the slicing is separate for every dimension." ] }, { "cell_type": "code", "execution_count": 8, "id": "disciplinary-think", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: \n", "tensor([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24]])\n", "\n", "First 2 elements of the first 3 row:\n", "tensor([[ 0, 1],\n", " [ 5, 6],\n", " [10, 11]])\n", "\n", "Middle 3 elements from the middle 3 rows:\n", "tensor([[ 6, 7, 8],\n", " [11, 12, 13],\n", " [16, 17, 18]])\n", "\n", "Bottom-right 3 by 3 submatrix (negative indexing):\n", "tensor([[12, 13, 14],\n", " [17, 18, 19],\n", " [22, 23, 24]])\n" ] } ], "source": [ "a = torch.tensor([i for i in range(25)]).reshape(5, 5)\n", "\n", "print(\"Original: \")\n", "print(a)\n", "print()\n", "print(\"First 2 elements of the first 3 row:\")\n", "print(a[:3, :2])\n", "print()\n", "print(\"Middle 3 elements from the middle 3 rows:\")\n", "print(a[1:4, 1:4])\n", "print()\n", "print(\"Bottom-right 3 by 3 submatrix (negative indexing):\")\n", "print(a[-3:, -3:])" ] }, { "cell_type": "markdown", "id": "removable-canyon", "metadata": {}, "source": [ "### Setting PyTorch tensor field values" ] }, { "cell_type": "code", "execution_count": 9, "id": "senior-serbia", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original: \n", "tensor([[ 0, 1, 2, 3, 4],\n", " [ 5, 6, 7, 8, 9],\n", " [10, 11, 12, 13, 14],\n", " [15, 16, 17, 18, 19],\n", " [20, 21, 22, 23, 24]])\n", "\n", "Middle values changed to 5\n", "tensor([[ 0, 1, 2, 3, 4],\n", " [ 5, 5, 5, 5, 9],\n", " [10, 5, 5, 5, 14],\n", " [15, 5, 5, 5, 19],\n", " [20, 21, 22, 23, 24]])\n", "\n", "Second matrix\n", "tensor([[ 0, 0, 2],\n", " [ 6, 12, 20],\n", " [30, 42, 56]])\n", "\n", "Second matrix substituted into the middle of the first matrix\n", "tensor([[ 0, 1, 2, 3, 4],\n", " [ 5, 0, 0, 2, 9],\n", " [10, 6, 12, 20, 14],\n", " [15, 30, 42, 56, 19],\n", " [20, 21, 22, 23, 24]])\n" ] } ], "source": [ "a = torch.tensor([i for i in range(25)]).reshape(5, 5)\n", "\n", "print(\"Original: \")\n", "print(a)\n", "print()\n", "\n", "a[1:4, 1:4] = 5.0\n", "\n", "print(\"Middle values changed to 5\")\n", "print(a)\n", "print()\n", "\n", "b = torch.tensor([i**2 - i for i in range(9)]).reshape(3, 3)\n", "\n", "print(\"Second matrix\")\n", "print(b)\n", "print()\n", "\n", "a[1:4, 1:4] = b\n", "\n", "print(\"Second matrix substituted into the middle of the first matrix\")\n", "print(a)" ] }, { "cell_type": "markdown", "id": "federal-wayne", "metadata": {}, "source": [ "## Operations on PyTorch tensors\n", "\n", "It is important to remember that arithmetic operations on PyTorch tensors are always element-wise." ] }, { "cell_type": "code", "execution_count": 10, "id": "southwest-biotechnology", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 0, 1, 4],\n", " [ 9, 16, 25],\n", " [36, 49, 64]])\n", "\n", "tensor([[0.0000, 1.0000, 1.4142],\n", " [1.7321, 2.0000, 2.2361],\n", " [2.4495, 2.6458, 2.8284]])\n", "\n" ] } ], "source": [ "a = torch.tensor([i**2 for i in range(9)]).reshape((3, 3))\n", "print(a)\n", "print()\n", "\n", "b = torch.tensor([i**0.5 for i in range(9)]).reshape((3, 3))\n", "print(b)\n", "print()" ] }, { "cell_type": "markdown", "id": "intensive-gates", "metadata": {}, "source": [ "### Element-wise sum" ] }, { "cell_type": "code", "execution_count": 11, "id": "behavioral-safety", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 0.0000, 2.0000, 5.4142],\n", " [10.7321, 18.0000, 27.2361],\n", " [38.4495, 51.6458, 66.8284]])\n" ] } ], "source": [ "print(a + b)" ] }, { "cell_type": "markdown", "id": "occupied-trial", "metadata": {}, "source": [ "### Element-wise multiplication" ] }, { "cell_type": "code", "execution_count": 12, "id": "charming-pleasure", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 0.0000, 1.0000, 5.6569],\n", " [ 15.5885, 32.0000, 55.9017],\n", " [ 88.1816, 129.6418, 181.0193]])\n" ] } ], "source": [ "print(a * b)" ] }, { "cell_type": "markdown", "id": "efficient-league", "metadata": {}, "source": [ "### Matrix multiplication" ] }, { "cell_type": "code", "execution_count": 13, "id": "changing-community", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 11.5300, 12.5830, 13.5498],\n", " [ 88.9501, 107.1438, 119.2157],\n", " [241.6378, 303.3281, 341.4984]], dtype=torch.float64)\n", "\n", "tensor([[ 0., 1., 4.],\n", " [ 9., 16., 25.],\n", " [36., 49., 64.]])\n" ] } ], "source": [ "print(np.matmul(a, b))\n", "print()\n", "\n", "# Multiplication by the identity matrix (to check it works as expected)\n", "id_matrix = torch.tensor(\n", " [[1.0, 0.0, 0.0], \n", " [0.0, 1.0, 0.0], \n", " [0.0, 0.0, 1.0]]\n", ")\n", "\n", "# Tensor a contained integers (type Long by default) and must be changed to the float type\n", "a = a.type(torch.FloatTensor)\n", "\n", "print(torch.matmul(id_matrix, a))" ] }, { "cell_type": "markdown", "id": "assisted-communications", "metadata": {}, "source": [ "### Calculating the mean" ] }, { "cell_type": "code", "execution_count": 14, "id": "defensive-wrong", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([3, 8, 7, 2, 6])\n", "\n", "Mean: tensor(5.2000)\n", "\n", "Mean: 5.199999809265137\n" ] } ], "source": [ "a = torch.randint(low=0, high=10, size=(5,))\n", "\n", "print(a)\n", "print()\n", "\n", "print(\"Mean: \", torch.sum(a) / len(a))\n", "print()\n", "\n", "# To get a single value use tensor.item()\n", "\n", "print(\"Mean: \", (torch.sum(a) / len(a)).item())" ] }, { "cell_type": "markdown", "id": "complex-karma", "metadata": {}, "source": [ "### Calculating the mean of every row" ] }, { "cell_type": "code", "execution_count": 15, "id": "correct-dietary", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1, 6, 8],\n", " [6, 4, 8],\n", " [1, 5, 8],\n", " [2, 5, 7],\n", " [1, 0, 4]])\n", "\n", "Mean: tensor([5.0000, 6.0000, 4.6667, 4.6667, 1.6667])\n", "Mean in the original matrix form:\n", "tensor([[5.0000],\n", " [6.0000],\n", " [4.6667],\n", " [4.6667],\n", " [1.6667]])\n" ] } ], "source": [ "a = torch.randint(low=0, high=10, size=(5, 3))\n", "\n", "print(a)\n", "print()\n", "\n", "print(\"Mean:\", torch.sum(a, axis=1) / a.shape[1])\n", "\n", "print(\"Mean in the original matrix form:\")\n", "print((torch.sum(a, axis=1) / a.shape[1]).reshape(-1, 1)) # -1 calculates the right size to use all elements" ] }, { "cell_type": "markdown", "id": "indian-orlando", "metadata": {}, "source": [ "### More complex operations\n", "\n", "Note that more complex tensor operations can only be performed on tensors. Numpy operations can be performed on numpy arrays but also directly on lists." ] }, { "cell_type": "code", "execution_count": 16, "id": "presidential-cologne", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vector to power 2 (element-wise)\n", "tensor([1., 4., 9.])\n", "\n", "Euler number to the power a (element-wise)\n", "tensor([ 2.7183, 7.3891, 20.0855])\n", "\n", "An even more complex expression\n", "tensor([0.6197, 1.8982, 4.8476])\n" ] } ], "source": [ "a = torch.tensor([1.0, 2.0, 3.0])\n", "\n", "print(\"Vector to power 2 (element-wise)\")\n", "print(torch.pow(a, 2))\n", "print()\n", "print(\"Euler number to the power a (element-wise)\")\n", "print(torch.exp(a))\n", "print()\n", "print(\"An even more complex expression\")\n", "print((torch.pow(a, 2) + torch.exp(a)) / torch.sum(a))" ] }, { "cell_type": "markdown", "id": "hearing-street", "metadata": {}, "source": [ "## PyTorch basic operations tasks" ] }, { "cell_type": "markdown", "id": "regular-niger", "metadata": {}, "source": [ "**Task 1.** Calculate the sigmoid (logistic) function on every element of the following array [0.3, 1.2, -1.4, 0.2, -0.1, 0.1, 0.8, -0.25] and print the last 5 elements. Use only tensor operations." ] }, { "cell_type": "code", "execution_count": 17, "id": "agreed-single", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "another-catch", "metadata": {}, "source": [ "**Task 2.** Calculate the dot product of the following two vectors:
\n", "$x = [3, 1, 4, 2, 6, 1, 4, 8]$
\n", "$y = [5, 2, 3, 12, 2, 4, 17, 9]$
\n", "a) by using element-wise mutliplication and torch.sum,
\n", "b) by using torch.dot,
\n", "b) by using torch.matmul and transposition (x.T)." ] }, { "cell_type": "code", "execution_count": 18, "id": "forbidden-journalism", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "acute-amber", "metadata": {}, "source": [ "**Task 3.** Calculate the following expression
\n", "$$\\frac{1}{1 + e^{-x_0 \\theta_0 - \\ldots - x_9 \\theta_9 - \\theta_{10}}}$$\n", "for
\n", "$x = [1.2, 2.3, 3.4, -0.7, 4.2, 2.7, -0.5, 1.4, -3.3, 0.2]$
\n", "$\\theta = [1.7, 0.33, -2.12, -1.73, 2.9, -5.8, -0.9, 12.11, 3.43, -0.5, -1.65]$
\n", "and print the result. Use only tensor operations." ] }, { "cell_type": "code", "execution_count": 19, "id": "falling-holder", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "latter-vector", "metadata": {}, "source": [ "# Tensor gradients\n", "\n", "Tensors are designed to be used in neural networks. Their most important functionality is automatic gradient and backward propagation calculation." ] }, { "cell_type": "code", "execution_count": 20, "id": "guided-interface", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "out=35.0\n", "\n", "gradient\n", "tensor([[12., 3.],\n", " [27., 3.]])\n" ] } ], "source": [ "x = torch.tensor([[2., -1.], [3., 1.]], requires_grad=True)\n", "out = x.pow(3).sum() # the actual derivative is 3*x^2\n", "print(\"out={}\".format(out))\n", "print()\n", "\n", "out.backward()\n", "print(\"gradient\")\n", "print(x.grad)" ] }, { "cell_type": "code", "execution_count": 21, "id": "nuclear-gothic", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 4., 2., -1.]])\n", "tensor([[ 2., -1., 3.]])\n", "tensor([[ 0.1807, 0.0904, -0.0452]])\n", "tensor([[ 0.0904, -0.0452, 0.1355]])\n" ] } ], "source": [ "x = torch.tensor([[2., -1., 3.]], requires_grad=True)\n", "y = torch.tensor([[4., 2., -1.]], requires_grad=True)\n", "\n", "z = torch.sum(x * y)\n", "\n", "z.backward()\n", "print(x.grad)\n", "print(y.grad)\n", "\n", "x.grad.data.zero_()\n", "y.grad.data.zero_()\n", "\n", "z = torch.sigmoid(torch.sum(x * y))\n", "\n", "z.backward()\n", "print(x.grad)\n", "print(y.grad)" ] }, { "cell_type": "markdown", "id": "innovative-provider", "metadata": {}, "source": [ "# Backpropagation\n", "\n", "In this section we train weights $w$ of a simple model $y = \\text{sigmoid}(w * x)$ to obtain $y = 0.65$ on $x = [2.0, -1.0, 3.0]$." ] }, { "cell_type": "code", "execution_count": 22, "id": "supposed-sellers", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x\n", "tensor([ 2., -1., 3.])\n", "x.grad\n", "None\n", "w\n", "tensor([ 4., 2., -1.], requires_grad=True)\n", "w.grad\n", "None\n", "\n", "\n", "w\n", "tensor([ 3.9945, 2.0027, -1.0082], requires_grad=True)\n", "w.grad\n", "tensor([ 0.0547, -0.0273, 0.0820])\n", "y\n", "tensor(0.9526, grad_fn=)\n", "loss\n", "tensor(0.0916, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.9889, 2.0055, -1.0166], requires_grad=True)\n", "w.grad\n", "tensor([ 0.0563, -0.0281, 0.0844])\n", "y\n", "tensor(0.9508, grad_fn=)\n", "loss\n", "tensor(0.0905, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.9831, 2.0084, -1.0253], requires_grad=True)\n", "w.grad\n", "tensor([ 0.0579, -0.0290, 0.0869])\n", "y\n", "tensor(0.9489, grad_fn=)\n", "loss\n", "tensor(0.0894, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True)\n", "w.grad\n", "tensor([ 6.1291e-06, -3.0645e-06, 9.1936e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(4.5365e-11, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True)\n", "w.grad\n", "tensor([ 5.0985e-06, -2.5493e-06, 7.6478e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(3.1392e-11, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True)\n", "w.grad\n", "tensor([ 4.4477e-06, -2.2238e-06, 6.6715e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(2.3888e-11, grad_fn=)\n", "\n", "\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x = torch.tensor([2., -1., 3.], requires_grad=False)\n", "w = torch.tensor([4., 2., -1.], requires_grad=True)\n", "y_target = 0.65\n", "\n", "print(\"x\")\n", "print(x)\n", "print(\"x.grad\")\n", "print(x.grad)\n", "print(\"w\")\n", "print(w)\n", "print(\"w.grad\")\n", "print(w.grad)\n", "print()\n", "print()\n", "\n", "optimizer = optim.SGD([w], lr=0.1)\n", "\n", "losses = []\n", "n_epochs = 100\n", "for epoch in range(n_epochs):\n", "\n", " optimizer.zero_grad()\n", " y = torch.sigmoid(torch.sum(x * w))\n", " loss = torch.pow(y - y_target, 2)\n", " loss.backward()\n", " losses.append(loss.item())\n", " optimizer.step()\n", "\n", " if epoch < 3 or epoch > 96:\n", " print(\"w\")\n", " print(w)\n", " print(\"w.grad\")\n", " print(w.grad)\n", " print(\"y\")\n", " print(y)\n", " print(\"loss\")\n", " print(loss)\n", " print()\n", " print()\n", " \n", "sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')\n", "plt.xlabel(\"epoch\")\n", "plt.ylabel(\"loss\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "addressed-anxiety", "metadata": {}, "source": [ "# Proper PyTorch model with a fully-connected layer\n", "\n", "A fully-connected layer is represented by torch.nn.Linear. Its parameters are:\n", " - in_features - the number of input neurons,\n", " - out_features - the number of output neurons,\n", " - bias - boolean if bias should be included.\n", " \n", "Documentation: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html" ] }, { "cell_type": "code", "execution_count": 23, "id": "lovely-wesley", "metadata": {}, "outputs": [], "source": [ "class FullyConnectedNetworkModel(nn.Module):\n", " def __init__(self, seed):\n", " super().__init__()\n", "\n", " self.seed = torch.manual_seed(seed)\n", "\n", " self.fc = nn.Linear(3, 1, bias=False)\n", "\n", " self.fc.weight.data = torch.tensor([4., 2., -1.], requires_grad=True)\n", "\n", " def forward(self, x):\n", " x = torch.sigmoid(self.fc(x))\n", "\n", " return x" ] }, { "cell_type": "code", "execution_count": 24, "id": "hourly-apollo", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "w\n", "tensor([ 3.9945, 2.0027, -1.0082])\n", "w.grad\n", "tensor([ 0.0547, -0.0273, 0.0820])\n", "y\n", "tensor(0.9526, grad_fn=)\n", "loss\n", "tensor(0.0916, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.9889, 2.0055, -1.0166])\n", "w.grad\n", "tensor([ 0.0563, -0.0281, 0.0844])\n", "y\n", "tensor(0.9508, grad_fn=)\n", "loss\n", "tensor(0.0905, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.9831, 2.0084, -1.0253])\n", "w.grad\n", "tensor([ 0.0579, -0.0290, 0.0869])\n", "y\n", "tensor(0.9489, grad_fn=)\n", "loss\n", "tensor(0.0894, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102])\n", "w.grad\n", "tensor([ 6.1291e-06, -3.0645e-06, 9.1936e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(4.5365e-11, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102])\n", "w.grad\n", "tensor([ 5.0985e-06, -2.5493e-06, 7.6478e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(3.1392e-11, grad_fn=)\n", "\n", "\n", "w\n", "tensor([ 3.6599, 2.1701, -1.5102])\n", "w.grad\n", "tensor([ 4.4477e-06, -2.2238e-06, 6.6715e-06])\n", "y\n", "tensor(0.6500, grad_fn=)\n", "loss\n", "tensor(2.3888e-11, grad_fn=)\n", "\n", "\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "x = torch.tensor([2., -1., 3.])\n", "y_target = 0.65\n", "\n", "fc_neural_net = FullyConnectedNetworkModel(seed=6789)\n", "\n", "optimizer = optim.SGD(fc_neural_net.parameters(), lr=0.1)\n", "\n", "losses = []\n", "n_epochs = 100\n", "for epoch in range(n_epochs):\n", "\n", " optimizer.zero_grad()\n", " y = fc_neural_net(x)\n", " loss = torch.pow(y - y_target, 2)\n", " loss.backward()\n", " losses.append(loss.item())\n", " optimizer.step()\n", " \n", " if epoch < 3 or epoch > 96:\n", " print(\"w\")\n", " print(fc_neural_net.fc.weight.data)\n", " print(\"w.grad\")\n", " print(next(fc_neural_net.parameters()).grad)\n", " print(\"y\")\n", " print(y)\n", " print(\"loss\")\n", " print(loss)\n", " print()\n", " print()\n", " \n", "sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')\n", "plt.xlabel(\"epoch\")\n", "plt.ylabel(\"loss\")\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "breeding-sailing", "metadata": {}, "source": [ "# Embedding layer\n", "\n", "An embedding layer is represented by torch.nn.Embedding. Its main parameters are:\n", " - num_embeddings - the number of ids to embed,\n", " - embedding_dim - the dimension of the embedding vector.\n", " \n", "Documentation: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html\n", "\n", "In the example below we will have 3 movies and 3 users. The movies have already trained representations:\n", " - $m0 = [0.6, 0.4, -0.2]$\n", " - $m1 = [-0.7, 0.8, -0.7]$\n", " - $m2 = [0.8, -0.75, 0.9]$\n", "where the three dimensions represent: level of violence, positive message, foul language.\n", "\n", "We want to find user embeddings so that:\n", " - user 0 likes movie 0 and dislikes movie 1 and 2,\n", " - user 1 likes movie 1 and dislikes movie 0 and 2,\n", " - user 2 likes movie 2 and dislikes movie 0 and 1." ] }, { "cell_type": "code", "execution_count": 25, "id": "posted-performer", "metadata": {}, "outputs": [], "source": [ "class EmbeddingNetworkModel(nn.Module):\n", " def __init__(self, seed):\n", " super().__init__()\n", "\n", " self.seed = torch.manual_seed(seed)\n", "\n", " self.embedding = nn.Embedding(3, 3)\n", "\n", " def forward(self, x):\n", " user_id = x[0]\n", " item_repr = x[1]\n", " \n", " y = self.embedding(user_id) * item_repr\n", " y = torch.sum(y)\n", " y = torch.sigmoid(y)\n", "\n", " return y" ] }, { "cell_type": "code", "execution_count": 26, "id": "pleased-distributor", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "user_ids = [torch.tensor(0), torch.tensor(1), torch.tensor(2)]\n", "items = [torch.tensor([0.6, 0.4, -0.2]), \n", " torch.tensor([-0.7, 0.8, -0.7]), \n", " torch.tensor([0.8, -0.75, 0.9])]\n", "responses = [1, 0, 0, 0, 1, 0, 0, 0, 1]\n", "data = [(user_ids[user_id], items[item_id]) for user_id in range(3) for item_id in range(3)]\n", "\n", "embedding_nn = EmbeddingNetworkModel(seed=6789)\n", "\n", "optimizer = optim.SGD(embedding_nn.parameters(), lr=0.1)\n", "\n", "losses = []\n", "n_epochs = 1000\n", "for epoch in range(n_epochs):\n", "\n", " optimizer.zero_grad()\n", " \n", " for i in range(len(data)):\n", " user_id = data[i][0]\n", " item_repr = data[i][1]\n", " \n", " y = embedding_nn((user_id, item_repr))\n", " if i == 0:\n", " loss = torch.pow(y - responses[i], 2)\n", " else:\n", " loss += torch.pow(y - responses[i], 2)\n", " \n", " for param in embedding_nn.parameters():\n", " loss += 1 / 5 * torch.norm(param)\n", " \n", " loss.backward()\n", " losses.append(loss.item())\n", " optimizer.step()\n", "\n", "sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')\n", "plt.xlabel(\"epoch\")\n", "plt.ylabel(\"loss\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 27, "id": "turkish-thinking", "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Embedding for user 0\n", "tensor([ 0.9887, 0.2676, -0.7881], grad_fn=)\n", "Representation for item 0\n", "tensor([ 0.6000, 0.4000, -0.2000])\n", "Score=0.7\n", "\n", "Embedding for user 0\n", "tensor([ 0.9887, 0.2676, -0.7881], grad_fn=)\n", "Representation for item 1\n", "tensor([-0.7000, 0.8000, -0.7000])\n", "Score=0.52\n", "\n", "Embedding for user 0\n", "tensor([ 0.9887, 0.2676, -0.7881], grad_fn=)\n", "Representation for item 2\n", "tensor([ 0.8000, -0.7500, 0.9000])\n", "Score=0.47\n", "\n", "Embedding for user 1\n", "tensor([-1.7678, 0.1267, -0.4628], grad_fn=)\n", "Representation for item 0\n", "tensor([ 0.6000, 0.4000, -0.2000])\n", "Score=0.29\n", "\n", "Embedding for user 1\n", "tensor([-1.7678, 0.1267, -0.4628], grad_fn=)\n", "Representation for item 1\n", "tensor([-0.7000, 0.8000, -0.7000])\n", "Score=0.84\n", "\n", "Embedding for user 1\n", "tensor([-1.7678, 0.1267, -0.4628], grad_fn=)\n", "Representation for item 2\n", "tensor([ 0.8000, -0.7500, 0.9000])\n", "Score=0.13\n", "\n", "Embedding for user 2\n", "tensor([-0.2462, -1.4256, 1.1095], grad_fn=)\n", "Representation for item 0\n", "tensor([ 0.6000, 0.4000, -0.2000])\n", "Score=0.28\n", "\n", "Embedding for user 2\n", "tensor([-0.2462, -1.4256, 1.1095], grad_fn=)\n", "Representation for item 1\n", "tensor([-0.7000, 0.8000, -0.7000])\n", "Score=0.15\n", "\n", "Embedding for user 2\n", "tensor([-0.2462, -1.4256, 1.1095], grad_fn=)\n", "Representation for item 2\n", "tensor([ 0.8000, -0.7500, 0.9000])\n", "Score=0.87\n", "\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "id_pairs = [(user_id, item_id) for user_id in range(3) for item_id in range(3)]\n", "\n", "for id_pair in id_pairs:\n", " print(\"Embedding for user {}\".format(id_pair[0]))\n", " print(embedding_nn.embedding(user_ids[id_pair[0]]))\n", " print(\"Representation for item {}\".format(id_pair[1]))\n", " print(items[id_pair[1]])\n", " print(\"Score={}\".format(round(embedding_nn((user_ids[id_pair[0]], items[id_pair[1]])).item(), 2)))\n", " print()\n", " \n", "embeddings = pd.DataFrame(\n", " [\n", " ['user_0'] + embedding_nn.embedding(user_ids[0]).tolist(),\n", " ['user_1'] + embedding_nn.embedding(user_ids[1]).tolist(),\n", " ['user_2'] + embedding_nn.embedding(user_ids[2]).tolist(),\n", " ['item_0'] + items[0].tolist(),\n", " ['item_1'] + items[1].tolist(),\n", " ['item_2'] + items[2].tolist()\n", " \n", " ],\n", " columns=['entity', 'violence', 'positive message', 'language']\n", ")\n", "\n", "ax = sns.heatmap(embeddings.loc[:, ['violence', 'positive message', 'language']], annot=True)\n", "ax.yaxis.set_major_formatter(ticker.FixedFormatter(embeddings.loc[:, 'entity'].tolist()))\n", "plt.yticks(rotation=0)\n", "plt.show()\n", "\n", "ax = sns.scatterplot(data=embeddings, x='violence', y='positive message')\n", "for i in range(embeddings.shape[0]):\n", " x = embeddings['violence'][i]\n", " x = x + (-0.1 + 0.1 * -np.sign(x - np.mean(embeddings['violence'])))\n", " y = embeddings['positive message'][i]\n", " y = y + (-0.02 + 0.13 * -np.sign(y - np.mean(embeddings['positive message'])))\n", " plt.text(x=x, y=y, s=embeddings['entity'][i])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "middle-newman", "metadata": {}, "source": [ "## PyTorch advanced operations tasks" ] }, { "cell_type": "markdown", "id": "manual-serial", "metadata": {}, "source": [ "**Task 4.** Calculate the derivative $f'(w)$ using PyTorch and backward propagation (the backword method of the Tensor class) for the following functions and points:\n", " - $f(w) = w^3 + w^2$ and $w = 2.0$,\n", " - $f(w) = \\text{sin}(w)$ and $w = \\pi$,\n", " - $f(w) = \\ln(w * e^{3w})$ and $w = 1.0$." ] }, { "cell_type": "code", "execution_count": 28, "id": "copyrighted-perry", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "frequent-sarah", "metadata": {}, "source": [ "**Task 5.** Calculate the derivative $\\frac{\\partial f}{\\partial w_1}(w_1, w_2, w_3)$ using PyTorch and backward propagation (the backword method of the Tensor class) for the following functions and points:\n", " - $f(w_1, w_2) = w_1^3 + w_1^2 + w_2$ and $(w_1, w_2) = (2.0, 3.0)$,\n", " - $f(w_1, w_2, w_3) = \\text{sin}(w_1) * w_2 + w_1^2 * w_3$ and $(w_1, w_2) = (\\pi, 2.0, 4.0)$,\n", " - $f(w_1, w_2, w_3) = e^{w_1^2 + w_2^2 + w_3^2} + w_1^2 + w_2^2 + w_3^2$ and $(w_1, w_2, w_3) = (0.5, 0.67, 0.55)$." ] }, { "cell_type": "code", "execution_count": 29, "id": "dietary-columbia", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] }, { "cell_type": "markdown", "id": "short-border", "metadata": {}, "source": [ "**Task 6*.** Train a neural network with:\n", " - two input neurons, \n", " - four hidden neurons with sigmoid activation in the first hidden layer,\n", " - four hidden neurons with sigmoid activation in the second hidden layer,\n", " - one output neuron without sigmoid activation \n", " \n", "to get a good approximation of $f(x) = x_1 * x_2 + 1$ on the following dataset $D = \\{(1.0, 1.0), (0.0, 0.0), (2.0, -1.0), (-1.0, 0.5), (-0.5, -2.0)\\}$, i.e. the network should satisfy:\n", " - $\\text{net}(1.0, 1.0) \\sim 2.0$,\n", " - $\\text{net}(0.0, 0.0) \\sim 1.0$,\n", " - $\\text{net}(2.0, -1.0) \\sim -1.0$,\n", " - $\\text{net}(-1.0, 0.5) \\sim 0.5$,\n", " - $\\text{net}(-0.5, -2.0) \\sim 2.0$.\n", " \n", "After training print all weights and separately print $w_{1, 2}^{(1)}$ (the weight from the second input to the first hidden neuron in the first hidden layer) and $w_{1, 3}^{(3)}$ (the weight from the third hidden neuron in the second hidden layer to the output unit).\n", "\n", "Print the values of the network on the training points and verify that these values are closer to the real values of the $f$ function than $\\epsilon = 0.1$, i.e. $|\\text{net}(x) - f(x)| < \\epsilon$ for $x \\in D$.\n", "\n", "Because this network is only tested on the training set, it will certainly overfit if trained long enough. Train for 1000 epochs and then calculate\n", " - $\\text{net}(2.0, 2.0)$,\n", " - $\\text{net}(-1.0, -1.0)$,\n", " - $\\text{net}(3.0, -3.0)$.\n", " \n", "How far are these values from real values of the function $f$?" ] }, { "cell_type": "code", "execution_count": 30, "id": "documentary-petersburg", "metadata": {}, "outputs": [], "source": [ "# Write your code here" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 5 }