106 KiB
%matplotlib inline
%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
from IPython.display import Markdown, display, HTML
import torch
import torch.nn as nn
import torch.optim as optim
# Fix the dying kernel problem (only a problem in some installations - you can remove it, if it works without it)
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'
PyTorch
Here's your best friend when working with PyTorch: https://pytorch.org/docs/stable/index.html.
The beginning of this notebook shows that PyTorch tensors can be used exactly like numpy arrays. Later in the notebook additional features of tensors will be presented.
Creating PyTorch tensors
Directly
a = np.array(
[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
)
print(a)
print()
t = torch.tensor(
[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
)
print(t)
[[1. 2. 3.] [4. 5. 6.] [7. 8. 9.]] tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
From a list
l = [[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
print(l)
print()
a = np.array(l)
print(a)
print()
t = torch.tensor(l)
print(t)
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]] [[1. 2. 3.] [4. 5. 6.] [7. 8. 9.]] tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
From a list comprehension
a = [i**2 for i in range(10)]
print(a)
print()
print(np.array(a))
print()
print(torch.tensor(a))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] [ 0 1 4 9 16 25 36 49 64 81] tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
From a numpy array
a = np.array(
[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
)
t = torch.tensor(a)
print(t)
tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]], dtype=torch.float64)
Ready-made functions in PyTorch
# All zeros
a = torch.zeros((3, 4))
print("All zeros")
print(a)
print()
# All a chosen value
a = torch.full((3, 4), 7.0)
print("All chosen value (variant 1)")
print(a)
print()
# or
a = torch.zeros((3, 4))
a[:] = 7.0
print("All chosen value (variant 2)")
print(a)
print()
# Random integers
print("Random integers")
a = np.random.randint(low=0, high=10, size=(3, 2))
print(a)
print()
a = torch.randint(low=0, high=10, size=(3, 2))
print(a)
print()
# Random values from the normal distribution (Gaussian)
print("Random values from the normal distribution")
a = np.random.normal(loc=0, scale=10, size=(3, 2))
print(a)
print()
a = torch.normal(mean=0, std=10, size=(3, 2))
print(a)
All zeros tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) All chosen value (variant 1) tensor([[7., 7., 7., 7.], [7., 7., 7., 7.], [7., 7., 7., 7.]]) All chosen value (variant 2) tensor([[7., 7., 7., 7.], [7., 7., 7., 7.], [7., 7., 7., 7.]]) Random integers [[6 6] [8 9] [1 0]] tensor([[9, 5], [9, 3], [3, 8]]) Random values from the normal distribution [[ -5.34346728 0.97207777] [ -7.26648922 -12.2890286 ] [ -2.68082928 10.95819034]] tensor([[ 1.1231, -5.9980], [20.4600, -6.4359], [-6.6826, -0.4491]])
Slicing PyTorch tensors
Slicing in 1D
To obtain only specific values from a PyTorch tensor one can use so called slicing. It has the form
arr[low:high:step]
where low is the lowest index to be retrieved, high is the lowest index not to be retrieved and step indicates that every step element will be taken.
a = torch.tensor([i**2 for i in range(10)])
print("Original: ", a)
print("First 5 elements:", a[:5])
print("Elements from index 3 to index 5:", a[3:6])
print("Last 3 elements (negative indexing):", a[-3:])
print("Every second element:", a[::2])
print("Negative step a[::-1] to obtain reverse order does not work for tensors")
Original: tensor([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]) First 5 elements: tensor([ 0, 1, 4, 9, 16]) Elements from index 3 to index 5: tensor([ 9, 16, 25]) Last 3 elements (negative indexing): tensor([49, 64, 81]) Every second element: tensor([ 0, 4, 16, 36, 64]) Negative step a[::-1] to obtain reverse order does not work for tensors
Slicing in 2D
In two dimensions it works similarly, just the slicing is separate for every dimension.
a = torch.tensor([i for i in range(25)]).reshape(5, 5)
print("Original: ")
print(a)
print()
print("First 2 elements of the first 3 row:")
print(a[:3, :2])
print()
print("Middle 3 elements from the middle 3 rows:")
print(a[1:4, 1:4])
print()
print("Bottom-right 3 by 3 submatrix (negative indexing):")
print(a[-3:, -3:])
Original: tensor([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) First 2 elements of the first 3 row: tensor([[ 0, 1], [ 5, 6], [10, 11]]) Middle 3 elements from the middle 3 rows: tensor([[ 6, 7, 8], [11, 12, 13], [16, 17, 18]]) Bottom-right 3 by 3 submatrix (negative indexing): tensor([[12, 13, 14], [17, 18, 19], [22, 23, 24]])
Setting PyTorch tensor field values
a = torch.tensor([i for i in range(25)]).reshape(5, 5)
print("Original: ")
print(a)
print()
a[1:4, 1:4] = 5.0
print("Middle values changed to 5")
print(a)
print()
b = torch.tensor([i**2 - i for i in range(9)]).reshape(3, 3)
print("Second matrix")
print(b)
print()
a[1:4, 1:4] = b
print("Second matrix substituted into the middle of the first matrix")
print(a)
Original: tensor([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) Middle values changed to 5 tensor([[ 0, 1, 2, 3, 4], [ 5, 5, 5, 5, 9], [10, 5, 5, 5, 14], [15, 5, 5, 5, 19], [20, 21, 22, 23, 24]]) Second matrix tensor([[ 0, 0, 2], [ 6, 12, 20], [30, 42, 56]]) Second matrix substituted into the middle of the first matrix tensor([[ 0, 1, 2, 3, 4], [ 5, 0, 0, 2, 9], [10, 6, 12, 20, 14], [15, 30, 42, 56, 19], [20, 21, 22, 23, 24]])
Operations on PyTorch tensors
It is important to remember that arithmetic operations on PyTorch tensors are always element-wise.
a = torch.tensor([i**2 for i in range(9)]).reshape((3, 3))
print(a)
print()
b = torch.tensor([i**0.5 for i in range(9)]).reshape((3, 3))
print(b)
print()
tensor([[ 0, 1, 4], [ 9, 16, 25], [36, 49, 64]]) tensor([[0.0000, 1.0000, 1.4142], [1.7321, 2.0000, 2.2361], [2.4495, 2.6458, 2.8284]])
Element-wise sum
print(a + b)
tensor([[ 0.0000, 2.0000, 5.4142], [10.7321, 18.0000, 27.2361], [38.4495, 51.6458, 66.8284]])
Element-wise multiplication
print(a * b)
tensor([[ 0.0000, 1.0000, 5.6569], [ 15.5885, 32.0000, 55.9017], [ 88.1816, 129.6418, 181.0193]])
Matrix multiplication
print(np.matmul(a, b))
print()
# Multiplication by the identity matrix (to check it works as expected)
id_matrix = torch.tensor(
[[1.0, 0.0, 0.0],
[0.0, 1.0, 0.0],
[0.0, 0.0, 1.0]]
)
# Tensor a contained integers (type Long by default) and must be changed to the float type
a = a.type(torch.FloatTensor)
print(torch.matmul(id_matrix, a))
tensor([[ 11.5300, 12.5830, 13.5498], [ 88.9501, 107.1438, 119.2157], [241.6378, 303.3281, 341.4984]], dtype=torch.float64) tensor([[ 0., 1., 4.], [ 9., 16., 25.], [36., 49., 64.]])
Calculating the mean
a = torch.randint(low=0, high=10, size=(5,))
print(a)
print()
print("Mean: ", torch.sum(a) / len(a))
print()
# To get a single value use tensor.item()
print("Mean: ", (torch.sum(a) / len(a)).item())
tensor([3, 8, 7, 2, 6]) Mean: tensor(5.2000) Mean: 5.199999809265137
Calculating the mean of every row
a = torch.randint(low=0, high=10, size=(5, 3))
print(a)
print()
print("Mean:", torch.sum(a, axis=1) / a.shape[1])
print("Mean in the original matrix form:")
print((torch.sum(a, axis=1) / a.shape[1]).reshape(-1, 1)) # -1 calculates the right size to use all elements
tensor([[1, 6, 8], [6, 4, 8], [1, 5, 8], [2, 5, 7], [1, 0, 4]]) Mean: tensor([5.0000, 6.0000, 4.6667, 4.6667, 1.6667]) Mean in the original matrix form: tensor([[5.0000], [6.0000], [4.6667], [4.6667], [1.6667]])
More complex operations
Note that more complex tensor operations can only be performed on tensors. Numpy operations can be performed on numpy arrays but also directly on lists.
a = torch.tensor([1.0, 2.0, 3.0])
print("Vector to power 2 (element-wise)")
print(torch.pow(a, 2))
print()
print("Euler number to the power a (element-wise)")
print(torch.exp(a))
print()
print("An even more complex expression")
print((torch.pow(a, 2) + torch.exp(a)) / torch.sum(a))
Vector to power 2 (element-wise) tensor([1., 4., 9.]) Euler number to the power a (element-wise) tensor([ 2.7183, 7.3891, 20.0855]) An even more complex expression tensor([0.6197, 1.8982, 4.8476])
PyTorch basic operations tasks
Task 1. Calculate the sigmoid (logistic) function on every element of the following array [0.3, 1.2, -1.4, 0.2, -0.1, 0.1, 0.8, -0.25] and print the last 5 elements. Use only tensor operations.
# Write your code here
Task 2. Calculate the dot product of the following two vectors:
$x = [3, 1, 4, 2, 6, 1, 4, 8]$
$y = [5, 2, 3, 12, 2, 4, 17, 9]$
a) by using element-wise mutliplication and torch.sum,
b) by using torch.dot,
b) by using torch.matmul and transposition (x.T).
# Write your code here
Task 3. Calculate the following expression
$$\frac{1}{1 + e^{-x_0 \theta_0 - \ldots - x_9 \theta_9 - \theta_{10}}}$$
for
$x = [1.2, 2.3, 3.4, -0.7, 4.2, 2.7, -0.5, 1.4, -3.3, 0.2]$
$\theta = [1.7, 0.33, -2.12, -1.73, 2.9, -5.8, -0.9, 12.11, 3.43, -0.5, -1.65]$
and print the result. Use only tensor operations.
# Write your code here
Tensor gradients
Tensors are designed to be used in neural networks. Their most important functionality is automatic gradient and backward propagation calculation.
x = torch.tensor([[2., -1.], [3., 1.]], requires_grad=True)
out = x.pow(3).sum() # the actual derivative is 3*x^2
print("out={}".format(out))
print()
out.backward()
print("gradient")
print(x.grad)
out=35.0 gradient tensor([[12., 3.], [27., 3.]])
x = torch.tensor([[2., -1., 3.]], requires_grad=True)
y = torch.tensor([[4., 2., -1.]], requires_grad=True)
z = torch.sum(x * y)
z.backward()
print(x.grad)
print(y.grad)
x.grad.data.zero_()
y.grad.data.zero_()
z = torch.sigmoid(torch.sum(x * y))
z.backward()
print(x.grad)
print(y.grad)
tensor([[ 4., 2., -1.]]) tensor([[ 2., -1., 3.]]) tensor([[ 0.1807, 0.0904, -0.0452]]) tensor([[ 0.0904, -0.0452, 0.1355]])
Backpropagation
In this section we train weights $w$ of a simple model $y = \text{sigmoid}(w * x)$ to obtain $y = 0.65$ on $x = [2.0, -1.0, 3.0]$.
x = torch.tensor([2., -1., 3.], requires_grad=False)
w = torch.tensor([4., 2., -1.], requires_grad=True)
y_target = 0.65
print("x")
print(x)
print("x.grad")
print(x.grad)
print("w")
print(w)
print("w.grad")
print(w.grad)
print()
print()
optimizer = optim.SGD([w], lr=0.1)
losses = []
n_epochs = 100
for epoch in range(n_epochs):
optimizer.zero_grad()
y = torch.sigmoid(torch.sum(x * w))
loss = torch.pow(y - y_target, 2)
loss.backward()
losses.append(loss.item())
optimizer.step()
if epoch < 3 or epoch > 96:
print("w")
print(w)
print("w.grad")
print(w.grad)
print("y")
print(y)
print("loss")
print(loss)
print()
print()
sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()
x tensor([ 2., -1., 3.]) x.grad None w tensor([ 4., 2., -1.], requires_grad=True) w.grad None w tensor([ 3.9945, 2.0027, -1.0082], requires_grad=True) w.grad tensor([ 0.0547, -0.0273, 0.0820]) y tensor(0.9526, grad_fn=<SigmoidBackward>) loss tensor(0.0916, grad_fn=<PowBackward0>) w tensor([ 3.9889, 2.0055, -1.0166], requires_grad=True) w.grad tensor([ 0.0563, -0.0281, 0.0844]) y tensor(0.9508, grad_fn=<SigmoidBackward>) loss tensor(0.0905, grad_fn=<PowBackward0>) w tensor([ 3.9831, 2.0084, -1.0253], requires_grad=True) w.grad tensor([ 0.0579, -0.0290, 0.0869]) y tensor(0.9489, grad_fn=<SigmoidBackward>) loss tensor(0.0894, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True) w.grad tensor([ 6.1291e-06, -3.0645e-06, 9.1936e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(4.5365e-11, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True) w.grad tensor([ 5.0985e-06, -2.5493e-06, 7.6478e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(3.1392e-11, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102], requires_grad=True) w.grad tensor([ 4.4477e-06, -2.2238e-06, 6.6715e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(2.3888e-11, grad_fn=<PowBackward0>)
Proper PyTorch model with a fully-connected layer
A fully-connected layer is represented by torch.nn.Linear. Its parameters are:
in_features - the number of input neurons,
out_features - the number of output neurons,
bias - boolean if bias should be included.
Documentation: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
class FullyConnectedNetworkModel(nn.Module):
def __init__(self, seed):
super().__init__()
self.seed = torch.manual_seed(seed)
self.fc = nn.Linear(3, 1, bias=False)
self.fc.weight.data = torch.tensor([4., 2., -1.], requires_grad=True)
def forward(self, x):
x = torch.sigmoid(self.fc(x))
return x
x = torch.tensor([2., -1., 3.])
y_target = 0.65
fc_neural_net = FullyConnectedNetworkModel(seed=6789)
optimizer = optim.SGD(fc_neural_net.parameters(), lr=0.1)
losses = []
n_epochs = 100
for epoch in range(n_epochs):
optimizer.zero_grad()
y = fc_neural_net(x)
loss = torch.pow(y - y_target, 2)
loss.backward()
losses.append(loss.item())
optimizer.step()
if epoch < 3 or epoch > 96:
print("w")
print(fc_neural_net.fc.weight.data)
print("w.grad")
print(next(fc_neural_net.parameters()).grad)
print("y")
print(y)
print("loss")
print(loss)
print()
print()
sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()
w tensor([ 3.9945, 2.0027, -1.0082]) w.grad tensor([ 0.0547, -0.0273, 0.0820]) y tensor(0.9526, grad_fn=<SigmoidBackward>) loss tensor(0.0916, grad_fn=<PowBackward0>) w tensor([ 3.9889, 2.0055, -1.0166]) w.grad tensor([ 0.0563, -0.0281, 0.0844]) y tensor(0.9508, grad_fn=<SigmoidBackward>) loss tensor(0.0905, grad_fn=<PowBackward0>) w tensor([ 3.9831, 2.0084, -1.0253]) w.grad tensor([ 0.0579, -0.0290, 0.0869]) y tensor(0.9489, grad_fn=<SigmoidBackward>) loss tensor(0.0894, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102]) w.grad tensor([ 6.1291e-06, -3.0645e-06, 9.1936e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(4.5365e-11, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102]) w.grad tensor([ 5.0985e-06, -2.5493e-06, 7.6478e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(3.1392e-11, grad_fn=<PowBackward0>) w tensor([ 3.6599, 2.1701, -1.5102]) w.grad tensor([ 4.4477e-06, -2.2238e-06, 6.6715e-06]) y tensor(0.6500, grad_fn=<SigmoidBackward>) loss tensor(2.3888e-11, grad_fn=<PowBackward0>)
Embedding layer
An embedding layer is represented by torch.nn.Embedding. Its main parameters are:
num_embeddings - the number of ids to embed,
embedding_dim - the dimension of the embedding vector.
Documentation: https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
In the example below we will have 3 movies and 3 users. The movies have already trained representations:
- $m0 = [0.6, 0.4, -0.2]$
- $m1 = [-0.7, 0.8, -0.7]$
- $m2 = [0.8, -0.75, 0.9]$ where the three dimensions represent: level of violence, positive message, foul language.
We want to find user embeddings so that:
- user 0 likes movie 0 and dislikes movie 1 and 2,
- user 1 likes movie 1 and dislikes movie 0 and 2,
- user 2 likes movie 2 and dislikes movie 0 and 1.
class EmbeddingNetworkModel(nn.Module):
def __init__(self, seed):
super().__init__()
self.seed = torch.manual_seed(seed)
self.embedding = nn.Embedding(3, 3)
def forward(self, x):
user_id = x[0]
item_repr = x[1]
y = self.embedding(user_id) * item_repr
y = torch.sum(y)
y = torch.sigmoid(y)
return y
user_ids = [torch.tensor(0), torch.tensor(1), torch.tensor(2)]
items = [torch.tensor([0.6, 0.4, -0.2]),
torch.tensor([-0.7, 0.8, -0.7]),
torch.tensor([0.8, -0.75, 0.9])]
responses = [1, 0, 0, 0, 1, 0, 0, 0, 1]
data = [(user_ids[user_id], items[item_id]) for user_id in range(3) for item_id in range(3)]
embedding_nn = EmbeddingNetworkModel(seed=6789)
optimizer = optim.SGD(embedding_nn.parameters(), lr=0.1)
losses = []
n_epochs = 1000
for epoch in range(n_epochs):
optimizer.zero_grad()
for i in range(len(data)):
user_id = data[i][0]
item_repr = data[i][1]
y = embedding_nn((user_id, item_repr))
if i == 0:
loss = torch.pow(y - responses[i], 2)
else:
loss += torch.pow(y - responses[i], 2)
for param in embedding_nn.parameters():
loss += 1 / 5 * torch.norm(param)
loss.backward()
losses.append(loss.item())
optimizer.step()
sns.lineplot(x=np.arange(n_epochs), y=losses).set_title('Training loss')
plt.xlabel("epoch")
plt.ylabel("loss")
plt.show()
id_pairs = [(user_id, item_id) for user_id in range(3) for item_id in range(3)]
for id_pair in id_pairs:
print("Embedding for user {}".format(id_pair[0]))
print(embedding_nn.embedding(user_ids[id_pair[0]]))
print("Representation for item {}".format(id_pair[1]))
print(items[id_pair[1]])
print("Score={}".format(round(embedding_nn((user_ids[id_pair[0]], items[id_pair[1]])).item(), 2)))
print()
embeddings = pd.DataFrame(
[
['user_0'] + embedding_nn.embedding(user_ids[0]).tolist(),
['user_1'] + embedding_nn.embedding(user_ids[1]).tolist(),
['user_2'] + embedding_nn.embedding(user_ids[2]).tolist(),
['item_0'] + items[0].tolist(),
['item_1'] + items[1].tolist(),
['item_2'] + items[2].tolist()
],
columns=['entity', 'violence', 'positive message', 'language']
)
ax = sns.heatmap(embeddings.loc[:, ['violence', 'positive message', 'language']], annot=True)
ax.yaxis.set_major_formatter(ticker.FixedFormatter(embeddings.loc[:, 'entity'].tolist()))
plt.yticks(rotation=0)
plt.show()
ax = sns.scatterplot(data=embeddings, x='violence', y='positive message')
for i in range(embeddings.shape[0]):
x = embeddings['violence'][i]
x = x + (-0.1 + 0.1 * -np.sign(x - np.mean(embeddings['violence'])))
y = embeddings['positive message'][i]
y = y + (-0.02 + 0.13 * -np.sign(y - np.mean(embeddings['positive message'])))
plt.text(x=x, y=y, s=embeddings['entity'][i])
plt.show()
Embedding for user 0 tensor([ 0.9887, 0.2676, -0.7881], grad_fn=<EmbeddingBackward>) Representation for item 0 tensor([ 0.6000, 0.4000, -0.2000]) Score=0.7 Embedding for user 0 tensor([ 0.9887, 0.2676, -0.7881], grad_fn=<EmbeddingBackward>) Representation for item 1 tensor([-0.7000, 0.8000, -0.7000]) Score=0.52 Embedding for user 0 tensor([ 0.9887, 0.2676, -0.7881], grad_fn=<EmbeddingBackward>) Representation for item 2 tensor([ 0.8000, -0.7500, 0.9000]) Score=0.47 Embedding for user 1 tensor([-1.7678, 0.1267, -0.4628], grad_fn=<EmbeddingBackward>) Representation for item 0 tensor([ 0.6000, 0.4000, -0.2000]) Score=0.29 Embedding for user 1 tensor([-1.7678, 0.1267, -0.4628], grad_fn=<EmbeddingBackward>) Representation for item 1 tensor([-0.7000, 0.8000, -0.7000]) Score=0.84 Embedding for user 1 tensor([-1.7678, 0.1267, -0.4628], grad_fn=<EmbeddingBackward>) Representation for item 2 tensor([ 0.8000, -0.7500, 0.9000]) Score=0.13 Embedding for user 2 tensor([-0.2462, -1.4256, 1.1095], grad_fn=<EmbeddingBackward>) Representation for item 0 tensor([ 0.6000, 0.4000, -0.2000]) Score=0.28 Embedding for user 2 tensor([-0.2462, -1.4256, 1.1095], grad_fn=<EmbeddingBackward>) Representation for item 1 tensor([-0.7000, 0.8000, -0.7000]) Score=0.15 Embedding for user 2 tensor([-0.2462, -1.4256, 1.1095], grad_fn=<EmbeddingBackward>) Representation for item 2 tensor([ 0.8000, -0.7500, 0.9000]) Score=0.87
PyTorch advanced operations tasks
Task 4. Calculate the derivative $f'(w)$ using PyTorch and backward propagation (the backword method of the Tensor class) for the following functions and points:
- $f(w) = w^3 + w^2$ and $w = 2.0$,
- $f(w) = \text{sin}(w)$ and $w = \pi$,
- $f(w) = \ln(w * e^{3w})$ and $w = 1.0$.
# Write your code here
Task 5. Calculate the derivative $\frac{\partial f}{\partial w_1}(w_1, w_2, w_3)$ using PyTorch and backward propagation (the backword method of the Tensor class) for the following functions and points:
- $f(w_1, w_2) = w_1^3 + w_1^2 + w_2$ and $(w_1, w_2) = (2.0, 3.0)$,
- $f(w_1, w_2, w_3) = \text{sin}(w_1) * w_2 + w_1^2 * w_3$ and $(w_1, w_2) = (\pi, 2.0, 4.0)$,
- $f(w_1, w_2, w_3) = e^{w_1^2 + w_2^2 + w_3^2} + w_1^2 + w_2^2 + w_3^2$ and $(w_1, w_2, w_3) = (0.5, 0.67, 0.55)$.
# Write your code here
_Task 6.* Train a neural network with:
two input neurons,
four hidden neurons with sigmoid activation in the first hidden layer,
four hidden neurons with sigmoid activation in the second hidden layer,
one output neuron without sigmoid activation
to get a good approximation of $f(x) = x_1 * x_2 + 1$ on the following dataset $D = \{(1.0, 1.0), (0.0, 0.0), (2.0, -1.0), (-1.0, 0.5), (-0.5, -2.0)\}$, i.e. the network should satisfy:
$\text{net}(1.0, 1.0) \sim 2.0$,
$\text{net}(0.0, 0.0) \sim 1.0$,
$\text{net}(2.0, -1.0) \sim -1.0$,
$\text{net}(-1.0, 0.5) \sim 0.5$,
$\text{net}(-0.5, -2.0) \sim 2.0$.
After training print all weights and separately print $w_{1, 2}^{(1)}$ (the weight from the second input to the first hidden neuron in the first hidden layer) and $w_{1, 3}^{(3)}$ (the weight from the third hidden neuron in the second hidden layer to the output unit).
Print the values of the network on the training points and verify that these values are closer to the real values of the $f$ function than $\epsilon = 0.1$, i.e. $|\text{net}(x) - f(x)| < \epsilon$ for $x \in D$.
Because this network is only tested on the training set, it will certainly overfit if trained long enough. Train for 1000 epochs and then calculate
$\text{net}(2.0, 2.0)$,
$\text{net}(-1.0, -1.0)$,
$\text{net}(3.0, -3.0)$.
How far are these values from real values of the function $f$?
# Write your code here