uczenie-maszynowe/lab/01_Podstawowe_narzędzia.ipynb

2289 lines
44 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Uczenie maszynowe — laboratoria\n",
"# 1. Podstawowe narzędzia uczenia maszynowego"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.1. Elementy języka Python przydatne w uczeniu maszynowym"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Listy składane (*List comprehension*)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Przypuśćmy, że mamy dane zdanie i chcemy utworzyć listę, która będzie zawierać długości kolejnych wyrazów tego zdania. Możemy to zrobić w następujący sposób:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = \"tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł\"\n",
"wyrazy = zdanie.split()\n",
"dlugosci_wyrazow = []\n",
"for wyraz in wyrazy:\n",
" dlugosci_wyrazow.append(len(wyraz))\n",
"\n",
"print(dlugosci_wyrazow)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Możemy to też zrobić bardziej „pythonicznie”, przy użyciu list składanych:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = \"tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł\"\n",
"dlugosci_wyrazow = [len(wyraz) for wyraz in zdanie.split()]\n",
"\n",
"print(dlugosci_wyrazow)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jeżeli chcemy, żeby był sprawdzany dodatkowy warunek, np. chcemy pomijać wyraz „takt”, to wciąż możemy użyć list składanych:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 1, 3, 1, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = \"tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł\"\n",
"wyrazy = zdanie.split()\n",
"\n",
"# Ta konstrukcja:\n",
"dlugosci_wyrazow = []\n",
"for wyraz in wyrazy:\n",
" if wyraz != \"takt\":\n",
" dlugosci_wyrazow.append(wyraz)\n",
"\n",
"# ...jest równoważna tej jednolinijkowej:\n",
"dlugosci_wyrazow = [len(wyraz) for wyraz in wyrazy if wyraz != \"takt\"]\n",
"\n",
"print(dlugosci_wyrazow)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indeksowanie"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wszystkie listy i krotki w Pythonie, w tym łańcuchy (które trakowane są jak krotki znaków), są indeksowane od 0:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[4, 16, 36, 64, 100]\n",
"[4, 16, 36, 64, 100]\n"
]
}
],
"source": [
"print(lista)\n",
"print(lista[:])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a\n",
"e\n"
]
}
],
"source": [
"napis = \"abcde\"\n",
"print(napis[0]) # 'a'\n",
"print(napis[4]) # 'e'\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indeksy możemy liczyć również „od końca”:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"e\n",
"d\n",
"a\n"
]
}
],
"source": [
"napis = \"abcde\"\n",
"print(napis[-1]) # 'e' („ostatni”)\n",
"print(napis[-2]) # 'd' („drugi od końca”)\n",
"print(napis[-5]) # 'a' („piąty od końca”)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Łańcuchy możemy też „kroić na plasterki” (*slicing*):"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bcd\n",
"b\n",
"cd\n",
"bcd\n",
"de\n",
"abc\n",
"abcde\n"
]
}
],
"source": [
"napis = \"abcde\"\n",
"print(napis[1:4]) # 'bcd' („znaki od 1. włącznie do 4. wyłącznie”)\n",
"print(napis[1:2]) # 'b' (to samo co `napis[1]`)\n",
"print(napis[-3:-1]) # 'cd' (kroić można też stosując indeksowanie od końca)\n",
"print(napis[1:-1]) # 'bcd' (możemy nawet mieszać te dwa sposoby indeksowania)\n",
"print(\n",
" napis[3:]\n",
") # 'de' (jeżeli koniec przedziału nie jest podany, to kroimy do samego końca łańcucha)\n",
"print(\n",
" napis[:3]\n",
") # 'abc' (jeżeli początek przedziału nie jest podany, to kroimy od początku łańcucha)\n",
"print(napis[:]) # 'abcde' (kopia całego napisu)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.2. Biblioteka _NumPy_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tablice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Głównym obiektem w NumPy jest **jednorodna**, **wielowymiarowa** tablica. Przykładem takiej tablicy jest macierz `x`.\n",
"\n",
"Macierz $x =\n",
" \\begin{pmatrix}\n",
" 1 & 2 & 3 \\\\\n",
" 4 & 5 & 6 \\\\\n",
" 7 & 8 & 9\n",
" \\end{pmatrix}$\n",
"można zapisać jako:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 5 6]\n",
" [7 8 9]\n",
" [2 5 8]]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [2, 5, 8]])\n",
"print(x)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Najczęsciej używane metody tablic typu `array`:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(4, 3)\n"
]
}
],
"source": [
"print(x.shape) # wymiary macierzy\n"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[14 20 26]\n",
"[ 6 15 24 15]\n"
]
}
],
"source": [
"print(x.sum(axis=0)) # suma liczb w każdej kolumnie\n",
"print(x.sum(axis=1)) # suma liczb w każdym wierszu"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2. 5. 8. 5.]\n"
]
}
],
"source": [
"print(x.mean(axis=1)) # średnia liczb w każdym wierszu\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do tworzenia sekwencji liczbowych jako obiekty typu `array` należy wykorzystać funkcję `arange`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(10)\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5, 6, 7, 8, 9])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(5, 10)\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(5, 10, 0.5)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kształt tablicy można zmienić za pomocą metody `reshape`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1 2 3 4 5 6 7 8 9 10 11 12]\n",
"[[ 1 2 3 4]\n",
" [ 5 6 7 8]\n",
" [ 9 10 11 12]]\n"
]
}
],
"source": [
"x = np.arange(1, 13)\n",
"print(x)\n",
"y = x.reshape(3, 4)\n",
"print(y)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Funkcją podobną do `arange` jest `linspace`, która wypełnia wektor określoną liczbą elementów z przedziału o równych automatycznie obliczonych odstępach (w `arange` należy podać rozmiar kroku):"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0. 0.625 1.25 1.875 2.5 3.125 3.75 4.375 5. ]\n"
]
}
],
"source": [
"x = np.linspace(0, 5, 9)\n",
"print(x)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dodatkowe informacje o funkcjach NumPy uzyskuje się za pomocą polecenia `help(nazwa_funkcji)`:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function shape in module numpy:\n",
"\n",
"shape(a)\n",
" Return the shape of an array.\n",
" \n",
" Parameters\n",
" ----------\n",
" a : array_like\n",
" Input array.\n",
" \n",
" Returns\n",
" -------\n",
" shape : tuple of ints\n",
" The elements of the shape tuple give the lengths of the\n",
" corresponding array dimensions.\n",
" \n",
" See Also\n",
" --------\n",
" alen\n",
" ndarray.shape : Equivalent array method.\n",
" \n",
" Examples\n",
" --------\n",
" >>> np.shape(np.eye(3))\n",
" (3, 3)\n",
" >>> np.shape([[1, 2]])\n",
" (1, 2)\n",
" >>> np.shape([0])\n",
" (1,)\n",
" >>> np.shape(0)\n",
" ()\n",
" \n",
" >>> a = np.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')])\n",
" >>> np.shape(a)\n",
" (2,)\n",
" >>> a.shape\n",
" (2,)\n",
"\n"
]
}
],
"source": [
"help(np.shape)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice mogą składać się z danych różnych typów (ale tylko jednego typu danych równocześnie, stąd jednorodność)."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 3] - typ: int32\n",
"[0.1 0.2 0.3] - typ: float64\n",
"[1. 2. 3.] - typ: float64\n"
]
}
],
"source": [
"x = np.array([1, 2, 3])\n",
"print(x, \"- typ: \", x.dtype)\n",
"\n",
"x = np.array([0.1, 0.2, 0.3])\n",
"print(x, \"- typ: \", x.dtype)\n",
"\n",
"x = np.array([1, 2, 3], dtype=\"float64\")\n",
"print(x, \"- typ: \", x.dtype)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tworzenie tablic składających się z samych zer lub jedynek umożliwiają funkcje `zeros` oraz `ones`:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 0. 0. 0.]\n",
" [0. 0. 0. 0.]\n",
" [0. 0. 0. 0.]]\n"
]
}
],
"source": [
"x = np.zeros([3, 4])\n",
"print(x)\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 1. 1. 1.]\n",
" [1. 1. 1. 1.]\n",
" [1. 1. 1. 1.]]\n"
]
}
],
"source": [
"x = np.ones([3, 4])\n",
"print(x)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Podstawowe operacje arytmetyczne"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Operatory arytmetyczne na tablicach w NumPy działają **element po elemencie**."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3 4 5]\n",
"[1. 1. 1.]\n",
"[3. 4. 5.]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"a = np.array([3, 4, 5])\n",
"print(a)\n",
"b = np.ones(3)\n",
"print(b)\n",
"print(a / b)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Za mnożenie macierzy odpowiadają funkcje `dot` i `matmul` (**nie** operator `*`):"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2]\n",
" [3 4]]\n"
]
}
],
"source": [
"a = np.array([[1, 2], [3, 4]])\n",
"print(a)\n"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2]\n",
" [3 4]]\n"
]
}
],
"source": [
"b = np.array([[1, 2], [3, 4]])\n",
"print(b)\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a * b # mnożenie element po elemencie\n"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a @ b # mnożenie macierzowe"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dot(a, b) # mnożenie macierzowe\n"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.matmul(a, b) # mnożenie macierzowe\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Przykłady innych operacji dodawania i mnożenia:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[5., 5.],\n",
" [5., 5.]])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.zeros((2, 2), dtype=\"float\")\n",
"a += 5\n",
"a\n"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[25., 25.],\n",
" [25., 25.]])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a *= 5\n",
"a\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[50., 50.],\n",
" [50., 50.]])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a + a\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sklejanie tablic:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1, 2, 3])\n",
"b = np.array([4, 5, 6])\n",
"c = np.array([7, 8, 9])\n",
"np.hstack([a, b, c])\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9]])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.vstack([a, b, c])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Typowe funkcje matematyczne:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3.14159265, 4.44288294, 5.44139809, 6.28318531])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(1, 5)\n",
"np.sqrt(x) * np.pi\n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2**4\n"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.power(2, 4)\n"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.log(np.e)\n"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(5)\n",
"x.max() - x.min()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indeksy i zakresy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice jednowymiarowe zachowują sie podobnie do zwykłych list pythonowych."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(10)\n",
"a[2:4]\n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 2, 4, 6, 8])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[:10:2]\n"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[::-1]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice wielowymiarowe mają po jednym indeksie na wymiar:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(12).reshape(3, 4)\n",
"x\n"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[2, 3]\n"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 5, 9])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[:, 1] # kolumna nr 1\n"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 5, 6, 7])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1, :] # wiersz nr 1\n"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1:3, :]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Warunki"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Warunki pozwalają na selekcję elementów tablicy."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 2, 2, 3, 3, 3])"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3])\n",
"a[a > 1]\n"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 3, 3])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[a == 3]\n"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([0, 1, 2, 3, 4, 5], dtype=int64),)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a < 3)\n"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5], dtype=int64)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a < 3)[0]\n"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([], dtype=int64),)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a > 9)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pętle i wypisywanie"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3]\n",
"[4 5 6 7]\n",
"[ 8 9 10 11]\n"
]
}
],
"source": [
"for row in x:\n",
" print(row)\n"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8\n",
"9\n",
"10\n",
"11\n"
]
}
],
"source": [
"for element in x.flat:\n",
" print(element)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Liczby losowe"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 3, 3, 1, 2])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.randint(0, 10, 5)\n"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 2.25701199, -0.62666283, -0.58260693, 0.91053811, -0.12398967])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.normal(0, 1, 5)\n"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.64188687, 1.98379682, 0.4690363 , 1.26967692, 0.84376779])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.uniform(0, 2, 5)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Macierze"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy jest pakietem wykorzystywanym do obliczeń w dziedzinie algebry liniowej, co jeszcze szczególnie przydatne w uczeniu maszynowym. \n",
"\n",
"Wektor o wymiarach $1 \\times N$ \n",
"$$\n",
" x =\n",
" \\begin{pmatrix}\n",
" x_{1} \\\\\n",
" x_{2} \\\\\n",
" \\vdots \\\\\n",
" x_{N}\n",
" \\end{pmatrix} \n",
"$$\n",
"\n",
"i jego transpozycję $x^\\top = (x_{1}, x_{2},\\ldots,x_{N})$ można wyrazić w Pythonie w następujący sposób:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 1)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"x = np.array([[1, 2, 3]]).T\n",
"x.shape\n"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, 3)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xt = x.T\n",
"xt.shape\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz kolumnowa** w NumPy.\n",
"$$X =\n",
" \\begin{pmatrix}\n",
" 3 \\\\\n",
" 4 \\\\\n",
" 5 \\\\\n",
" 6 \n",
" \\end{pmatrix}$$"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3],\n",
" [4],\n",
" [5],\n",
" [6]])"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([[3, 4, 5, 6]]).T\n",
"x\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz wierszowa** w NumPy.\n",
"$$ X =\n",
" \\begin{pmatrix}\n",
" 3 & 4 & 5 & 6\n",
" \\end{pmatrix}$$"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3, 4, 5, 6]])"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([[3, 4, 5, 6]])\n",
"x\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oprócz obiektów typu `array` istnieje wyspecjalizowany obiekt `matrix`, dla którego operacje `*` (mnożenie) oraz `**-1` (odwracanie) są określone w sposób właściwy dla macierzy (w przeciwieństwie do operacji elementowych dla obiektów `array`)."
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 5 6]\n",
" [7 8 9]]\n"
]
}
],
"source": [
"x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9]).reshape(3, 3)\n",
"print(x)\n"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[4 6 3]\n",
" [8 7 1]\n",
" [3 0 3]]\n"
]
}
],
"source": [
"y = np.array([4, 6, 3, 8, 7, 1, 3, 0, 3]).reshape(3, 3)\n",
"print(y)\n"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [],
"source": [
"X = np.matrix(x)\n",
"Y = np.matrix(y)\n"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 4 12 9]\n",
" [32 35 6]\n",
" [21 0 27]]\n"
]
}
],
"source": [
"print(x * y) # Tablice np.array mnożone są element po elemencie\n"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 29 20 14]\n",
" [ 74 59 35]\n",
" [119 98 56]]\n"
]
}
],
"source": [
"print(X * Y) # Macierze np.matrix mnożone są macierzowo\n"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 29 20 14]\n",
" [ 74 59 35]\n",
" [119 98 56]]\n"
]
}
],
"source": [
"print(np.matmul(x, y))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Wyznacznik macierzy**"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"33.000000000000014"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([[3, -9], [2, 5]])\n",
"np.linalg.det(a)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz odwrotna**"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-4, -2],\n",
" [ 5, 5]])"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = np.array([[-4, -2], [5, 5]])\n",
"A\n"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.5, -0.2],\n",
" [ 0.5, 0.4]])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"invA = np.linalg.inv(A)\n",
"invA\n"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1., 0.],\n",
" [0., 1.]])"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.round(np.dot(A, invA))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(ponieważ $AA^{-1} = A^{-1}A = I$)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Wartości i wektory własne**"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 0, 0],\n",
" [0, 2, 0],\n",
" [0, 0, 3]])"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.diag((1, 2, 3))\n",
"a\n"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1. 2. 3.]\n",
"[[1. 0. 0.]\n",
" [0. 1. 0.]\n",
" [0. 0. 1.]]\n"
]
}
],
"source": [
"w, v = np.linalg.eig(a)\n",
"print(w) # wartości własne\n",
"print(v) # wektory własne\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1.3. Biblioteka PyTorch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Biblioteka PyTorch została stworzona z myślą o uczeniu maszynowym. Oprócz wykonywania rozmaitych działań matematycznych takich jak te, które można wykonywać w bibliotece NumPy, dostarcza metod przydatnych w uczeniu maszynowym, z których chyba najbardziej charakterystyczną jest automatyczne różniczkowanie (moduł `autograd`).\n",
"\n",
"Ale o tym później."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Instalacja\n",
"\n",
" pip install torch torchvision\n",
"\n",
"lub\n",
"\n",
" conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tensory"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podstawowym typem danych dla pakietu `pytorch` jest tensor (`torch.tensor`). Tensor to uogólnienie macierzy na dowolną liczbę wymiarów. Można powiedzieć, że macierze są dwuwymiarowymi tensorami."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9]])\n"
]
}
],
"source": [
"import torch\n",
"\n",
"x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
"print(x)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Operacje na tensorach"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Działania na tensorach w bibliotece PyTorch wykonuje się bardzo podobnie do działań na miacierzach w bibliotece NumPy. Czasami nazwy metod się trochę różnią."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([3, 3])\n",
"torch.Size([3, 3])\n"
]
}
],
"source": [
"# Wymiary (rozmiar) tensora\n",
"\n",
"print(x.shape)\n",
"print(x.size()) # Można użyć `size()` zamiast `shape`\n"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([1, 2, 3]) - type: torch.int64\n",
"tensor([0.1000, 0.2000, 0.3000]) - type: torch.float32\n",
"tensor([1., 2., 3.], dtype=torch.float64) - type: torch.float64\n"
]
}
],
"source": [
"# Typy elementów\n",
"\n",
"x = torch.tensor([1, 2, 3])\n",
"print(x, \"- type:\", x.dtype)\n",
"\n",
"x = torch.tensor([0.1, 0.2, 0.3])\n",
"print(x, \"- type:\", x.dtype)\n",
"\n",
"x = torch.tensor([1, 2, 3], dtype=torch.float64) # Uwaga: inaczej niż w NumPy\n",
"print(x, \"- type:\", x.dtype)\n"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0., 0., 0., 0.],\n",
" [0., 0., 0., 0.],\n",
" [0., 0., 0., 0.]])\n"
]
}
],
"source": [
"x = torch.zeros([3, 4])\n",
"print(x)\n"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1., 1., 1., 1.],\n",
" [1., 1., 1., 1.],\n",
" [1., 1., 1., 1.]])\n"
]
}
],
"source": [
"x = torch.ones([3, 4])\n",
"print(x)\n"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.9863, 0.9173, 0.5301, 0.4279],\n",
" [0.7708, 0.4671, 0.2965, 0.0578],\n",
" [0.6684, 0.4432, 0.9817, 0.1521]])\n"
]
}
],
"source": [
"x = torch.rand([3, 4])\n",
"print(x)\n"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Wiersz 0:\n",
"0.9863188862800598\n",
"0.917273998260498\n",
"0.53009432554245\n",
"0.42788761854171753\n",
"\n",
"Wiersz 1:\n",
"0.7708230018615723\n",
"0.46713775396347046\n",
"0.2964947819709778\n",
"0.057803571224212646\n",
"\n",
"Wiersz 2:\n",
"0.6684107780456543\n",
"0.4432327151298523\n",
"0.9817106127738953\n",
"0.15205740928649902\n"
]
}
],
"source": [
"# Iterowanie po elementach tensora\n",
"\n",
"for i, row in enumerate(x):\n",
" print(f\"\\nWiersz {i}:\")\n",
" for element in row:\n",
" print(\n",
" element.item()\n",
" ) # `item()` zamienia jednoelementowy (bezwymiarowy) tensor na liczbę\n"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n",
" [0.1138, 0.9820, 0.4452, 0.5775],\n",
" [0.7510, 0.3174, 0.6937, 0.8904]])\n",
"tensor([[0.0661, 0.6596, 0.7498, 0.5254],\n",
" [0.3271, 0.8968, 0.3188, 0.9255],\n",
" [0.2099, 0.5828, 0.4611, 0.6856]])\n",
"tensor([[0.0738, 0.3776],\n",
" [0.2646, 0.5449],\n",
" [0.6779, 0.0567],\n",
" [0.0348, 0.3072]])\n"
]
}
],
"source": [
"# Przykładowe macierze\n",
"\n",
"A = torch.rand([3, 4])\n",
"print(A)\n",
"\n",
"B = torch.rand([3, 4])\n",
"print(B)\n",
"\n",
"C = torch.rand([4, 2])\n",
"print(C)\n"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.2232, 1.1188, 1.4978, 1.1928],\n",
" [0.4409, 1.8788, 0.7640, 1.5029],\n",
" [0.9609, 0.9002, 1.1548, 1.5760]])\n",
"tensor([[ 0.0911, -0.2005, -0.0017, 0.1419],\n",
" [-0.2133, 0.0853, 0.1264, -0.3480],\n",
" [ 0.5410, -0.2654, 0.2326, 0.2048]])\n",
"tensor([[0.0104, 0.3029, 0.5609, 0.3506],\n",
" [0.0372, 0.8807, 0.1419, 0.5344],\n",
" [0.1577, 0.1850, 0.3199, 0.6105]])\n",
"tensor([[2.3777, 0.6961, 0.9977, 1.2701],\n",
" [0.3478, 1.0951, 1.3966, 0.6240],\n",
" [3.5770, 0.5446, 1.5044, 1.2988]])\n"
]
}
],
"source": [
"# Działania \"element po elemencie\"\n",
"\n",
"print(A + B)\n",
"print(A - B)\n",
"print(A * B)\n",
"print(A / B)\n"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.6635, 0.5570],\n",
" [0.5902, 0.7807],\n",
" [0.6406, 0.7694]])\n"
]
}
],
"source": [
"# Mnożenie macierzowe\n",
"\n",
"print(torch.matmul(A, C))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Konwersja między PyTorch i NumPy"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n",
" [0.1138, 0.9820, 0.4452, 0.5775],\n",
" [0.7510, 0.3174, 0.6937, 0.8904]])\n",
"[[0.15715027 0.45915365 0.7480644 0.66733134]\n",
" [0.11377418 0.98203135 0.4451999 0.5774748 ]\n",
" [0.7509776 0.3174067 0.69367564 0.8904279 ]]\n"
]
}
],
"source": [
"# Konwersja z PyTorch do NumPy\n",
"\n",
"print(A)\n",
"\n",
"A_numpy = A.numpy()\n",
"print(A_numpy)\n"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.84580006 0.49270934 0.67969751 0.27546956 0.10600392]\n",
" [0.84610871 0.11680263 0.3535065 0.83725955 0.07995571]\n",
" [0.4586334 0.64818257 0.53201793 0.77786372 0.8584107 ]]\n",
"tensor([[0.8458, 0.4927, 0.6797, 0.2755, 0.1060],\n",
" [0.8461, 0.1168, 0.3535, 0.8373, 0.0800],\n",
" [0.4586, 0.6482, 0.5320, 0.7779, 0.8584]], dtype=torch.float64)\n"
]
}
],
"source": [
"# Konwersja z numpy do PyTorch\n",
"\n",
"X = np.random.rand(3, 5)\n",
"print(X)\n",
"\n",
"X_pytorch = torch.from_numpy(X)\n",
"print(X_pytorch)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Przydatne materiały\n",
"\n",
" * NumPy - dokumentacja: https://numpy.org/doc/stable\n",
" * PyTorch - dokumentacja: https://pytorch.org/docs/stable"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"livereveal": {
"start_slideshow_at": "selected",
"theme": "amu"
},
"vscode": {
"interpreter": {
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}