{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Uczenie maszynowe — laboratoria\n", "# 1a. Podstawowe narzędzia uczenia maszynowego" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Elementy języka Python przydatne w uczeniu maszynowym" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Listy składane (*List comprehension*)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Przypuśćmy, że mamy dane zdanie i chcemy utworzyć listę, która będzie zawierać długości kolejnych wyrazów tego zdania. Możemy to zrobić w następujący sposób:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n" ] } ], "source": [ "zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n", "wyrazy = zdanie.split()\n", "dlugosci_wyrazow = []\n", "for wyraz in wyrazy:\n", " dlugosci_wyrazow.append(len(wyraz))\n", " \n", "print(dlugosci_wyrazow)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Możemy to też zrobić bardziej „pythonicznie”, przy użyciu list składanych:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n" ] } ], "source": [ "zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n", "wyrazy = zdanie.split()\n", "dlugosci_wyrazow = [len(wyraz) for wyraz in wyrazy]\n", "\n", "print(dlugosci_wyrazow)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jeżeli chcemy, żeby był sprawdzany dodatkowy warunek, np. chcemy pomijać wyraz „takt”, to wciąż możemy użyć list składanych:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5, 4, 7, 3, 1, 3, 1, 7, 6, 4]\n" ] } ], "source": [ "zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n", "wyrazy = zdanie.split()\n", "\n", "# Ta konstrukcja:\n", "dlugosci_wyrazow = []\n", "for wyraz in wyrazy:\n", " if wyraz != 'takt':\n", " dlugosci_wyrazow.append(wyraz)\n", " \n", "# ...jest równoważna tej jednolinijkowej:\n", "dlugosci_wyrazow = [len(wyraz) for wyraz in wyrazy if wyraz != 'takt']\n", "\n", "print(dlugosci_wyrazow)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indeksowanie" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wszystkie listy i krotki w Pythonie, w tym łańcuchy (które trakowane są jak krotki znaków), są indeksowane od 0:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a\n", "e\n" ] } ], "source": [ "napis = 'abcde'\n", "print(napis[0]) # 'a'\n", "print(napis[4]) # 'e'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeksy możemy liczyć również „od końca”:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e\n", "d\n", "a\n" ] } ], "source": [ "napis = 'abcde'\n", "print(napis[-1]) # 'e' („ostatni”)\n", "print(napis[-2]) # 'd' („drugi od końca”)\n", "print(napis[-5]) # 'a' („piąty od końca”)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Łańcuchy możemy też „kroić na plasterki” (*slicing*):" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bcd\n", "b\n", "cd\n", "bcd\n", "de\n", "abc\n", "abcde\n" ] } ], "source": [ "napis = 'abcde'\n", "print(napis[1:4]) # 'bcd' („znaki od 1. włącznie do 4. wyłącznie”)\n", "print(napis[1:2]) # 'b' (to samo co `napis[1]`)\n", "print(napis[-3:-1]) # 'cd' (kroić można też stosując indeksowanie od końca)\n", "print(napis[1:-1]) # 'bcd' (możemy nawet mieszać te dwa sposoby indeksowania)\n", "print(napis[3:]) # 'de' (jeżeli koniec przedziału nie jest podany, to kroimy do samego końca łańcucha)\n", "print(napis[:3]) # 'abc' (jeżeli początek przedziału nie jest podany, to kroimy od początku łańcucha)\n", "print(napis[:]) # 'abcde' (kopia całego napisu)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Biblioteka _NumPy_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tablice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Głównym obiektem w NumPy jest **jednorodna**, **wielowymiarowa** tablica. Przykładem takiej tablicy jest macierz `x`.\n", "\n", "Macierz $x =\n", " \\begin{pmatrix}\n", " 1 & 2 & 3 \\\\\n", " 4 & 5 & 6 \\\\\n", " 7 & 8 & 9\n", " \\end{pmatrix}$\n", "można zapisać jako:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n" ] } ], "source": [ "import numpy as np\n", "\n", "x = np.array([[1,2,3],[4,5,6],[7,8,9]])\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Najczęsciej używane metody tablic typu `array`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 3)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([12, 15, 18])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.sum(axis=0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2., 5., 8.])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.mean(axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do tworzenia sekwencji liczbowych jako obiekty typu `array` należy wykorzystać funkcję `arange`." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(10)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5, 6, 7, 8, 9])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(5, 10)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(5, 10, 0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kształt tablicy można zmienić za pomocą metody `reshape`:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1 2 3 4 5 6 7 8 9 10 11 12]\n", "[[ 1 2 3 4]\n", " [ 5 6 7 8]\n", " [ 9 10 11 12]]\n" ] } ], "source": [ "x = np.arange(1, 13)\n", "print(x)\n", "y = x.reshape(3, 4)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Funkcją podobną do `arange` jest `linspace`, która wypełnia wektor określoną liczbą elementów z przedziału o równych automatycznie obliczonych odstępach (w `arange` należy podać rozmiar kroku):" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 1.25 2.5 3.75 5. ]\n" ] } ], "source": [ "x = np.linspace(0, 5, 5)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dodatkowe informacje o funkcjach NumPy uzyskuje się za pomocą polecenia `help(nazwa_funkcji)`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function shape in module numpy:\n", "\n", "shape(a)\n", " Return the shape of an array.\n", " \n", " Parameters\n", " ----------\n", " a : array_like\n", " Input array.\n", " \n", " Returns\n", " -------\n", " shape : tuple of ints\n", " The elements of the shape tuple give the lengths of the\n", " corresponding array dimensions.\n", " \n", " See Also\n", " --------\n", " alen\n", " ndarray.shape : Equivalent array method.\n", " \n", " Examples\n", " --------\n", " >>> np.shape(np.eye(3))\n", " (3, 3)\n", " >>> np.shape([[1, 2]])\n", " (1, 2)\n", " >>> np.shape([0])\n", " (1,)\n", " >>> np.shape(0)\n", " ()\n", " \n", " >>> a = np.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')])\n", " >>> np.shape(a)\n", " (2,)\n", " >>> a.shape\n", " (2,)\n", "\n" ] } ], "source": [ "help(np.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tablice mogą składać się z danych różnych typów (ale tylko jednego typu danych równocześnie, stąd jednorodność)." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1 2 3] - typ: int32\n", "[0.1 0.2 0.3] - typ: float64\n", "[1. 2. 3.] - typ: float64\n" ] } ], "source": [ "x = np.array([1, 2, 3])\n", "print(x, \"- typ: \", x.dtype)\n", "\n", "x = np.array([0.1, 0.2, 0.3])\n", "print(x, \"- typ: \", x.dtype)\n", "\n", "x = np.array([1, 2, 3], dtype='float64')\n", "print(x, \"- typ: \", x.dtype)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tworzenie tablic składających się z samych zer lub jedynek umożliwiają funkcje `zeros` oraz `ones`:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 0. 0. 0.]\n", " [0. 0. 0. 0.]\n", " [0. 0. 0. 0.]]\n" ] } ], "source": [ "x = np.zeros([3,4])\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 1. 1. 1.]\n", " [1. 1. 1. 1.]\n", " [1. 1. 1. 1.]]\n" ] } ], "source": [ "x = np.ones([3,4])\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Podstawowe operacje arytmetyczne" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Operatory arytmetyczne na tablicach w NumPy działają **element po elemencie**." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2. 3. 4.]\n" ] } ], "source": [ "import numpy as np\n", "\n", "a = np.array([3, 4, 5])\n", "b = np.ones(3)\n", "print(a - b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Za mnożenie macierzy odpowiadają funkcje `dot` i `matmul` (**nie** operator `*`):" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n" ] } ], "source": [ "a = np.array([[1, 2], [3, 4]])\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2]\n", " [3 4]]\n" ] } ], "source": [ "b = np.array([[1, 2], [3, 4]])\n", "print(b)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1, 4],\n", " [ 9, 16]])" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a * b # mnożenie element po elemencie" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 10],\n", " [15, 22]])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(a,b) # mnożenie macierzowe" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[ 7, 10],\n", " [15, 22]])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.matmul(a,b) # mnożenie macierzowe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Przykłady innych operacji dodawania i mnożenia:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[5., 5.],\n", " [5., 5.]])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.zeros((2, 2), dtype='float')\n", "a += 5\n", "a" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[25., 25.],\n", " [25., 25.]])" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a *= 5\n", "a" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[50., 50.],\n", " [50., 50.]])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sklejanie tablic:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3, 4, 5, 6, 7, 8, 9])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 2, 3])\n", "b = np.array([4, 5, 6])\n", "c = np.array([7, 8, 9])\n", "np.hstack([a, b, c])" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "array([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.vstack([a, b, c])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Typowe funkcje matematyczne:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3.14159265, 4.44288294, 5.44139809, 6.28318531])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(1, 5)\n", "np.sqrt(x) * np.pi" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2**4" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.power(2, 4)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.log(np.e)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(5)\n", "x.max() - x.min()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indeksy i zakresy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tablice jednowymiarowe zachowują sie podobnie do zwykłych list pythonowych." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.arange(10)\n", "a[2:4]" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 2, 4, 6, 8])" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[:10:2]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[::-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tablice wielowymiarowe mają po jednym indeksie na wymiar:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0, 1, 2, 3],\n", " [ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.arange(12).reshape(3, 4)\n", "x" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[2, 3]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 9])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[:, 1]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([4, 5, 6, 7])" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1, :]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 4, 5, 6, 7],\n", " [ 8, 9, 10, 11]])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[1:3, :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Warunki" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Warunki pozwalają na selekcję elementów tablicy." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 2, 2, 3, 3, 3])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3])\n", "a[a > 1]" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([3, 3, 3])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a[a == 3]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0, 1, 2, 3, 4, 5], dtype=int64),)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(a < 3)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 1, 2, 3, 4, 5], dtype=int64)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(a < 3)[0]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([], dtype=int64),)" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.where(a > 9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pętle i wypisywanie" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0 1 2 3]\n", "[4 5 6 7]\n", "[ 8 9 10 11]\n" ] } ], "source": [ "for row in x:\n", " print(row)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8\n", "9\n", "10\n", "11\n" ] } ], "source": [ "for element in x.flat:\n", " print(element) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Liczby losowe" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 0, 7, 3, 5])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.randint(0, 10, 5)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([-0.7907838 , -0.65971486, 0.0375355 , 2.00045956, 0.32631216])" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.normal(0, 1, 5) " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1.50130054, 1.20710594, 0.45451505, 0.70098876, 0.90371663])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.uniform(0, 2, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Macierze" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy jest pakietem wykorzystywanym do obliczeń w dziedzinie algebry liniowej, co jeszcze szczególnie przydatne w uczeniu maszynowym. \n", "\n", "Wektor o wymiarach $1 \\times N$ \n", "$$\n", " x =\n", " \\begin{pmatrix}\n", " x_{1} \\\\\n", " x_{2} \\\\\n", " \\vdots \\\\\n", " x_{N}\n", " \\end{pmatrix} \n", "$$\n", "\n", "i jego transpozycję $x^\\top = (x_{1}, x_{2},\\ldots,x_{N})$ można wyrazić w Pythonie w następujący sposób:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3, 1)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "x = np.array([[1, 2, 3]]).T\n", "x.shape" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 3)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xt = x.T\n", "xt.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Macierz kolumnowa** w NumPy.\n", "$$X =\n", " \\begin{pmatrix}\n", " 3 \\\\\n", " 4 \\\\\n", " 5 \\\\\n", " 6 \n", " \\end{pmatrix}$$" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3],\n", " [4],\n", " [5],\n", " [6]])" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[3,4,5,6]]).T\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Macierz wierszowa** w NumPy.\n", "$$ X =\n", " \\begin{pmatrix}\n", " 3 & 4 & 5 & 6\n", " \\end{pmatrix}$$" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[3, 4, 5, 6]])" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[3,4,5,6]])\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oprócz obiektów typu `array` istnieje wyspecjalizowany obiekt `matrix`, dla którego operacje `*` (mnożenie) oraz `**-1` (odwracanie) są określone w sposób właściwy dla macierzy (w przeciwieństwie do operacji elementowych dla obiektów `array`)." ] }, { "cell_type": "code", "execution_count": 158, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1 2 3]\n", " [4 5 6]\n", " [7 8 9]]\n" ] } ], "source": [ "x = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3)\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4 6 3]\n", " [8 7 1]\n", " [3 0 3]]\n" ] } ], "source": [ "y = np.array([4,6,3,8,7,1,3,0,3]).reshape(3,3)\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 160, "metadata": {}, "outputs": [], "source": [ "X = np.matrix(x)\n", "Y = np.matrix(y)" ] }, { "cell_type": "code", "execution_count": 161, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 4 12 9]\n", " [32 35 6]\n", " [21 0 27]]\n" ] } ], "source": [ "print(x * y) # Tablice np.array mnożone są element po elemencie" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 29 20 14]\n", " [ 74 59 35]\n", " [119 98 56]]\n" ] } ], "source": [ "print(X * Y) # Macierze np.matrix mnożone są macierzowo" ] }, { "cell_type": "code", "execution_count": 164, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 29 20 14]\n", " [ 74 59 35]\n", " [119 98 56]]\n" ] } ], "source": [ "print(np.matmul(x, y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Wyznacznik macierzy**" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "33.000000000000014" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.array([[3,-9],[2,5]])\n", "np.linalg.det(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Macierz odwrotna**" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-4, -2],\n", " [ 5, 5]])" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = np.array([[-4,-2],[5,5]])\n", "A" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.5, -0.2],\n", " [ 0.5, 0.4]])" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invA = np.linalg.inv(A)\n", "invA" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0.],\n", " [0., 1.]])" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.round(np.dot(A, invA))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(ponieważ $AA^{-1} = A^{-1}A = I$)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Wartości i wektory własne**" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1, 0, 0],\n", " [0, 2, 0],\n", " [0, 0, 3]])" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = np.diag((1, 2, 3))\n", "a" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 2. 3.]\n", "[[1. 0. 0.]\n", " [0. 1. 0.]\n", " [0. 0. 1.]]\n" ] } ], "source": [ "w, v = np.linalg.eig(a)\n", "print(w) # wartości własne\n", "print(v) # wektory własne" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Biblioteka PyTorch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Biblioteka PyTorch została stworzona z myślą o uczeniu maszynowym. Oprócz wykonywania rozmaitych działań matematycznych takich jak te, które można wykonywać w bibliotece NumPy, dostarcza metod przydatnych w uczeniu maszynowym, z których chyba najbardziej charakterystyczną jest automatyczne różniczkowanie (moduł `autograd`).\n", "\n", "Ale o tym później." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Instalacja\n", "\n", " pip install torch torchvision\n", "\n", "lub\n", "\n", " conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tensory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Podstawowym typem danych dla pakietu `pytorch` jest tensor (`torch.tensor`). Tensor to uogólnienie macierzy na dowolną liczbę wymiarów. Można powiedzieć, że macierze są dwuwymiarowymi tensorami." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]])\n" ] } ], "source": [ "import torch\n", "\n", "x = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Operacje na tensorach" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Działania na tensorach w bibliotece PyTorch wykonuje się bardzo podobnie do działań na miacierzach w bibliotece NumPy. Czasami nazwy metod się trochę różnią." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([3, 3])\n", "torch.Size([3, 3])\n" ] } ], "source": [ "# Wymiary (rozmiar) tensora\n", "\n", "print(x.shape)\n", "print(x.size()) # Można użyć `size()` zamiast `shape`" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([1, 2, 3]) - type: torch.int64\n", "tensor([0.1000, 0.2000, 0.3000]) - type: torch.float32\n", "tensor([1., 2., 3.], dtype=torch.float64) - type: torch.float64\n" ] } ], "source": [ "# Typy elementów\n", "\n", "x = torch.tensor([1, 2, 3])\n", "print(x, \"- type:\", x.dtype)\n", "\n", "x = torch.tensor([0.1, 0.2, 0.3])\n", "print(x, \"- type:\", x.dtype)\n", "\n", "x = torch.tensor([1, 2, 3], dtype=torch.float64) # Uwaga: inaczej niż w NumPy\n", "print(x, \"- type:\", x.dtype)\n" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0., 0., 0., 0.],\n", " [0., 0., 0., 0.],\n", " [0., 0., 0., 0.]])\n" ] } ], "source": [ "x = torch.zeros([3,4])\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1., 1., 1., 1.],\n", " [1., 1., 1., 1.],\n", " [1., 1., 1., 1.]])\n" ] } ], "source": [ "x = torch.ones([3,4])\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.9863, 0.9173, 0.5301, 0.4279],\n", " [0.7708, 0.4671, 0.2965, 0.0578],\n", " [0.6684, 0.4432, 0.9817, 0.1521]])\n" ] } ], "source": [ "x = torch.rand([3,4])\n", "print(x)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Wiersz 0:\n", "0.9863188862800598\n", "0.917273998260498\n", "0.53009432554245\n", "0.42788761854171753\n", "\n", "Wiersz 1:\n", "0.7708230018615723\n", "0.46713775396347046\n", "0.2964947819709778\n", "0.057803571224212646\n", "\n", "Wiersz 2:\n", "0.6684107780456543\n", "0.4432327151298523\n", "0.9817106127738953\n", "0.15205740928649902\n" ] } ], "source": [ "# Iterowanie po elementach tensora\n", "\n", "for i, row in enumerate(x):\n", " print(f\"\\nWiersz {i}:\")\n", " for element in row:\n", " print(element.item()) # `item()` zamienia jednoelementowy (bezwymiarowy) tensor na liczbę" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n", " [0.1138, 0.9820, 0.4452, 0.5775],\n", " [0.7510, 0.3174, 0.6937, 0.8904]])\n", "tensor([[0.0661, 0.6596, 0.7498, 0.5254],\n", " [0.3271, 0.8968, 0.3188, 0.9255],\n", " [0.2099, 0.5828, 0.4611, 0.6856]])\n", "tensor([[0.0738, 0.3776],\n", " [0.2646, 0.5449],\n", " [0.6779, 0.0567],\n", " [0.0348, 0.3072]])\n" ] } ], "source": [ "# Przykładowe macierze\n", "\n", "A = torch.rand([3, 4])\n", "print(A)\n", "\n", "B = torch.rand([3, 4])\n", "print(B)\n", "\n", "C = torch.rand([4, 2])\n", "print(C)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.2232, 1.1188, 1.4978, 1.1928],\n", " [0.4409, 1.8788, 0.7640, 1.5029],\n", " [0.9609, 0.9002, 1.1548, 1.5760]])\n", "tensor([[ 0.0911, -0.2005, -0.0017, 0.1419],\n", " [-0.2133, 0.0853, 0.1264, -0.3480],\n", " [ 0.5410, -0.2654, 0.2326, 0.2048]])\n", "tensor([[0.0104, 0.3029, 0.5609, 0.3506],\n", " [0.0372, 0.8807, 0.1419, 0.5344],\n", " [0.1577, 0.1850, 0.3199, 0.6105]])\n", "tensor([[2.3777, 0.6961, 0.9977, 1.2701],\n", " [0.3478, 1.0951, 1.3966, 0.6240],\n", " [3.5770, 0.5446, 1.5044, 1.2988]])\n" ] } ], "source": [ "# Działania \"element po elemencie\"\n", "\n", "print(A + B)\n", "print(A - B)\n", "print(A * B)\n", "print(A / B)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.6635, 0.5570],\n", " [0.5902, 0.7807],\n", " [0.6406, 0.7694]])\n" ] } ], "source": [ "# Mnożenie macierzowe\n", "\n", "print(torch.matmul(A, C))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Konwersja między PyTorch i NumPy" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n", " [0.1138, 0.9820, 0.4452, 0.5775],\n", " [0.7510, 0.3174, 0.6937, 0.8904]])\n", "[[0.15715027 0.45915365 0.7480644 0.66733134]\n", " [0.11377418 0.98203135 0.4451999 0.5774748 ]\n", " [0.7509776 0.3174067 0.69367564 0.8904279 ]]\n" ] } ], "source": [ "# Konwersja z PyTorch do NumPy\n", "\n", "print(A)\n", "\n", "A_numpy = A.numpy()\n", "print(A_numpy)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.84580006 0.49270934 0.67969751 0.27546956 0.10600392]\n", " [0.84610871 0.11680263 0.3535065 0.83725955 0.07995571]\n", " [0.4586334 0.64818257 0.53201793 0.77786372 0.8584107 ]]\n", "tensor([[0.8458, 0.4927, 0.6797, 0.2755, 0.1060],\n", " [0.8461, 0.1168, 0.3535, 0.8373, 0.0800],\n", " [0.4586, 0.6482, 0.5320, 0.7779, 0.8584]], dtype=torch.float64)\n" ] } ], "source": [ "# Konwersja z numpy do PyTorch\n", "\n", "X = np.random.rand(3, 5)\n", "print(X)\n", "\n", "X_pytorch = torch.from_numpy(X)\n", "print(X_pytorch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Przydatne materiały\n", "\n", " * NumPy - dokumentacja: https://numpy.org/doc/stable\n", " * PyTorch - dokumentacja: https://pytorch.org/docs/stable" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "livereveal": { "start_slideshow_at": "selected", "theme": "amu" } }, "nbformat": 4, "nbformat_minor": 4 }