zuma/lab/1a_Podstawowe_narzędzia.ipynb

2243 lines
43 KiB
Plaintext
Raw Normal View History

2021-03-17 20:43:37 +01:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
2021-03-26 19:26:50 +01:00
"### Uczenie maszynowe — laboratoria\n",
"# 1a. Podstawowe narzędzia uczenia maszynowego"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Elementy języka Python przydatne w uczeniu maszynowym"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Listy składane (*List comprehension*)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Przypuśćmy, że mamy dane zdanie i chcemy utworzyć listę, która będzie zawierać długości kolejnych wyrazów tego zdania. Możemy to zrobić w następujący sposób:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n",
"wyrazy = zdanie.split()\n",
"dlugosci_wyrazow = []\n",
"for wyraz in wyrazy:\n",
" dlugosci_wyrazow.append(len(wyraz))\n",
" \n",
"print(dlugosci_wyrazow)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Możemy to też zrobić bardziej „pythonicznie”, przy użyciu list składanych:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 4, 1, 4, 3, 4, 1, 4, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n",
"wyrazy = zdanie.split()\n",
"dlugosci_wyrazow = [len(wyraz) for wyraz in wyrazy]\n",
"\n",
"print(dlugosci_wyrazow)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jeżeli chcemy, żeby był sprawdzany dodatkowy warunek, np. chcemy pomijać wyraz „takt”, to wciąż możemy użyć list składanych:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5, 4, 7, 3, 1, 3, 1, 7, 6, 4]\n"
]
}
],
"source": [
"zdanie = 'tracz tarł tarcicę tak takt w takt jak takt w takt tarcicę tartak tarł'\n",
"wyrazy = zdanie.split()\n",
"\n",
"# Ta konstrukcja:\n",
"dlugosci_wyrazow = []\n",
"for wyraz in wyrazy:\n",
" if wyraz != 'takt':\n",
" dlugosci_wyrazow.append(wyraz)\n",
" \n",
"# ...jest równoważna tej jednolinijkowej:\n",
"dlugosci_wyrazow = [len(wyraz) for wyraz in wyrazy if wyraz != 'takt']\n",
"\n",
"print(dlugosci_wyrazow)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indeksowanie"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wszystkie listy i krotki w Pythonie, w tym łańcuchy (które trakowane są jak krotki znaków), są indeksowane od 0:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a\n",
"e\n"
]
}
],
"source": [
"napis = 'abcde'\n",
"print(napis[0]) # 'a'\n",
"print(napis[4]) # 'e'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indeksy możemy liczyć również „od końca”:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"e\n",
"d\n",
"a\n"
]
}
],
"source": [
"napis = 'abcde'\n",
"print(napis[-1]) # 'e' („ostatni”)\n",
"print(napis[-2]) # 'd' („drugi od końca”)\n",
"print(napis[-5]) # 'a' („piąty od końca”)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Łańcuchy możemy też „kroić na plasterki” (*slicing*):"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bcd\n",
"b\n",
"cd\n",
"bcd\n",
"de\n",
"abc\n",
"abcde\n"
]
}
],
"source": [
"napis = 'abcde'\n",
"print(napis[1:4]) # 'bcd' („znaki od 1. włącznie do 4. wyłącznie”)\n",
"print(napis[1:2]) # 'b' (to samo co `napis[1]`)\n",
"print(napis[-3:-1]) # 'cd' (kroić można też stosując indeksowanie od końca)\n",
"print(napis[1:-1]) # 'bcd' (możemy nawet mieszać te dwa sposoby indeksowania)\n",
"print(napis[3:]) # 'de' (jeżeli koniec przedziału nie jest podany, to kroimy do samego końca łańcucha)\n",
"print(napis[:3]) # 'abc' (jeżeli początek przedziału nie jest podany, to kroimy od początku łańcucha)\n",
"print(napis[:]) # 'abcde' (kopia całego napisu)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Biblioteka _NumPy_"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tablice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Głównym obiektem w NumPy jest **jednorodna**, **wielowymiarowa** tablica. Przykładem takiej tablicy jest macierz `x`.\n",
"\n",
"Macierz $x =\n",
" \\begin{pmatrix}\n",
" 1 & 2 & 3 \\\\\n",
" 4 & 5 & 6 \\\\\n",
" 7 & 8 & 9\n",
" \\end{pmatrix}$\n",
"można zapisać jako:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 5 6]\n",
" [7 8 9]]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"x = np.array([[1,2,3],[4,5,6],[7,8,9]])\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Najczęsciej używane metody tablic typu `array`:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 3)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x.shape"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([12, 15, 18])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x.sum(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2., 5., 8.])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x.mean(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Do tworzenia sekwencji liczbowych jako obiekty typu `array` należy wykorzystać funkcję `arange`."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(10)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5, 6, 7, 8, 9])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(5, 10)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5. , 5.5, 6. , 6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(5, 10, 0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kształt tablicy można zmienić za pomocą metody `reshape`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1 2 3 4 5 6 7 8 9 10 11 12]\n",
"[[ 1 2 3 4]\n",
" [ 5 6 7 8]\n",
" [ 9 10 11 12]]\n"
]
}
],
"source": [
"x = np.arange(1, 13)\n",
"print(x)\n",
"y = x.reshape(3, 4)\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Funkcją podobną do `arange` jest `linspace`, która wypełnia wektor określoną liczbą elementów z przedziału o równych automatycznie obliczonych odstępach (w `arange` należy podać rozmiar kroku):"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0. 1.25 2.5 3.75 5. ]\n"
]
}
],
"source": [
"x = np.linspace(0, 5, 5)\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dodatkowe informacje o funkcjach NumPy uzyskuje się za pomocą polecenia `help(nazwa_funkcji)`:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function shape in module numpy:\n",
"\n",
"shape(a)\n",
" Return the shape of an array.\n",
" \n",
" Parameters\n",
" ----------\n",
" a : array_like\n",
" Input array.\n",
" \n",
" Returns\n",
" -------\n",
" shape : tuple of ints\n",
" The elements of the shape tuple give the lengths of the\n",
" corresponding array dimensions.\n",
" \n",
" See Also\n",
" --------\n",
" alen\n",
" ndarray.shape : Equivalent array method.\n",
" \n",
" Examples\n",
" --------\n",
" >>> np.shape(np.eye(3))\n",
" (3, 3)\n",
" >>> np.shape([[1, 2]])\n",
" (1, 2)\n",
" >>> np.shape([0])\n",
" (1,)\n",
" >>> np.shape(0)\n",
" ()\n",
" \n",
" >>> a = np.array([(1, 2), (3, 4)], dtype=[('x', 'i4'), ('y', 'i4')])\n",
" >>> np.shape(a)\n",
" (2,)\n",
" >>> a.shape\n",
" (2,)\n",
"\n"
]
}
],
"source": [
"help(np.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice mogą składać się z danych różnych typów (ale tylko jednego typu danych równocześnie, stąd jednorodność)."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 3] - typ: int32\n",
"[0.1 0.2 0.3] - typ: float64\n",
"[1. 2. 3.] - typ: float64\n"
]
}
],
"source": [
"x = np.array([1, 2, 3])\n",
"print(x, \"- typ: \", x.dtype)\n",
"\n",
"x = np.array([0.1, 0.2, 0.3])\n",
"print(x, \"- typ: \", x.dtype)\n",
"\n",
"x = np.array([1, 2, 3], dtype='float64')\n",
"print(x, \"- typ: \", x.dtype)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tworzenie tablic składających się z samych zer lub jedynek umożliwiają funkcje `zeros` oraz `ones`:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 0. 0. 0.]\n",
" [0. 0. 0. 0.]\n",
" [0. 0. 0. 0.]]\n"
]
}
],
"source": [
"x = np.zeros([3,4])\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1. 1. 1. 1.]\n",
" [1. 1. 1. 1.]\n",
" [1. 1. 1. 1.]]\n"
]
}
],
"source": [
"x = np.ones([3,4])\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Podstawowe operacje arytmetyczne"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Operatory arytmetyczne na tablicach w NumPy działają **element po elemencie**."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2. 3. 4.]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"a = np.array([3, 4, 5])\n",
"b = np.ones(3)\n",
"print(a - b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Za mnożenie macierzy odpowiadają funkcje `dot` i `matmul` (**nie** operator `*`):"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2]\n",
" [3 4]]\n"
]
}
],
"source": [
"a = np.array([[1, 2], [3, 4]])\n",
"print(a)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2]\n",
" [3 4]]\n"
]
}
],
"source": [
"b = np.array([[1, 2], [3, 4]])\n",
"print(b)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1, 4],\n",
" [ 9, 16]])"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a * b # mnożenie element po elemencie"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dot(a,b) # mnożenie macierzowe"
]
},
{
"cell_type": "code",
"execution_count": 32,
2021-03-19 19:33:27 +01:00
"metadata": {
"scrolled": true
},
2021-03-17 20:43:37 +01:00
"outputs": [
{
"data": {
"text/plain": [
"array([[ 7, 10],\n",
" [15, 22]])"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.matmul(a,b) # mnożenie macierzowe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Przykłady innych operacji dodawania i mnożenia:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[5., 5.],\n",
" [5., 5.]])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.zeros((2, 2), dtype='float')\n",
"a += 5\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[25., 25.],\n",
" [25., 25.]])"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a *= 5\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[50., 50.],\n",
" [50., 50.]])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a + a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sklejanie tablic:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1, 2, 3])\n",
"b = np.array([4, 5, 6])\n",
"c = np.array([7, 8, 9])\n",
"np.hstack([a, b, c])"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9]])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.vstack([a, b, c])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Typowe funkcje matematyczne:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3.14159265, 4.44288294, 5.44139809, 6.28318531])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(1, 5)\n",
"np.sqrt(x) * np.pi"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2**4"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.power(2, 4)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.log(np.e)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(5)\n",
"x.max() - x.min()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indeksy i zakresy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice jednowymiarowe zachowują sie podobnie do zwykłych list pythonowych."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(10)\n",
"a[2:4]"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 2, 4, 6, 8])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[:10:2]"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tablice wielowymiarowe mają po jednym indeksie na wymiar:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.arange(12).reshape(3, 4)\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"11"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[2, 3]"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 5, 9])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[:, 1]"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 5, 6, 7])"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1, :]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x[1:3, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Warunki"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Warunki pozwalają na selekcję elementów tablicy."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 2, 2, 3, 3, 3])"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3])\n",
"a[a > 1]"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 3, 3])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a[a == 3]"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([0, 1, 2, 3, 4, 5], dtype=int64),)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a < 3)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4, 5], dtype=int64)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a < 3)[0]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(array([], dtype=int64),)"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.where(a > 9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pętle i wypisywanie"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3]\n",
"[4 5 6 7]\n",
"[ 8 9 10 11]\n"
]
}
],
"source": [
"for row in x:\n",
" print(row)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8\n",
"9\n",
"10\n",
"11\n"
]
}
],
"source": [
"for element in x.flat:\n",
" print(element) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Liczby losowe"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 0, 7, 3, 5])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.randint(0, 10, 5)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([-0.7907838 , -0.65971486, 0.0375355 , 2.00045956, 0.32631216])"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.normal(0, 1, 5) "
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1.50130054, 1.20710594, 0.45451505, 0.70098876, 0.90371663])"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.uniform(0, 2, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Macierze"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy jest pakietem wykorzystywanym do obliczeń w dziedzinie algebry liniowej, co jeszcze szczególnie przydatne w uczeniu maszynowym. \n",
"\n",
"Wektor o wymiarach $1 \\times N$ \n",
"$$\n",
" x =\n",
" \\begin{pmatrix}\n",
" x_{1} \\\\\n",
" x_{2} \\\\\n",
" \\vdots \\\\\n",
" x_{N}\n",
" \\end{pmatrix} \n",
"$$\n",
"\n",
"i jego transpozycję $x^\\top = (x_{1}, x_{2},\\ldots,x_{N})$ można wyrazić w Pythonie w następujący sposób:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 1)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"x = np.array([[1, 2, 3]]).T\n",
"x.shape"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(1, 3)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xt = x.T\n",
"xt.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz kolumnowa** w NumPy.\n",
"$$X =\n",
" \\begin{pmatrix}\n",
" 3 \\\\\n",
" 4 \\\\\n",
" 5 \\\\\n",
" 6 \n",
" \\end{pmatrix}$$"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3],\n",
" [4],\n",
" [5],\n",
" [6]])"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([[3,4,5,6]]).T\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz wierszowa** w NumPy.\n",
"$$ X =\n",
" \\begin{pmatrix}\n",
" 3 & 4 & 5 & 6\n",
" \\end{pmatrix}$$"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[3, 4, 5, 6]])"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x = np.array([[3,4,5,6]])\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oprócz obiektów typu `array` istnieje wyspecjalizowany obiekt `matrix`, dla którego operacje `*` (mnożenie) oraz `**-1` (odwracanie) są określone w sposób właściwy dla macierzy (w przeciwieństwie do operacji elementowych dla obiektów `array`)."
]
},
{
"cell_type": "code",
2021-03-19 19:33:27 +01:00
"execution_count": 158,
2021-03-17 20:43:37 +01:00
"metadata": {},
"outputs": [
{
2021-03-19 19:33:27 +01:00
"name": "stdout",
"output_type": "stream",
"text": [
"[[1 2 3]\n",
" [4 5 6]\n",
" [7 8 9]]\n"
]
2021-03-17 20:43:37 +01:00
}
],
"source": [
"x = np.array([1,2,3,4,5,6,7,8,9]).reshape(3,3)\n",
2021-03-19 19:33:27 +01:00
"print(x)"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "code",
2021-03-19 19:33:27 +01:00
"execution_count": 159,
2021-03-17 20:43:37 +01:00
"metadata": {},
"outputs": [
{
2021-03-19 19:33:27 +01:00
"name": "stdout",
"output_type": "stream",
"text": [
"[[4 6 3]\n",
" [8 7 1]\n",
" [3 0 3]]\n"
]
2021-03-17 20:43:37 +01:00
}
],
2021-03-19 19:33:27 +01:00
"source": [
"y = np.array([4,6,3,8,7,1,3,0,3]).reshape(3,3)\n",
"print(y)"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [],
2021-03-17 20:43:37 +01:00
"source": [
"X = np.matrix(x)\n",
2021-03-19 19:33:27 +01:00
"Y = np.matrix(y)"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 4 12 9]\n",
" [32 35 6]\n",
" [21 0 27]]\n"
]
}
],
"source": [
"print(x * y) # Tablice np.array mnożone są element po elemencie"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 29 20 14]\n",
" [ 74 59 35]\n",
" [119 98 56]]\n"
]
}
],
"source": [
"print(X * Y) # Macierze np.matrix mnożone są macierzowo"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 29 20 14]\n",
" [ 74 59 35]\n",
" [119 98 56]]\n"
]
}
],
"source": [
"print(np.matmul(x, y))"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Wyznacznik macierzy**"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"33.000000000000014"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.array([[3,-9],[2,5]])\n",
"np.linalg.det(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Macierz odwrotna**"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-4, -2],\n",
" [ 5, 5]])"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = np.array([[-4,-2],[5,5]])\n",
"A"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.5, -0.2],\n",
" [ 0.5, 0.4]])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"invA = np.linalg.inv(A)\n",
"invA"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1., 0.],\n",
" [0., 1.]])"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.round(np.dot(A, invA))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(ponieważ $AA^{-1} = A^{-1}A = I$)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Wartości i wektory własne**"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 0, 0],\n",
" [0, 2, 0],\n",
" [0, 0, 3]])"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.diag((1, 2, 3))\n",
"a"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1. 2. 3.]\n",
"[[1. 0. 0.]\n",
" [0. 1. 0.]\n",
" [0. 0. 1.]]\n"
]
}
],
"source": [
"w, v = np.linalg.eig(a)\n",
"print(w) # wartości własne\n",
"print(v) # wektory własne"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Biblioteka PyTorch"
2021-03-17 20:43:37 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Biblioteka PyTorch została stworzona z myślą o uczeniu maszynowym. Oprócz wykonywania rozmaitych działań matematycznych takich jak te, które można wykonywać w bibliotece NumPy, dostarcza metod przydatnych w uczeniu maszynowym, z których chyba najbardziej charakterystyczną jest automatyczne różniczkowanie (moduł `autograd`).\n",
"\n",
"Ale o tym później."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Instalacja\n",
"\n",
" pip install torch torchvision\n",
"\n",
"lub\n",
"\n",
" conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tensory"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Podstawowym typem danych dla pakietu `pytorch` jest tensor (`torch.tensor`). Tensor to uogólnienie macierzy na dowolną liczbę wymiarów. Można powiedzieć, że macierze są dwuwymiarowymi tensorami."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9]])\n"
]
}
],
"source": [
"import torch\n",
"\n",
"x = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Operacje na tensorach"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Działania na tensorach w bibliotece PyTorch wykonuje się bardzo podobnie do działań na miacierzach w bibliotece NumPy. Czasami nazwy metod się trochę różnią."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([3, 3])\n",
"torch.Size([3, 3])\n"
]
}
],
"source": [
"# Wymiary (rozmiar) tensora\n",
"\n",
"print(x.shape)\n",
"print(x.size()) # Można użyć `size()` zamiast `shape`"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([1, 2, 3]) - type: torch.int64\n",
"tensor([0.1000, 0.2000, 0.3000]) - type: torch.float32\n",
"tensor([1., 2., 3.], dtype=torch.float64) - type: torch.float64\n"
]
}
],
"source": [
"# Typy elementów\n",
"\n",
"x = torch.tensor([1, 2, 3])\n",
"print(x, \"- type:\", x.dtype)\n",
"\n",
"x = torch.tensor([0.1, 0.2, 0.3])\n",
"print(x, \"- type:\", x.dtype)\n",
"\n",
"x = torch.tensor([1, 2, 3], dtype=torch.float64) # Uwaga: inaczej niż w NumPy\n",
"print(x, \"- type:\", x.dtype)\n"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0., 0., 0., 0.],\n",
" [0., 0., 0., 0.],\n",
" [0., 0., 0., 0.]])\n"
]
}
],
"source": [
"x = torch.zeros([3,4])\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1., 1., 1., 1.],\n",
" [1., 1., 1., 1.],\n",
" [1., 1., 1., 1.]])\n"
]
}
],
"source": [
"x = torch.ones([3,4])\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.9863, 0.9173, 0.5301, 0.4279],\n",
" [0.7708, 0.4671, 0.2965, 0.0578],\n",
" [0.6684, 0.4432, 0.9817, 0.1521]])\n"
]
}
],
"source": [
"x = torch.rand([3,4])\n",
"print(x)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Wiersz 0:\n",
"0.9863188862800598\n",
"0.917273998260498\n",
"0.53009432554245\n",
"0.42788761854171753\n",
"\n",
"Wiersz 1:\n",
"0.7708230018615723\n",
"0.46713775396347046\n",
"0.2964947819709778\n",
"0.057803571224212646\n",
"\n",
"Wiersz 2:\n",
"0.6684107780456543\n",
"0.4432327151298523\n",
"0.9817106127738953\n",
"0.15205740928649902\n"
]
}
],
"source": [
"# Iterowanie po elementach tensora\n",
"\n",
"for i, row in enumerate(x):\n",
" print(f\"\\nWiersz {i}:\")\n",
" for element in row:\n",
" print(element.item()) # `item()` zamienia jednoelementowy (bezwymiarowy) tensor na liczbę"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n",
" [0.1138, 0.9820, 0.4452, 0.5775],\n",
" [0.7510, 0.3174, 0.6937, 0.8904]])\n",
"tensor([[0.0661, 0.6596, 0.7498, 0.5254],\n",
" [0.3271, 0.8968, 0.3188, 0.9255],\n",
" [0.2099, 0.5828, 0.4611, 0.6856]])\n",
"tensor([[0.0738, 0.3776],\n",
" [0.2646, 0.5449],\n",
" [0.6779, 0.0567],\n",
" [0.0348, 0.3072]])\n"
]
}
],
"source": [
"# Przykładowe macierze\n",
"\n",
"A = torch.rand([3, 4])\n",
"print(A)\n",
"\n",
"B = torch.rand([3, 4])\n",
"print(B)\n",
"\n",
"C = torch.rand([4, 2])\n",
"print(C)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.2232, 1.1188, 1.4978, 1.1928],\n",
" [0.4409, 1.8788, 0.7640, 1.5029],\n",
" [0.9609, 0.9002, 1.1548, 1.5760]])\n",
"tensor([[ 0.0911, -0.2005, -0.0017, 0.1419],\n",
" [-0.2133, 0.0853, 0.1264, -0.3480],\n",
" [ 0.5410, -0.2654, 0.2326, 0.2048]])\n",
"tensor([[0.0104, 0.3029, 0.5609, 0.3506],\n",
" [0.0372, 0.8807, 0.1419, 0.5344],\n",
" [0.1577, 0.1850, 0.3199, 0.6105]])\n",
"tensor([[2.3777, 0.6961, 0.9977, 1.2701],\n",
" [0.3478, 1.0951, 1.3966, 0.6240],\n",
" [3.5770, 0.5446, 1.5044, 1.2988]])\n"
]
}
],
"source": [
"# Działania \"element po elemencie\"\n",
"\n",
"print(A + B)\n",
"print(A - B)\n",
"print(A * B)\n",
"print(A / B)"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.6635, 0.5570],\n",
" [0.5902, 0.7807],\n",
" [0.6406, 0.7694]])\n"
]
}
],
"source": [
"# Mnożenie macierzowe\n",
"\n",
"print(torch.matmul(A, C))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Konwersja między PyTorch i NumPy"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.1572, 0.4592, 0.7481, 0.6673],\n",
" [0.1138, 0.9820, 0.4452, 0.5775],\n",
" [0.7510, 0.3174, 0.6937, 0.8904]])\n",
"[[0.15715027 0.45915365 0.7480644 0.66733134]\n",
" [0.11377418 0.98203135 0.4451999 0.5774748 ]\n",
" [0.7509776 0.3174067 0.69367564 0.8904279 ]]\n"
]
}
],
"source": [
"# Konwersja z PyTorch do NumPy\n",
"\n",
"print(A)\n",
"\n",
"A_numpy = A.numpy()\n",
"print(A_numpy)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0.84580006 0.49270934 0.67969751 0.27546956 0.10600392]\n",
" [0.84610871 0.11680263 0.3535065 0.83725955 0.07995571]\n",
" [0.4586334 0.64818257 0.53201793 0.77786372 0.8584107 ]]\n",
"tensor([[0.8458, 0.4927, 0.6797, 0.2755, 0.1060],\n",
" [0.8461, 0.1168, 0.3535, 0.8373, 0.0800],\n",
" [0.4586, 0.6482, 0.5320, 0.7779, 0.8584]], dtype=torch.float64)\n"
]
}
],
"source": [
"# Konwersja z numpy do PyTorch\n",
"\n",
"X = np.random.rand(3, 5)\n",
"print(X)\n",
"\n",
"X_pytorch = torch.from_numpy(X)\n",
"print(X_pytorch)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Przydatne materiały\n",
"\n",
" * NumPy - dokumentacja: https://numpy.org/doc/stable\n",
" * PyTorch - dokumentacja: https://pytorch.org/docs/stable"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
},
"livereveal": {
"start_slideshow_at": "selected",
"theme": "amu"
}
},
"nbformat": 4,
"nbformat_minor": 4
}