{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Uczenie maszynowe\n", "# 3. Regresja liniowa – część 2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 3.1. Regresja liniowa wielu zmiennych" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Do przewidywania wartości $y$ możemy użyć więcej niż jednej cechy $x$:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Przykład – ceny mieszkań" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " y:price x1:isNew x2:rooms x3:floor x4:location x5:sqrMetres\n", "1 476118.0 False 3 1 Centrum 78\n", "2 459531.0 False 3 2 Sołacz 62\n", "3 411557.0 False 3 0 Sołacz 15\n", "4 496416.0 False 4 0 Sołacz 14\n", "5 406032.0 False 3 0 Sołacz 15\n", "... ... ... ... ... ... ...\n", "1335 349000.0 False 4 0 Szczepankowo 29\n", "1336 399000.0 False 5 0 Szczepankowo 68\n", "1337 234000.0 True 2 7 Wilda 50\n", "1338 210000.0 True 2 1 Wilda 65\n", "1339 279000.0 True 2 2 Łazarz 36\n", "\n", "[1339 rows x 6 columns]\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "data = pd.read_csv(\"data02_train.tsv\", sep=\"\\t\")\n", "data.rename(columns={col: f\"x{i}:{col}\" if i > 0 else f\"y:{col}\" for i, col in enumerate(data.columns)}, inplace=True)\n", "data.index = np.arange(1, len(data) + 1)\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$ x^{(2)} = ({\\rm \"False\"}, 3, 2, {\\rm \"Sołacz\"}, 62), \\quad x_3^{(2)} = 2 $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Hipoteza" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "W naszym przypadku (wybraliśmy 5 cech):\n", "\n", "$$ h_\\theta(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\theta_3 x_3 + \\theta_4 x_4 + \\theta_5 x_5 $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "W ogólności ($n$ cech):\n", "\n", "$$ h_\\theta(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\ldots + \\theta_n x_n $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Jeżeli zdefiniujemy $x_0 = 1$, będziemy mogli powyższy wzór zapisać w bardziej kompaktowy sposób:\n", "\n", "$$\n", "\\begin{array}{rcl}\n", "h_\\theta(x)\n", " & = & \\theta_0 x_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\ldots + \\theta_n x_n \\\\\n", " & = & \\displaystyle\\sum_{i=0}^{n} \\theta_i x_i \\\\\n", " & = & \\theta^T \\, x \\\\\n", " & = & x^T \\, \\theta \\\\\n", "\\end{array}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Metoda gradientu prostego – notacja macierzowa" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Metoda gradientu prostego przyjmie bardzo elegancką formę, jeżeli do jej zapisu użyjemy wektorów i macierzy." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "X=\\left[\\begin{array}{cc}\n", "1 & \\left( \\vec x^{(1)} \\right)^T \\\\\n", "1 & \\left( \\vec x^{(2)} \\right)^T \\\\\n", "\\vdots & \\vdots\\\\\n", "1 & \\left( \\vec x^{(m)} \\right)^T \\\\\n", "\\end{array}\\right] \n", "= \\left[\\begin{array}{cccc}\n", "1 & x_1^{(1)} & \\cdots & x_n^{(1)} \\\\\n", "1 & x_1^{(2)} & \\cdots & x_n^{(2)} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots\\\\\n", "1 & x_1^{(m)} & \\cdots & x_n^{(m)} \\\\\n", "\\end{array}\\right]\n", "\\quad\n", "\\vec{y} = \n", "\\left[\\begin{array}{c}\n", "y^{(1)}\\\\\n", "y^{(2)}\\\\\n", "\\vdots\\\\\n", "y^{(m)}\\\\\n", "\\end{array}\\right]\n", "\\quad\n", "\\theta = \\left[\\begin{array}{c}\n", "\\theta_0\\\\\n", "\\theta_1\\\\\n", "\\vdots\\\\\n", "\\theta_n\\\\\n", "\\end{array}\\right]\n", "$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Wersje macierzowe funkcji rysowania wykresów punktowych oraz krzywej regresyjnej\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "\n", "def h(theta, x):\n", " return x * theta\n", "\n", "\n", "def regdots(x, y, xlabel=\"populacja\", ylabel=\"zysk\"):\n", " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n", " ax = fig.add_subplot(111)\n", " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n", " ax.scatter([x[:, 1]], [y], c=\"r\", s=50, label=\"Dane\")\n", "\n", " ax.set_xlabel(xlabel)\n", " ax.set_ylabel(ylabel)\n", " ax.margins(0.05, 0.05)\n", " plt.ylim(y.min() - 1, y.max() + 1)\n", " plt.xlim(np.min(x[:, 1]) - 1, np.max(x[:, 1]) + 1)\n", " return fig\n", "\n", "\n", "def regline(fig, fun, theta, x):\n", " ax = fig.axes[0]\n", " x_min = np.min(x[:, 1])\n", " x_max = np.max(x[:, 1])\n", " x_range = [x_min, x_max]\n", " x_matrix = np.matrix([1, x_min, 1, x_max]).reshape(2, 2)\n", " ax.plot(\n", " x_range,\n", " fun(theta, x_matrix),\n", " linewidth=\"2\",\n", " label=(\n", " r\"$y={theta0:.2}{op}{theta1:.2}x$\".format(\n", " theta0=float(theta[0][0]),\n", " theta1=(\n", " float(theta[1][0]) if theta[1][0] >= 0 else float(-theta[1][0])\n", " ),\n", " op=\"+\" if theta[1][0] >= 0 else \"-\",\n", " )\n", " ),\n", " )\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x[:5]=matrix([[ 1., 3., 1., 78.],\n", " [ 1., 3., 2., 62.],\n", " [ 1., 3., 0., 15.],\n", " [ 1., 4., 0., 14.],\n", " [ 1., 3., 0., 15.]])\n", "x.shape=(1339, 4)\n", "y[:5]=matrix([[476118.],\n", " [459531.],\n", " [411557.],\n", " [496416.],\n", " [406032.]])\n", "y.shape=(1339, 1)\n" ] } ], "source": [ "# Wczytwanie danych z pliku – regresja liniowa wielu zmiennych – notacja macierzowa\n", "\n", "import pandas as pd\n", "\n", "data = pd.read_csv(\n", " \"data02_train.tsv\", delimiter=\"\\t\", usecols=[\"price\", \"rooms\", \"floor\", \"sqrMetres\"]\n", ")\n", "m, n_plus_1 = data.values.shape\n", "n = n_plus_1 - 1\n", "xn = data.values[:, 1:].reshape(m, n)\n", "\n", "# Dodaj kolumnę jedynek do macierzy\n", "x = np.matrix(np.concatenate((np.ones((m, 1)), xn), axis=1)).reshape(m, n_plus_1)\n", "y = np.matrix(data.values[:, 0]).reshape(m, 1)\n", "\n", "print(f\"{x[:5]=}\")\n", "print(f\"{x.shape=}\")\n", "print(f\"{y[:5]=}\")\n", "print(f\"{y.shape=}\")\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Funkcja kosztu – notacja macierzowa" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$J(\\theta)=\\dfrac{1}{2|\\vec y|}\\left(X\\theta-\\vec{y}\\right)^T\\left(X\\theta-\\vec{y}\\right)$$ \n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle \\Large J(\\theta) = 85104141370.9717$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Math, Latex\n", "\n", "\n", "def J(theta, X, y):\n", " \"\"\"Wersja macierzowa funkcji kosztu\"\"\"\n", " m = len(y)\n", " cost = 1.0 / (2.0 * m) * ((X * theta - y).T * (X * theta - y))\n", " return cost.item()\n", "\n", "\n", "theta = np.matrix([10, 90, -1, 2.5]).reshape(4, 1)\n", "\n", "cost = J(theta, x, y)\n", "display(Math(r\"\\Large J(\\theta) = %.4f\" % cost))\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Gradient – notacja macierzowa" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\\nabla J(\\theta) = \\frac{1}{|\\vec y|} X^T\\left(X\\theta-\\vec y\\right)$$" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "# Wyświetlanie macierzy w LaTeX-u\n", "\n", "\n", "def latex_matrix(matrix):\n", " ltx = r\"\\left[\\begin{array}\"\n", " m, n = matrix.shape\n", " ltx += \"{\" + (\"r\" * n) + \"}\"\n", " for i in range(m):\n", " ltx += r\" & \".join([(\"%.4f\" % j.item()) for j in matrix[i]]) + r\" \\\\ \"\n", " ltx += r\"\\end{array}\\right]\"\n", " return ltx\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle \\large \\theta = \\left[\\begin{array}{r}10.0000 \\\\ 90.0000 \\\\ -1.0000 \\\\ 2.5000 \\\\ \\end{array}\\right]\\quad\\large \\nabla J(\\theta) = \\left[\\begin{array}{r}-373492.7442 \\\\ -1075656.5086 \\\\ -989554.4921 \\\\ -23806475.6561 \\\\ \\end{array}\\right]$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Math, Latex\n", "\n", "\n", "def dJ(theta, X, y):\n", " \"\"\"Wersja macierzowa gradientu funckji kosztu\"\"\"\n", " return 1.0 / len(y) * (X.T * (X * theta - y))\n", "\n", "\n", "theta = np.matrix([10, 90, -1, 2.5]).reshape(4, 1)\n", "\n", "display(\n", " Math(\n", " r\"\\large \\theta = \"\n", " + latex_matrix(theta)\n", " + r\"\\quad\"\n", " + r\"\\large \\nabla J(\\theta) = \"\n", " + latex_matrix(dJ(theta, x, y))\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Algorytm gradientu prostego – notacja macierzowa" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$ \\theta := \\theta - \\alpha \\, \\nabla J(\\theta) $$" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle \\large\\textrm{Wynik:}\\quad \\theta = \\left[\\begin{array}{r}17446.2135 \\\\ 86476.7960 \\\\ -1374.8950 \\\\ 2165.0689 \\\\ \\end{array}\\right] \\quad J(\\theta) = 10324864803.1591 \\quad \\textrm{po 374575 iteracjach}$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Implementacja algorytmu gradientu prostego za pomocą numpy i macierzy\n", "\n", "\n", "def gradient_descent(fJ, fdJ, theta, X, y, alpha, eps):\n", " current_cost = fJ(theta, X, y)\n", " history = [[current_cost, theta]]\n", " while True:\n", " theta = theta - alpha * fdJ(theta, X, y) # implementacja wzoru\n", " current_cost, prev_cost = fJ(theta, X, y), current_cost\n", " if abs(prev_cost - current_cost) <= eps:\n", " break\n", " if current_cost > prev_cost:\n", " print(\"Długość kroku (alpha) jest zbyt duża!\")\n", " break\n", " history.append([current_cost, theta])\n", " return theta, history\n", "\n", "\n", "theta_start = np.zeros((n + 1, 1))\n", "\n", "# Zmieniamy wartości alpha (rozmiar kroku) oraz eps (kryterium stopu)\n", "theta_best, history = gradient_descent(J, dJ, theta_start, x, y, alpha=0.0001, eps=0.1)\n", "\n", "display(\n", " Math(\n", " r\"\\large\\textrm{Wynik:}\\quad \\theta = \"\n", " + latex_matrix(theta_best)\n", " + (r\" \\quad J(\\theta) = %.4f\" % history[-1][0])\n", " + r\" \\quad \\textrm{po %d iteracjach}\" % len(history)\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 3.2. Metoda gradientu prostego w praktyce" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Kryterium stopu" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Algorytm gradientu prostego polega na wykonywaniu określonych kroków w pętli. Pytanie brzmi: kiedy należy zatrzymać wykonywanie tej pętli?\n", "\n", "W każdej kolejnej iteracji wartość funkcji kosztu maleje o coraz mniejszą wartość.\n", "Parametr `eps` określa, jaka wartość graniczna tej różnicy jest dla nas wystarczająca:\n", "\n", " * Im mniejsza wartość `eps`, tym dokładniejszy wynik, ale dłuższy czas działania algorytmu.\n", " * Im większa wartość `eps`, tym krótszy czas działania algorytmu, ale mniej dokładny wynik." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Na wykresie zobaczymy porównanie regresji dla różnych wartości `eps`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "ename": "ValueError", "evalue": "x and y must be the same size", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn [18], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m theta_start \u001b[39m=\u001b[39m np\u001b[39m.\u001b[39mzeros((\u001b[39m2\u001b[39m, \u001b[39m1\u001b[39m))\n\u001b[0;32m----> 3\u001b[0m fig \u001b[39m=\u001b[39m regdots(x[\u001b[39m1\u001b[39m], y)\n", "Cell \u001b[0;32mIn [6], line 15\u001b[0m, in \u001b[0;36mregdots\u001b[0;34m(x, y, xlabel, ylabel)\u001b[0m\n\u001b[1;32m 13\u001b[0m ax \u001b[39m=\u001b[39m fig\u001b[39m.\u001b[39madd_subplot(\u001b[39m111\u001b[39m)\n\u001b[1;32m 14\u001b[0m fig\u001b[39m.\u001b[39msubplots_adjust(left\u001b[39m=\u001b[39m\u001b[39m0.1\u001b[39m, right\u001b[39m=\u001b[39m\u001b[39m0.9\u001b[39m, bottom\u001b[39m=\u001b[39m\u001b[39m0.1\u001b[39m, top\u001b[39m=\u001b[39m\u001b[39m0.9\u001b[39m)\n\u001b[0;32m---> 15\u001b[0m ax\u001b[39m.\u001b[39;49mscatter([x[:, \u001b[39m1\u001b[39;49m]], [y], c\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mr\u001b[39;49m\u001b[39m\"\u001b[39;49m, s\u001b[39m=\u001b[39;49m\u001b[39m50\u001b[39;49m, label\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mDane\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n\u001b[1;32m 17\u001b[0m ax\u001b[39m.\u001b[39mset_xlabel(xlabel)\n\u001b[1;32m 18\u001b[0m ax\u001b[39m.\u001b[39mset_ylabel(ylabel)\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/matplotlib/__init__.py:1423\u001b[0m, in \u001b[0;36m_preprocess_data..inner\u001b[0;34m(ax, data, *args, **kwargs)\u001b[0m\n\u001b[1;32m 1420\u001b[0m \u001b[39m@functools\u001b[39m\u001b[39m.\u001b[39mwraps(func)\n\u001b[1;32m 1421\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39minner\u001b[39m(ax, \u001b[39m*\u001b[39margs, data\u001b[39m=\u001b[39m\u001b[39mNone\u001b[39;00m, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs):\n\u001b[1;32m 1422\u001b[0m \u001b[39mif\u001b[39;00m data \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m-> 1423\u001b[0m \u001b[39mreturn\u001b[39;00m func(ax, \u001b[39m*\u001b[39;49m\u001b[39mmap\u001b[39;49m(sanitize_sequence, args), \u001b[39m*\u001b[39;49m\u001b[39m*\u001b[39;49mkwargs)\n\u001b[1;32m 1425\u001b[0m bound \u001b[39m=\u001b[39m new_sig\u001b[39m.\u001b[39mbind(ax, \u001b[39m*\u001b[39margs, \u001b[39m*\u001b[39m\u001b[39m*\u001b[39mkwargs)\n\u001b[1;32m 1426\u001b[0m auto_label \u001b[39m=\u001b[39m (bound\u001b[39m.\u001b[39marguments\u001b[39m.\u001b[39mget(label_namer)\n\u001b[1;32m 1427\u001b[0m \u001b[39mor\u001b[39;00m bound\u001b[39m.\u001b[39mkwargs\u001b[39m.\u001b[39mget(label_namer))\n", "File \u001b[0;32m~/.local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:4512\u001b[0m, in \u001b[0;36mAxes.scatter\u001b[0;34m(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, edgecolors, plotnonfinite, **kwargs)\u001b[0m\n\u001b[1;32m 4510\u001b[0m y \u001b[39m=\u001b[39m np\u001b[39m.\u001b[39mma\u001b[39m.\u001b[39mravel(y)\n\u001b[1;32m 4511\u001b[0m \u001b[39mif\u001b[39;00m x\u001b[39m.\u001b[39msize \u001b[39m!=\u001b[39m y\u001b[39m.\u001b[39msize:\n\u001b[0;32m-> 4512\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mx and y must be the same size\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 4514\u001b[0m \u001b[39mif\u001b[39;00m s \u001b[39mis\u001b[39;00m \u001b[39mNone\u001b[39;00m:\n\u001b[1;32m 4515\u001b[0m s \u001b[39m=\u001b[39m (\u001b[39m20\u001b[39m \u001b[39mif\u001b[39;00m mpl\u001b[39m.\u001b[39mrcParams[\u001b[39m'\u001b[39m\u001b[39m_internal.classic_mode\u001b[39m\u001b[39m'\u001b[39m] \u001b[39melse\u001b[39;00m\n\u001b[1;32m 4516\u001b[0m mpl\u001b[39m.\u001b[39mrcParams[\u001b[39m'\u001b[39m\u001b[39mlines.markersize\u001b[39m\u001b[39m'\u001b[39m] \u001b[39m*\u001b[39m\u001b[39m*\u001b[39m \u001b[39m2.0\u001b[39m)\n", "\u001b[0;31mValueError\u001b[0m: x and y must be the same size" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "theta_start = np.zeros((2, 1))\n", "\n", "fig = regdots(x[1], y)\n", "# theta_e1, history1 = GDMx(\n", "# JMx, dJMx, thetaStartMx, XMx, yMx, alpha=0.01, eps=0.01\n", "# ) # niebieska linia\n", "# reglineMx(fig, hMx, theta_e1, XMx)\n", "# theta_e2, history2 = GDMx(\n", "# JMx, dJMx, thetaStartMx, XMx, yMx, alpha=0.01, eps=0.000001\n", "# ) # pomarańczowa linia\n", "# reglineMx(fig, hMx, theta_e2, XMx)\n", "# legend(fig)\n" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle \\theta_{10^{-2}} = \\left[\\begin{array}{r}0.0531 \\\\ 0.8365 \\\\ \\end{array}\\right]\\quad\\theta_{10^{-6}} = \\left[\\begin{array}{r}-3.4895 \\\\ 1.1786 \\\\ \\end{array}\\right]$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(\n", " Math(\n", " r\"\\theta_{10^{-2}} = \"\n", " + latex_matrix(theta_e1)\n", " + r\"\\quad\\theta_{10^{-6}} = \"\n", " + latex_matrix(theta_e2)\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Długość kroku ($\\alpha$)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "# Jak zmienia się koszt w kolejnych krokach w zależności od alfa\n", "\n", "\n", "def costchangeplot(history):\n", " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n", " ax = fig.add_subplot(111)\n", " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n", " ax.set_xlabel(\"krok\")\n", " ax.set_ylabel(r\"$J(\\theta)$\")\n", "\n", " X = np.arange(0, 500, 1)\n", " Y = [history[step][0] for step in X]\n", " ax.plot(X, Y, linewidth=\"2\", label=(r\"$J(\\theta)$\"))\n", " return fig\n", "\n", "\n", "def slide7(alpha):\n", " best_theta, history = gradient_descent(\n", " h, J, [0.0, 0.0], x, y, alpha=alpha, eps=0.0001\n", " )\n", " fig = costchangeplot(history)\n", " legend(fig)\n", "\n", "\n", "sliderAlpha1 = widgets.FloatSlider(\n", " min=0.01, max=0.03, step=0.001, value=0.02, description=r\"$\\alpha$\", width=300\n", ")\n" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "52b0d91e39104f4facbb7f57819aae0c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=0.02, description='$\\\\alpha$', max=0.03, min=0.01, step=0.001), Button…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "widgets.interact_manual(slide7, alpha=sliderAlpha1)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 3.3. Normalizacja danych" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Normalizacja danych to proces, który polega na dostosowaniu danych wejściowych w taki sposób, żeby ułatwić działanie algorytmowi gradientu prostego.\n", "\n", "Wyjaśnię to na przykladzie." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Użyjemy danych z „Gratka flats challenge 2017”.\n", "\n", "Rozważmy model $h(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2$, w którym cena mieszkania prognozowana jest na podstawie liczby pokoi $x_1$ i metrażu $x_2$:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
priceroomssqrMetres
0476118.00378
1459531.00362
2411557.00315
3496416.00414
4406032.00315
5450026.00380
6571229.15239
7325000.00354
8268229.00290
9604836.00440
\n", "
" ], "text/plain": [ " price rooms sqrMetres\n", "0 476118.00 3 78\n", "1 459531.00 3 62\n", "2 411557.00 3 15\n", "3 496416.00 4 14\n", "4 406032.00 3 15\n", "5 450026.00 3 80\n", "6 571229.15 2 39\n", "7 325000.00 3 54\n", "8 268229.00 2 90\n", "9 604836.00 4 40" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Wczytanie danych przy pomocy biblioteki pandas\n", "import pandas\n", "\n", "alldata = pandas.read_csv(\n", " \"data_flats.tsv\", header=0, sep=\"\\t\", usecols=[\"price\", \"rooms\", \"sqrMetres\"]\n", ")\n", "alldata[:10]\n" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "# Funkcja, która pokazuje wartości minimalne i maksymalne w macierzy X\n", "\n", "\n", "def show_mins_and_maxs(XMx):\n", " mins = np.amin(XMx, axis=0).tolist()[0] # wartości minimalne\n", " maxs = np.amax(XMx, axis=0).tolist()[0] # wartości maksymalne\n", " for i, (xmin, xmax) in enumerate(zip(mins, maxs)):\n", " display(Math(r\"${:.2F} \\leq x_{} \\leq {:.2F}$\".format(xmin, i, xmax)))\n" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "# Przygotowanie danych\n", "\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "\n", "data2 = np.matrix(alldata[['rooms', 'sqrMetres', 'price']])\n", "\n", "m, n_plus_1 = data2.shape\n", "n = n_plus_1 - 1\n", "Xn = data2[:, 0:n]\n", "\n", "XMx2 = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n", "yMx2 = np.matrix(data2[:, -1]).reshape(m, 1) / 1000.0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Cechy w danych treningowych przyjmują wartości z zakresu:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle 1.00 \\leq x_0 \\leq 1.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle 2.00 \\leq x_1 \\leq 7.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle 12.00 \\leq x_2 \\leq 196.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_mins_and_maxs(XMx2)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Jak widzimy, $x_2$ przyjmuje wartości dużo większe niż $x_1$.\n", "Powoduje to, że wykres funkcji kosztu jest bardzo „spłaszczony” wzdłuż jednej z osi:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "def contour_plot(X, y, rescale=10**8):\n", " theta0_vals = np.linspace(-100000, 100000, 100)\n", " theta1_vals = np.linspace(-100000, 100000, 100)\n", "\n", " J_vals = np.zeros(shape=(theta0_vals.size, theta1_vals.size))\n", " for t1, element in enumerate(theta0_vals):\n", " for t2, element2 in enumerate(theta1_vals):\n", " thetaT = np.matrix([1.0, element, element2]).reshape(3, 1)\n", " J_vals[t1, t2] = JMx(thetaT, X, y) / rescale\n", "\n", " plt.figure()\n", " plt.contour(theta0_vals, theta1_vals, J_vals.T, np.logspace(-2, 3, 20))\n", " plt.xlabel(r\"$\\theta_1$\")\n", " plt.ylabel(r\"$\\theta_2$\")\n" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": "\n\n\n \n \n \n \n 2022-10-14T11:22:55.380282\n image/svg+xml\n \n \n Matplotlib v3.6.1, https://matplotlib.org/\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "contour_plot(XMx2, yMx2, rescale=10**10)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Jeżeli funkcja kosztu ma kształt taki, jak na powyższym wykresie, to łatwo sobie wyobrazić, że znalezienie minimum lokalnego przy użyciu metody gradientu prostego musi stanowć nie lada wyzwanie: algorytm szybko znajdzie „rynnę”, ale „zjazd” wzdłuż „rynny” w poszukiwaniu minimum będzie odbywał się bardzo powoli.\n", "\n", "Jak temu zaradzić?\n", "\n", "Spróbujemy przekształcić dane tak, żeby funkcja kosztu miała „ładny”, regularny kształt." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Skalowanie" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Będziemy dążyć do tego, żeby każda z cech przyjmowała wartości w podobnym zakresie.\n", "\n", "W tym celu przeskalujemy wartości każdej z cech, dzieląc je przez wartość maksymalną:\n", "\n", "$$ \\hat{x_i}^{(j)} := \\frac{x_i^{(j)}}{\\max_j x_i^{(j)}} $$" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle 1.00 \\leq x_0 \\leq 1.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle 0.29 \\leq x_1 \\leq 1.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle 0.06 \\leq x_2 \\leq 1.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "XMx2_scaled = XMx2 / np.amax(XMx2, axis=0)\n", "\n", "show_mins_and_maxs(XMx2_scaled)\n" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": "\n\n\n \n \n \n \n 2022-10-14T11:23:02.698988\n image/svg+xml\n \n \n Matplotlib v3.6.1, https://matplotlib.org/\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "contour_plot(XMx2_scaled, yMx2)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Normalizacja średniej" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Będziemy dążyć do tego, żeby dodatkowo średnia wartość każdej z cech była w okolicach $0$.\n", "\n", "W tym celu oprócz przeskalowania odejmiemy wartość średniej od wartości każdej z cech:\n", "\n", "$$ \\hat{x_i}^{(j)} := \\frac{x_i^{(j)} - \\mu_i}{\\max_j x_i^{(j)}} $$" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/latex": [ "$\\displaystyle 0.00 \\leq x_0 \\leq 0.00$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle -0.10 \\leq x_1 \\leq 0.62$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/latex": [ "$\\displaystyle -0.23 \\leq x_2 \\leq 0.70$" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "XMx2_norm = (XMx2 - np.mean(XMx2, axis=0)) / np.amax(XMx2, axis=0)\n", "\n", "show_mins_and_maxs(XMx2_norm)\n" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": "\n\n\n \n \n \n \n 2022-10-14T11:23:08.721094\n image/svg+xml\n \n \n Matplotlib v3.6.1, https://matplotlib.org/\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "contour_plot(XMx2_norm, yMx2)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Teraz funkcja kosztu ma wykres o bardzo regularnym kształcie – algorytm gradientu prostego zastosowany w takim przypadku bardzo szybko znajdzie minimum funkcji kosztu." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "livereveal": { "start_slideshow_at": "selected", "theme": "white" }, "vscode": { "interpreter": { "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" } } }, "nbformat": 4, "nbformat_minor": 4 }