diff --git a/wyk/02_Regresja_liniowa.ipynb b/wyk/02_Regresja_liniowa.ipynb
new file mode 100644
index 0000000..c627d51
--- /dev/null
+++ b/wyk/02_Regresja_liniowa.ipynb
@@ -0,0 +1,39155 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Uczenie maszynowe\n",
+ "# 2. Regresja liniowa – część 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 2.1. Funkcja kosztu"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Zadanie\n",
+ "Znając $x$ – ludność miasta, należy przewidzieć $y$ – dochód firmy transportowej.\n",
+ "\n",
+ "(Dane pochodzą z kursu „Machine Learning”, Andrew Ng, Coursera)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "**Uwaga**: Ponieważ ten przykład ma być tak prosty, jak to tylko możliwe, ludność miasta podana jest w dziesiątkach tysięcy mieszkańców, a dochód firmy w dziesiątkach tysięcy dolarów. Dzięki temu funkcja kosztu obliczona w dalszej części wykładu będzie osiągać wartości, które łatwo przedstawić na wykresie."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "import ipywidgets as widgets\n",
+ "\n",
+ "%matplotlib inline\n",
+ "%config InlineBackend.figure_format = \"svg\"\n",
+ "\n",
+ "from IPython.display import display, Math, Latex"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Dane"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " x y\n",
+ "0 6.1101 17.59200\n",
+ "1 5.5277 9.13020\n",
+ "2 8.5186 13.66200\n",
+ "3 7.0032 11.85400\n",
+ "4 5.8598 6.82330\n",
+ ".. ... ...\n",
+ "75 6.5479 0.29678\n",
+ "76 7.5386 3.88450\n",
+ "77 5.0365 5.70140\n",
+ "78 10.2740 6.75260\n",
+ "79 5.1077 2.05760\n",
+ "\n",
+ "[80 rows x 2 columns]\n"
+ ]
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "data = pd.read_csv(\"data01_train.csv\", names=[\"x\", \"y\"])\n",
+ "print(data)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "x = data[\"x\"].to_numpy()\n",
+ "y = data[\"y\"].to_numpy()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Hipoteza i parametry modelu"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Jak przewidzieć $y$ na podstawie danego $x$? W celu odpowiedzi na to pytanie będziemy starać się znaleźć taką funkcję $h(x)$, która będzie najlepiej obrazować zależność między $x$ a $y$, tj. $y \\sim h(x)$.\n",
+ "\n",
+ "Zacznijmy od najprostszego przypadku, kiedy $h(x)$ jest po prostu funkcją liniową. Ogólny wzór funkcji liniowej to"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ h(x) = a \\, x + b $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Pamiętajmy jednak, że współczynniki $a$ i $b$ nie są w tej chwili dane z góry – naszym zadaniem właśnie będzie znalezienie takich ich wartości, żeby $h(x)$ było „możliwie jak najbliżej” $y$ (co właściwie oznacza to sformułowanie, wyjaśnię potem)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "Poszukiwaną funkcję $h$ będziemy nazywać **funkcją hipotezy**, a jej współczynniki – **parametrami modelu**."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "W teorii uczenia maszynowego parametry modelu oznacza się na ogół grecką literą $\\theta$ z odpowiednimi indeksami, dlatego powyższy wzór opisujący liniową funkcję hipotezy zapiszemy jako\n",
+ "$$ h(x) = \\theta_0 + \\theta_1 x $$\n",
+ "\n",
+ "**Parametry modelu** tworzą wektor, który oznaczymy po prostu przez $\\theta$:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "$$ \\theta = \\left[\\begin{array}{c}\\theta_0\\\\ \\theta_1\\end{array}\\right] $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Żeby podkreślić fakt, że funkcja hipotezy zależy od parametrów modelu, będziemy pisać $h_\\theta$ zamiast $h$:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ h_{\\theta}(x) = \\theta_0 + \\theta_1 x $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Przyjrzyjmy się teraz, jak wyglądają dane, które mamy modelować:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Na poniższym wykresie możesz spróbować ręcznie dopasować parametry modelu $\\theta_0$ i $\\theta_1$ tak, aby jak najlepiej modelowały zależność między $x$ a $y$:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Funkcje rysujące wykres kropkowy oraz prostą regresyjną\n",
+ "\n",
+ "\n",
+ "def regdots(x, y):\n",
+ " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n",
+ " ax = fig.add_subplot(111)\n",
+ " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
+ " ax.scatter(x, y, c=\"r\", label=\"Dane\")\n",
+ "\n",
+ " ax.set_xlabel(\"Wielkość miejscowości\")\n",
+ " ax.set_ylabel(\"Dochód firmy\")\n",
+ " ax.margins(0.05, 0.05)\n",
+ " plt.ylim(min(y) - 1, max(y) + 1)\n",
+ " plt.xlim(min(x) - 1, max(x) + 1)\n",
+ " return fig\n",
+ "\n",
+ "\n",
+ "def regline(fig, fun, theta, x):\n",
+ " ax = fig.axes[0]\n",
+ " x0, x1 = min(x), max(x)\n",
+ " X = [x0, x1]\n",
+ " Y = [fun(theta, x) for x in X]\n",
+ " ax.plot(\n",
+ " X,\n",
+ " Y,\n",
+ " linewidth=\"2\",\n",
+ " label=(\n",
+ " r\"$y={theta0}{op}{theta1}x$\".format(\n",
+ " theta0=theta[0],\n",
+ " theta1=(theta[1] if theta[1] >= 0 else -theta[1]),\n",
+ " op=\"+\" if theta[1] >= 0 else \"-\",\n",
+ " )\n",
+ " ),\n",
+ " )\n",
+ "\n",
+ "\n",
+ "def legend(fig):\n",
+ " ax = fig.axes[0]\n",
+ " handles, labels = ax.get_legend_handles_labels()\n",
+ " # try-except block is a fix for a bug in Poly3DCollection\n",
+ " try:\n",
+ " fig.legend(handles, labels, fontsize=\"15\", loc=\"lower right\")\n",
+ " except AttributeError:\n",
+ " pass\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = regdots(x, y)\n",
+ "legend(fig)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Hipoteza: funkcja liniowa jednej zmiennej\n",
+ "\n",
+ "\n",
+ "def h(theta, x):\n",
+ " return theta[0] + theta[1] * x\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Przygotowanie interaktywnego wykresu\n",
+ "\n",
+ "sliderTheta01 = widgets.FloatSlider(\n",
+ " min=-10, max=10, step=0.1, value=0, description=r\"$\\theta_0$\", width=300\n",
+ ")\n",
+ "sliderTheta11 = widgets.FloatSlider(\n",
+ " min=-5, max=5, step=0.1, value=0, description=r\"$\\theta_1$\", width=300\n",
+ ")\n",
+ "\n",
+ "\n",
+ "def slide1(theta0, theta1):\n",
+ " fig = regdots(x, y)\n",
+ " regline(fig, h, [theta0, theta1], x)\n",
+ " legend(fig)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "4880be41c52643798571f509b333a025",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(FloatSlider(value=0.0, description='$\\\\theta_0$', max=10.0, min=-10.0), FloatSlider(valu…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 47,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact_manual(slide1, theta0=sliderTheta01, theta1=sliderTheta11)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Skąd wiadomo, że przewidywania modelu (wartości funkcji $h(x)$) zgadzaja się z obserwacjami (wartości $y$)?\n",
+ "\n",
+ "Aby to zmierzyć wprowadzimy pojęcie funkcji kosztu."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Funkcja kosztu"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Funkcję kosztu zdefiniujemy w taki sposób, żeby odzwierciedlała ona różnicę między przewidywaniami modelu a obserwacjami.\n",
+ "\n",
+ "Jedną z możliwosci jest zdefiniowanie funkcji kosztu jako wartość **błędu średniokwadratowego** (metoda najmniejszych kwadratów, *mean-square error, MSE*).\n",
+ "\n",
+ "My zdefiniujemy funkcję kosztu jako *połowę* błędu średniokwadratowego w celu ułatwienia późniejszych obliczeń (obliczenie pochodnej funkcji kosztu w dalszej części wykładu). Możemy tak zrobić, ponieważ $\\frac{1}{2}$ jest stałą, a pomnożenie przez stałą nie wpływa na przebieg zmienności funkcji."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ J(\\theta) \\, = \\, \\frac{1}{2m} \\sum_{i = 1}^{m} \\left( h_{\\theta} \\left( x^{(i)} \\right) - y^{(i)} \\right) ^2 $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "gdzie $m$ jest liczbą wszystkich przykładów (obserwacji), czyli wielkością zbioru danych uczących.\n",
+ "\n",
+ "W powyższym wzorze sumujemy kwadraty różnic między przewidywaniami modelu ($h_\\theta \\left( x^{(i)} \\right)$) a obserwacjami ($y^{(i)}$) po wszystkich przykładach $i$."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Teraz nasze zadanie sprowadza się do tego, że będziemy szukać takich parametrów $\\theta = \\left[\\begin{array}{c}\\theta_0\\\\ \\theta_1\\end{array}\\right]$, które minimalizują fukcję kosztu $J(\\theta)$:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ \\hat\\theta = \\mathop{\\arg\\min}_{\\theta} J(\\theta) $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ \\theta \\in \\mathbb{R}^2, \\quad J \\colon \\mathbb{R}^2 \\to \\mathbb{R} $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Proszę zwrócić uwagę, że dziedziną funkcji kosztu jest zbiór wszystkich możliwych wartości parametrów $\\theta$."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "$$ J(\\theta_0, \\theta_1) \\, = \\, \\frac{1}{2m} \\sum_{i = 1}^{m} \\left( \\theta_0 + \\theta_1 x^{(i)} - y^{(i)} \\right) ^2 $$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "def J(h, theta, x, y):\n",
+ " \"\"\"Funkcja kosztu\"\"\"\n",
+ " m = len(y)\n",
+ " return 1.0 / (2 * m) * sum((h(theta, x[i]) - y[i]) ** 2 for i in range(m))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Oblicz wartość funkcji kosztu i pokaż na wykresie\n",
+ "\n",
+ "\n",
+ "def regline2(fig, fun, theta, xx, yy):\n",
+ " \"\"\"Rysuj regresję liniową\"\"\"\n",
+ " ax = fig.axes[0]\n",
+ " x0, x1 = min(xx), max(xx)\n",
+ " X = [x0, x1]\n",
+ " Y = [fun(theta, x) for x in X]\n",
+ " cost = J(fun, theta, xx, yy)\n",
+ " ax.plot(\n",
+ " X,\n",
+ " Y,\n",
+ " linewidth=\"2\",\n",
+ " label=(\n",
+ " r\"$y={theta0}{op}{theta1}x, \\; J(\\theta)={cost:.3}$\".format(\n",
+ " theta0=theta[0],\n",
+ " theta1=(theta[1] if theta[1] >= 0 else -theta[1]),\n",
+ " op=\"+\" if theta[1] >= 0 else \"-\",\n",
+ " cost=cost,\n",
+ " )\n",
+ " ),\n",
+ " )\n",
+ "\n",
+ "\n",
+ "sliderTheta02 = widgets.FloatSlider(\n",
+ " min=-10, max=10, step=0.1, value=0, description=r\"$\\theta_0$\", width=300\n",
+ ")\n",
+ "sliderTheta12 = widgets.FloatSlider(\n",
+ " min=-5, max=5, step=0.1, value=0, description=r\"$\\theta_1$\", width=300\n",
+ ")\n",
+ "\n",
+ "\n",
+ "def slide2(theta0, theta1):\n",
+ " fig = regdots(x, y)\n",
+ " regline2(fig, h, [theta0, theta1], x, y)\n",
+ " legend(fig)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Poniższy interaktywny wykres pokazuje wartość funkcji kosztu $J(\\theta)$. Czy teraz łatwiej jest dobrać parametry modelu?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "c67ea652bba946cf83a86485848bb0b0",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(FloatSlider(value=0.0, description='$\\\\theta_0$', max=10.0, min=-10.0), FloatSlider(valu…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 50,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact_manual(slide2, theta0=sliderTheta02, theta1=sliderTheta12)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Funkcja kosztu jako funkcja zmiennej $\\theta$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Funkcja kosztu zdefiniowana jako MSE jest funkcją zmiennej wektorowej $\\theta$, czyli funkcją dwóch zmiennych rzeczywistych: $\\theta_0$ i $\\theta_1$.\n",
+ " \n",
+ "Zobaczmy, jak wygląda jej wykres."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Wykres funkcji kosztu dla ustalonego theta_1=1.0\n",
+ "\n",
+ "\n",
+ "def costfun(fun, x, y):\n",
+ " return lambda theta: J(fun, theta, x, y)\n",
+ "\n",
+ "\n",
+ "def costplot(hypothesis, x, y, theta1=1.0):\n",
+ " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n",
+ " ax = fig.add_subplot(111)\n",
+ " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
+ " ax.set_xlabel(r\"$\\theta_0$\")\n",
+ " ax.set_ylabel(r\"$J(\\theta)$\")\n",
+ " j = costfun(hypothesis, x, y)\n",
+ " fun = lambda theta0: j([theta0, theta1])\n",
+ " X = np.arange(-10, 10, 0.1)\n",
+ " Y = [fun(x) for x in X]\n",
+ " ax.plot(\n",
+ " X, Y, linewidth=\"2\", label=(r\"$J(\\theta_0, {theta1})$\".format(theta1=theta1))\n",
+ " )\n",
+ " return fig\n",
+ "\n",
+ "\n",
+ "def slide3(theta1):\n",
+ " fig = costplot(h, x, y, theta1)\n",
+ " legend(fig)\n",
+ "\n",
+ "\n",
+ "sliderTheta13 = widgets.FloatSlider(\n",
+ " min=-5, max=5, step=0.1, value=1.0, description=r\"$\\theta_1$\", width=300\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 52,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "f5ea28655cad4743b9e58a3ecd0b1fc3",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(FloatSlider(value=1.0, description='$\\\\theta_1$', max=5.0, min=-5.0), Button(description…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 52,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact_manual(slide3, theta1=sliderTheta13)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Wykres funkcji kosztu względem theta_0 i theta_1\n",
+ "\n",
+ "from mpl_toolkits.mplot3d import Axes3D\n",
+ "import pylab\n",
+ "\n",
+ "%matplotlib inline\n",
+ "\n",
+ "def costplot3d(hypothesis, x, y, show_gradient=False):\n",
+ " fig = plt.figure(figsize=(16*.6, 9*.6))\n",
+ " ax = fig.add_subplot(111, projection='3d')\n",
+ " fig.subplots_adjust(left=0.0, right=1.0, bottom=0.0, top=1.0)\n",
+ " ax.set_xlabel(r'$\\theta_0$')\n",
+ " ax.set_ylabel(r'$\\theta_1$')\n",
+ " ax.set_zlabel(r'$J(\\theta)$')\n",
+ " \n",
+ " j = lambda theta0, theta1: costfun(hypothesis, x, y)([theta0, theta1])\n",
+ " X = np.arange(-10, 10.1, 0.1)\n",
+ " Y = np.arange(-1, 4.1, 0.1)\n",
+ " X, Y = np.meshgrid(X, Y)\n",
+ " Z = np.array([[J(hypothesis, [theta0, theta1], x, y) \n",
+ " for theta0, theta1 in zip(xRow, yRow)] \n",
+ " for xRow, yRow in zip(X, Y)])\n",
+ " \n",
+ " ax.plot_surface(X, Y, Z, rstride=2, cstride=8, linewidth=0.5,\n",
+ " alpha=0.5, cmap='jet', zorder=0,\n",
+ " label=r\"$J(\\theta)$\")\n",
+ " ax.view_init(elev=20., azim=-150)\n",
+ "\n",
+ " ax.set_xlim3d(-10, 10);\n",
+ " ax.set_ylim3d(-1, 4);\n",
+ " ax.set_zlim3d(-100, 800);\n",
+ "\n",
+ " N = range(0, 800, 20)\n",
+ " plt.contour(X, Y, Z, N, zdir='z', offset=-100, cmap='coolwarm', alpha=1)\n",
+ " \n",
+ " ax.plot([-3.89578088] * 2,\n",
+ " [ 1.19303364] * 2,\n",
+ " [-100, 4.47697137598], \n",
+ " color='red', alpha=1, linewidth=1.3, zorder=100, linestyle='dashed',\n",
+ " label=r'minimum: $J(-3.90, 1.19) = 4.48$')\n",
+ " ax.scatter([-3.89578088] * 2,\n",
+ " [ 1.19303364] * 2,\n",
+ " [-100, 4.47697137598], \n",
+ " c='r', s=80, marker='x', alpha=1, linewidth=1.3, zorder=100, \n",
+ " label=r'minimum: $J(-3.90, 1.19) = 4.48$')\n",
+ " \n",
+ " if show_gradient:\n",
+ " ax.plot([3.0, 1.1],\n",
+ " [3.0, 2.4],\n",
+ " [263.0, 125.0], \n",
+ " color='green', alpha=1, linewidth=1.3, zorder=100)\n",
+ " ax.scatter([3.0],\n",
+ " [3.0],\n",
+ " [263.0], \n",
+ " c='g', s=30, marker='D', alpha=1, linewidth=1.3, zorder=100)\n",
+ "\n",
+ " ax.margins(0,0,0)\n",
+ " fig.tight_layout()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 54,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "costplot3d(h, x, y)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Na powyższym wykresie poszukiwane minimum funkcji kosztu oznaczone jest czerwonym krzyżykiem.\n",
+ "\n",
+ "Możemy też zobaczyć rzut powyższego trójwymiarowego wykresu na płaszczyznę $(\\theta_0, \\theta_1)$ poniżej:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "def costplot2d(hypothesis, x, y, gradient_values=[], nohead=False):\n",
+ " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n",
+ " ax = fig.add_subplot(111)\n",
+ " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
+ " ax.set_xlabel(r\"$\\theta_0$\")\n",
+ " ax.set_ylabel(r\"$\\theta_1$\")\n",
+ "\n",
+ " j = lambda theta0, theta1: costfun(hypothesis, x, y)([theta0, theta1])\n",
+ " X = np.arange(-10, 10.1, 0.1)\n",
+ " Y = np.arange(-1, 4.1, 0.1)\n",
+ " X, Y = np.meshgrid(X, Y)\n",
+ " Z = np.array(\n",
+ " [\n",
+ " [\n",
+ " J(hypothesis, [theta0, theta1], x, y)\n",
+ " for theta0, theta1 in zip(xRow, yRow)\n",
+ " ]\n",
+ " for xRow, yRow in zip(X, Y)\n",
+ " ]\n",
+ " )\n",
+ "\n",
+ " N = range(0, 800, 20)\n",
+ " plt.contour(X, Y, Z, N, cmap=\"coolwarm\", alpha=1)\n",
+ "\n",
+ " ax.scatter(\n",
+ " [-3.89578088],\n",
+ " [1.19303364],\n",
+ " c=\"r\",\n",
+ " s=80,\n",
+ " marker=\"x\",\n",
+ " label=r\"minimum: $J(-3.90, 1.19) = 4.48$\",\n",
+ " )\n",
+ "\n",
+ " if len(gradient_values) > 0:\n",
+ " prev_theta = gradient_values[0][1]\n",
+ " ax.scatter(\n",
+ " [prev_theta[0]], [prev_theta[1]], c=\"g\", s=30, marker=\"D\", zorder=100\n",
+ " )\n",
+ " for cost, theta in gradient_values[1:]:\n",
+ " dtheta = [theta[0] - prev_theta[0], theta[1] - prev_theta[1]]\n",
+ " ax.arrow(\n",
+ " prev_theta[0],\n",
+ " prev_theta[1],\n",
+ " dtheta[0],\n",
+ " dtheta[1],\n",
+ " color=\"green\",\n",
+ " head_width=(0.0 if nohead else 0.1),\n",
+ " head_length=(0.0 if nohead else 0.2),\n",
+ " zorder=100,\n",
+ " )\n",
+ " prev_theta = theta\n",
+ "\n",
+ " return fig\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = costplot2d(h, x, y)\n",
+ "legend(fig)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Cechy funkcji kosztu\n",
+ "Funkcja kosztu $J(\\theta)$ zdefiniowana powyżej jest funkcją wypukłą, dlatego posiada tylko jedno minimum lokalne."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 2.2. Metoda gradientu prostego"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Metoda gradientu prostego\n",
+ "Metoda znajdowania minimów lokalnych."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "Idea:\n",
+ " * Zacznijmy od dowolnego $\\theta$.\n",
+ " * Zmieniajmy powoli $\\theta$ tak, aby zmniejszać $J(\\theta)$, aż w końcu znajdziemy minimum."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "costplot3d(h, x, y, show_gradient=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Przykładowe wartości kolejnych przybliżeń (sztuczne)\n",
+ "\n",
+ "gv = [\n",
+ " [_, [3.0, 3.0]],\n",
+ " [_, [2.6, 2.4]],\n",
+ " [_, [2.2, 2.0]],\n",
+ " [_, [1.6, 1.6]],\n",
+ " [_, [0.4, 1.2]],\n",
+ "]\n",
+ "\n",
+ "# Przygotowanie interaktywnego wykresu\n",
+ "\n",
+ "sliderSteps1 = widgets.IntSlider(\n",
+ " min=0, max=3, step=1, value=0, description=\"kroki\", width=300\n",
+ ")\n",
+ "\n",
+ "\n",
+ "def slide4(steps):\n",
+ " costplot2d(h, x, y, gradient_values=gv[: steps + 1])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "ba49ab01f3694550a13b124b599f9d17",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(IntSlider(value=0, description='kroki', max=3), Output()), _dom_classes=('widget-interac…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact(slide4, steps=sliderSteps1)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Metoda gradientu prostego\n",
+ "W każdym kroku będziemy aktualizować parametry $\\theta_j$:\n",
+ "\n",
+ "$$ \\theta_j := \\theta_j - \\alpha \\frac{\\partial}{\\partial \\theta_j} J(\\theta) $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "Współczynnik $\\alpha$ nazywamy **długością kroku** lub **współczynnikiem szybkości uczenia** (*learning rate*)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "$$ \\begin{array}{rcl}\n",
+ "\\dfrac{\\partial}{\\partial \\theta_j} J(\\theta)\n",
+ " & = & \\dfrac{\\partial}{\\partial \\theta_j} \\dfrac{1}{2m} \\displaystyle\\sum_{i = 1}^{m} \\left( h_{\\theta} \\left( x^{(i)} \\right) - y^{(i)} \\right) ^2 \\\\\n",
+ " & = & 2 \\cdot \\dfrac{1}{2m} \\displaystyle\\sum_{i=1}^m \\left( h_\\theta \\left( x^{(i)} \\right) - y^{(i)} \\right) \\cdot \\dfrac{\\partial}{\\partial\\theta_j} \\left( h_\\theta \\left( x^{(i)} \\right) - y^{(i)} \\right) \\\\\n",
+ " & = & \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_\\theta \\left( x^{(i)} \\right) - y^{(i)} \\right) \\cdot \\dfrac{\\partial}{\\partial\\theta_j} \\left( \\displaystyle\\sum_{i=0}^n \\theta_i x_i^{(i)} - y^{(i)} \\right)\\\\\n",
+ " & = & \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_\\theta \\left( x^{(i)} \\right) -y^{(i)} \\right) x_j^{(i)} \\\\\n",
+ "\\end{array} $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Czyli dla regresji liniowej jednej zmiennej:\n",
+ "\n",
+ "$$ h_\\theta(x) = \\theta_0 + \\theta_1x $$\n",
+ "\n",
+ "w każdym kroku będziemy aktualizować:\n",
+ "\n",
+ "$$\n",
+ "\\begin{array}{rcl}\n",
+ "\\theta_0 & := & \\theta_0 - \\alpha \\, \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_\\theta(x^{(i)})-y^{(i)} \\right) \\\\ \n",
+ "\\theta_1 & := & \\theta_1 - \\alpha \\, \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_\\theta(x^{(i)})-y^{(i)} \\right) x^{(i)}\\\\ \n",
+ "\\end{array}\n",
+ "$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "###### Uwaga!\n",
+ " * W każdym kroku aktualizujemy *jednocześnie* $\\theta_0$ i $\\theta_1$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ " * Kolejne kroki wykonujemy aż uzyskamy zbieżność"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Metoda gradientu prostego – implementacja"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Wyświetlanie macierzy w LaTeX-u\n",
+ "\n",
+ "\n",
+ "def LatexMatrix(matrix):\n",
+ " ltx = r\"\\left[\\begin{array}\"\n",
+ " m, n = matrix.shape\n",
+ " ltx += \"{\" + (\"r\" * n) + \"}\"\n",
+ " for i in range(m):\n",
+ " ltx += r\" & \".join([(\"%.4f\" % j.item()) for j in matrix[i]]) + r\" \\\\ \"\n",
+ " ltx += r\"\\end{array}\\right]\"\n",
+ " return ltx\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "def gradient_descent(h, cost_fun, theta, x, y, alpha, eps):\n",
+ " current_cost = cost_fun(h, theta, x, y)\n",
+ " history = [\n",
+ " [current_cost, theta]\n",
+ " ] # zapiszmy wartości kosztu i parametrów, by potem zrobić wykres\n",
+ " m = len(y)\n",
+ " while True:\n",
+ " new_theta = [\n",
+ " theta[0] - alpha / float(m) * sum(h(theta, x[i]) - y[i] for i in range(m)),\n",
+ " theta[1]\n",
+ " - alpha / float(m) * sum((h(theta, x[i]) - y[i]) * x[i] for i in range(m)),\n",
+ " ]\n",
+ " theta = new_theta # jednoczesna aktualizacja - używamy zmiennej tymczasowej\n",
+ " try:\n",
+ " prev_cost = current_cost\n",
+ " current_cost = cost_fun(h, theta, x, y)\n",
+ " except OverflowError:\n",
+ " break\n",
+ " if abs(prev_cost - current_cost) <= eps:\n",
+ " break\n",
+ " history.append([current_cost, theta])\n",
+ " return theta, history\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle \\large\\textrm{Wynik:}\\quad \\theta = \\left[\\begin{array}{r}-1.8792 \\\\ 1.0231 \\\\ \\end{array}\\right] \\quad J(\\theta) = 5.0010 \\quad \\textrm{po 4114 iteracjach}$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "best_theta, history = gradient_descent(h, J, [0.0, 0.0], x, y, alpha=0.001, eps=0.0001)\n",
+ "\n",
+ "display(\n",
+ " Math(\n",
+ " r\"\\large\\textrm{Wynik:}\\quad \\theta = \"\n",
+ " + LatexMatrix(np.matrix(best_theta).reshape(2, 1))\n",
+ " + (r\" \\quad J(\\theta) = %.4f\" % history[-1][0])\n",
+ " + r\" \\quad \\textrm{po %d iteracjach}\" % len(history)\n",
+ " )\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Przygotowanie interaktywnego wykresu\n",
+ "\n",
+ "sliderSteps2 = widgets.IntSlider(\n",
+ " min=0, max=500, step=1, value=1, description=\"kroki\", width=300\n",
+ ")\n",
+ "\n",
+ "\n",
+ "def slide5(steps):\n",
+ " costplot2d(h, x, y, gradient_values=history[: steps + 1], nohead=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "59091adc5a5f4d20bf2ad5e92c17b234",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(IntSlider(value=1, description='kroki', max=500), Button(description='Run Interact', sty…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 64,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact_manual(slide5, steps=sliderSteps2)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Współczynnik szybkości uczenia $\\alpha$ (długość kroku)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Tempo zbieżności metody gradientu prostego możemy regulować za pomocą parametru $\\alpha$, pamiętając, że:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ " * Jeżeli długość kroku jest zbyt mała, algorytm może działać zbyt wolno."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ " * Jeżeli długość kroku jest zbyt duża, algorytm może nie być zbieżny."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 2.3. Predykcja wyników"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Zbudowaliśmy model, dzięki któremu wiemy, jaka jest zależność między dochodem firmy transportowej ($y$) a ludnością miasta ($x$).\n",
+ "\n",
+ "Wróćmy teraz do postawionego na początku wykładu pytania: jak przewidzieć dochód firmy transportowej w mieście o danej wielkości?\n",
+ "\n",
+ "Odpowiedź polega po prostu na zastosowaniu funkcji $h$ z wyznaczonymi w poprzednim kroku parametrami $\\theta$.\n",
+ "\n",
+ "Na przykład, jeżeli miasto ma $536\\,000$ ludności, to $x = 53.6$ (bo dane trenujące były wyrażone w dziesiątkach tysięcy mieszkańców, a $536\\,000 = 53.6 \\cdot 10\\,000$) i możemy użyć znalezionych parametrów $\\theta$, by wykonać następujące obliczenia:\n",
+ "$$ \\hat{y} \\, = \\, h_\\theta(x) \\, = \\, \\theta_0 + \\theta_1 \\, x \\, = \\, 0.0494 + 0.7591 \\cdot 53.6 \\, = \\, 40.7359 $$\n",
+ "\n",
+ "Czyli używając zdefiniowanych wcześniej funkcji:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "52.96131370254696\n"
+ ]
+ }
+ ],
+ "source": [
+ "example_x = 53.6\n",
+ "predicted_y = h(best_theta, example_x)\n",
+ "print(\n",
+ " predicted_y\n",
+ ") ## taki jest przewidywany dochód tej firmy transportowej w 536-tysięcznym mieście\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 2.4. Ewaluacja modelu"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Jak ocenić jakość stworzonego przez nas modelu?\n",
+ "\n",
+ " * Trzeba sprawdzić, jak przewidywania modelu zgadzają się z oczekiwaniami!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Czy możemy w tym celu użyć danych, których użyliśmy do wytrenowania modelu?\n",
+ "**NIE!**\n",
+ "\n",
+ " * Istotą uczenia maszynowego jest budowanie modeli/algorytmów, które dają dobre przewidywania dla **nieznanych** danych – takich, z którymi algorytm nie miał jeszcze styczności! Nie sztuką jest przewidywać rzeczy, które już sie zna.\n",
+ " * Dlatego testowanie/ewaluowanie modelu na zbiorze uczącym mija się z celem i jest nieprzydatne.\n",
+ " * Do ewaluacji modelu należy użyć oddzielnego zbioru danych.\n",
+ " * **Dane uczące i dane testowe zawsze powinny stanowić oddzielne zbiory!**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Na wykładzie *5. Dobre praktyki w uczeniu maszynowym* dowiesz się, jak podzielić posiadane dane na zbiór uczący i zbiór testowy.\n",
+ "\n",
+ "Tutaj, na razie, do ewaluacji użyjemy specjalnie przygotowanego zbioru testowego."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Jako metrykę ewaluacji wykorzystamy znany nam już błąd średniokwadratowy (MSE):"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "def mse(expected, predicted):\n",
+ " \"\"\"Błąd średniokwadratowy\"\"\"\n",
+ " m = len(expected)\n",
+ " if len(predicted) != m:\n",
+ " raise Exception(\"Wektory mają różne długości!\")\n",
+ " return 1.0 / (2 * m) * sum((expected[i] - predicted[i]) ** 2 for i in range(m))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "4.36540743711836\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Wczytwanie danych testowych z pliku za pomocą numpy\n",
+ "\n",
+ "test_data = np.loadtxt(\"data01_test.csv\", delimiter=\",\")\n",
+ "x_test = test_data[:, 0]\n",
+ "y_test = test_data[:, 1]\n",
+ "\n",
+ "# Obliczenie przewidywań modelu\n",
+ "y_pred = h(best_theta, x_test)\n",
+ "\n",
+ "# Obliczenie MSE na zbiorze testowym (im mniejszy MSE, tym lepiej!)\n",
+ "evaluation_result = mse(y_test, y_pred)\n",
+ "\n",
+ "print(evaluation_result)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Otrzymana wartość mówi nam o tym, jak dobry jest stworzony przez nas model.\n",
+ "\n",
+ "W przypadku metryki MSE im mniejsza wartość, tym lepiej.\n",
+ "\n",
+ "W ten sposób możemy np. porównywać różne modele."
+ ]
+ }
+ ],
+ "metadata": {
+ "celltoolbar": "Slideshow",
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.4"
+ },
+ "livereveal": {
+ "start_slideshow_at": "selected",
+ "theme": "white"
+ },
+ "vscode": {
+ "interpreter": {
+ "hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1"
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/wyk/03_Regresja_liniowa_2.ipynb b/wyk/03_Regresja_liniowa_2.ipynb
new file mode 100644
index 0000000..3e786c8
--- /dev/null
+++ b/wyk/03_Regresja_liniowa_2.ipynb
@@ -0,0 +1,9657 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Uczenie maszynowe\n",
+ "# 3. Regresja liniowa – część 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 3.1. Regresja liniowa wielu zmiennych"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Do przewidywania wartości $y$ możemy użyć więcej niż jednej cechy $x$:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Przykład – ceny mieszkań"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "y : price x1: isNew x2: rooms x3: floor x4: location x5: sqrMetres\n",
+ "476118.0 False 3 1 Centrum 78 \n",
+ "459531.0 False 3 2 Sołacz 62 \n",
+ "411557.0 False 3 0 Sołacz 15 \n",
+ "496416.0 False 4 0 Sołacz 14 \n",
+ "406032.0 False 3 0 Sołacz 15 \n",
+ "450026.0 False 3 1 Naramowice 80 \n",
+ "571229.15 False 2 4 Wilda 39 \n",
+ "325000.0 False 3 1 Grunwald 54 \n",
+ "268229.0 False 2 1 Grunwald 90 \n"
+ ]
+ }
+ ],
+ "source": [
+ "import csv\n",
+ "\n",
+ "reader = csv.reader(open(\"data02_train.tsv\", encoding=\"utf-8\"), delimiter=\"\\t\")\n",
+ "for i, row in enumerate(list(reader)[:10]):\n",
+ " if i == 0:\n",
+ " print(\n",
+ " \" \".join(\n",
+ " [\n",
+ " \"{}: {:8}\".format(\"x\" + str(j) if j > 0 else \"y \", entry)\n",
+ " for j, entry in enumerate(row)\n",
+ " ]\n",
+ " )\n",
+ " )\n",
+ " else:\n",
+ " print(\" \".join([\"{:12}\".format(entry) for entry in row]))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ x^{(2)} = ({\\rm \"False\"}, 3, 2, {\\rm \"Sołacz\"}, 62), \\quad x_3^{(2)} = 2 $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Hipoteza"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "W naszym przypadku (wybraliśmy 5 cech):\n",
+ "\n",
+ "$$ h_\\theta(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\theta_3 x_3 + \\theta_4 x_4 + \\theta_5 x_5 $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "W ogólności ($n$ cech):\n",
+ "\n",
+ "$$ h_\\theta(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\ldots + \\theta_n x_n $$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Jeżeli zdefiniujemy $x_0 = 1$, będziemy mogli powyższy wzór zapisać w bardziej kompaktowy sposób:\n",
+ "\n",
+ "$$\n",
+ "\\begin{array}{rcl}\n",
+ "h_\\theta(x)\n",
+ " & = & \\theta_0 x_0 + \\theta_1 x_1 + \\theta_2 x_2 + \\ldots + \\theta_n x_n \\\\\n",
+ " & = & \\displaystyle\\sum_{i=0}^{n} \\theta_i x_i \\\\\n",
+ " & = & \\theta^T \\, x \\\\\n",
+ " & = & x^T \\, \\theta \\\\\n",
+ "\\end{array}\n",
+ "$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Metoda gradientu prostego – notacja macierzowa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Metoda gradientu prostego przyjmie bardzo elegancką formę, jeżeli do jej zapisu użyjemy wektorów i macierzy."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$\n",
+ "X=\\left[\\begin{array}{cc}\n",
+ "1 & \\left( \\vec x^{(1)} \\right)^T \\\\\n",
+ "1 & \\left( \\vec x^{(2)} \\right)^T \\\\\n",
+ "\\vdots & \\vdots\\\\\n",
+ "1 & \\left( \\vec x^{(m)} \\right)^T \\\\\n",
+ "\\end{array}\\right] \n",
+ "= \\left[\\begin{array}{cccc}\n",
+ "1 & x_1^{(1)} & \\cdots & x_n^{(1)} \\\\\n",
+ "1 & x_1^{(2)} & \\cdots & x_n^{(2)} \\\\\n",
+ "\\vdots & \\vdots & \\ddots & \\vdots\\\\\n",
+ "1 & x_1^{(m)} & \\cdots & x_n^{(m)} \\\\\n",
+ "\\end{array}\\right]\n",
+ "\\quad\n",
+ "\\vec{y} = \n",
+ "\\left[\\begin{array}{c}\n",
+ "y^{(1)}\\\\\n",
+ "y^{(2)}\\\\\n",
+ "\\vdots\\\\\n",
+ "y^{(m)}\\\\\n",
+ "\\end{array}\\right]\n",
+ "\\quad\n",
+ "\\theta = \\left[\\begin{array}{c}\n",
+ "\\theta_0\\\\\n",
+ "\\theta_1\\\\\n",
+ "\\vdots\\\\\n",
+ "\\theta_n\\\\\n",
+ "\\end{array}\\right]\n",
+ "$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "skip"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Wersje macierzowe funkcji rysowania wykresów punktowych oraz krzywej regresyjnej\n",
+ "\n",
+ "\n",
+ "def hMx(theta, X):\n",
+ " return X * theta\n",
+ "\n",
+ "\n",
+ "def regdotsMx(X, y):\n",
+ " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n",
+ " ax = fig.add_subplot(111)\n",
+ " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
+ " ax.scatter([X[:, 1]], [y], c=\"r\", s=50, label=\"Dane\")\n",
+ "\n",
+ " ax.set_xlabel(\"Populacja\")\n",
+ " ax.set_ylabel(\"Zysk\")\n",
+ " ax.margins(0.05, 0.05)\n",
+ " plt.ylim(y.min() - 1, y.max() + 1)\n",
+ " plt.xlim(np.min(X[:, 1]) - 1, np.max(X[:, 1]) + 1)\n",
+ " return fig\n",
+ "\n",
+ "\n",
+ "def reglineMx(fig, fun, theta, X):\n",
+ " ax = fig.axes[0]\n",
+ " x0, x1 = np.min(X[:, 1]), np.max(X[:, 1])\n",
+ " L = [x0, x1]\n",
+ " LX = np.matrix([1, x0, 1, x1]).reshape(2, 2)\n",
+ " ax.plot(\n",
+ " L,\n",
+ " fun(theta, LX),\n",
+ " linewidth=\"2\",\n",
+ " label=(\n",
+ " r\"$y={theta0:.2}{op}{theta1:.2}x$\".format(\n",
+ " theta0=float(theta[0][0]),\n",
+ " theta1=(\n",
+ " float(theta[1][0]) if theta[1][0] >= 0 else float(-theta[1][0])\n",
+ " ),\n",
+ " op=\"+\" if theta[1][0] >= 0 else \"-\",\n",
+ " )\n",
+ " ),\n",
+ " )\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 72,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[ 1. 3. 1. 78.]\n",
+ " [ 1. 3. 2. 62.]\n",
+ " [ 1. 3. 0. 15.]\n",
+ " [ 1. 4. 0. 14.]\n",
+ " [ 1. 3. 0. 15.]]\n",
+ "(1339, 4)\n",
+ "\n",
+ "[[476118.]\n",
+ " [459531.]\n",
+ " [411557.]\n",
+ " [496416.]\n",
+ " [406032.]]\n",
+ "(1339, 1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Wczytwanie danych z pliku za pomocą numpy – regresja liniowa wielu zmiennych – notacja macierzowa\n",
+ "\n",
+ "import pandas\n",
+ "\n",
+ "data = pandas.read_csv(\n",
+ " \"data02_train.tsv\", delimiter=\"\\t\", usecols=[\"price\", \"rooms\", \"floor\", \"sqrMetres\"]\n",
+ ")\n",
+ "m, n_plus_1 = data.values.shape\n",
+ "n = n_plus_1 - 1\n",
+ "Xn = data.values[:, 1:].reshape(m, n)\n",
+ "\n",
+ "# Dodaj kolumnę jedynek do macierzy\n",
+ "XMx = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
+ "yMx = np.matrix(data.values[:, 0]).reshape(m, 1)\n",
+ "\n",
+ "print(XMx[:5])\n",
+ "print(XMx.shape)\n",
+ "\n",
+ "print()\n",
+ "\n",
+ "print(yMx[:5])\n",
+ "print(yMx.shape)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Funkcja kosztu – notacja macierzowa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$J(\\theta)=\\dfrac{1}{2|\\vec y|}\\left(X\\theta-\\vec{y}\\right)^T\\left(X\\theta-\\vec{y}\\right)$$ \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 73,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle \\Large J(\\theta) = 85104141370.9717$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from IPython.display import display, Math, Latex\n",
+ "\n",
+ "\n",
+ "def JMx(theta, X, y):\n",
+ " \"\"\"Wersja macierzowa funkcji kosztu\"\"\"\n",
+ " m = len(y)\n",
+ " J = 1.0 / (2.0 * m) * ((X * theta - y).T * (X * theta - y))\n",
+ " return J.item()\n",
+ "\n",
+ "\n",
+ "thetaMx = np.matrix([10, 90, -1, 2.5]).reshape(4, 1)\n",
+ "\n",
+ "cost = JMx(thetaMx, XMx, yMx)\n",
+ "display(Math(r\"\\Large J(\\theta) = %.4f\" % cost))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Gradient – notacja macierzowa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$\\nabla J(\\theta) = \\frac{1}{|\\vec y|} X^T\\left(X\\theta-\\vec y\\right)$$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 74,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle \\large \\theta = \\left[\\begin{array}{r}10.0000 \\\\ 90.0000 \\\\ -1.0000 \\\\ 2.5000 \\\\ \\end{array}\\right]\\quad\\large \\nabla J(\\theta) = \\left[\\begin{array}{r}-373492.7442 \\\\ -1075656.5086 \\\\ -989554.4921 \\\\ -23806475.6561 \\\\ \\end{array}\\right]$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from IPython.display import display, Math, Latex\n",
+ "\n",
+ "\n",
+ "def dJMx(theta, X, y):\n",
+ " \"\"\"Wersja macierzowa gradientu funckji kosztu\"\"\"\n",
+ " return 1.0 / len(y) * (X.T * (X * theta - y))\n",
+ "\n",
+ "\n",
+ "thetaMx = np.matrix([10, 90, -1, 2.5]).reshape(4, 1)\n",
+ "\n",
+ "display(\n",
+ " Math(\n",
+ " r\"\\large \\theta = \"\n",
+ " + LatexMatrix(thetaMx)\n",
+ " + r\"\\quad\"\n",
+ " + r\"\\large \\nabla J(\\theta) = \"\n",
+ " + LatexMatrix(dJMx(thetaMx, XMx, yMx))\n",
+ " )\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Algorytm gradientu prostego – notacja macierzowa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "$$ \\theta := \\theta - \\alpha \\, \\nabla J(\\theta) $$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 75,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle \\large\\textrm{Wynik:}\\quad \\theta = \\left[\\begin{array}{r}17446.2135 \\\\ 86476.7960 \\\\ -1374.8950 \\\\ 2165.0689 \\\\ \\end{array}\\right] \\quad J(\\theta) = 10324864803.1591 \\quad \\textrm{po 374575 iteracjach}$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Implementacja algorytmu gradientu prostego za pomocą numpy i macierzy\n",
+ "\n",
+ "\n",
+ "def GDMx(fJ, fdJ, theta, X, y, alpha, eps):\n",
+ " current_cost = fJ(theta, X, y)\n",
+ " history = [[current_cost, theta]]\n",
+ " while True:\n",
+ " theta = theta - alpha * fdJ(theta, X, y) # implementacja wzoru\n",
+ " current_cost, prev_cost = fJ(theta, X, y), current_cost\n",
+ " if abs(prev_cost - current_cost) <= eps:\n",
+ " break\n",
+ " if current_cost > prev_cost:\n",
+ " print(\"Długość kroku (alpha) jest zbyt duża!\")\n",
+ " break\n",
+ " history.append([current_cost, theta])\n",
+ " return theta, history\n",
+ "\n",
+ "\n",
+ "thetaStartMx = np.zeros((n + 1, 1))\n",
+ "\n",
+ "# Zmieniamy wartości alpha (rozmiar kroku) oraz eps (kryterium stopu)\n",
+ "thetaBestMx, history = GDMx(JMx, dJMx, thetaStartMx, XMx, yMx, alpha=0.0001, eps=0.1)\n",
+ "\n",
+ "######################################################################\n",
+ "display(\n",
+ " Math(\n",
+ " r\"\\large\\textrm{Wynik:}\\quad \\theta = \"\n",
+ " + LatexMatrix(thetaBestMx)\n",
+ " + (r\" \\quad J(\\theta) = %.4f\" % history[-1][0])\n",
+ " + r\" \\quad \\textrm{po %d iteracjach}\" % len(history)\n",
+ " )\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 3.2. Metoda gradientu prostego w praktyce"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "### Kryterium stopu"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Algorytm gradientu prostego polega na wykonywaniu określonych kroków w pętli. Pytanie brzmi: kiedy należy zatrzymać wykonywanie tej pętli?\n",
+ "\n",
+ "W każdej kolejnej iteracji wartość funkcji kosztu maleje o coraz mniejszą wartość.\n",
+ "Parametr `eps` określa, jaka wartość graniczna tej różnicy jest dla nas wystarczająca:\n",
+ "\n",
+ " * Im mniejsza wartość `eps`, tym dokładniejszy wynik, ale dłuższy czas działania algorytmu.\n",
+ " * Im większa wartość `eps`, tym krótszy czas działania algorytmu, ale mniej dokładny wynik."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "source": [
+ "Na wykresie zobaczymy porównanie regresji dla różnych wartości `eps`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Wczytwanie danych z pliku za pomocą numpy – wersja macierzowa\n",
+ "data = np.loadtxt(\"data01_train.csv\", delimiter=\",\")\n",
+ "m, n_plus_1 = data.shape\n",
+ "n = n_plus_1 - 1\n",
+ "Xn = data[:, 0:n].reshape(m, n)\n",
+ "\n",
+ "# Dodaj kolumnę jedynek do macierzy\n",
+ "XMx = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
+ "yMx = np.matrix(data[:, 1]).reshape(m, 1)\n",
+ "\n",
+ "thetaStartMx = np.zeros((2, 1))\n",
+ "\n",
+ "fig = regdotsMx(XMx, yMx)\n",
+ "theta_e1, history1 = GDMx(\n",
+ " JMx, dJMx, thetaStartMx, XMx, yMx, alpha=0.01, eps=0.01\n",
+ ") # niebieska linia\n",
+ "reglineMx(fig, hMx, theta_e1, XMx)\n",
+ "theta_e2, history2 = GDMx(\n",
+ " JMx, dJMx, thetaStartMx, XMx, yMx, alpha=0.01, eps=0.000001\n",
+ ") # pomarańczowa linia\n",
+ "reglineMx(fig, hMx, theta_e2, XMx)\n",
+ "legend(fig)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 77,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle \\theta_{10^{-2}} = \\left[\\begin{array}{r}0.0531 \\\\ 0.8365 \\\\ \\end{array}\\right]\\quad\\theta_{10^{-6}} = \\left[\\begin{array}{r}-3.4895 \\\\ 1.1786 \\\\ \\end{array}\\right]$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display(\n",
+ " Math(\n",
+ " r\"\\theta_{10^{-2}} = \"\n",
+ " + LatexMatrix(theta_e1)\n",
+ " + r\"\\quad\\theta_{10^{-6}} = \"\n",
+ " + LatexMatrix(theta_e2)\n",
+ " )\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### Długość kroku ($\\alpha$)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Jak zmienia się koszt w kolejnych krokach w zależności od alfa\n",
+ "\n",
+ "\n",
+ "def costchangeplot(history):\n",
+ " fig = plt.figure(figsize=(16 * 0.6, 9 * 0.6))\n",
+ " ax = fig.add_subplot(111)\n",
+ " fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
+ " ax.set_xlabel(\"krok\")\n",
+ " ax.set_ylabel(r\"$J(\\theta)$\")\n",
+ "\n",
+ " X = np.arange(0, 500, 1)\n",
+ " Y = [history[step][0] for step in X]\n",
+ " ax.plot(X, Y, linewidth=\"2\", label=(r\"$J(\\theta)$\"))\n",
+ " return fig\n",
+ "\n",
+ "\n",
+ "def slide7(alpha):\n",
+ " best_theta, history = gradient_descent(\n",
+ " h, J, [0.0, 0.0], x, y, alpha=alpha, eps=0.0001\n",
+ " )\n",
+ " fig = costchangeplot(history)\n",
+ " legend(fig)\n",
+ "\n",
+ "\n",
+ "sliderAlpha1 = widgets.FloatSlider(\n",
+ " min=0.01, max=0.03, step=0.001, value=0.02, description=r\"$\\alpha$\", width=300\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 79,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "52b0d91e39104f4facbb7f57819aae0c",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "interactive(children=(FloatSlider(value=0.02, description='$\\\\alpha$', max=0.03, min=0.01, step=0.001), Button…"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 79,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "widgets.interact_manual(slide7, alpha=sliderAlpha1)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## 3.3. Normalizacja danych"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "source": [
+ "Normalizacja danych to proces, który polega na dostosowaniu danych wejściowych w taki sposób, żeby ułatwić działanie algorytmowi gradientu prostego.\n",
+ "\n",
+ "Wyjaśnię to na przykladzie."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Użyjemy danych z „Gratka flats challenge 2017”.\n",
+ "\n",
+ "Rozważmy model $h(x) = \\theta_0 + \\theta_1 x_1 + \\theta_2 x_2$, w którym cena mieszkania prognozowana jest na podstawie liczby pokoi $x_1$ i metrażu $x_2$:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " price | \n",
+ " rooms | \n",
+ " sqrMetres | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 476118.00 | \n",
+ " 3 | \n",
+ " 78 | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 459531.00 | \n",
+ " 3 | \n",
+ " 62 | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 411557.00 | \n",
+ " 3 | \n",
+ " 15 | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 496416.00 | \n",
+ " 4 | \n",
+ " 14 | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 406032.00 | \n",
+ " 3 | \n",
+ " 15 | \n",
+ "
\n",
+ " \n",
+ " 5 | \n",
+ " 450026.00 | \n",
+ " 3 | \n",
+ " 80 | \n",
+ "
\n",
+ " \n",
+ " 6 | \n",
+ " 571229.15 | \n",
+ " 2 | \n",
+ " 39 | \n",
+ "
\n",
+ " \n",
+ " 7 | \n",
+ " 325000.00 | \n",
+ " 3 | \n",
+ " 54 | \n",
+ "
\n",
+ " \n",
+ " 8 | \n",
+ " 268229.00 | \n",
+ " 2 | \n",
+ " 90 | \n",
+ "
\n",
+ " \n",
+ " 9 | \n",
+ " 604836.00 | \n",
+ " 4 | \n",
+ " 40 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " price rooms sqrMetres\n",
+ "0 476118.00 3 78\n",
+ "1 459531.00 3 62\n",
+ "2 411557.00 3 15\n",
+ "3 496416.00 4 14\n",
+ "4 406032.00 3 15\n",
+ "5 450026.00 3 80\n",
+ "6 571229.15 2 39\n",
+ "7 325000.00 3 54\n",
+ "8 268229.00 2 90\n",
+ "9 604836.00 4 40"
+ ]
+ },
+ "execution_count": 81,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Wczytanie danych przy pomocy biblioteki pandas\n",
+ "import pandas\n",
+ "\n",
+ "alldata = pandas.read_csv(\n",
+ " \"data_flats.tsv\", header=0, sep=\"\\t\", usecols=[\"price\", \"rooms\", \"sqrMetres\"]\n",
+ ")\n",
+ "alldata[:10]\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 82,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Funkcja, która pokazuje wartości minimalne i maksymalne w macierzy X\n",
+ "\n",
+ "\n",
+ "def show_mins_and_maxs(XMx):\n",
+ " mins = np.amin(XMx, axis=0).tolist()[0] # wartości minimalne\n",
+ " maxs = np.amax(XMx, axis=0).tolist()[0] # wartości maksymalne\n",
+ " for i, (xmin, xmax) in enumerate(zip(mins, maxs)):\n",
+ " display(Math(r\"${:.2F} \\leq x_{} \\leq {:.2F}$\".format(xmin, i, xmax)))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 83,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Przygotowanie danych\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "%matplotlib inline\n",
+ "\n",
+ "data2 = np.matrix(alldata[['rooms', 'sqrMetres', 'price']])\n",
+ "\n",
+ "m, n_plus_1 = data2.shape\n",
+ "n = n_plus_1 - 1\n",
+ "Xn = data2[:, 0:n]\n",
+ "\n",
+ "XMx2 = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
+ "yMx2 = np.matrix(data2[:, -1]).reshape(m, 1) / 1000.0"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Cechy w danych treningowych przyjmują wartości z zakresu:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 84,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle 1.00 \\leq x_0 \\leq 1.00$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle 2.00 \\leq x_1 \\leq 7.00$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/latex": [
+ "$\\displaystyle 12.00 \\leq x_2 \\leq 196.00$"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "show_mins_and_maxs(XMx2)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "subslide"
+ }
+ },
+ "source": [
+ "Jak widzimy, $x_2$ przyjmuje wartości dużo większe niż $x_1$.\n",
+ "Powoduje to, że wykres funkcji kosztu jest bardzo „spłaszczony” wzdłuż jednej z osi:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 85,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "notes"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "def contour_plot(X, y, rescale=10**8):\n",
+ " theta0_vals = np.linspace(-100000, 100000, 100)\n",
+ " theta1_vals = np.linspace(-100000, 100000, 100)\n",
+ "\n",
+ " J_vals = np.zeros(shape=(theta0_vals.size, theta1_vals.size))\n",
+ " for t1, element in enumerate(theta0_vals):\n",
+ " for t2, element2 in enumerate(theta1_vals):\n",
+ " thetaT = np.matrix([1.0, element, element2]).reshape(3, 1)\n",
+ " J_vals[t1, t2] = JMx(thetaT, X, y) / rescale\n",
+ "\n",
+ " plt.figure()\n",
+ " plt.contour(theta0_vals, theta1_vals, J_vals.T, np.logspace(-2, 3, 20))\n",
+ " plt.xlabel(r\"$\\theta_1$\")\n",
+ " plt.ylabel(r\"$\\theta_2$\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 86,
+ "metadata": {
+ "slideshow": {
+ "slide_type": "fragment"
+ }
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ "