2693 lines
307 KiB
Plaintext
2693 lines
307 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Uczenie maszynowe\n",
|
||
"# 3. Ewaluacja, regularyzacja, optymalizacja"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.1. Metodologia testowania"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W uczeniu maszynowym bardzo ważna jest ewaluacja budowanego modelu. Dlatego dobrze jest podzielić posiadane dane na odrębne zbiory – osobny zbiór danych do uczenia i osobny do testowania. W niektórych przypadkach potrzeba będzie dodatkowo wyodrębnić tzw. zbiór walidacyjny."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zbiór uczący a zbiór testowy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"* Na zbiorze uczącym (treningowym) uczymy algorytmy, a na zbiorze testowym sprawdzamy ich poprawność.\n",
|
||
"* Zbiór uczący powinien być kilkukrotnie większy od testowego (np. 4:1, 9:1 itp.).\n",
|
||
"* Zbiór testowy często jest nieznany.\n",
|
||
"* Należy unikać mieszania danych testowych i treningowych – nie wolno „zanieczyszczać” danych treningowych danymi testowymi!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Czasami potrzebujemy dobrać parametry modelu, np. $\\alpha$ – który zbiór wykorzystać do tego celu?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zbiór walidacyjny"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do doboru parametrów najlepiej użyć jeszcze innego zbioru – jest to tzw. **zbiór walidacyjny**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Zbiór walidacyjny powinien mieć wielkość zbliżoną do wielkości zbioru testowego, czyli np. dane można podzielić na te trzy zbiory w proporcjach 3:1:1, 8:1:1 itp."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Walidacja krzyżowa"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Którą część danych wydzielić jako zbiór walidacyjny tak, żeby było „najlepiej”?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Niech każda partia danych pełni tę rolę naprzemiennie!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img width=\"100%\" src=\"https://chrisjmccormick.files.wordpress.com/2013/07/10_fold_cv.png\"/>\n",
|
||
"Żródło: https://chrisjmccormick.wordpress.com/2013/07/31/k-fold-cross-validation-with-matlab-code/"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Walidacja krzyżowa\n",
|
||
"\n",
|
||
"* Podziel dane $D = \\left\\{ (x^{(1)}, y^{(1)}), \\ldots, (x^{(m)}, y^{(m)})\\right\\} $ na $N$ rozłącznych zbiorów $T_1,\\ldots,T_N$\n",
|
||
"* Dla $i=1,\\ldots,N$, wykonaj:\n",
|
||
" * Użyj $T_i$ do walidacji i zbiór $S_i$ do trenowania, gdzie $S_i = D \\smallsetminus T_i$. \n",
|
||
" * Zapisz model $\\theta_i$.\n",
|
||
"* Akumuluj wyniki dla modeli $\\theta_i$ dla zbiorów $T_i$.\n",
|
||
"* Ustalaj parametry uczenia na akumulowanych wynikach."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Walidacja krzyżowa – wskazówki\n",
|
||
"\n",
|
||
"* Zazwyczaj ustala się $N$ w przedziale od $4$ do $10$, tzw. $N$-krotna walidacja krzyżowa (*$N$-fold cross validation*). \n",
|
||
"* Zbiór $D$ warto zrandomizować przed podziałem.\n",
|
||
"* W jaki sposób akumulować wyniki dla wszystkich zbiórow $T_i$?\n",
|
||
"* Po ustaleniu parametrów dla każdego $T_i$, trenujemy model na całych danych treningowych z ustalonymi parametrami.\n",
|
||
"* Testujemy na zbiorze testowym (jeśli nim dysponujemy)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### _Leave-one-out_\n",
|
||
"\n",
|
||
"Jest to szczególny przypadek walidacji krzyżowej, w której $N = m$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"* Jaki jest rozmiar pojedynczego zbioru $T_i$?\n",
|
||
"* Jakie są zalety i wady tej metody?\n",
|
||
"* Kiedy może być przydatna?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zbiór walidujący a algorytmy optymalizacji\n",
|
||
"\n",
|
||
"* Gdy błąd rośnie na zbiorze uczącym, mamy źle dobrany parametr $\\alpha$. Należy go wtedy zmniejszyć.\n",
|
||
"* Gdy błąd zmniejsza się na zbiorze trenującym, ale rośnie na zbiorze walidującym, mamy do czynienia ze zjawiskiem **nadmiernego dopasowania** (*overfitting*).\n",
|
||
"* Należy wtedy przerwać optymalizację. Automatyzacja tego procesu to _early stopping_."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.2. Miary jakości"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Aby przeprowadzić ewaluację modelu, musimy wybrać **miarę** (**metrykę**), jakiej będziemy używać.\n",
|
||
"\n",
|
||
"Jakiej miary użyc najlepiej?\n",
|
||
" * To zależy od rodzaju zadania.\n",
|
||
" * Innych metryk używa się do regresji, a innych do klasyfikacji"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Metryki dla zadań regresji\n",
|
||
"\n",
|
||
"Dla zadań regresji możemy zastosować np.:\n",
|
||
" * błąd średniokwadratowy (*root-mean-square error*, RMSE):\n",
|
||
" $$ \\mathrm{RMSE} \\, = \\, \\sqrt{ \\frac{1}{m} \\sum_{i=1}^{m} \\left( \\hat{y}^{(i)} - y^{(i)} \\right)^2 } $$\n",
|
||
" * średni błąd bezwzględny (*mean absolute error*, MAE):\n",
|
||
" $$ \\mathrm{MAE} \\, = \\, \\frac{1}{m} \\sum_{i=1}^{m} \\left| \\hat{y}^{(i)} - y^{(i)} \\right| $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W powyższych wzorach $y^{(i)}$ oznacza **oczekiwaną** wartości zmiennej $y$ w $i$-tym przykładzie, a $\\hat{y}^{(i)}$ oznacza wartość zmiennej $y$ w $i$-tym przykładzie wyliczoną (**przewidzianą**) przez nasz model."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Metryki dla zadań klasyfikacji\n",
|
||
"\n",
|
||
"Aby przedstawić kilka najpopularniejszych metryk stosowanych dla zadań klasyfikacyjnych, posłużmy się następującym przykładem:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Przydatne importy\n",
|
||
"\n",
|
||
"import ipywidgets as widgets\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import numpy as np\n",
|
||
"import pandas\n",
|
||
"import random\n",
|
||
"import seaborn\n",
|
||
"\n",
|
||
"%matplotlib inline"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def powerme(x1,x2,n):\n",
|
||
" \"\"\"Funkcja, która generuje n potęg dla zmiennych x1 i x2 oraz ich iloczynów\"\"\"\n",
|
||
" X = []\n",
|
||
" for m in range(n+1):\n",
|
||
" for i in range(m+1):\n",
|
||
" X.append(np.multiply(np.power(x1,i),np.power(x2,(m-i))))\n",
|
||
" return np.hstack(X)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def plot_data_for_classification(X, Y, xlabel=None, ylabel=None, Y_predicted=[], highlight=None):\n",
|
||
" \"\"\"Wykres danych dla zadania klasyfikacji\"\"\"\n",
|
||
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
||
" ax = fig.add_subplot(111)\n",
|
||
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
||
" X = X.tolist()\n",
|
||
" Y = Y.tolist()\n",
|
||
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
|
||
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
|
||
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
|
||
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
|
||
" \n",
|
||
" if len(Y_predicted) > 0:\n",
|
||
" Y_predicted = Y_predicted.tolist()\n",
|
||
" X1tn = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 0]\n",
|
||
" X1fn = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 0]\n",
|
||
" X1tp = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 1]\n",
|
||
" X1fp = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 1]\n",
|
||
" X2tn = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 0]\n",
|
||
" X2fn = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 0]\n",
|
||
" X2tp = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 1]\n",
|
||
" X2fp = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 1]\n",
|
||
" \n",
|
||
" if highlight == 'tn':\n",
|
||
" ax.scatter(X1tn, X2tn, c='r', marker='x', s=100, label='Dane')\n",
|
||
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
||
" elif highlight == 'fn':\n",
|
||
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fn, X2fn, c='g', marker='o', s=100, label='Dane')\n",
|
||
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
||
" elif highlight == 'tp':\n",
|
||
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1tp, X2tp, c='g', marker='o', s=100, label='Dane')\n",
|
||
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
||
" elif highlight == 'fp':\n",
|
||
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fp, X2fp, c='r', marker='x', s=100, label='Dane')\n",
|
||
" else:\n",
|
||
" ax.scatter(X1tn, X2tn, c='r', marker='x', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fn, X2fn, c='g', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1tp, X2tp, c='g', marker='o', s=50, label='Dane')\n",
|
||
" ax.scatter(X1fp, X2fp, c='r', marker='x', s=50, label='Dane')\n",
|
||
"\n",
|
||
" else:\n",
|
||
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
|
||
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
|
||
" \n",
|
||
" if xlabel:\n",
|
||
" ax.set_xlabel(xlabel)\n",
|
||
" if ylabel:\n",
|
||
" ax.set_ylabel(ylabel)\n",
|
||
" \n",
|
||
" ax.margins(.05, .05)\n",
|
||
" return fig"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Wczytanie danych\n",
|
||
"import pandas\n",
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"alldata = pandas.read_csv('data-metrics.tsv', sep='\\t')\n",
|
||
"data = np.matrix(alldata)\n",
|
||
"\n",
|
||
"m, n_plus_1 = data.shape\n",
|
||
"n = n_plus_1 - 1\n",
|
||
"\n",
|
||
"X2 = powerme(data[:, 1], data[:, 2], n)\n",
|
||
"Y2 = np.matrix(data[:, 0]).reshape(m, 1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def safeSigmoid(x, eps=0):\n",
|
||
" \"\"\"Funkcja sigmoidalna zmodyfikowana w taki sposób, \n",
|
||
" żeby wartości zawsze były odległe od asymptot o co najmniej eps\n",
|
||
" \"\"\"\n",
|
||
" y = 1.0/(1.0 + np.exp(-x))\n",
|
||
" if eps > 0:\n",
|
||
" y[y < eps] = eps\n",
|
||
" y[y > 1 - eps] = 1 - eps\n",
|
||
" return y\n",
|
||
"\n",
|
||
"def h(theta, X, eps=0.0):\n",
|
||
" \"\"\"Funkcja hipotezy (regresja logistyczna)\"\"\"\n",
|
||
" return safeSigmoid(X*theta, eps)\n",
|
||
"\n",
|
||
"def J(h,theta,X,y, lamb=0):\n",
|
||
" \"\"\"Funkcja kosztu dla regresji logistycznej\"\"\"\n",
|
||
" m = len(y)\n",
|
||
" f = h(theta, X, eps=10**-7)\n",
|
||
" j = -np.sum(np.multiply(y, np.log(f)) + \n",
|
||
" np.multiply(1 - y, np.log(1 - f)), axis=0)/m\n",
|
||
" if lamb > 0:\n",
|
||
" j += lamb/(2*m) * np.sum(np.power(theta[1:],2))\n",
|
||
" return j\n",
|
||
"\n",
|
||
"def dJ(h,theta,X,y,lamb=0):\n",
|
||
" \"\"\"Gradient funkcji kosztu\"\"\"\n",
|
||
" g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))\n",
|
||
" if lamb > 0:\n",
|
||
" g[1:] += lamb/float(y.shape[0]) * theta[1:] \n",
|
||
" return g\n",
|
||
"\n",
|
||
"def classifyBi(theta, X):\n",
|
||
" \"\"\"Funkcja predykcji - klasyfikacja dwuklasowa\"\"\"\n",
|
||
" prob = h(theta, X)\n",
|
||
" return prob"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def GD(h, fJ, fdJ, theta, X, y, alpha=0.01, eps=10**-3, maxSteps=10000):\n",
|
||
" \"\"\"Metoda gradientu prostego dla regresji logistycznej\"\"\"\n",
|
||
" errorCurr = fJ(h, theta, X, y)\n",
|
||
" errors = [[errorCurr, theta]]\n",
|
||
" while True:\n",
|
||
" # oblicz nowe theta\n",
|
||
" theta = theta - alpha * fdJ(h, theta, X, y)\n",
|
||
" # raportuj poziom błędu\n",
|
||
" errorCurr, errorPrev = fJ(h, theta, X, y), errorCurr\n",
|
||
" # kryteria stopu\n",
|
||
" if abs(errorPrev - errorCurr) <= eps:\n",
|
||
" break\n",
|
||
" if len(errors) > maxSteps:\n",
|
||
" break\n",
|
||
" errors.append([errorCurr, theta]) \n",
|
||
" return theta, errors"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"theta = [[ 1.37136167]\n",
|
||
" [ 0.90128948]\n",
|
||
" [ 0.54708112]\n",
|
||
" [-5.9929264 ]\n",
|
||
" [ 2.64435168]\n",
|
||
" [-4.27978238]]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
|
||
"theta_start = np.matrix(np.zeros(X2.shape[1])).reshape(X2.shape[1],1)\n",
|
||
"theta, errors = GD(h, J, dJ, theta_start, X2, Y2, \n",
|
||
" alpha=0.1, eps=10**-7, maxSteps=10000)\n",
|
||
"print('theta = {}'.format(theta))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def plot_decision_boundary(fig, theta, X):\n",
|
||
" \"\"\"Wykres granicy klas\"\"\"\n",
|
||
" ax = fig.axes[0]\n",
|
||
" xx, yy = np.meshgrid(np.arange(-1.0, 1.0, 0.02),\n",
|
||
" np.arange(-1.0, 1.0, 0.02))\n",
|
||
" l = len(xx.ravel())\n",
|
||
" C = powerme(xx.reshape(l, 1), yy.reshape(l, 1), n)\n",
|
||
" z = classifyBi(theta, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))\n",
|
||
"\n",
|
||
" plt.contour(xx, yy, z, levels=[0.5], lw=3);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"Y_expected = Y2.astype(int)\n",
|
||
"Y_predicted = (classifyBi(theta, X2) > 0.5).astype(int)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Przygotowanie interaktywnego wykresu\n",
|
||
"\n",
|
||
"dropdown_highlight = widgets.Dropdown(options=['all', 'tp', 'fp', 'tn', 'fn'], value='all', description='highlight')\n",
|
||
"\n",
|
||
"def interactive_classification(highlight):\n",
|
||
" fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$',\n",
|
||
" Y_predicted=Y_predicted, highlight=highlight)\n",
|
||
" plot_decision_boundary(fig, theta, X2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"application/vnd.jupyter.widget-view+json": {
|
||
"model_id": "b208b75eb3484bef8d52e8aec9b89448",
|
||
"version_major": 2,
|
||
"version_minor": 0
|
||
},
|
||
"text/plain": [
|
||
"interactive(children=(Dropdown(description='highlight', options=('all', 'tp', 'fp', 'tn', 'fn'), value='all'),…"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<function __main__.interactive_classification(highlight)>"
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"widgets.interact(interactive_classification, highlight=dropdown_highlight)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Zadanie klasyfikacyjne z powyższego przykładu polega na przypisaniu punktów do jednej z dwóch kategorii:\n",
|
||
" 0. <font color=\"red\">czerwone krzyżyki</font>\n",
|
||
" 1. <font color=\"green\">zielone kółka</font>\n",
|
||
"\n",
|
||
"W tym celu zastosowano regresję logistyczną."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"W rezultacie otrzymano model, który dzieli płaszczyznę na dwa obszary:\n",
|
||
" 0. <font color=\"red\">na zewnątrz granatowej krzywej</font>\n",
|
||
" 1. <font color=\"green\">wewnątrz granatowej krzywej</font>\n",
|
||
" \n",
|
||
"Model przewiduje klasę <font color=\"red\">0 („czerwoną”)</font> dla punktów znajdujący się w obszarze na zewnątrz krzywej, natomiast klasę <font color=\"green\">1 („zieloną”)</font> dla punktów znajdujących sie w obszarze wewnąrz krzywej."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Wszysktie obserwacje możemy podzielić zatem na cztery grupy:\n",
|
||
" * **true positives (TP)** – prawidłowo sklasyfikowane pozytywne przykłady (<font color=\"green\">zielone kółka</font> w <font color=\"green\">wewnętrznym obszarze</font>)\n",
|
||
" * **true negatives (TN)** – prawidłowo sklasyfikowane negatywne przykłady (<font color=\"red\">czerwone krzyżyki</font> w <font color=\"red\">zewnętrznym obszarze</font>)\n",
|
||
" * **false positives (FP)** – negatywne przykłady sklasyfikowane jako pozytywne (<font color=\"red\">czerwone krzyżyki</font> w <font color=\"green\">wewnętrznym obszarze</font>)\n",
|
||
" * **false negatives (FN)** – pozytywne przykłady sklasyfikowane jako negatywne (<font color=\"green\">zielone kółka</font> w <font color=\"red\">zewnętrznym obszarze</font>)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Innymi słowy:\n",
|
||
"\n",
|
||
"<img width=\"50%\" src=\"https://blog.aimultiple.com/wp-content/uploads/2019/07/positive-negative-true-false-matrix.png\">"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"TP = 5\n",
|
||
"TN = 35\n",
|
||
"FP = 3\n",
|
||
"FN = 6\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# Obliczmy TP, TN, FP i FN\n",
|
||
"\n",
|
||
"tp = 0\n",
|
||
"tn = 0\n",
|
||
"fp = 0\n",
|
||
"fn = 0\n",
|
||
"\n",
|
||
"for i in range(len(Y_expected)):\n",
|
||
" if Y_expected[i] == 1 and Y_predicted[i] == 1:\n",
|
||
" tp += 1\n",
|
||
" elif Y_expected[i] == 0 and Y_predicted[i] == 0:\n",
|
||
" tn += 1\n",
|
||
" elif Y_expected[i] == 0 and Y_predicted[i] == 1:\n",
|
||
" fp += 1\n",
|
||
" elif Y_expected[i] == 1 and Y_predicted[i] == 0:\n",
|
||
" fn += 1\n",
|
||
" \n",
|
||
"print('TP =', tp)\n",
|
||
"print('TN =', tn)\n",
|
||
"print('FP =', fp)\n",
|
||
"print('FN =', fn)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy teraz zdefiniować następujące metryki:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Dokładność (*accuracy*)\n",
|
||
"$$ \\mbox{accuracy} = \\frac{\\mbox{przypadki poprawnie sklasyfikowane}}{\\mbox{wszystkie przypadki}} = \\frac{TP + TN}{TP + TN + FP + FN} $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dokładność otrzymujemy przez podzielenie liczby przypadków poprawnie sklasyfikowanych przez liczbę wszystkich przypadków:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Accuracy: 0.8163265306122449\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
|
||
"print('Accuracy:', accuracy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Uwaga:** Nie zawsze dokładność będzie dobrą miarą, zwłaszcza gdy klasy są bardzo asymetryczne!\n",
|
||
"\n",
|
||
"*Przykład:* Wyobraźmy sobie test na koronawirusa, który **zawsze** zwraca wynik negatywny. Jaką przydatność będzie miał taki test w praktyce? Żadną. A jaka będzie jego *dokładność*? Policzmy:\n",
|
||
"$$ \\mbox{accuracy} \\, = \\, \\frac{\\mbox{szacowana liczba osób zdrowych na świecie}}{\\mbox{populacja Ziemi}} \\, \\approx \\, \\frac{7\\,700\\,000\\,000 - 600\\,000}{7\\,700\\,000\\,000} \\, \\approx \\, 0.99992 $$\n",
|
||
"(zaokrąglone dane z 27 marca 2020)\n",
|
||
"\n",
|
||
"Powyższy wynik jest tak wysoki, ponieważ zdecydowana większość osób na świecie nie jest zakażona, więc biorąc losowego Ziemianina możemy w ciemno strzelać, że nie ma koronawirusa.\n",
|
||
"\n",
|
||
"W tym przypadku duża różnica w liczności obu zbiorów (zakażeni/niezakażeni) powoduje, że *accuracy* nie jest dobrą metryką.\n",
|
||
"\n",
|
||
"Dlatego dysponujemy również innymi metrykami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Precyzja (*precision*)\n",
|
||
"$$ \\mbox{precision} = \\frac{TP}{TP + FP} $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Precision: 0.625\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"precision = tp / (tp + fp)\n",
|
||
"print('Precision:', precision)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Precyzja określa, jaka część przykładów sklasyfikowanych jako pozytywne to faktycznie przykłady pozytywne."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Pokrycie (czułość, *recall*)\n",
|
||
"$$ \\mbox{recall} = \\frac{TP}{TP + FN} $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Recall: 0.45454545454545453\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"recall = tp / (tp + fn)\n",
|
||
"print('Recall:', recall)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Pokrycie mówi nam, jaka część przykładów pozytywnych została poprawnie sklasyfikowana."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### *$F$-measure* (*$F$-score*)\n",
|
||
"$$ F = \\frac{2 \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\mbox{precision} + \\mbox{recall}} $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"F-score: 0.5263157894736842\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"fscore = (2 * precision * recall) / (precision + recall)\n",
|
||
"print('F-score:', fscore)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"$F$-_measure_ jest kompromisem między precyzją a pokryciem (a ściślej: jest średnią harmoniczną precyzji i pokrycia)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"$F$-_measure_ jest szczególnym przypadkiem ogólniejszej miary:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"*$F_\\beta$-measure*:\n",
|
||
"$$ F_\\beta = \\frac{(1 + \\beta) \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\beta^2 \\cdot \\mbox{precision} + \\mbox{recall}} $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dla $\\beta = 1$ otrzymujemy:\n",
|
||
"$$ F_1 \\, = \\, \\frac{(1 + 1) \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{1^2 \\cdot \\mbox{precision} + \\mbox{recall}} \\, = \\, \\frac{2 \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\mbox{precision} + \\mbox{recall}} \\, = \\, F $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.3. Obserwacje odstające"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Obserwacje odstające** (*outliers*) – to wszelkie obserwacje posiadające nietypową wartość.\n",
|
||
"\n",
|
||
"Mogą być na przykład rezultatem błędnego pomiaru albo pomyłki przy wprowadzaniu danych do bazy, ale nie tylko.\n",
|
||
"\n",
|
||
"Obserwacje odstające mogą niekiedy znacząco wpłynąć na parametry modelu, dlatego ważne jest, żeby takie obserwacje odrzucić zanim przystąpi się do tworzenia modelu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W poniższym przykładzie można zobaczyć wpływ obserwacji odstających na wynik modelowania na przykładzie danych dotyczących cen mieszkań zebranych z ogłoszeń na portalu Gratka.pl: tutaj przykładem obserwacji odstającej może być ogłoszenie, w którym podano cenę w tys. zł zamiast ceny w zł."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Przydatne funkcje\n",
|
||
"\n",
|
||
"def h_linear(Theta, x):\n",
|
||
" \"\"\"Funkcja regresji liniowej\"\"\"\n",
|
||
" return x * Theta\n",
|
||
"\n",
|
||
"def linear_regression(theta):\n",
|
||
" \"\"\"Ta funkcja zwraca funkcję regresji liniowej dla danego wektora parametrów theta\"\"\"\n",
|
||
" return lambda x: h_linear(theta, x)\n",
|
||
"\n",
|
||
"def cost(theta, X, y):\n",
|
||
" \"\"\"Wersja macierzowa funkcji kosztu\"\"\"\n",
|
||
" m = len(y)\n",
|
||
" J = 1.0 / (2.0 * m) * ((X * theta - y).T * (X * theta - y))\n",
|
||
" return J.item()\n",
|
||
"\n",
|
||
"def gradient(theta, X, y):\n",
|
||
" \"\"\"Wersja macierzowa gradientu funkcji kosztu\"\"\"\n",
|
||
" return 1.0 / len(y) * (X.T * (X * theta - y)) \n",
|
||
"\n",
|
||
"def gradient_descent(fJ, fdJ, theta, X, y, alpha=0.1, eps=10**-5):\n",
|
||
" \"\"\"Algorytm gradientu prostego (wersja macierzowa)\"\"\"\n",
|
||
" current_cost = fJ(theta, X, y)\n",
|
||
" logs = [[current_cost, theta]]\n",
|
||
" while True:\n",
|
||
" theta = theta - alpha * fdJ(theta, X, y)\n",
|
||
" current_cost, prev_cost = fJ(theta, X, y), current_cost\n",
|
||
" if abs(prev_cost - current_cost) > 10**15:\n",
|
||
" print('Algorithm does not converge!')\n",
|
||
" break\n",
|
||
" if abs(prev_cost - current_cost) <= eps:\n",
|
||
" break\n",
|
||
" logs.append([current_cost, theta]) \n",
|
||
" return theta, logs\n",
|
||
"\n",
|
||
"def plot_data(X, y, xlabel, ylabel):\n",
|
||
" \"\"\"Wykres danych (wersja macierzowa)\"\"\"\n",
|
||
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
||
" ax = fig.add_subplot(111)\n",
|
||
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
||
" ax.scatter([X[:, 1]], [y], c='r', s=50, label='Dane')\n",
|
||
" \n",
|
||
" ax.set_xlabel(xlabel)\n",
|
||
" ax.set_ylabel(ylabel)\n",
|
||
" ax.margins(.05, .05)\n",
|
||
" plt.ylim(y.min() - 1, y.max() + 1)\n",
|
||
" plt.xlim(np.min(X[:, 1]) - 1, np.max(X[:, 1]) + 1)\n",
|
||
" return fig\n",
|
||
"\n",
|
||
"def plot_regression(fig, fun, theta, X):\n",
|
||
" \"\"\"Wykres krzywej regresji (wersja macierzowa)\"\"\"\n",
|
||
" ax = fig.axes[0]\n",
|
||
" x0 = np.min(X[:, 1]) - 1.0\n",
|
||
" x1 = np.max(X[:, 1]) + 1.0\n",
|
||
" L = [x0, x1]\n",
|
||
" LX = np.matrix([1, x0, 1, x1]).reshape(2, 2)\n",
|
||
" ax.plot(L, fun(theta, LX), linewidth='2',\n",
|
||
" label=(r'$y={theta0:.2}{op}{theta1:.2}x$'.format(\n",
|
||
" theta0=float(theta[0][0]),\n",
|
||
" theta1=(float(theta[1][0]) if theta[1][0] >= 0 else float(-theta[1][0])),\n",
|
||
" op='+' if theta[1][0] >= 0 else '-')))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Wczytanie danych (mieszkania) przy pomocy biblioteki pandas\n",
|
||
"\n",
|
||
"alldata = pandas.read_csv('data_flats_with_outliers.tsv', sep='\\t',\n",
|
||
" names=['price', 'isNew', 'rooms', 'floor', 'location', 'sqrMetres'])\n",
|
||
"data = np.matrix(alldata[['price', 'sqrMetres']])\n",
|
||
"\n",
|
||
"m, n_plus_1 = data.shape\n",
|
||
"n = n_plus_1 - 1\n",
|
||
"Xn = data[:, 0:n]\n",
|
||
"\n",
|
||
"Xo = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n + 1)\n",
|
||
"yo = np.matrix(data[:, -1]).reshape(m, 1)\n",
|
||
"\n",
|
||
"Xo /= np.amax(Xo, axis=0)\n",
|
||
"yo /= np.amax(yo, axis=0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(Xo, yo, xlabel=u'metraż', ylabel=u'cena')\n",
|
||
"theta_start = np.matrix([0.0, 0.0]).reshape(2, 1)\n",
|
||
"theta, logs = gradient_descent(cost, gradient, theta_start, Xo, yo, alpha=0.01)\n",
|
||
"plot_regression(fig, h_linear, theta, Xo)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Na powyższym przykładzie obserwacja odstająca jawi sie jako pojedynczy punkt po prawej stronie wykresu. Widzimy, że otrzymana krzywa regresji zamiast odwzorowywać ogólny trend, próbuje „dopasować się” do tej pojedynczej obserwacji.\n",
|
||
"\n",
|
||
"Dlatego taką obserwację należy usunąć ze zbioru danych (zobacz ponizej)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Odrzućmy obserwacje odstające\n",
|
||
"alldata_no_outliers = [\n",
|
||
" (index, item) for index, item in alldata.iterrows() \n",
|
||
" if item.price > 100 and item.sqrMetres > 10]\n",
|
||
"\n",
|
||
"alldata_no_outliers = alldata.loc[(alldata['price'] > 100) & (alldata['sqrMetres'] > 100)]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"data = np.matrix(alldata_no_outliers[['price', 'sqrMetres']])\n",
|
||
"\n",
|
||
"m, n_plus_1 = data.shape\n",
|
||
"n = n_plus_1 - 1\n",
|
||
"Xn = data[:, 0:n]\n",
|
||
"\n",
|
||
"Xo = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n + 1)\n",
|
||
"yo = np.matrix(data[:, -1]).reshape(m, 1)\n",
|
||
"\n",
|
||
"Xo /= np.amax(Xo, axis=0)\n",
|
||
"yo /= np.amax(yo, axis=0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {
|
||
"scrolled": true,
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(Xo, yo, xlabel=u'metraż', ylabel=u'cena')\n",
|
||
"theta_start = np.matrix([0.0, 0.0]).reshape(2, 1)\n",
|
||
"theta, logs = gradient_descent(cost, gradient, theta_start, Xo, yo, alpha=0.01)\n",
|
||
"plot_regression(fig, h_linear, theta, Xo)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Na powyższym wykresie widać, że po odrzuceniu obserwacji odstających otrzymujemy dużo bardziej „wiarygodną” krzywą regresji."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.4. Problem nadmiernego dopasowania"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Obciążenie a wariancja"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Dane do prostego przykładu\n",
|
||
"\n",
|
||
"data = np.matrix([\n",
|
||
" [0.0, 0.0],\n",
|
||
" [0.5, 1.8],\n",
|
||
" [1.0, 4.8],\n",
|
||
" [1.6, 7.2],\n",
|
||
" [2.6, 8.8],\n",
|
||
" [3.0, 9.0],\n",
|
||
" ])\n",
|
||
"\n",
|
||
"m, n_plus_1 = data.shape\n",
|
||
"n = n_plus_1 - 1\n",
|
||
"Xn1 = data[:, 0:n]\n",
|
||
"Xn1 /= np.amax(Xn1, axis=0)\n",
|
||
"Xn2 = np.power(Xn1, 2) \n",
|
||
"Xn2 /= np.amax(Xn2, axis=0)\n",
|
||
"Xn3 = np.power(Xn1, 3) \n",
|
||
"Xn3 /= np.amax(Xn3, axis=0)\n",
|
||
"Xn4 = np.power(Xn1, 4) \n",
|
||
"Xn4 /= np.amax(Xn4, axis=0)\n",
|
||
"Xn5 = np.power(Xn1, 5) \n",
|
||
"Xn5 /= np.amax(Xn5, axis=0)\n",
|
||
"\n",
|
||
"X1 = np.matrix(np.concatenate((np.ones((m, 1)), Xn1), axis=1)).reshape(m, n + 1)\n",
|
||
"X2 = np.matrix(np.concatenate((np.ones((m, 1)), Xn1, Xn2), axis=1)).reshape(m, 2 * n + 1)\n",
|
||
"X5 = np.matrix(np.concatenate((np.ones((m, 1)), Xn1, Xn2, Xn3, Xn4, Xn5), axis=1)).reshape(m, 5 * n + 1)\n",
|
||
"y = np.matrix(data[:, -1]).reshape(m, 1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmEAAAFoCAYAAAAfEiweAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAT+0lEQVR4nO3df4zteV3f8dd79kKQmWlcwgXWhRbqnYCWP8TeEpRJQ0Xa9bZxW6OZNVFXc5NNm1Kx17RS20jSNC1pGlPbWJvNQtEUYQhi3dhblaJEb7Rk765bYbmSmVCF27u6lzbB2Wkb3M6nf5y5vdfLvXtnl5nve+6cxyPZnJnzPXPOO9987/Dk+2tqjBEAAKa10D0AAMA8EmEAAA1EGABAAxEGANBAhAEANBBhAAANDizCquq9VfVUVX3qmudeUlUfraqN3cc7D+rzAQAOs4PcE/a+JPdc99w7k3xsjLGS5GO73wMAzJ06yJu1VtWrk/ziGOP1u99/JslbxhhPVtVdST4+xnjtgQ0AAHBITX1O2MvHGE8mye7jyyb+fACAQ+FY9wA3U1UPJHkgSRYXF//86173uuaJAAD+pEcfffQLY4zjz+dnp46wP6yqu645HPnUzV44xngwyYNJcvLkyXH+/PmpZgQA2JOq+v3n+7NTH458OMn9u1/fn+QXJv58AIBD4SBvUfGBJL+V5LVVdbGqTid5d5K3VdVGkrftfg8AMHcO7HDkGOO7b7LorQf1mQAAtwt3zAcAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGx7oHAOCI2tpK1teTjY1kZSVZW0uWl7ungkNDhAGw/86dS06dSnZ2ku3tZHExOXMmOXs2WV3tng4OBYcjAdhfW1uzANvamgVYMnu88vzTT/fOB4eECANgf62vz/aA3cjOzmw5IMIA2GcbG1f3gF1vezvZ3Jx2HjikRBgA+2tlZXYO2I0sLiYnTkw7DxxSIgxgHm1tJQ89lPzIj8wet7b2773X1pKFm/zPy8LCbDng6kiAuXPQVy4uL8/e6/rPWFiYPb+09JV/BhwBIgxgnlx75eIVV87fOnUquXRpfyJpdXX2Xuvrs3PATpyY7QETYPD/iTCAebKXKxdPn96fz1pa2r/3giPIOWEA88SVi3BoiDCAeeLKRTg0RBjAPHHlIhwaIgxgnly5cnF5+eoescXFq887cR4m48R8gHnjykU4FEQYwDxy5SK0czgSAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKBBS4RV1d+tqieq6lNV9YGqelHHHAAAXSaPsKq6O8kPJjk5xnh9kjuS3Df1HAAAnboORx5L8lVVdSzJi5NcapoDAKDF5BE2xvjvSf5Fks8leTLJF8cYv3L966rqgao6X1XnL1++PPWYAAAHquNw5J1J7k3ymiRfk2Sxqr7n+teNMR4cY5wcY5w8fvz41GMCAByojsOR35rkv40xLo8x/jjJR5J8c8McAABtOiLsc0neVFUvrqpK8tYkFxrmAABo03FO2CeSfDjJY0k+uTvDg1PPAQDQ6VjHh44x3pXkXR2fDQBwGLhjPgBAAxEGANBAhAEANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0ECEAQA0EGEAAA1EGABAg2PdAwC029pK1teTjY1kZSVZW0uWl7unAo44EQbMt3PnklOnkp2dZHs7WVxMzpxJzp5NVle7pwOOMIcjgfm1tTULsK2tWYAls8crzz/9dO98wJEmwoD5tb4+2wN2Izs7s+UAB0SEAfNrY+PqHrDrbW8nm5vTzgPMFREGzK+Vldk5YDeyuJicODHtPMBcEWHA/FpbSxZu8mtwYWG2HOCAiDBgfi0vz66CXF6+ukdscfHq80tLvfMBR5pbVADzbXU1uXRpdhL+5ubsEOTamgADDpwIA1haSk6f7p4CmDMORwIANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0ECEAQA0EGEAAA1EGABAAxEGANBAhAEANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0ECEAQA0EGEAAA1EGABAAxEGANCgJcKq6qur6sNV9btVdaGqvqljDgCALseaPvcnkvzSGOM7q+qFSV7cNAcAQIvJI6yq/lSSv5jk+5NkjPGlJF+aeg4AgE4dhyP/bJLLSf5dVf12VT1UVYsNcwAAtOmIsGNJvjHJT40x3pBkO8k7r39RVT1QVeer6vzly5ennhEA4EB1RNjFJBfHGJ/Y/f7DmUXZnzDGeHCMcXKMcfL48eOTDggAcNAmj7Axxh8k+XxVvXb3qbcm+fTUcwAAdOq6OvLvJHn/7pWRn03yA01zAAC0aImwMcbjSU52fDYAwGHgjvkAAA1EGABAAxEGANBAhAEANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0ECEAQA0EGEAAA1EGABAAxEGANBAhAEANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0ECEAQA0EGEAAA2OdQ8AzJmtrWR9PdnYSFZWkrW1ZHm5eyqAyYkwYDrnziWnTiU7O8n2drK4mJw5k5w9m6yudk8HMCmHI4FpbG3NAmxraxZgyezxyvNPP907H8DERBgwjfX12R6wG9nZmS0HmCMiDJjGxsbVPWDX295ONjennQegmQgDprGyMjsH7EYWF5MTJ6adB6CZCAOmsbaWLNzkV87Cwmw5wBwRYcA0lpdnV0EuL1/dI7a4ePX5paXe+QAm5hYVwHRWV5NLl2Yn4W9uzg5Brq0JMGAuiTBgWktLyenT3VMAtHM4EgCgwS0jrKreXlV3TjEMAMC82MuesFckeaSqPlRV91RVHfRQAABH3S0jbIzxj5KsJHlPku9PslFV/7SqvvaAZwMAOLL2dE7YGGMk+YPd/55JcmeSD1fVPz/A2QAAjqxbXh1ZVT+Y5P4kX0jyUJK/N8b446paSLKR5O8f7IgAAEfPXm5R8dIk3zHG+P1rnxxj7FTVXzuYsQAAjrZbRtgY48eeZdmF/R0HAGA+uE8YAEADEQYA0ECEAQA0EGEAAA1EGABAg7YIq6o7quq3q+oXu2YAAOjSuSfsHUnc4gIAmEstEVZVr0zyVzO7Az8AwNzp2hP2LzP7c0c7N3tBVT1QVeer6vzly5enmwwAYAKTR9junzp6aozx6LO9bozx4Bjj5Bjj5PHjxyeaDgBgGh17wt6c5Nur6veSfDDJt1TVv2+YAwCgzeQRNsb4B2OMV44xXp3kviS/Osb4nqnnAADo5D5hAAANjnV++Bjj40k+3jkDAEAHe8IAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoMHkEVZVr6qqX6uqC1X1RFW9Y+oZAAC6HWv4zGeS/PAY47GqWk7yaFV9dIzx6YZZAABaTL4nbIzx5Bjjsd2vt5JcSHL31HMAAHRqPSesql6d5A1JPtE5BwDA1NoirKqWkvxckh8aY/zRDZY/UFXnq+r85cuXpx8QAOAAtURYVb0gswB7/xjjIzd6zRjjwTHGyTHGyePHj087IADAAZv8xPyqqiTvSXJhjPHjU38+kGRrK1lfTzY2kpWVZG0tWV7ungpgrnRcHfnmJN+b5JNV9fjucz86xjjbMAvMn3PnklOnkp2dZHs7WVxMzpxJzp5NVle7pwOYG5NH2BjjXJKa+nOBzPaAnTo1e7xie3v2eOpUculSsrTUMxvAnHHHfJgn6+uzPWA3srMzWw7AJEQYzJONjat7vq63vZ1sbk47D8AcE2EwT1ZWZueA3cjiYnLixLTzAMwxEQbzZG0tWbjJP/uFhdlyACYhwmCeLC/ProJcXr66R2xx8erzTsoHmEzHLSqATqurs6sg19dn54CdODHbAybAACYlwmAeLS0lp093TwEw1xyOBABoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaHCs40Or6p4kP5HkjiQPjTHe3TEHtNraStbXk42NZGUlWVtLlpe7pwJgIpNHWFXdkeQnk7wtycUkj1TVw2OMT089C7Q5dy45dSrZ2Um2t5PFxeTMmeTs2WR1tXs6ACbQcTjyjUk2xxifHWN8KckHk9zbMAf02NqaBdjW1izAktnjleeffrp3PgAm0RFhdyf5/DXfX9x9DubD+vpsD9iN7OzMlgNw5HVEWN3gufFlL6p6oKrOV9X5y5cvTzAWTGRj4+oesOttbyebm9POA0CLjgi7mORV13z/yiSXrn/RGOPBMcbJMcbJ48ePTzYcHLiVldk5YDeyuJicODHtPAC06IiwR5KsVNVrquqFSe5L8nDDHNBjbS1ZuMk/vYWF2XIAjrzJI2yM8UyStyf55SQXknxojPHE1HNAm+Xl2VWQy8tX94gtLl59fmmpdz4AJtFyn7AxxtkkZzs+Gw6F1dXk0qXZSfibm7NDkGtrAgxgjrREGJBZcJ0+3T0FAE382SIAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKCBCAMAaCDCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgwAoIEIAwBoIMIAABqIMACABiIMAKBBjTG6Z7ilqtpK8pnuOebIS5N8oXuIOWJ9T8v6np51Pi3re1qvHWMsP58fPLbfkxyQz4wxTnYPMS+q6rz1PR3re1rW9/Ss82lZ39OqqvPP92cdjgQAaCDCAAAa3C4R9mD3AHPG+p6W9T0t63t61vm0rO9pPe/1fVucmA8AcNTcLnvCAACOlEMZYVX1XVX1RFXtVNVNr/Coqnuq6jNVtVlV75xyxqOkql5SVR+tqo3dxztv8rrfq6pPVtXjX8nVIPPqVttrzfyr3eW/U1Xf2DHnUbGH9f2Wqvri7vb8eFX9WMecR0VVvbeqnqqqT91kue17H+1hfdu+91FVvaqqfq2qLuz2yTtu8JrnvI0fyghL8qkk35Hk12/2gqq6I8lPJvm2JF+f5Lur6uunGe/IeWeSj40xVpJ8bPf7m/lLY4xvcPnzc7PH7fXbkqzs/vdAkp+adMgj5Dn8fviN3e35G8YY/3jSIY+e9yW551mW27731/vy7Os7sX3vp2eS/PAY4+uSvCnJ396P3+GHMsLGGBfGGLe6Oesbk2yOMT47xvhSkg8muffgpzuS7k3y07tf/3SSv944y1G1l+313iQ/M2b+S5Kvrqq7ph70iPD7YWJjjF9P8j+f5SW27320h/XNPhpjPDnGeGz3660kF5Lcfd3LnvM2figjbI/uTvL5a76/mC9fIezNy8cYTyazDS3Jy27yupHkV6rq0ap6YLLpjoa9bK+26f2z13X5TVX1X6vqP1XVn5tmtLll+56e7fsAVNWrk7whySeuW/Sct/G2O+ZX1X9O8oobLPqHY4xf2Mtb3OA5l3rexLOt7+fwNm8eY1yqqpcl+WhV/e7u/xvj1vayvdqm989e1uVjSf7MGOPpqjqV5D9kdhiBg2H7npbt+wBU1VKSn0vyQ2OMP7p+8Q1+5Fm38bYIG2N861f4FheTvOqa71+Z5NJX+J5H1rOt76r6w6q6a4zx5O6u06du8h6Xdh+fqqqfz+yQjwjbm71sr7bp/XPLdXntL9Axxtmq+jdV9dIxhr+5dzBs3xOyfe+/qnpBZgH2/jHGR27wkue8jd/OhyMfSbJSVa+pqhcmuS/Jw80z3a4eTnL/7tf3J/myPZFVtVhVy1e+TvKXM7uAgr3Zy/b6cJLv273C5k1JvnjlMDHP2S3Xd1W9oqpq9+s3Zvb78H9MPun8sH1PyPa9v3bX5XuSXBhj/PhNXvact/FD+Qe8q+pvJPnXSY4n+Y9V9fgY469U1dckeWiMcWqM8UxVvT3JLye5I8l7xxhPNI59O3t3kg9V1ekkn0vyXUly7fpO8vIkP7/7b/pYkp8dY/xS07y3nZttr1X1N3eX/9skZ5OcSrKZ5H8l+YGueW93e1zf35nkb1XVM0n+d5L7hrtXP29V9YEkb0ny0qq6mORdSV6Q2L4Pwh7Wt+17f705yfcm+WRVPb773I8m+dPJ89/G3TEfAKDB7Xw4EgDgtiXCAAAaiDAAgAYiDACggQgDAGggwgAAGogwAIAGIgyYC1X1F6rqd6rqRbt/AeKJqnp991zA/HKzVmBuVNU/SfKiJF+V5OIY4581jwTMMREGzI3dvyP5SJL/k+Sbxxj/t3kkYI45HAnMk5ckWUqynNkeMYA29oQBc6OqHk7ywSSvSXLXGOPtzSMBc+xY9wAAU6iq70vyzBjjZ6vqjiS/WVXfMsb41e7ZgPlkTxgAQAPnhAEANBBhAAANRBgAQAMRBgDQQIQBADQQYQAADUQYAEADEQYA0OD/AWCU6vzjnoR/AAAAAElFTkSuQmCC\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(X1, y, xlabel='x', ylabel='y')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Funkcja regresji wielomianowej\n",
|
||
"\n",
|
||
"def h_poly(Theta, x):\n",
|
||
" \"\"\"Funkcja wielomianowa\"\"\"\n",
|
||
" return sum(theta * np.power(x, i) for i, theta in enumerate(Theta.tolist()))\n",
|
||
"\n",
|
||
"def polynomial_regression(theta):\n",
|
||
" \"\"\"Funkcja regresji wielomianowej\"\"\"\n",
|
||
" return lambda x: h_poly(theta, x)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def plot_fun(fig, fun, X):\n",
|
||
" \"\"\"Wykres funkcji `fun`\"\"\"\n",
|
||
" ax = fig.axes[0]\n",
|
||
" x0 = np.min(X[:, 1]) - 1.0\n",
|
||
" x1 = np.max(X[:, 1]) + 1.0\n",
|
||
" Arg = np.arange(x0, x1, 0.1)\n",
|
||
" Val = fun(Arg)\n",
|
||
" return ax.plot(Arg, Val, linewidth='2')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {
|
||
"scrolled": true,
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[<matplotlib.lines.Line2D at 0x19d045ece20>]"
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(X1, y, xlabel='x', ylabel='y')\n",
|
||
"theta_start = np.matrix([0, 0]).reshape(2, 1)\n",
|
||
"theta, _ = gradient_descent(cost, gradient, theta_start, X1, y, eps=0.00001)\n",
|
||
"plot_fun(fig, polynomial_regression(theta), X1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ten model ma duże **obciążenie** (**błąd systematyczny**, *bias*) – zachodzi **niedostateczne dopasowanie** (*underfitting*)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[<matplotlib.lines.Line2D at 0x19d0462d880>]"
|
||
]
|
||
},
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmEAAAFoCAYAAAAfEiweAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dd3yV5cH/8e91ssmAACFA2CTsKWGDIm4caBVxVlstWldt+1StHc/z9Pm1tba1tVatVq0LFQsOFMRRQUWGhr1JGBkEQoCQBZnn+v1xAiKCBEjOdcbn/XrxSnLOCfl6v26O31zXfV+XsdYKAAAA/uVxHQAAACAcUcIAAAAcoIQBAAA4QAkDAABwgBIGAADgACUMAADAgWYrYcaY54wxu40xa494rLUx5kNjTHbDx+Tm+vkAAACBrDlHwp6XdOFRjz0g6T/W2gxJ/2n4GgAAIOyY5lys1RjTTdK71toBDV9vkjTBWrvTGNNB0gJrbe9mCwAAABCg/H1NWKq1dqckNXxs5+efDwAAEBAiXQc4HmPMNEnTJCk+Pn5Ynz59HCcCAAD4umXLlu2x1qacyvf6u4QVGWM6HDEduft4L7TWPi3paUnKzMy0WVlZ/soIAADQKMaY3FP9Xn9PR86WdFPD5zdJetvPPx8AACAgNOcSFa9KWiyptzGmwBhzi6SHJJ1njMmWdF7D1wAAAGGn2aYjrbXXHuepc5rrZwIAAAQLVswHAABwgBIGAADgACUMAADAAUoYAACAA5QwAAAAByhhAAAADlDCAAAAHKCEAQAAOEAJAwAAcIASBgAA4AAlDAAAwAFKGAAAgAOUMAAAAAcoYQAAAA5QwgAAAByghAEAADhACQMAAHCAEgYAAOBApOsAAIDwYa1Vndeqps7r+1Pv+1h91Ne+z+u/9lyrFtE6o0srtUmIcf2fATQJShgAoEmVHqzV5zl79MnaQn2xvkDlNV5VR0SqxhOpmnqvrD29v79H23id0TVZmV2TNaxrsnqmJMjjMU0THvAjShgA4LRYa7WusEyfbC7WJ5uKtSyvRPXeQ00rUjKSvJK8XklShMcoOsKj6MiGPxEexUR+/etjfb6j5KBWFezX1j2V2rqnUjOXFUiSWsZFaVhDIRvWNVmDO7VSXHSEk2MBnAxKGADgpJUeqNVnOcVasKlYn2wuVnF59eHnIow0YscGTcheqjO3r1C7in2Krq/1/YmLVeSOAikh4ZR+bm29V+sLy7Qst0TLckuUlbtPRWXV+njjbn28cbckKdJj1L9jkoZ1ba3Mbr5ilpoU2yT/3UBTMvZ0x4X9IDMz02ZlZbmOAQBhy+v1jXYt2LRbCzYXa0VeibxH/O+jfVKsJvRO0Vm9UjR20Vwl/fRHUmXlN/+i+Hjp0UelW25pklzWWu3Yf/CrUra9RBt3lX0tmyR1So47PH05vHtr9U5NlDFMYeL0GWOWWWszT+V7GQkDABxTSWWNPs32jXR9urlYeypqDj8X6TEa2b21zuqdogm9U75eal7efOwCJvkez8lpsozGGHVKbqFOyS00eUiaJKmiuk4r8/YrK3efluWWaEXefhWUHFRByUG9tbJQknRGl1a6e2KGJvROoYzBGUoYAOCwmjqvZmTl643lBVqVv/9rI0odW8bqrN7tNKF3isamt1VCzHH+F5KR4RvxOt5IWHp684RvkBATqXEZbTUuo60kqd5rtbmoXFm5JVqeW6L5m3Zred5+fe/5LzUgLUl3nZ2h8/ulcnE//I7pSAAIR+Xl0owZUna2lJEh79VXa862Cv3pg03K3XtAkhQVYTSie2tN6OUrXuntEho3alReLqWl+T4eLTFRKiw85WvCmkJldZ1eWZqnpz7dqj0VvmvZeqcm6s6J6bp4YAdFUMZwEk5nOpISBgDhZuFCadIk392KlZVa1GuEHhp7vVa36ylJ6pkSrx+d20vn9Gmn+OONdp3kz1B8vOTxSHPnSuPGNeF/zKmrqq3XjC/z9Y9PtmhnaZUk3/IXd5ydrslDOioqgvXMcWKUMABA4xwxSrU+pbv+MOEmfdLD9/+PdpUl+vGUkZoytqcim6KAVFT4RttycnxTkFOnOh0BO56aOq9mLS/QEwtylL/voCTfhfw/nNBTVw3rpJhIlrvA8VHCAACN88wzyv/Vb/XIsO/orf4TZI1HidWVun3JTH1vw3/U4s8PN9mdi8Gmtt6r2SsL9fiCHG0t9l3P1j4pVred1UPXDO/C2mM4Ju6OBACcUElljR7PrteL1/9FNZFRiqqv1Y3LZ+uuxa+r9cEy34ua8M7FYBMV4dGVwzrp8qFpmrtmp/7+cY42FZXrf99Zr8fn5+gH43vo+lFdj39DAnCSOJMAIMQdrKnXc59v0z8WbFG56SRFSpevm6+ffvayOpcWffVCP9y5GAwiPEaXDu6oiwd20EcbivTYxzlas6NUv39vo578ZIu+P7a7bhrTTS3jolxHRZBjOhIAQlRdvVczlxXoLx9tVlGZ7y7A8T2Sdf9Dt2vAtjXf/IYAuHMxEFlr9cnmYj32cY6W5ZZIkhJjInXTmG66ZVx3JcdHO04Il7gmDABwmLVWH64v0sPvb1LO7gpJ0oC0JD1wYV/f2llBcOdiILLWavHWvfr7xzlatGWvJKltQrT+MnWIxmekOE4HVyhhAABJUtb2fXrovY3Kahix6dw6Tv91fm9dOqjj1xcjDZI7FwPVstx9+sO8Tfpi2z4ZI90xoad+fG6vprmrFEGFEgYAYa64vFq/fGuN3l/nu8ardXy07p6YrutHdlV0JMWgOdR7rR6fn6O/frRZXisN75asv107VB1axrmOBj+ihAFAGFuVv1+3v7xMO0urFBcVoVvHd9e0M3soMZYLx/1hyda9+tFrK1RUVq3kFlH689WDNbFPqutY8JPTKWH8egQAQez1rHxNeWqxdpZW6YwurfTxf52ln57fmwLmR6N6tNHce8ZrQu8UlRyo1fefz9Jv56xXTZ3XdTQEOEoYAAShmjqvfv32Wt03c7Vq6ry6fmQXvTZtNFNhjrRJiNFzNw3Xzy/qowiP0T8/26YpTy1W/r4DrqMhgFHCACDI7C6v0vXPLNGLi3MVHeHRQ98ZqN9eMZBrvxzzeIxuO6unXr9ttNJaxWlV/n5N+ttnem/NTtfREKD4FwsAQWRFXokufWyhvtxeotSkGM24bZSuGdHFdSwcYVjXZM25Z5zO75eq8qo6/XD6cv367bWqqq13HQ0BhhIGAEFixpd5mvrUEhWVVWt4t2S9c/c4De2S7DoWjqFVi2g9deMw/c+l/RQd4dGLi3P1nScWaWtxhetoCCCUMAAIcDV1Xv3izTW6f9Ya1dR79d3RXTX91lFqlxjrOhq+hTFGN4/trlk/HKOubVpo/c4yXfrYQr29cofraAgQlDAACGC7y6p07T+XaPrSPEVHevTwVYP0m8kDuP4riAzs1FLv3j1OlwzqoMqaev3otZW6f+ZqHaxhejLc8a8YAALUstx9uuSxhVqWW6IOLWP179tG6+rMzq5j4RQkxkbpsWuH6ndXDFRMpEczsvI1+fGF2lxU7joaHKKEAUAAemVpnq55eol2l1drRPfWeufucRrcuZXrWDgNxhhdN7KL3rpzrHqmxGtzUYUu+/tCvf5lvoJh4XQ0PUoYAASQ6rp6/fyN1XrwzTWqrbe6eUw3Tb91pNomxLiOhibSt0OSZt81Tlee0UlVtV7dN2u1fjxjJXdPhqFI1wEAAD5FZVW6/eVlWpG3X9GRHv3uioG6algn17HQDOJjIvXnqwdrdM82+tVba/XWykLtqajRMzdlKjYqwnU8+AkjYQAQALK2+67/WpG3Xx1bxmrW7WMoYGHgqmGd9NadY9U2IUYLc/bo1heyGBELI05KmDHmx8aYdcaYtcaYV40x3GcNICxZa/XSklxd8/QSFZdXa1QP3/VfAzu1dB0NftK7faJe/cFIilgY8nsJM8akSbpHUqa1doCkCEnX+DsHALhmrdVv3l2vX721VnVeq1vGddfLt4xUG67/CjsZqRSxcORqOjJSUpwxJlJSC0mFjnIAgDMPv79J//p8u6IjPPrL1MH61SX9FBnBVSLhKiM1Ua9N+3oRYy2x0Ob3f+3W2h2S/iQpT9JOSaXW2g+Ofp0xZpoxJssYk1VcXOzvmADQrB6fn6MnF2xRhMfo8evP0BVDuf4LUnq7rxexH7xIEQtlLqYjkyVNltRdUkdJ8caYG45+nbX2aWttprU2MyUlxd8xAaDZPP/5Nv3x/U0yRnrk6sE6r1+q60gIIBSx8OFi3PtcSdustcXW2lpJb0ga4yAHAPjd61n5+p931kuSfnfFQE0ekuY4EQIRRSw8uChheZJGGWNaGGOMpHMkbXCQAwD8as7qnXpg1mpJ0i8v7qtrR3RxnAiB7OgiduuLX1LEQoyLa8KWSpopabmkNQ0ZnvZ3DgDwp483FulHr62Q10o/PreXbh3fw3UkBAFfERultgkx+jxnL0UsxDi5Dcda+9/W2j7W2gHW2huttdUucgCAPyzaske3v7xcdV6raWf20D3npLuOhCCS3i6BIhaiuBcaAJrRirwS/eCFLNXUeXXdyC76+UV95LsSA2g8ilhoooQBQDNZX1imm577QpU19bp8SEf9v8kDKGA4ZYeKWEoiRSxUUMIAoBlsKa7Qjc8uVVlVnc7vl6o/TRksj4cChtOT3i5Br/7gqyJ2ywsUsWBGCQOAJpa/74BueGap9lbWaHxGWz123VBWwkeTObKILdpCEQtmvCsAQBPaXValG55dqp2lVRreLVlP35ipmMgI17EQYihioYESBgBNZF9lja5/Zqly9x7QwLSWevbm4YqLpoCheRx5jRhFLDhRwgCgCZRV1eq7zy1V9u4KZbRL0AvfH6Gk2CjXsRDieqZQxIIZJQwATtOBmjp9/19fau2OMnVt00LTbx2p1vHRrmMhTBxdxO6Yvkz1Xus6FhqBEgYAp6G6rl63vbRMWbkl6tAyVtNvHal2SbGuYyHMHCpiyS2iNH9Tsf7y4WbXkdAIlDAAOEW19V7d/coKfZa9R20TojX91pHqlNzCdSyEqZ4pCXr8ujPkMdLf5+do3tqdriPhBChhAHAKvF6rn/17lT5YX6Sk2Ei9dMtI9UhJcB0LYW5Mels9OKmvJOmnr69SdlG540T4NpQwADhJ1lr98u21emtloeKjI/TC90eob4ck17EASdIt47rr0sEdVVlTr2kvLVNZVa3rSDgOShgAnKQXF+fqlaV5ion06Jmbhmtol2TXkYDDjDH6w5UD1ad9orbtqdSPX1spLxfqByRKGACUl0vPPCPdf7/vY/nxp3BW5u/X/5uzXpL0pymDNbpnG3+lBBqtRXSknr4xUy3jovSfjbv1t4+zXUfCMVDCAIS3hQultDTp3nulhx/2fUxL8z1+lJLKGt05fblq661uGt1Vlw7u6CAw0Dhd2rTQY9cOlcdIf/0oWx+uL3IdCUehhAEIX+Xl0qRJvo+Vlb7HKiu/eryi4vBLvV6rn7y+Ujv2H9Tgzq304MV9HYUGGu/MXin62QV9JEk/mbFSW4orTvAd8CdKGIDwNWOG5PUe+zmv1/d8gyc/2aL5m4rVMi5Kj183lP0gETRuP6uHJg1sr/LqOk17MUvlXKgfMChhAMJXdvZXI2BHq6yUcnIkSYu37NWfP9gkSfrL1MGsBYagYozRH68arF6pCdpSXKmfvr6KC/UDBCUMQPjKyJDi44/9XHy8lJ6u3WVVuvvVFfJa6Y4JPTWxT6p/MwJNID4mUk/dmKnE2Eh9sL5ITyzIcR0JooQBCGdTp0qe47wNejyqm3K17n51hfZUVGtUj9b6yXm9/JsPaELd28br0WuGyBjpzx9u1vyNu11HCnuUMADhKzFRmjvX9/HQiFh8/OHHH1m0Q0u37VNKYoz+du1QRUbwlongNrFPqn5ybi9ZK93z2gpt33Oc6Xj4Be8oAMLbuHFSYaH06KPSAw/4PhYW6uO2GXpiwRZ5jPS3a4aqXSKbciM03Hl2us7vl6ryqjpNeylLldV1riOFLWNt4F+cl5mZabOyslzHABAm8vcd0CWPLVTpwVrdd2Fv3TEh3XUkoEmVV9Xq8sc/15biSl08sIP+ft1QGWNcxwpKxphl1trMU/leRsIA4AjVdfW665XlKj1Yq3P6tNPtZ/Z0HQlocomxUXrqxkwlxERqzpqdeurTra4jhSVKGAAc4XdzNmhVQanSWsXpz1cPlsfD6ABCU3q7BD1y9WBJ0sPzNurTzcWOE4UfShgANHhnVaFeWJyrqAijx68/Q61aRLuOBDSr8/u31z3nZMhrpbtfXaG8vQdcRworlDAAkLSluEIPzFotSfrVJf00pHMrx4kA/7j3nAyd06edSg/WatpLWTpQw4X6/kIJAxD2DtbU646Xl6uypl6XDu6oG0d1dR0J8BuPx+iRqUPUvW28Nu4q1wOz1igYbtoLBZQwAGHNWqtfvLVGm4rK1SMlXr//zkDuEkPYaRkXpaduHKYW0RGavapQzy7c5jpSWKCEAQhrM77M1xvLdyg2yqMnrx+mhJhI15EAJ3qlJurPU3wX6v9u7gYtytnjOFHoo4QBCFvrCkv169nrJEm/vXygerdPdJwIcOuigR10x4Se8lrpzleWq6isynWkkEYJAxCWyqpqdcf05aqp8+qa4Z115bBOriMBAeGn5/fW+Iy2KjlQq1+8uZbrw5oRJQxA2LHW6r5/r1bu3gPq1yFJ/3NZf9eRgIAR4TF6+KpBSoyJ1EcbijR7VaHrSCGLEgYg7Dy7cJvmrdulxJhIPXH9GYqNinAdCQgoHVrG6RcX95Uk/ffsddpdzrRkc6CEAQgry3L36aH3NkqS/jhlkLq1jXecCAhMU4d31viMttp/oFa/fmsd05LNgBIGIGzsq6zRXa+sUJ3X6pZx3XXhgA6uIwEByxijh64cpPjoCM1bt0tz1ux0HSnkUMIAhI1fvLlGO0urdEaXVnrgoj6u4wABL61VnH4+yTct+eu312lvRbXjRKGFEgYgLLy/bpfeW7tLLaIj9Ldrhyoqgrc/oDGuG9FFo3u00b7KmsNLuqBp8C4EIOSVVdXq12+vlSTdd0FvdUpu4TgREDw8DXdLtoiO0JzVOzVvLdOSTYUSBiDkPTxvo4rKqjWkcyvdOLqb6zhA0OncuoXuv9A3hf/Lt9aqpLLGcaLQQAkDENKytu/Ty0vyFOkxeujKgYrwsC8kcCpuHNVVI7q31p6KGv3vO0xLNgVKGICQVV1Xr/tnrZYk/XBCT/Vpn+Q4ERC8PB6jh68cpNgoj95aWagP1xe5jhT0KGEAQtYT87doS3GleqTE686z013HAYJet7bx+tkFvmnJX7y5RqUHah0nCm6UMAAhKbuoXE8syJEk/f6KgayKDzSRm8d007CuydpdXq3fvLvedZygRgkDEHK8XqsH3lij2nqra0d01sgebVxHAkLGob0lYyI9mrW8QPM37nYdKWhRwgCEnOlLc7Ust0QpiTF64KK+ruMAIadnSoJ+en4vSdLP31ijsiqmJU8FJQxASNlZelB/mLdJkvSby/qrZVyU40RAaLplXA8N6dxKu8qq9Nt3N7iOE5SclDBjTCtjzExjzEZjzAZjzGgXOQCEFmutfvXWOlVU1+m8fqm6cEB715GAkBXhMfrjVYMUHeHRjKx8fbq52HWkoONqJOxRSfOstX0kDZZEhQZw2t5bu0sfbShSYkyk/m/yABnDmmBAc8pITdS952VI8k1LljMteVL8XsKMMUmSzpT0rCRZa2ustfv9nQNAaCk9UKv/btjX7r6L+qh9y1jHiYDwMG18Dw3q1FI79h/U79/b6DpOUHExEtZDUrGkfxljVhhjnjHGxDvIASCEPDRvg4rLq5XZNVnXj+jiOg4QNiIjPPrjVYMVFWH0ytI8LcrZ4zpS0HBRwiIlnSHpSWvtUEmVkh44+kXGmGnGmCxjTFZxMfPMAI5vyda9evWLfEVFGP3+OwPlYWsiwK96t0/UPRN905L3zVqtyuo6x4mCg4sSViCpwFq7tOHrmfKVsq+x1j5trc201mampKT4NSCA4FFVW68H31gjSbrz7HRlpCY6TgSEp9sn9FT/jkkqKDmoh+cxLdkYfi9h1tpdkvKNMb0bHjpHEkvuAjglf/84R1v3VCq9XYJ+OKGn6zhA2IpqmJaM9Bi9sDhXS7budR0p4Lm6O/JuSdONMaslDZH0O0c5AASxjbvK9I9PtkiSHvrOQMVEsjUR4FK/jkmH92m9f9ZqHaypd5wosDkpYdbalQ1TjYOstZdba0tc5AAQvOq9VvfPWqM6r9WNo7oqs1tr15EAyHdZQJ/2icrde0B/fH+T6zgBjRXzAQSlFxdv16r8/WqfFKv7Lux9wtcD8I/oSI/+NGWwIjxG/1q0TVnb97mOFLAoYQCCzo79Bw//hv2byf2VGMvWREAgGZDWUref1UPWSvfNXK2qWqYlj4USBiCoWGv1yzfX6EBNvS4a0F7n92drIiAQ3XNOhjLaJWjrnko9+p9s13ECEiUMQFB5Z/VOzd9UrMTYSP3vZf1dxwFwHDGREXr4qkGSpGc/26b8fQccJwo8lDAAQaOkskb/27A10YOT+qpdElsTAYFsaJdkXTE0TTX1Xj3ElkbfQAkDEDR+O3eD9lbWaET31pqa2dl1HACN8LMLeis2yqM5a3Zykf5RKGEAgsLnOXs0c1mBoiM9bE0EBJGOreI0bXwPSdL/vbteXq91nChwUMIABLyq2no9+KZva6J7JqarZ0qC40QATsZtZ/VUu8QYrSoo1durdriOEzAoYQAC3l8/ylbu3gPqnZqoaWeyNREQbOJjIvWzC3zr+T08bxMr6TeghAEIaFuLK/TMZ1tljPTQlQMVHcnbFhCMrjyjk/p3TNLO0ir987OtruMEBN7NAAS0h+dtUp3X6uphnTW0S7LrOABOkcdj9KtL+kmSnlywRUVlVY4TuUcJAxCwluXu07x1uxQb5dFPzu/lOg6A0zSqRxtd0D9VB2vr9Sf2laSEAQhM1lr9bq5vXaEfjO+hVNYEA0LCzy/qq6gIo5nLC7R2R6nrOE5RwgAEpPfX7dKy3BK1iY/WtDN7uI4DoIl0axuvm0Z3k7W+JSusDd8lKyhhAAJObb1Xf5jnm6q499wMNugGQszd52QouUWUlm7bpw/WF7mO4wwlDEDAee2LPG3bU6nubeN1zYguruMAaGIt46J077m+6zx/P3eDauq8jhO5QQkDEFDKq2r114+yJUn3X9hbURG8TQGh6LqRXdQzJV7b9x7Qi4u3u47jBO9uAALK059u1d7KGg3rmqwL+rd3HQdAM4mK8OiXF/uWrHj0P9naV1njOJH/UcIABIyisq8WcXxwUh8Zw/6QQCib0DtF4zPaqryqTo9+tNl1HL+jhAEIGI98sFlVtV5d2L+9hnVt7ToOgGZmjNEvL+4nj5FeXpqnnN3lriP5FSUMQEDYtKtc/16Wr0iP0X0X9nYdB4Cf9G6fqGtGdFG91+q3cza4juNXlDAAAeEP8zbKa30X6/ZISXAdB4Af/eS8XkqIidT8TcX6dHOx6zh+QwkD4NyiLXv08cbdSoiJ1D3nZLiOA8DP2ibE6M6z0yVJv52zQXX14bFkBSUMgFNer9XvG7Ynuv2sHmqbEOM4EQAXvje2mzolx2lTUblmZOW7juMXlDAATr2zulBrdpQqNSlGt4xjeyIgXMVGRejnF/WV5LtJp7yq1nGi5hfpOgCAMFNeLs2YIWVnqzo9Q3/c002S75qQuOgIt9kAODVpYHtldk1WVm6JHp+/RQ9c1Md1pGbFSBgA/1m4UEpLk+69V3r4Yb30wocqKK1Wr0SPrhrW2XU6AI4ZY/SrS3wLuD63cJvy9x1wnKh5UcIA+Ed5uTRpku9jZaVKY+L12LArJEk/f/X3ijhQ6TgggEAwuHMrXTE0TTX1Xj00b6PrOM2KEgbAP2bMkLxf3fH0+OirVRqXqNG5qzRh+3Lf8wAg6WcX9FZslEdzVu9U1vZ9ruM0G0oYAP/IzpYqfaNd+Unt9PywyyRJD85/TqayUsrJcZkOQADp2CpO08b7btT5v3fXy+u1jhM1D0oYAP/IyJDi4yVJj4y/QTWRUZq8boEGFm3xPZ6e7jgggEBy21k91S4xRqsKSjV7VaHrOM2CEgbAP6ZOlTwerU3tqTcHTFR0Xa3+67OXfM95PL7nAaBBfEyk/usC3xZmf5i3UQdr6h0nanqUMAD+kZgoO2eOfnfuDyRJ313+rjrXVUiJidLcuVICWxUB+Lqrzuik/h2TtLO0Sv/8bKvrOE2OEgbAbz5p11uLOg1Qkseru0Z0kB59VCoslMaNcx0NQADyeIx+ebFvyYonF2xRUVmV40RNixIGwC/qj9ie6K4L+6nVQ/8n3XILI2AAvtXonm10fr9UHayt15/e3+Q6TpOihAHwi1nLC7SpqFxpreL03dHdXMcBEEQenNRXURFGM5cXKGd3ues4TeaEJcwYc5cxJtkfYQCEpoM19Xrkg82SDq3/w/ZEABqvW9t4XZ3ZWdZKj30cOsvZNGYkrL2kL40xrxtjLjTGmOYOBSC0PPf5Nu0qq1L/jkm6bHBH13EABKE7zk5XVITRO6sKtaW4wnWcJnHCEmat/aWkDEnPSrpZUrYx5nfGmJ7NnA1ACNhbUa0nF2yR5JtS8Hj4PQ7AyUtrFaerhnWS10qPh8hoWKOuCbPWWkm7Gv7USUqWNNMY83AzZgMQAh77OEcV1XWa0DtFY9Pbuo4DIIjdMSFdkR6jt1bu0PY9wb/fbGOuCbvHGLNM0sOSPpc00Fr7Q0nDJF3ZzPkABLFteyr18pJcGSM9cFEf13EABLnOrVvoO2ek+UbD5gf/aFhjRsLaSvqOtfYCa+2/rbW1kmSt9Uq6pFnTAQhqf3x/o+q8Vled0Ul92ie5jgMgBNx5droiPEZvrNihvL0HXMc5LY25JuzX1trc4zy3oekjAQgFK/JKNHfNLsVGefST83u5jgMgRHRtE6/Lh6Sp3mv1xILgHg1jnTAAzeLR/2RLkr43trs6tIxznAZAKLnz7J7yGGnmsgIVlATvaBglDECTW7ujVAs2FSsuKjx57UQAABYhSURBVEI/GN/DdRwAIaZHSoIuG9xRdV6rJxruvg5GlDAATe7QBbPXj+yi1vHRjtMACEV3TcyQMdK/s/JVuP+g6zinhBIGoEnl7C7XvHW7FB3h0Q/OZBQMQPNIb5egSwZ1VG291T8+Cc7RMEoYgCb1xPwtslaaktlJqUmxruMACGF3T0yXMdJrX+RrV2mV6zgnzVkJM8ZEGGNWGGPedZUBQNPK23tAb68qVITH6Paz2FQDQPPqlZqoSQM6qKbeG5SjYS5Hwn4kiSUugBDyj0+3qN5rNXlIR3Vu3cJ1HABh4O5z0iVJr36Rp91lwTUa5qSEGWM6SbpY0jMufj6AprertEozswpkjG9rEQDwhz7tk3Rh//aqrvPqqU+3uo5zUlyNhP1V0n2SvMd7gTFmmjEmyxiTVVxc7L9kAE7JPz/bqpp6ry4a0F7p7RJcxwEQRg6Nhk1fmqvi8mrHaRrP7yXMGHOJpN3W2mXf9jpr7dPW2kxrbWZKSoqf0gE4Ffsqa/TK0jxJjIIB8L/+HVvqvH6pqqr16p+fBc9omIuRsLGSLjPGbJf0mqSJxpiXHeQA0ESeW7hNB2vrdXbvFA1Ia+k6DoAwdM/EDEnSS4tztbciOEbD/F7CrLU/t9Z2stZ2k3SNpI+ttTf4OweAplFWVasXFm+XJN01kVEwAG4M7NRSE/u008Haej2zcJvrOI3COmEATstLi3NVXlWnUT1aa1jX1q7jAAhj95zjGw17cdF2lVTWOE5zYk5LmLV2gbX2EpcZAJy6AzV1erbhN867zs5wnAZAuBvSuZXO6pWiypr6w+9NgYyRMACn7NUv8rWvskaDO7fS2PQ2ruMAwOHRsOcXbdf+A4E9GkYJA3BKquvq9fSnvhWq7zo7XcYYx4kAQBrWNVnjM9qqorpOz32+3XWcb0UJA3BKZi3boaKyavVpn6hz+rRzHQcADjs0Gvavz7ep9GCt4zTHRwkDcNLqjtin7Y6z0+XxMAoGIHAM79ZaY3q2UXlVnZ4P4NEwShiAk/bO6kLl7Tug7m3jdfHADq7jAMA3HBoNe3bhVpVXBeZoGCUMwEnxeq2emO8bBfvhWT0VwSgYgAA0qkcbjejeWmVVdXpxca7rOMdECQNwUj5Yv0vZuyvUsWWsLh+a5joOABzXvQ2jYf/8bKsqquscp/kmShiARrPW6u/zcyRJt53VU9GRvIUACFyje7ZRZtdk7T9Qq5cCcDSMd1AAjfbJ5mKt3VGmtgkxmjq8s+s4APCtjDGHrw3752dbdaAmsEbDKGEAGu3xhlGwW8d3V2xUhOM0AHBi4zPaamiXVtpXWaOXlwTWaBglDECjLN26V19uL1HLuCjdMKqr6zgA0ChHjoY9/elWHaypd5zoK5QwAI1y6Fqwm8d0U0JMpOM0ANB4E3qlaHCnltpTUaPpSwNnNIwSBuCEVuXv12fZexQfHaHvje3mOg4AnJQjR8Oe+nSrqmoDYzSMEgbghA5dC3bDqK5q1SLacRoAOHkT+7TTgLQkFZdX67Uv8lzHkUQJA3ACm3aV64P1RYqO9OiW8d1dxwGAU2KM0T0TfaNhT36yJSBGwyhhAL7VEwt8o2DXDO+sdomxjtMAwKk7r1+q+nZIUlFZtf6dle86DiUMwPFt31Opd1YVKtJjdNtZPV3HAYDT4hsNS1fr+GhFRrivQNziBOC4/vHJFnmtdOUZaUprFec6DgCctgv6t9dZvVPUItp9BXJfAwEEpML9BzVreYE8RvrhBEbBAIQGj8cERAGTKGEAjuPpT7eqtt7q4kEd1SMlwXUcAAg5lDAA37Cnolqvfem7hfvOsxkFA4DmQAkD8A3PLtymqlqvzu2bqj7tk1zHAYCQRAkD8DWlB2r10mLfth53TUx3nAYAQhclDMDXPL9ouyqq6zQuva2GdG7lOg4AhCxKGIDDqmrr9cLi7ZKkO89mFAwAmhMlDMBh76wq1L7KGg1IS9KoHq1dxwGAkEYJAyBJstbq+UXbJUk3j+kuY4zbQAAQ4ihhACRJy3JLtK6wTK3jo3XJoA6u4wBAyKOEAZAk/athFOy6EV0UGxXhNgwAhAFKGADtLD2oeWt3KcJjdP2oLq7jAEBYoIQB0PQlear3Wl04oL06tGSjbgDwB0oYEOaqauv16he+LYpuHtPNbRgACCOUMCDMvbt6p/ZW1qh/xyRldk12HQcAwgYlDAhj1lq90HBB/k1jurEsBQD4ESUMCGPL80q0ZkepWsdH67LBHV3HAYCwQgkDwtjzi3wbdV8zvDPLUgCAn1HCgDBVVFal99bsVITH6IZRXV3HAYCwQwkDwtT0Jbmq81pd0D9VHVuxLAUA+BslDAhD1XX1euXwshTdHacBgPBECQPC0JzVO7WnokZ9OyRpeDeWpQAAFyhhQJix1ur5hmUpvseyFADgDCUMCDMr8vdrdUGpkltE6bIhLEsBAK5QwoAw8/zn2yVJ14zowrIUAOAQJQwII0VlVZq7Zqc8RixLAQCOUcKAMDJ9aV7DshTtlcayFADgFCUMCBPVdfV6ZalvWYqbxnRzGwYAQAkDwsXcNTu1p6JafdonamT31q7jAEDY83sJM8Z0NsbMN8ZsMMasM8b8yN8ZgHB0aJ/Im1mWAgACQqSDn1kn6afW2uXGmERJy4wxH1pr1zvIAoSFFXklWpW/X61aRGnykDTXcQAAcjASZq3daa1d3vB5uaQNkvi/AtCMXmhYnHXq8M6Ki2ZZCgAIBE6vCTPGdJM0VNJSlzmAULa7vEpzGpaluJFlKQAgYDgrYcaYBEmzJN1rrS07xvPTjDFZxpis4uJi/wcEQsQrS/NUW291Xr9UdUpu4ToOAKCBkxJmjImSr4BNt9a+cazXWGufttZmWmszU1JS/BsQCBE1dV5Nb1iW4uYx3R2nAQAcye8X5hvfbVnPStpgrX3E3z8fCCfvrd2p4vJq9U5N1KgeRyxLUV4uzZghZWdLGRnS1KlSYqK7oAAQhlzcHTlW0o2S1hhjVjY89qC1dq6DLEBI+1fDPpE3jz1iWYqFC6VJkySvV6qslOLjpZ/8RJo7Vxo3zl1YAAgzfi9h1tqFklikCGhmq/L3a2X+frWMi9Llh5alKC/3FbDy8q9eWFnp+zhpklRYKCUk+D8sAIQhVswHQtShZSmuOXJZihkzfCNgx+L1+p4HAPgFJQwIQcXl1XpndaE8RrrhyGUpsrO/Gvk6WmWllJPjn4AAAEoYEIpe/cK3LMW5fVPVufURy1JkZPiuATuW+HgpPd0/AQEAlDAg1NTUefXykq/2ifyaqVMlz3H+2Xs8vucBAH5BCQNCzLx1u7S7vFq9UhM0umebrz+ZmOi7CzIx8asRsfj4rx7nonwA8BsXS1QAaEbPf75NknTTmCOWpTjSuHG+uyBnzPBdA5ae7hsBo4ABgF9RwoAQsrpgv5bn7VdSbKSuGJp2/BcmJEi33OK/YACAb2A6EgghzzcsSzF1eGe1iOZ3LAAIZJQwIETsqajWu6t2yhjpu6O7uY4DADgBShgQIl5dmqeaeq/O6XPUshQAgIBECQNCQG29Vy8v9S1L8b2x3dyGAQA0CiUMCAEfri9SUVm1MtolaMzRy1IAAAISJQwIATO+zJckXTeyy7GXpQAABBxKGBDkCvcf1KfZxYqO8OjyId+yLAUAIKBQwoAgN2tZgayVzuufquT4aNdxAACNRAkDgpjXa/X6Mt9U5NTMzo7TAABOBiUMCGJLtu1V/r6DSmsVp7HpbV3HAQCcBEoYEMReb7gg/8phnRTh4YJ8AAgmlDAgSJUerNV7a3fJGGnKsE6u4wAAThIlDAhSs1cVqrrOq7E927JCPgAEIUoYEKQOTUVOyWQUDACCESUMCELrC8u0ZkepWsZF6YL+7V3HAQCcAkoYEIRez/KNgl0+pKNioyIcpwEAnApKGBBkqmrr9eaKHZKkKawNBgBBixIGBJkP1xep9GCt+ndM0oC0lq7jAABOESUMCDKHpiKnDmcUDACCGSUMCCIFJQe0MGePoiM9mjyYzboBIJhRwoAgMrNhs+4L+7dXyxZRruMAAE4DJQwIEl6v1b+zCiQxFQkAoYASBgSJRVv2asf+g+qUHKfRPdq4jgMAOE2UMCBIzGi4IH/KsM7ysFk3AAQ9ShgQBPYfqNH763ybdV/FNkUAEBIoYUAQeHtloWrqvBqfkaK0VnGu4wAAmgAlDAgCMxo2676aUTAACBmUMCDArd1RqvU7y5TcIkrn9Ut1HQcA0EQoYUCAO7xZ99A0xUSyWTcAhApKGBDAqmrr9VbDZt1Xs1k3AIQUShgQwN5ft0tlVXUa1Kml+nZIch0HANCEKGFAADs0FckoGACEHkoYEKDy9x3Q5zl7FRPp0aWDO7qOAwBoYpQwIED9u2EUbNLADmoZx2bdABBqKGFAAKr3Ws1c5tusm6lIAAhNlDAgAC3M2aPC0ip1ad1CI7u3dh0HANAMKGFAAHr9iBXy2awbAEITJQwIMPsqa/TB+l3yGOnKYWxTBAChihIGBJi3VuxQbb3Vmb1S1KElm3UDQKiihAEBxFp7eG2wqVyQDwAhjRIGBJDVBaXauKtcreOjdU5fNusGgFAW6eKHGmMulPSopAhJz1hrH3KRA3CqvFyaMUPKzpYyMqSpUw+Pgl0xNE3RkfyOBAChzO8lzBgTIelxSedJKpD0pTFmtrV2vb+zAM4sXChNmiR5vVJlpRQfr4P3PaDZd7wgSZo6nKlIAAh1Ln7VHiEpx1q71VpbI+k1SZMd5ADcKC/3FbDycl8Bk6TKSr3XcZDK66QhaYnqlZroNiMAoNm5KGFpkvKP+Lqg4TEgPMyY4RsBO8rrA8+TJF1dW+DvRAAAB1yUsGOtPGm/8SJjphljsowxWcXFxX6IBfhJdvZXI2ANclu115KugxRXU6VLdzMzDwDhwEUJK5B05AUvnSQVHv0ia+3T1tpMa21mSkqK38IBzS4jQ4qP/9pDh0bBJm1ZosSM7i5SAQD8zEUJ+1JShjGmuzEmWtI1kmY7yAG4MXWq5Pnqn1698WjmwHMlSVdv/MT3PAAg5Pm9hFlr6yTdJel9SRskvW6tXefvHIAziYnS3Lm+j/Hx+rT7GSpKbKPu+3dqxDN/lhISXCcEAPiBk3XCrLVzJc118bOBgDBunFRYKM2YoRnZvn+GU64YIzO+n+NgAAB/YTVIwJWEBO2deoM+8qT4Nuse3cN1IgCAH1HCAIfeXLFDdV6rs3u3U2pSrOs4AAA/ooQBDs1c5lsTbAqbdQNA2KGEAY5sLirXxl3lahkXpYl92rmOAwDwM0oY4Mjslb7l8SYNbM9m3QAQhnjnBxyw1urtVTskSZcNZtcuAAhHlDDAgRX5+5W/76BSk2I0ontr13EAAA5QwgAHDk1FXjqooyI8x9pOFQAQ6ihhgJ/V1Xv17mpfCZs8hKlIAAhXlDDAzxZv3as9FTXq0TZeA9KSXMcBADhCCQP87O1DU5GDO8oYpiIBIFxRwgA/qqqt17y1uyRJlw3p6DgNAMAlShjgR/M37lZFdZ0GprVUz5QE13EAAA5RwgA/OjQVOZlRMAAIe5QwwE/Kqmr18abdMka6ZBAlDADCHSUM8JP31+5STZ1XI7u3VvuWsa7jAAAco4QBfjJ7FWuDAQC+QgkD/GB3eZU+z9mjqAijiwa0dx0HABAAKGGAH8xZvVNeK53VK0WtWkS7jgMACACUMMAPDk1FXsZUJACgASUMaGZ5ew9oRd5+tYiO0Ll927mOAwAIEJQwoJnNXrVDknR+v1S1iI50nAYAECgoYUAzstYeXqCVbYoAAEcy1lrXGU7IGFMuaZPrHGGkraQ9rkOEEY63f3G8/Y9j7l8cb//qba1NPJVvDJa5kU3W2kzXIcKFMSaL4+0/HG//4nj7H8fcvzje/mWMyTrV72U6EgAAwAFKGAAAgAPBUsKedh0gzHC8/Yvj7V8cb//jmPsXx9u/Tvl4B8WF+QAAAKEmWEbCAAAAQkpAljBjzBRjzDpjjNcYc9w7PIwxFxpjNhljcowxD/gzYygxxrQ2xnxojMlu+Jh8nNdtN8asMcasPJ27QcLVic5X4/O3hudXG2POcJEzVDTieE8wxpQ2nM8rjTG/dpEzVBhjnjPG7DbGrD3O85zfTagRx5vzuwkZYzobY+YbYzY09JMfHeM1J32OB2QJk7RW0nckfXq8FxhjIiQ9LukiSf0kXWuM6eefeCHnAUn/sdZmSPpPw9fHc7a1dgi3P5+cRp6vF0nKaPgzTdKTfg0ZQk7i/eGzhvN5iLX2N34NGXqel3ThtzzP+d20nte3H2+J87sp1Un6qbW2r6RRku5sivfwgCxh1toN1toTLc46QlKOtXartbZG0muSJjd/upA0WdILDZ+/IOlyh1lCVWPO18mSXrQ+SyS1MsZ08HfQEMH7g59Zaz+VtO9bXsL53YQacbzRhKy1O621yxs+L5e0QVLaUS876XM8IEtYI6VJyj/i6wJ984CgcVKttTsl34km6Xi7TFtJHxhjlhljpvktXWhozPnKOd10GnssRxtjVhlj3jPG9PdPtLDF+e1/nN/NwBjTTdJQSUuPeuqkz3FnK+YbYz6S1P4YT/3CWvt2Y/6KYzzGrZ7H8W3H+yT+mrHW2kJjTDtJHxpjNjb8NoYTa8z5yjnddBpzLJdL6mqtrTDGTJL0lnzTCGgenN/+xfndDIwxCZJmSbrXWlt29NPH+JZvPcedlTBr7bmn+VcUSOp8xNedJBWe5t8Zsr7teBtjiowxHay1OxuGTncf5+8obPi42xjzpnxTPpSwxmnM+co53XROeCyPfAO11s41xjxhjGlrrWXPvebB+e1HnN9NzxgTJV8Bm26tfeMYLznpczyYpyO/lJRhjOlujImWdI2k2Y4zBavZkm5q+PwmSd8YiTTGxBtjEg99Lul8+W6gQOM05nydLem7DXfYjJJUemiaGCfthMfbGNPeGGMaPh8h3/vhXr8nDR+c337E+d20Go7ls5I2WGsfOc7LTvocD8gNvI0xV0h6TFKKpDnGmJXW2guMMR0lPWOtnWStrTPG3CXpfUkRkp6z1q5zGDuYPSTpdWPMLZLyJE2RpCOPt6RUSW82/JuOlPSKtXaeo7xB53jnqzHm9obn/yFprqRJknIkHZD0PVd5g10jj/dVkn5ojKmTdFDSNZbVq0+ZMeZVSRMktTXGFEj6b0lREud3c2jE8eb8blpjJd0oaY0xZmXDYw9K6iKd+jnOivkAAAAOBPN0JAAAQNCihAEAADhACQMAAHCAEgYAAOAAJQwAAMABShgAAIADlDAAAAAHKGEAwoIxZrgxZrUxJrZhB4h1xpgBrnMBCF8s1gogbBhj/p+kWElxkgqstb93HAlAGKOEAQgbDftIfimpStIYa22940gAwhjTkQDCSWtJCZIS5RsRAwBnGAkDEDaMMbMlvSapu6QO1tq7HEcCEMYiXQcAAH8wxnxXUp219hVjTISkRcaYidbaj11nAxCeGAkDAABwgGvCAAAAHKCEAQAAOEAJAwAAcIASBgAA4AAlDAAAwAFKGAAAgAOUMAAAAAcoYQAAAA78f/lvrVq4UvfLAAAAAElFTkSuQmCC\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(X2, y, xlabel='x', ylabel='y')\n",
|
||
"theta_start = np.matrix([0, 0, 0]).reshape(3, 1)\n",
|
||
"theta, _ = gradient_descent(cost, gradient, theta_start, X2, y, eps=0.000001)\n",
|
||
"plot_fun(fig, polynomial_regression(theta), X1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ten model jest odpowiednio dopasowany."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[<matplotlib.lines.Line2D at 0x19d05047b20>]"
|
||
]
|
||
},
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmEAAAFoCAYAAAAfEiweAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxU1eH///fJvhMgCzsBEkABFUFEBeqCG221tSK471pbq1Zr66efbp/+PvXbRWu1WvtRUNGqgFatVeuKG4LsyCokQIAQyAYkkz0zc35/TLBUWRLIzJnl9Xw8fCSZmcy8vY/L5c2595xrrLUCAABAaMW5DgAAABCLKGEAAAAOUMIAAAAcoIQBAAA4QAkDAABwgBIGAADgQNBKmDHmCWNMpTFmzX6P9TDGvGOMKW7/2j1Ynw8AABDOgjkS9pSk87702D2S3rPWFkl6r/1nAACAmGOCuVirMaZA0mvW2pHtP2+QdLq1dqcxprekD6y1w4IWAAAAIEyF+pqwfGvtTklq/5oX4s8HAAAICwmuAxyMMeYmSTdJUnp6+pjhw4c7TgQACFdev9WGXR75rdWgnHRlJIftX2+IMsuWLau21uYeye+Gei+tMMb03u90ZOXBXmitfUzSY5I0duxYu3Tp0lBlBABEmP9+ebX2Ltqms4bnaeY1J7mOgxhijNl6pL8b6tORr0q6uv37qyX9I8SfDwCIMiWV9Zq9ZLvijHTP+Zw1QeQI5hIVz0taKGmYMabMGHO9pN9KOtsYUyzp7PafAQA4Yr/91+fy+a2mjxugovxM13GADgva6Uhr7aUHeeqsYH0mACC2fLq5Ru+ur1BaUrzumFzkOg7QKayYDwCISH6/1b1vrJck3TxpiPIyUxwnAjqHEgYAiEj/XFWuVWW1ystM1o2TBrmOA3QaJQwAEHGa23z6/ZsbJEl3nTNUaUksSYHIQwkDAEScpxeWasfeJg3Lz9TFY/q7jgMcEUoYACCi7G1s1cPzSiRJ90wZrvg44zgRcGQoYQCAiPLneSWqa/ZqQmGOTh96RAuVA2GBEgYAiBhbaxr09MJSGSP915ThMoZRMEQuShgAIGL8/q0NavNZfXt0X43o0811HOCoUMIAABFh+bY9en3VTiUnxOlH5wxzHQc4apQwAEDYs9bq3tcDC7NeP2GQ+mSnOk4EHD1KGAAg7L21tkJLt+5Rj/Qkfff0Ia7jAF2CEgYACGsLSqr1q1fXSpLumFykrJREx4mArsESwwCAsLS3sVW/eX29XlhWJkk6oX+2Lh03wHEqoOtQwgAAYcVaq3+u2qlf/3OtqutblRQfpx+cWaibvzZEifGcwEH0oIQBAMLGjr1N+vkrazTv80pJ0rhBPfT/LhqlIbkZjpMBXY8SBgBwzue3enphqf7w1gY1tvqUmZKgn045RtPG9lcctyVClKKEAQCc+nxXne75+2qt3L5XknT+yF76nwtGKC8rxXEyILgoYQAAJ5rbfHp4Xon++uEmef1W+VnJ+v8uHKlzRvRyHQ0ICUoYACDkPt1co5++tFqbqxskSVeMH6Afnzec5ScQUyhhAIDg8HikOXOk4mKpqEiaNk21CSn67b/W6/nF2yVJhXkZ+u1FozS2oIfjsEDoUcIAAF1v/nxpyhTJ75caGmTT0/Wvh2frlxfeqaoWq8R4o++fUahbTh+i5IR412kBJyhhAICu5fEECpjHI0naldFTPz/7u3pn6ClSi9WYfln67dQTVJSf6Tgo4BYlDADQtebMkd9vVZ6Vq3cLT9Z9k65SfXKaMloa9ZOFz+nyG76huPyJrlMCzlHCAABHrMXr05bqBm2qbNCmqnqVVNZrU3GGNt/0lJqS/r3ExNkbF+rX7/5VvT010hnDHSYGwgclDABwWLWNbSqp8mhTZYNKquq1qbJeJVX12r67UX77pRebTClJyq3frcKa7bpq+es6b+MCGUlKT5cKCx38HwDhhxIGALHoADMXbUaGKj0t2rDLo5L2krWpsl6bqupVXd96wLeJM9KgnHQNyU3XkLwMDcnNUGFGnIZMHKtu1bsO8Atx0rRpQf6fAyIDJQwAYoz9+GNVX3yZNnbvq40Zedq4tlXFHz+mjQOGq67twL+TmhivIXnpgZKVm6EheRkqzMvQwJ5pB57d+PIL/zE7UunpgQL2xhtSBveBBCRKGABEtd0NrdpY4fn3f+W1Kt6wQ3uu/etXX9wmdUtJ0LBeWSrMD5StwrxA4eqdldK5ezhOmCCVlwdG20pKAqcgp02jgAH7oYQBQBSw1mr1jlqtKqtVcYVHGyvqVVzpOfBpxNRMZbY0qKh6m4ZVbVVR9TYNrd6qoY3Vyr33f2RuuL5rQmVkSNd30XsBUYgSBgARzFqrj4qr9dB7xVq2dc9Xnk9PildhfqaG5mVoWK9MFb02V0MfvU+9PDU64LjWppKgZwYQQAkDgAhkrdX7Gyr14Hsl+mz7XklSdlqizhyep6H5mRqWn6mi/Az16Zb6n6cRN+RI/uYDvykzF4GQooQBQASx1uqddRV6aF6x1uyokyT1TE/SjZMG64rxA5WRfJjD+rRp0p13Hvg5Zi4CIUUJA4AI4Pdbvbl2l/48r0TrdwbKV05Gsr77tcG67OQBSkvq4OE8MzMwQ5GZi4BzlDAACGM+v9Xrq3fq4XnF2lhRL0nKz0rWd782RJeOG6CUxCO4+TUzF4GwQAkDgDDk9fn1z1Xl+vO8Em2uapAk9emWolvOKNTUMf2OrHztj5mLgHOUMAAII20+v15ZsUOPvF+i0ppGSVK/7qn6/hmF+s6J/ZSUEOc4IYCuQgkDgDDQ6vXrpeVleuSDEm3f3SRJGtgzTd8/o1DfHt1XifGULyDaUMIAwKEWr08vLC3Tox9s0o69gfI1OCddt55ZqAuO76MEyhcQtShhAOBIXXObrntyiZa2L7JamJehH5xZqG8c10fxnblFEICIRAkDAAdqG9t01ZOL9dn2verdLUU/+/qxOn9kr87dnxFARKOEAUCI7Wlo1RUzF2lteZ36dU/V8zeOV/8eaa5jAQgxShgAhFB1fYuumLFIn+/yaGDPND1/43j1yU51HQuAA5QwAAiRyrpmXT5jkYor6zU4N13P3zhe+VkprmMBcIQSBgAhsKu2WZc9/qk2VzdoaH6Gnr1hvHIzk13HAuAQJQwAgqxsT6Mue3yRtu1u1DG9s/S368epZwYFDIh1lDAACKJtNY269PFPtWNvk0b17aZnrh+n7LQk17EAhAFKGAAEyZbqBl32+KfaWdus0QOy9dS149QtNdF1LABhghIGAEFQUunRZY8vUqWnRScVdNcT15ykzBQKGIB/c3I/DGPMD40xa40xa4wxzxtjmB4EIGps2OXR9Mc+VaWnRacM7qmnrh1HAQPwFSEvYcaYvpJukzTWWjtSUryk6aHOAQDBsLa8VtMfW6jq+lZNLMrRE9ecpPRkTjoA+CpXR4YESanGmDZJaZLKHeUAgC6zqmyvrpy5WLVNbTpzeJ7+cvmJSkmMdx0LQJgK+UiYtXaHpPskbZO0U1KttfbtL7/OGHOTMWapMWZpVVVVqGMCQKcs27pHlz++SLVNbTrn2Hz99YoxFDAAh+TidGR3SRdKGiSpj6R0Y8wVX36dtfYxa+1Ya+3Y3NzcUMcEgA5bvGW3rpq5SJ4Wr74+qrceufxEJSU4ueQWQARxcZSYLGmLtbbKWtsm6SVJpzrIAQBHbUFJta5+YrEaWn361gl99OD0E5QYTwEDcHgujhTbJI03xqQZY4yksyStd5ADAI7KRxurdO1TS9TU5tPFY/rp/ktOUAIFDEAHubgmbJGkFyUtl7S6PcNjoc4BAEdj3ucVumHWUrV4/bp03AD9/jvHKT7OuI4FIII4mR1prf2lpF+6+GwAOFrryuv03WeWq9Xn19WnDNSvLhihwMA+AHQc4+YA0AmtXr9+9MJnavX5NW1sfwoYgCNGCQOATnjk/RKt21mnAT3S9ItvHksBA3DEKGEA0EFrdtTqkfdLJEm/v/g4VsIHcFQoYQDQAftOQ3r9VtecWqDxg3u6jgQgwlHCAKAD/jyvWJ/v8qigZ5p+fN4w13EARAFKGAAcxqqyvfrLB5tkjPSHqccrLYnTkACOHiUMAA6hxevTXXM/k89vdd1pg3RSQQ/XkQBECUoYABzCn94tVnFlvQbnpOtH53AaEkDXoYQBwEGs3L5X//fhJsW1n4ZMTYp3HQlAFKGEAcABNLf5dNfclfJb6caJgzVmYHfXkQBEGUoYABzAA+9s1KaqBg3JTdcPzx7qOg6AKEQJA4AvWbZ1tx77eLPijHTf1OOVkshpSABdjxIGAPtpbvPp7hdWyVrp5q8N0egBnIYEEBwsdgMAHo80Z45UXKz7up+ozXszNDQ/Q3dMLnKdDEAUo4QBiG3z50tTpkh+v5ZkD9TMyyco3vp0X5FVcgKnIQEED6cjAcQujydQwDweNbZ6dfeUO2RNnG759AUdd9kFUn2964QAohglDEDsmjNH8vslSb+fdLVKe/TR8Mot+sGC2YHH58xxHBBANON0JIDYVVwsNTTo0/4j9dTYC5Tg8+q+N/6kZJ9XavBKJSWuEwKIYoyEAYhdRUVqyO6hH59/uyTpe5/O1ciKTYHn0tOlwkKH4QBEO0bCAMSuadP0u5dXaVv33jqmYrNuXTD338/FxUnTprnLBiDqUcIAxKwFFS16etS5SvB5df+8vyrJ7w2MgMXFSW+8IWVkuI4IIIpRwgDEpPoWr+5+cZUk6bYzC3Xs8DsD14AVFgZGwChgAIKMEgYgJt37xnrt2NukkX2zdMs5x0jxI1xHAhBjuDAfQMz5uLhKzy3apsR4o/umHq/EeA6FAEKPIw+AmOJpbtNP2k9D3jF5qIb3ynKcCECsooQBiCm/eX29ymubdVy/brp50mDXcQDEMEoYgJjx0cYqzV6yXUnxcbp/6vFK4DQkAIc4AgGICX6/1b1vrJck3T65SEX5mY4TAYh1lDAAMeH11Tv1+S6P+nRL0Q0TB7mOAwCUMADRz+vz64F3N0qSbjurSMkJ8Y4TAQAlDEAMeGVluTZXNWhgzzR9Z0w/13EAQBIlDECUa/X69af2UbA7JhexJhiAsMHRCEBUm7t0u8r2NKkwL0MXHN/XdRwA+AIlDEDUam7z6c/ziiVJd549VPFxxnEiAPg3ShiAqPXsom2qqGvRsb2zdN6IXq7jAMB/oIQBiEoNLV49+kGJJOlH5w5VHKNgAMIMJQxAVJq1sFTV9a0aPSBbZwzLcx0HAL6CEgYg6tQ2ten/PtwsSfrROcNkDKNgAMIPJQxA1Jk5f4tqm9o0fnAPnTqkp+s4AHBAlDAAUWV3Q6uemL9FknQXo2AAwhglDEBU+b+PNqm+xauvDc3VSQU9XMcBgIOihAGIGpWeZs1aUCopcC0YAIQzShiAqPGX9zepuc2vc0fka1S/bq7jAMAhUcIARIUde5v03KJtMkb64dlDXccBgMOihAGICg/PK1arz69vHtdHw3tluY4DAIdFCQMQ8UqrGzR3aZnijHTH5CLXcQCgQyhhACLeQ+8Vy+e3+s6J/TQ4N8N1HADoECclzBiTbYx50RjzuTFmvTHmFBc5AES+4gqPXl65Q4nxRredxSgYgMiR4OhzH5T0prX2YmNMkqQ0RzkARLg/vVssa6XpJw1Q/x4cSgBEjpCXMGNMlqRJkq6RJGttq6TWUOcAEPnWltfq9dU7lZwQp1vPLHQdBwA6xcXpyMGSqiQ9aYxZYYyZYYxJd5ADQIT749sbJUlXjh+o/KwUx2kAoHNclLAESSdKetRaO1pSg6R7vvwiY8xNxpilxpilVVVVoc4IIMwt37ZH731eqbSkeH339CGu4wBAp7koYWWSyqy1i9p/flGBUvYfrLWPWWvHWmvH5ubmhjQggPC3bxTs2tMKlJOR7DgNAHReyEuYtXaXpO3GmH03djtL0rpQ5wAQuRZuqtH8kmplpiTopomMggGITK5mR/5A0rPtMyM3S7rWUQ4AEcZaqz++s0GSdOPEweqWlug4EQAcGSclzFq7UtJYF58NILJ9VFytJaV71D0tUddNGOQ6DgAcMVbMBxAxrLW6/+3AKNgtpw9RRrKrwXwAOHqUMAAR4+11FVpVVqvczGRdOb7AdRwAOCqUMAARwe+3X8yIvPWMQqUmxTtOBABHhxIGICK8tnqnNlR41KdbiqaP6+86DgAcNUoYgLDn9fn1p3cCo2C3nVWk5ARGwQBEPkoYgLD38ood2lzdoIKeafrOmH6u4wBAl6CEAQhrXp9fD75XLEm6Y/JQJcZz2AIQHTiaAQhr/1qzS2V7mjQ4J13fPL6P6zgA0GUoYQDClrVWMz7eLEm6fuIgxccZx4kAoOtQwgCEraVb9+izslp1T0vURaO5FgxAdKGEAQhb+0bBrhw/kHXBAEQdShiAsFRa3aC311UoKT5OV5wy0HUcAOhylDAAYenJT7bIWunCE/ooLzPFdRwA6HKUMABhp7axTXOXlkmSbpg42HEaAAgOShiAsPPc4m1qavNpYlGOhvXKdB0HAIKCEgYgrLR6/XpqwRZJjIIBiG6UMABh5fXV5aqoa9HQ/AxNKspxHQcAgoYSBiBsBBZnbR8FmzBYxrA4K4DoRQkDEDYWbq7R2vI65WQk6YITuEURgOhGCQMQNma2j4JdOb5AKYkszgogulHCAISFTVX1eu/zSiUnxOmK8QNcxwGAoKOEAQgLM+cHRsEuOrGfemYkO04DAMFHCQPg3O6GVv19WWBx1usnFLgNAwAhQgkD4Nyzn25Vi9evM4blqjCPxVkBxAZKGACnmtt8mrVwqyTpRhZnBRBDElwHABBjPB5pzhypuFgqKtKrw7+m6voWHdM7S6cM6ek6HQCEDCUMQOjMny9NmSL5/VJDg2x6umZe+gep5wDdMGEQi7MCiCmUMACh4fEECpjH88VDH+cO1YaeA5TXsEffLMxyGA4AQo9rwgCExpw5gRGw/cw46VuSpKtXvamkF19wkQoAnKGEAQiN4mKpoeGLHzfkDNRHg8cotbVZly/+h1RS4jAcAIQeJQxAaBQVSenpX/z4xNgLJElTV7+r7HgrFRa6SgYATlDCAITGtGlSXOCQU5WWrZdHnClj/bp22auBx6dNcxwQAEKLEgYgNDIzpTfekDIz9czJ31JrQqImb1mmQV5P4PGMDNcJASCkKGEAQmfCBDVv3a6/nXKRJOmGSYOl8nJpwgTHwQAg9FiiAkBIvbSxVrt9cTquXzeN+/4UibXBAMQoRsIAhIzfbzVz/mZJ0vUszgogxh22hBljbjXGdA9FGADR7cONVdpU1aDe3VI0ZVRv13EAwKmOjIT1krTEGDPXGHOe4Z+uAI7Q4x8HRsGuPa1AifEMxAOIbYc9ClprfyapSNJMSddIKjbG3GuMGRLkbACiyNryWi3YVKP0pHhNO2mA6zgA4FyH/ilqrbWSdrX/55XUXdKLxpjfBzEbgCgyc/4WSdIlJ/VXt9REx2kAwL3Dzo40xtwm6WpJ1ZJmSLrbWttmjImTVCzpx8GNCCDSVdQ165+flSvOSNedNsh1HAAICx1ZoiJH0kXW2q37P2it9RtjvhGcWACiyawFpWrzWU0Z1Uv9e6S5jgMAYeGwJcxa+4tDPLe+a+MAiDaNrV49u2ibJOn6CYMdpwGA8MH0JABB9eKyMtU2tWn0gGyNGchqNwCwDyUMQND4/FZPtF+Qf+NERsEAYH+UMABB8976CpXWNKpf91Sdc2y+6zgAEFYoYQCCZkb7KNi1pw1SAouzAsB/4KgIIChWle3V4i27lZmcoEvG9nMdBwDCjrMSZoyJN8asMMa85ioDgOB58pNSSdL0cf2VmcLirADwZS5Hwm6XxBIXQBSq8rTotVXlMka66pQC13EAICw5KWHGmH6Svq7ACvwAoszzi7epzWd11vB8FmcFgINwNRL2JwVud+Q/2AuMMTcZY5YaY5ZWVVWFLhmAo9Lm8+vZRYEbbFxzaoHbMAAQxkJewtpvdVRprV12qNdZax+z1o611o7Nzc0NUToAR+vNNbtUUdeiwrwMnVbY03UcAAhbLkbCTpN0gTGmVNJsSWcaY/7mIAeAIJi1oFSSdPWpBTLGuA0DAGEs5CXMWvtf1tp+1toCSdMlzbPWXhHqHAC63podtVq6dY8yUxJ00ei+ruMAQFhjnTAAXWbfKNjUMf2VnpzgNgwAhDmnR0lr7QeSPnCZAUDXqKlv0T8+27csxUDXcQAg7DESBqBLzF6yXa1ev04fmquCnHTXcQAg7FHCABw1r8+vZz8NLEtxNctSAECHUMIAHLV31lWovLZZg3LSNamIJWUAoCMoYQCO2lPtF+RfdcpAxcWxLAUAdAQlDMBRWb+zTou27FZ6UrwuHtPPdRwAiBiUMABH5emFpZKk74zpp8yURKdZACCSUMIAHLG9ja16ecUOSdJVpxS4DQMAEYYSBuCIzV26Xc1tfk0sylFhXobrOAAQUShhAI6Iz2/19ML2ZSkYBQOATqOEATgi8z6vVNmeJg3okaYzhue5jgMAEYcSBuCIzNpvWYp4lqUAgE6jhAHotJJKj+aXVCs1MV5Tx/Z3HQcAIhIlDECnzVoQuBbs2yf2VbdUlqUAgCNBCQPQKXXNbfr78jJJXJAPAEeDEgagU15YWqbGVp9OGdxTw3pluo4DABGLEgagw/x+q2cWlkqSrj61wGUUAIh4lDAAHfbhxiqV1jSqb3aqJh/DshQAcDQoYQA67Kn2ZSmuGD9QCfEcPgDgaHAUBdAhm6vq9eHGKiUnxGn6SSxLAQBHixIGoEP23aLowhP6qHt6kuM0ABD5KGEADqu+xasXl7UvS8EF+QDQJShhAA7rpeVlqm/xalxBD43o0811HACICpQwAIdkrf3iPpGMggFA16GEATik+SXV2lTVoF5ZKTpnRL7rOAAQNShhAA5p1hfLUgxQIstSAECX4YgK4KC21TTqvc8rlRQfp+njBriOAwBRhRIG4KCe+bRU1krfOL63cjKSXccBgKhCCQNwQI2tXs1Zsl2SdA0X5ANAl6OEATigV1aUq67Zq9EDsnVcv2zXcQAg6lDCAHyFtVZPLdgiiVEwAAgWShiAr1i4uUYbK+qVm5ms80f2dh0HAKISJQzAV+xbluKycQOUlMBhAgCCgaMrgP9QtqdR76yrUEKc0eUnsywFAAQLJQzAf/jbp9vkt9KUUb2Vl5XiOg4ARC1KGIAvNLZ6NXvJNkncJxIAgo0SBuALf1++Q3sb23RC/2ydOIBlKQAgmChhACRJfr/VE/MDy1LcOHGwjDGOEwFAdKOEAZAkvbu+QluqG9Q3O1Xnjsh3HQcAoh4lDIAkaUb7KNh1EwYpIZ5DAwAEG0daAFpVtleLt+xWZnKCLhnbz3UcAIgJlDAAmvFxYBTs0pMHKDMl0XEaAIgNlDAgxu3Y26TXV+9UfJzhPpEAEEKUMCDGzVpQKp/f6uujeqtPdqrrOAAQMyhhQAzzNLfp+UWBxVlvmDjIcRoAiC2UMCCGzVmyXZ4Wr8YN6qHj+rE4KwCEEiUMiFFen19PflIqKbA4KwAgtChhQIx6c+0u7djbpEE56TpreJ7rOAAQc0Jewowx/Y0x7xtj1htj1hpjbg91BiDWWWv1+Mf/Xpw1Lo5bFAFAqCU4+EyvpLustcuNMZmSlhlj3rHWrnOQBYhJy7bu0Wfb9yo7LVEXn8jirADgQshHwqy1O621y9u/90haL6lvqHMAsWzf4qxXnDxQqUnxjtMAQGxyek2YMaZA0mhJi1zmAGLJ1poGvbVul5Li43TVqQNdxwGAmOWshBljMiT9XdId1tq6Azx/kzFmqTFmaVVVVegDAlHqiflbZK10wQl9lJeZ4joOAMQsJyXMGJOoQAF71lr70oFeY619zFo71lo7Njc3N7QBgShV29imuUvLJLE4KwC4FvIL840xRtJMSeuttX8M9ecDsezZxVvV1ObTxEHZGv7aXKm4WCoqkqZNkzIzXccDgJjiYnbkaZKulLTaGLOy/bGfWmvfcJAFiBmtXr9mLSiVJN3w0E+krSulhgYpPV26807pjTekCRPchgSAGBLyEmatnS+JRYmAEHttVbkq6lo0tGa7Jq375N9PNDQEvk6ZIpWXSxkZbgICQIxhxXwgBuy/OOsNK1878L+C/H5pzpyQ5gKAWEYJA2LAwk01Wr+zTjm2VReueOvAL2pokEpKQhsMAGIYJQyIAY9/vFmSdFXPFiWnJB/4RenpUmFhCFMBQGyjhAFRrqTSo/c3VCk5IU5XXH2OFHeQP/ZxcYFZkgCAkKCEAVFu5vzAtWDfGdNPPfJ7BGZBZmYGRr6kwNfMzMDjXJQPACHjYokKACFSU9+ivy/fIUm6fkL74qwTJgRmQc6ZE7gGrLAwMAJGAQOAkKKEAVHsmU+3qtXr11nD8zQkd7+SlZEhXX+9u2AAAE5HAtGquc2nZxZulSTdMHGw4zQAgC+jhAFR6pUVO1TT0KoRfbI0fnAP13EAAF/C6cgjUFHXrPoWr7JTE9UtNVEJ8XRZhBe/32pG+wX5N04crMAtWwEA4YQS1kkbdnn0zYfnq9Xr/+KxzJQEdU9LUnZaorLTkpSdmqjuaYnqlpak7mmJX3o88LqslETFxfEXI4Ljw+IqlVTWq1dWir5+XG/XcQAAB0AJ66Q/vLVBrV6/eqYnyW+tapva5Gn2ytPs1bbdHX8fY6RuqYkampep+y85Xv17pAUvNGLOjPbFWa85rUCJjNQCQFiihHXCsq279e76CqUlxevNOyYpNzNZfr+Vp9mrPY2t2tvUpj2NraptDHzd09im2vave5vatLexVXvbn/M0e7W3sU2LS3frypmL9OItpyon4yArmQOdsK68Tp+U1CgtKV6XjhvgOg4A4CAoYR1krdXv3twgKbDeUm5moDDFxRl1S0tUt7TETr2f1+dXVX2Lbpi1VGvL63TNk4v1/I3jlZnSufcBvmzG/MAo2CVj+6tbKvsTAIQrzlN00Icbq7R4y25lpyXqxklHP90/IT5Ovbul6qlrx2lgzzSt2VGnm55epuY2XxekRayqqGvWPz8rV5yRrjttkOs4AIBDoIR1gN9v9Ye3AqNg3zt9iNPQR9kAABTlSURBVLK6cLQqNzNZz1x3snIzk7Vwc43umL1SPr/tsvdHbJm1oFRtPqtzR/TSgJ5cZwgA4YwS1gGvr96pteV16pWVoqtOKejy9x/QM01PXzdOmSkJenPtLv3slTWyliKGzmls9erZRdsksTgrAEQCSthhtPn8+uM7GyVJt08uUkpifFA+55jeWZp59UlKTojT84u36f63NwblcxC9XlxWptqmNo0ekK0xA7u7jgMAOAxK2GG8sLRMW6obNDgnXVPH9AvqZ40b1EOPXHai4uOMHn6/RE+0L7YJHI7PbzVzv8VZAQDhjxJ2CM1tPj34XmBE6s5zhoZkZfzJx+brtxeNkiT9+rV1emXFjqB/JiLfv9bs1NaaRvXvkapzR/RyHQcA0AGUsEOYtaBUFXUtGtk3S1NGhm7V8alj++unU4ZLkn70wmf6YENlyD4bkce73ynzmycNUTx3YgCAiEAJO4japjb95YNNkqS7zx0e8lsM3TRpiG6eNFhev9Utf1uuZVv3hPTzETleWr5Dm6saNLBnmqad1N91HABAB1HCDuLxjzartqlN4wf30KSiHCcZ7jl/uC4e009NbT5d99QSbazwOMmB8NXc5tOf3m0/ZX72UG5RBAARhCP2AVR6mr+4yPnH5w2XMW5O7xhj9NuLRmnyMfmqbWrTVTMXa8feJidZEJ6eW7RN5bXNGt4rU988ro/rOACATqCEHcAj80rU1ObT2cfm68QBbqf6J8TH6eHLRuukgu7aVdesK2cuUk19i9NMCA/1LV498n6JJOlH5wwL+SlzAMDRoYR9yfbdjXpu8TYZE/iLLRykJMZrxtUnaXivTG2uatC1Ty1RfYvXdSw49uT8LappaNXoAdk665g813EAAJ1ECfuSB97ZqDaf1bdH99WwXpmu43yhW2qinr5unPr3SNWqslp995llavFyn8lYtaehVY99FLhR993nDnN2yhwAcOQoYfv5fFedXl65Q4nxRj+cPNR1nK/Iy0rRM9edrJyMJM0vqdZdcz/jPpMx6q8fbZKnxauJRTk6dYibiSMAgKNDCdvPfW9tkLXS5ScPVP8e4Xnz44KcdD117ThlJCfotVU79atX13KfyRhTUdesWQtKJYXPKXMAQOdRwtot27pb766vVFpSvL5/RqHrOIc0sm83PX7VWCUlxOmZT7fqwfeKXUdCCP15XrGa2/w6b0QvHd8/23UcAMARooRJstbqd29ukCRdP2GQcjOTHSc6vFOG9NRD00crzkh/erdYzywsdR0JIbCtplGzF29XnJHuOif8TpkDADqOEibpw41VWrxlt7LTEnXjpMi5+fF5I3vp3m8H7jP5i1fX6l+rdzpOhGB74N2N8vqtvj26n4ryw2fiCACg82K+hPn9Vr9vHwX73ulDlJWS6DhR50wfN0B3nztM1ko/fnGVtu9udB0JQbJhl0evtE8cuWNykes4AICjFPMl7PXVO7VuZ516ZaXoqlMKXMc5It87fYjOHZEvT4tXd8xZKa/P7zoSguC+twMTRy4bNyBsJ44AADoupktYm8+v+98OjILdPrlIKYnxjhMdmcDtjY5Tflaylm3do0fe3+Q6ErrYim179M66CqUmxuv7Z4b3xBEAQMfEdAl7YWmZSmsaNTgnXVPH9HMd56h0T0/SHy85QcZID80r1rKte1xHQhf6w1uBfyxce1qB8jJTHKcBAHSFmC1hTa0+PfjeRknSnecMVUJ85G+K0wpzdNPEwfL5re6Ys0Ke5jbXkdAFPimp1oJNNcpKSdDNk4a4jgMA6CKR3zyO0KyFpaqoa9HIvlmaMrK36zhd5s5zhmpEnyxt392kX7661nUcHCVrrX7fPgp289eGqFtaZE0cAQAcXEyWsNqmNj36QeC6qR+fO1xxcdFz373khHg9OH20UhLj9NLyHXr1s3LXkXAU3l5Xoc+271VORrKuPa3AdRwAQBeKyRL22EebVNvUpvGDe2hiUfTdd68wL0M//8axkqT/fnm1yvawbEUk8vntFxNHfnBmodKSEhwnAgB0pZgrYZWeZj0xv1SS9OPzhsuY6BkF299l4wbo7GPz5Wn26s453Og7Ev1j5Q5trKhX3+xUTR/X33UcAEAXi7kS9vC8EjW1+XT2sfk6cUB313GCxhij333nOOVmJmtx6W49+kGJ60johFavXw+8G5g4csfkIiUnRObyKQCAg4upEratplHPL94mY6S7zx3mOk7Q9UhP0v1Tj5ckPfBusVZu3+s4ETpqzpJt2r67SYV5GbroxMhePgUAcGAxVcIeeHej2nxW3x7dV0Nj5L57k4bm6voJg+TzW90+e4UaWryuI+Ewmlp9emheYOTyrrOHKj6KJo4AAP4tZkrY57vqvrjv3g8nD3UdJ6R+fN4wDe+Vqa01jfoVy1aEvacWlKrK06JRfbvpvJG9XMcBAARJzJSwWQtKZa10+ckDY+6+e8kJ8frzpaOVnBCnF5aV6fVVO11HwkHUNrXprx8Glk+5+9xhUTtxBAAQQyVsxbbA9VDfPD56FmbtjKL8TP3s68dIkv7rpVUq39vkOBEO5PGPNkf18ikAgH+LiRLW3OZTcWW94ox0bO9uruM4c8X4gTpreJ7qmr364ZyVLFsRZqo8LXriky2SpLvPjd7lUwAAAU5KmDHmPGPMBmNMiTHmnmB/3vqddfL5rQrzMpSaFLtT/Y0x+t3FxyknI1mLtuzWYx9tdh0ptnk80owZ0k9+Is2YoUfeXqfGVp8mH5OnMQOjd/kUAEBAyJfgNsbES3pE0tmSyiQtMca8aq1dF6zPXFNeJ0ka2Td2R8H2yclI1n1Tj9M1Ty7R/W9v0GmFPXVcv2zXsWLP/PnSlCmS3y81NKis10A9d8WfZOITddc50b98CgDAzUjYOEkl1trN1tpWSbMlXRjMD1xTVitJGtmHEiZJpw/L0zWnFsjrt7p99ko1trJsRUh5PIEC5vFIDQ2SpAdHf0ut8Ym6YON8HZMZE1cJAEDMc3G07ytp+34/l7U/FjSrdwRK2Kh+lLB97jl/uIblZ2pLdYN+/c+gDULiQObMCYyAtSvp0U9/H3mmEnxe/XDR3MDzAICo56KEHehq469cIW6MuckYs9QYs7SqquqIP6y5zaeNFR4ZIx3bO+uI3yfapCTG66FLRyspIU6zl2zXm2tYtiJkiou/GAGTpAcmXi5/XLwuWfWOCso3SyXcYgoAYoGLElYmaf+7EfeTVP7lF1lrH7PWjrXWjs3NzT3iD9tY4ZHXbzUkN0PpySG/BC6sDeuVqZ+eP1ySdM9Lq7WrttlxohhRVCSlp0uS1uQP0evDJyrJ26rbFswOPF5Y6DggACAUXJSwJZKKjDGDjDFJkqZLejVYH7bvVOTIPoyCHcjVpxboa0NztbexTXe9sFJ+lq0IvmnTpLg4+Uyc/uesmyRJVy9/Tb3qa6S4uMDzAICoF/ISZq31SrpV0luS1kuaa60N2r101uwrYcyMPCBjjO6berx6pifpk5IazZjPshVBl5kpvfGGHj79Si3pP0J5nhp9b9XrXzyujAzXCQEAIeBkGpa19g1r7VBr7RBr7W+C+VlfXJRPCTuo3Mxk/WHqcZKkP7y14YviiuBZ2u9YPXjyVBlZPZBZru6/+41UXi5NmOA6GgAgRKJ6Lnyr168NuzySpBGUsEM6c3i+rjploNp8VrfNXiFPc5vrSFGrtqlNt89eKb+Vbv5aoU777T3S9dczAgYAMSaqS9jGCo/afFaDc9KVwUX5h/XTKcdoaH6GNlc1cFujILHW6qcvrdaOvU06vn+27jpnqOtIAABHorqEreZ6sE5JSYzXY1eOVbfURL27vlL3vb3BdaSoM3fpdr2+eqcykhP00PQTlBgf1X8EAQCHENV/A6zherBOK8hJ16OXn6j4OKNHP9ikV1bscB0papRU1utXrwYWxv3fb43UwJ7pjhMBAFyKiRLGSFjnnFqYo19981hJ0o//vkortu1xnCjyNbf59IPnV6ipzaeLRvfVt0YH9SYRAIAIELUlrM3n1/ovLspnjbDOuvKUAl1+8gC1ev266Zll2lnb5DpSRPvdm59r/c46DeyZpl9/a6TrOACAMBC1JWxjhUetXr8KeqYpKyXRdZyI9KsLRmj84B6q8rTopqeXqanV5zpSRJr3eYWe/KRUCXFGD00fzSQRAICkKC5ha3fUSeJU5NFIjI/To5eP0YAeaVq9o1Z3v/iZrGXGZGdU1jXrRy+skiT96NxhOr5/tuNEAIBwEbUljJmRXaN7epJmXD1WGckJem3VTj08j5tLd5Tfb3Xn3M+0u6FVEwpzdNPEwa4jAQDCSNSXMGZGHr2h+Zl6cPoJMka6/52NenPNLteRIsLjH2/W/JJq9UhP0h8vOV5xccZ1JABAGInKEub1+bV+Z/vpyD6UsK5w1jH5+sl5wyVJd85dqXXldY4ThbfPtu/VH94KrLN2/9TjlZeV4jgRACDcRGUJK6mqV4vXrwE90tQtjYvyu8rNkwbrotF91djq041PL1V1fYvrSGGpvsWr22avkNdvde1pBTpjeJ7rSACAMBSVJWx12b7rwViaoisZY3TvRaN0Qv9s7djbpFv+tkwtXmZMftkvXlmjrTWNOqZ3lu45f7jrOACAMBWVJYxFWoMncGujMeqVlaIlpXv081fWMGNyPy+vKNNLK3YoNTFef750tJIT4l1HAgCEqagsYVyUH1x5WSl6/KqxSkmM09ylZXrik1LXkcLC1poG/ezlNZKkX37zWBXmZThOBAAIZ1FXwnx+q3VclB90o/p1031Tj5ck/eb1dfpwY5XjRG61+fy6bfZKNbT69PVRvTXtpP6uIwEAwlzUlbBNVfVqbvOrb3aquqcnuY4T1b5xXB/ddmah/Fa69bnl2lRV7zqSM398Z6M+275XfbNTde9Fo2QMy1EAAA4t6krYvovyORUZGndMHqrzRvSSp9mrG2YtVW1jm+tIIfdJSbX++uEmxRnpweknqFsqM3IBAIcXfSVs3/Vg/ShhoRAXZ/THacfrmN5Z2lLdoO8/t1xen991rJCpqW/RD+eslLXS7WcN1diCHq4jAQAiRNSVsLXlzIwMtbSkBD1+1Rj1TE/S/JJq/e/r611HCglrre5+cZUqPS0aV9BDt55Z6DoSACCCRFUJ8/mt1pbvuyifNcJCqV/3NP3flWOUGG/01IJSPb94m+tIQTdrQanmfV6pbqmJemD6CYrntkQAgE6IqhK2pbpeja0+9emWop4Zya7jxJyxBT30m2+PkiT9/JU1+nRzjeNEwbOuvE73vvG5JOl33xmlvtmpjhMBACJNVJWwNTvaR8E4FenMJWP764YJg+T1W93yt2VROWNyx94m3fr8crX6/Lp03ACdN7K360gAgAhkImG1c2OMR9IG1zliSI6katchYgjbO7TY3qHHNg8ttndoDbPWZh7JLyZ0dZIg2WCtHes6RKwwxixle4cO2zu02N6hxzYPLbZ3aBljlh7p70bV6UgAAIBIQQkDAABwIFJK2GOuA8QYtndosb1Di+0demzz0GJ7h9YRb++IuDAfAAAg2kTKSBgAAEBUCcsSZoyZaoxZa4zxG2MOOsPDGHOeMWaDMabEGHNPKDNGE2NMD2PMO8aY4vav3Q/yulJjzGpjzMqjmQ0Sqw63v5qAh9qfX2WMOdFFzmjRge19ujGmtn1/XmmM+YWLnNHCGPOEMabSGLPmIM+zf3ehDmxv9u8uZIzpb4x53xizvr2f3H6A13R6Hw/LEiZpjaSLJH10sBcYY+IlPSLpfEnHSrrUGHNsaOJFnXskvWetLZL0XvvPB3OGtfYEpj93Tgf31/MlFbX/d5OkR0MaMop04vjwcfv+fIK19tchDRl9npJ03iGeZ//uWk/p0NtbYv/uSl5Jd1lrj5E0XtL3u+IYHpYlzFq73lp7uMVZx0kqsdZutta2Spot6cLgp4tKF0qa1f79LEnfcpglWnVkf71Q0tM24FNJ2cYYluM/MhwfQsxa+5Gk3Yd4Cft3F+rA9kYXstbutNYub//eI2m9pL5felmn9/GwLGEd1FfS9v1+LtNXNwg6Jt9au1MK7GiS8g7yOivpbWPMMmPMTSFLFx06sr+yT3edjm7LU4wxnxlj/mWMGRGaaDGL/Tv02L+DwBhTIGm0pEVfeqrT+7izFfONMe9K6nWAp/7bWvuPjrzFAR5jqudBHGp7d+JtTrPWlhtj8iS9Y4z5vP1fYzi8juyv7NNdpyPbcrmkgdbaemPMFEmvKHAaAcHB/h1a7N9BYIzJkPR3SXdYa+u+/PQBfuWQ+7izEmatnXyUb1Emqf9+P/eTVH6U7xm1DrW9jTEVxpje1tqd7UOnlQd5j/L2r5XGmJcVOOVDCeuYjuyv7NNd57Dbcv8DqLX2DWPMX4wxOdZa7rkXHOzfIcT+3fWMMYkKFLBnrbUvHeAlnd7HI/l05BJJRcaYQcaYJEnTJb3qOFOkelXS1e3fXy3pKyORxph0Y0zmvu8lnaPABAp0TEf211clXdU+w2a8pNp9p4nRaYfd3saYXsYY0/79OAWOhzUhTxo72L9DiP27a7Vvy5mS1ltr/3iQl3V6Hw/LG3gbY74t6c+SciW9boxZaa091xjTR9IMa+0Ua63XGHOrpLckxUt6wlq71mHsSPZbSXONMddL2iZpqiTtv70l5Ut6uf3PdIKk56y1bzrKG3EOtr8aY77b/vxfJb0haYqkEkmNkq51lTfSdXB7XyzpFmOMV1KTpOmW1auPmDHmeUmnS8oxxpRJ+qWkRIn9Oxg6sL3Zv7vWaZKulLTaGLOy/bGfShogHfk+zor5AAAADkTy6UgAAICIRQkDAABwgBIGAADgACUMAADAAUoYAACAA5QwAAAAByhhAAAADlDCAMQEY8xJxphVxpiU9jtArDXGjHSdC0DsYrFWADHDGPO/klIkpUoqs9b+P8eRAMQwShiAmNF+H8klkpolnWqt9TmOBCCGcToSQCzpISlDUqYCI2IA4AwjYQBihjHmVUmzJQ2S1Ntae6vjSABiWILrAAAQCsaYqyR5rbXPGWPiJS0wxpxprZ3nOhuA2MRIGAAAgANcEwYAAOAAJQwAAMABShgAAIADlDAAAAAHKGEAAAAOUMIAAAAcoIQBAAA4QAkDAABw4P8H00tIfoKKIg8AAAAASUVORK5CYII=\n",
|
||
"text/plain": [
|
||
"<Figure size 691.2x388.8 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig = plot_data(X5, y, xlabel='x', ylabel='y')\n",
|
||
"theta_start = np.matrix([0, 0, 0, 0, 0, 0]).reshape(6, 1)\n",
|
||
"theta, _ = gradient_descent(cost, gradient, theta_start, X5, y, alpha=0.5, eps=10**-7)\n",
|
||
"plot_fun(fig, polynomial_regression(theta), X1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ten model ma dużą **wariancję** (*variance*) – zachodzi **nadmierne dopasowanie** (*overfitting*)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"(Zwróć uwagę na dziwny kształt krzywej w lewej części wykresu – to m.in. efekt nadmiernego dopasowania)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Nadmierne dopasowanie występuje, gdy model ma zbyt dużo stopni swobody w stosunku do ilości danych wejściowych.\n",
|
||
"\n",
|
||
"Jest to zjawisko niepożądane.\n",
|
||
"\n",
|
||
"Możemy obrazowo powiedzieć, że nadmierne dopasowanie występuje, gdy model zaczyna modelować szum/zakłócenia w danych zamiast ich „głównego nurtu”. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Zobacz też: https://pl.wikipedia.org/wiki/Nadmierne_dopasowanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img style=\"margin:auto\" width=\"90%\" src=\"fit.png\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Obciążenie (błąd systematyczny, *bias*)\n",
|
||
"\n",
|
||
"* Wynika z błędnych założeń co do algorytmu uczącego się.\n",
|
||
"* Duże obciążenie powoduje niedostateczne dopasowanie."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Wariancja (*variance*)\n",
|
||
"\n",
|
||
"* Wynika z nadwrażliwości na niewielkie fluktuacje w zbiorze uczącym.\n",
|
||
"* Wysoka wariancja może spowodować nadmierne dopasowanie (modelując szum zamiast sygnału)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img style=\"margin:auto\" width=\"60%\" src=\"bias2.png\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img style=\"margin:auto\" width=\"60%\" src=\"curves.jpg\"/>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.5. Regularyzacja"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def SGD(h, fJ, fdJ, theta, X, Y, \n",
|
||
" alpha=0.001, maxEpochs=1.0, batchSize=100, \n",
|
||
" adaGrad=False, logError=False, validate=0.0, valStep=100, lamb=0, trainsetsize=1.0):\n",
|
||
" \"\"\"Stochastic Gradient Descent - stochastyczna wersja metody gradientu prostego\n",
|
||
" (więcej na ten temat na wykładzie 11)\n",
|
||
" \"\"\"\n",
|
||
" errorsX, errorsY = [], []\n",
|
||
" errorsVX, errorsVY = [], []\n",
|
||
" \n",
|
||
" XT, YT = X, Y\n",
|
||
" \n",
|
||
" m_end=int(trainsetsize*len(X))\n",
|
||
" \n",
|
||
" if validate > 0:\n",
|
||
" mv = int(X.shape[0] * validate)\n",
|
||
" XV, YV = X[:mv], Y[:mv] \n",
|
||
" XT, YT = X[mv:m_end], Y[mv:m_end] \n",
|
||
" m, n = XT.shape\n",
|
||
"\n",
|
||
" start, end = 0, batchSize\n",
|
||
" maxSteps = (m * float(maxEpochs)) / batchSize\n",
|
||
" \n",
|
||
" if adaGrad:\n",
|
||
" hgrad = np.matrix(np.zeros(n)).reshape(n,1)\n",
|
||
" \n",
|
||
" for i in range(int(maxSteps)):\n",
|
||
" XBatch, YBatch = XT[start:end,:], YT[start:end,:]\n",
|
||
"\n",
|
||
" grad = fdJ(h, theta, XBatch, YBatch, lamb=lamb)\n",
|
||
" if adaGrad:\n",
|
||
" hgrad += np.multiply(grad, grad)\n",
|
||
" Gt = 1.0 / (10**-7 + np.sqrt(hgrad))\n",
|
||
" theta = theta - np.multiply(alpha * Gt, grad)\n",
|
||
" else:\n",
|
||
" theta = theta - alpha * grad\n",
|
||
" \n",
|
||
" if logError:\n",
|
||
" errorsX.append(float(i*batchSize)/m)\n",
|
||
" errorsY.append(fJ(h, theta, XBatch, YBatch).item())\n",
|
||
" if validate > 0 and i % valStep == 0:\n",
|
||
" errorsVX.append(float(i*batchSize)/m)\n",
|
||
" errorsVY.append(fJ(h, theta, XV, YV).item())\n",
|
||
" \n",
|
||
" if start + batchSize < m:\n",
|
||
" start += batchSize\n",
|
||
" else:\n",
|
||
" start = 0\n",
|
||
" end = min(start + batchSize, m)\n",
|
||
" return theta, (errorsX, errorsY, errorsVX, errorsVY)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Przygotowanie danych do przykładu regularyzacji\n",
|
||
"\n",
|
||
"n = 6\n",
|
||
"\n",
|
||
"data = np.matrix(np.loadtxt(\"ex2data2.txt\", delimiter=\",\"))\n",
|
||
"np.random.shuffle(data)\n",
|
||
"\n",
|
||
"X = powerme(data[:,0], data[:,1], n)\n",
|
||
"Y = data[:,2]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def draw_regularization_example(X, Y, lamb=0, alpha=1, adaGrad=True, maxEpochs=2500, validate=0.25):\n",
|
||
" \"\"\"Rusuje przykład regularyzacji\"\"\"\n",
|
||
" plt.figure(figsize=(16,8))\n",
|
||
" plt.subplot(121)\n",
|
||
" plt.scatter(X[:, 2].tolist(), X[:, 1].tolist(),\n",
|
||
" c=Y.tolist(),\n",
|
||
" s=100, cmap=plt.cm.get_cmap('prism'));\n",
|
||
"\n",
|
||
" theta = np.matrix(np.zeros(X.shape[1])).reshape(X.shape[1],1)\n",
|
||
" thetaBest, err = SGD(h, J, dJ, theta, X, Y, alpha=alpha, adaGrad=adaGrad, maxEpochs=maxEpochs, batchSize=100, \n",
|
||
" logError=True, validate=validate, valStep=1, lamb=lamb)\n",
|
||
"\n",
|
||
" xx, yy = np.meshgrid(np.arange(-1.5, 1.5, 0.02),\n",
|
||
" np.arange(-1.5, 1.5, 0.02))\n",
|
||
" l = len(xx.ravel())\n",
|
||
" C = powerme(xx.reshape(l, 1),yy.reshape(l, 1), n)\n",
|
||
" z = classifyBi(thetaBest, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))\n",
|
||
"\n",
|
||
" plt.contour(xx, yy, z, levels=[0.5], lw=3);\n",
|
||
" plt.ylim(-1,1.2);\n",
|
||
" plt.xlim(-1,1.2);\n",
|
||
" plt.legend();\n",
|
||
" plt.subplot(122)\n",
|
||
" plt.plot(err[0],err[1], lw=3, label=\"Training error\")\n",
|
||
" if validate > 0:\n",
|
||
" plt.plot(err[2],err[3], lw=3, label=\"Validation error\");\n",
|
||
" plt.legend()\n",
|
||
" plt.ylim(0.2,0.8);"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<ipython-input-6-09634685a32b>:5: RuntimeWarning: overflow encountered in exp\n",
|
||
" y = 1.0/(1.0 + np.exp(-x))\n",
|
||
"<ipython-input-58-f0220c89a5e3>:19: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
||
" plt.contour(xx, yy, z, levels=[0.5], lw=3);\n",
|
||
"No handles with labels found to put in legend.\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 1152x576 with 2 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"draw_regularization_example(X, Y)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Regularyzacja\n",
|
||
"\n",
|
||
"Regularyzacja jest metodą zapobiegania zjawisku nadmiernego dopasowania (*overfitting*) poprzez odpowiednie zmodyfikowanie funkcji kosztu.\n",
|
||
"\n",
|
||
"Do funkcji kosztu dodawane jest specjalne wyrażenie (**wyrazenie regularyzacyjne** – zaznaczone na czerwono w poniższych wzorach), będące „karą” za ekstremalne wartości parametrów $\\theta$.\n",
|
||
"\n",
|
||
"W ten sposób preferowane są wektory $\\theta$ z mniejszymi wartosciami parametrów – mają automatycznie niższy koszt.\n",
|
||
"\n",
|
||
"Jak silną regularyzację chcemy zastosować? Możemy o tym zadecydować, dobierajac odpowiednio **parametr regularyzacji** $\\lambda$."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Regularyzacja dla regresji liniowej – funkcja kosztu\n",
|
||
"\n",
|
||
"$$\n",
|
||
"J(\\theta) \\, = \\, \\dfrac{1}{2m} \\left( \\displaystyle\\sum_{i=1}^{m} h_\\theta(x^{(i)}) - y^{(i)} \\color{red}{ + \\lambda \\displaystyle\\sum_{j=1}^{n} \\theta^2_j } \\right)\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"* $\\lambda$ – parametr regularyzacji\n",
|
||
"* jeżeli $\\lambda$ jest zbyt mały, skutkuje to nadmiernym dopasowaniem\n",
|
||
"* jeżeli $\\lambda$ jest zbyt duży, skutkuje to niedostatecznym dopasowaniem"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Regularyzacja dla regresji liniowej – gradient\n",
|
||
"\n",
|
||
"$$\\small\n",
|
||
"\\begin{array}{llll}\n",
|
||
"\\dfrac{\\partial J(\\theta)}{\\partial \\theta_0} &=& \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_{\\theta}(x^{(i)})-y^{(i)} \\right) x^{(i)}_0 & \\textrm{dla $j = 0$ }\\\\\n",
|
||
"\\dfrac{\\partial J(\\theta)}{\\partial \\theta_j} &=& \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_{\\theta}(x^{(i)})-y^{(i)} \\right) x^{(i)}_j \\color{red}{+ \\dfrac{\\lambda}{m}\\theta_j} & \\textrm{dla $j = 1, 2, \\ldots, n $} \\\\\n",
|
||
"\\end{array} \n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Regularyzacja dla regresji logistycznej – funkcja kosztu\n",
|
||
"\n",
|
||
"$$\n",
|
||
"\\begin{array}{rtl}\n",
|
||
"J(\\theta) & = & -\\dfrac{1}{m} \\left( \\displaystyle\\sum_{i=1}^{m} y^{(i)} \\log h_\\theta(x^{(i)}) + \\left( 1-y^{(i)} \\right) \\log \\left( 1-h_\\theta(x^{(i)}) \\right) \\right) \\\\\n",
|
||
"& & \\color{red}{ + \\dfrac{\\lambda}{2m} \\displaystyle\\sum_{j=1}^{n} \\theta^2_j } \\\\\n",
|
||
"\\end{array}\n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Regularyzacja dla regresji logistycznej – gradient\n",
|
||
"\n",
|
||
"$$\\small\n",
|
||
"\\begin{array}{llll}\n",
|
||
"\\dfrac{\\partial J(\\theta)}{\\partial \\theta_0} &=& \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_{\\theta}(x^{(i)})-y^{(i)} \\right) x^{(i)}_0 & \\textrm{dla $j = 0$ }\\\\\n",
|
||
"\\dfrac{\\partial J(\\theta)}{\\partial \\theta_j} &=& \\dfrac{1}{m}\\displaystyle\\sum_{i=1}^m \\left( h_{\\theta}(x^{(i)})-y^{(i)} \\right) x^{(i)}_j \\color{red}{+ \\dfrac{\\lambda}{m}\\theta_j} & \\textrm{dla $j = 1, 2, \\ldots, n $} \\\\\n",
|
||
"\\end{array} \n",
|
||
"$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Implementacja metody regularyzacji"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def J_(h,theta,X,y,lamb=0):\n",
|
||
" \"\"\"Funkcja kosztu z regularyzacją\"\"\"\n",
|
||
" m = float(len(y))\n",
|
||
" f = h(theta, X, eps=10**-7)\n",
|
||
" j = 1.0/m \\\n",
|
||
" * -np.sum(np.multiply(y, np.log(f)) + \n",
|
||
" np.multiply(1 - y, np.log(1 - f)), axis=0) \\\n",
|
||
" + lamb/(2*m) * np.sum(np.power(theta[1:] ,2))\n",
|
||
" return j\n",
|
||
"\n",
|
||
"def dJ_(h,theta,X,y,lamb=0):\n",
|
||
" \"\"\"Gradient funkcji kosztu z regularyzacją\"\"\"\n",
|
||
" m = float(y.shape[0])\n",
|
||
" g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))\n",
|
||
" g[1:] += lamb/m * theta[1:]\n",
|
||
" return g"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"slider_lambda = widgets.FloatSlider(min=0.0, max=0.5, step=0.005, value=0.01, description=r'$\\lambda$', width=300)\n",
|
||
"\n",
|
||
"def slide_regularization_example_2(lamb):\n",
|
||
" draw_regularization_example(X, Y, lamb=lamb)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"application/vnd.jupyter.widget-view+json": {
|
||
"model_id": "9a01a44941e544cb9df51b38c035da62",
|
||
"version_major": 2,
|
||
"version_minor": 0
|
||
},
|
||
"text/plain": [
|
||
"interactive(children=(FloatSlider(value=0.01, description='$\\\\lambda$', max=0.5, step=0.005), Button(descripti…"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<function __main__.slide_regularization_example_2(lamb)>"
|
||
]
|
||
},
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"widgets.interact_manual(slide_regularization_example_2, lamb=slider_lambda)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def cost_lambda_fun(lamb):\n",
|
||
" \"\"\"Koszt w zależności od parametru regularyzacji lambda\"\"\"\n",
|
||
" theta = np.matrix(np.zeros(X.shape[1])).reshape(X.shape[1],1)\n",
|
||
" thetaBest, err = SGD(h, J, dJ, theta, X, Y, alpha=1, adaGrad=True, maxEpochs=2500, batchSize=100, \n",
|
||
" logError=True, validate=0.25, valStep=1, lamb=lamb)\n",
|
||
" return err[1][-1], err[3][-1]\n",
|
||
"\n",
|
||
"def plot_cost_lambda():\n",
|
||
" \"\"\"Wykres kosztu w zależności od parametru regularyzacji lambda\"\"\"\n",
|
||
" plt.figure(figsize=(16,8))\n",
|
||
" ax = plt.subplot(111)\n",
|
||
" Lambda = np.arange(0.0, 1.0, 0.01)\n",
|
||
" Costs = [cost_lambda_fun(lamb) for lamb in Lambda]\n",
|
||
" CostTrain = [cost[0] for cost in Costs]\n",
|
||
" CostCV = [cost[1] for cost in Costs]\n",
|
||
" plt.plot(Lambda, CostTrain, lw=3, label='training error')\n",
|
||
" plt.plot(Lambda, CostCV, lw=3, label='validation error')\n",
|
||
" ax.set_xlabel(r'$\\lambda$')\n",
|
||
" ax.set_ylabel(u'cost')\n",
|
||
" plt.legend()\n",
|
||
" plt.ylim(0.2,0.8)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 1152x576 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_cost_lambda()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.6. Krzywa uczenia się"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"* Krzywa uczenia pozwala sprawdzić, czy uczenie przebiega poprawnie.\n",
|
||
"* Krzywa uczenia to wykres zależności między wielkością zbioru treningowego a wartością funkcji kosztu.\n",
|
||
"* Wraz ze wzrostem wielkości zbioru treningowego wartość funkcji kosztu na zbiorze treningowym rośnie.\n",
|
||
"* Wraz ze wzrostem wielkości zbioru treningowego wartość funkcji kosztu na zbiorze walidacyjnym maleje."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def cost_trainsetsize_fun(m):\n",
|
||
" \"\"\"Koszt w zależności od wielkości zbioru uczącego\"\"\"\n",
|
||
" theta = np.matrix(np.zeros(X.shape[1])).reshape(X.shape[1],1)\n",
|
||
" thetaBest, err = SGD(h, J, dJ, theta, X, Y, alpha=1, adaGrad=True, maxEpochs=2500, batchSize=100, \n",
|
||
" logError=True, validate=0.25, valStep=1, lamb=0.01, trainsetsize=m)\n",
|
||
" return err[1][-1], err[3][-1]\n",
|
||
"\n",
|
||
"def plot_learning_curve():\n",
|
||
" \"\"\"Wykres krzywej uczenia się\"\"\"\n",
|
||
" plt.figure(figsize=(16,8))\n",
|
||
" ax = plt.subplot(111)\n",
|
||
" M = np.arange(0.3, 1.0, 0.05)\n",
|
||
" Costs = [cost_trainsetsize_fun(m) for m in M]\n",
|
||
" CostTrain = [cost[0] for cost in Costs]\n",
|
||
" CostCV = [cost[1] for cost in Costs]\n",
|
||
" plt.plot(M, CostTrain, lw=3, label='training error')\n",
|
||
" plt.plot(M, CostCV, lw=3, label='validation error')\n",
|
||
" ax.set_xlabel(u'trainset size')\n",
|
||
" ax.set_ylabel(u'cost')\n",
|
||
" plt.legend()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Krzywa uczenia a obciążenie i wariancja\n",
|
||
"\n",
|
||
"Wykreślenie krzywej uczenia pomaga diagnozować nadmierne i niedostateczne dopasowanie:\n",
|
||
"\n",
|
||
"<img width=\"100%\" src=\"learning-curves.png\"/>\n",
|
||
"\n",
|
||
"Źródło: http://www.ritchieng.com/machinelearning-learning-curve"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "\n",
|
||
"text/plain": [
|
||
"<Figure size 1152x576 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {
|
||
"needs_background": "light"
|
||
},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"plot_learning_curve()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.7. Warianty metody gradientu prostego\n",
|
||
"\n",
|
||
"* Batch gradient descent\n",
|
||
"* Stochastic gradient descent\n",
|
||
"* Mini-batch gradient descent"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### _Batch gradient descent_\n",
|
||
"\n",
|
||
"* Klasyczna wersja metody gradientu prostego\n",
|
||
"* Obliczamy gradient funkcji kosztu względem całego zbioru treningowego:\n",
|
||
" $$ \\theta := \\theta - \\alpha \\cdot \\nabla_\\theta J(\\theta) $$\n",
|
||
"* Dlatego może działać bardzo powoli\n",
|
||
"* Nie można dodawać nowych przykładów na bieżąco w trakcie trenowania modelu (*online learning*)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### *Stochastic gradient descent* (SGD)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Algorytm\n",
|
||
"\n",
|
||
"Powtórz określoną liczbę razy (liczba epok):\n",
|
||
" 1. Randomizuj dane treningowe\n",
|
||
" 1. Powtórz dla każdego przykładu $i = 1, 2, \\ldots, m$:\n",
|
||
" $$ \\theta := \\theta - \\alpha \\cdot \\nabla_\\theta \\, J \\! \\left( \\theta, x^{(i)}, y^{(i)} \\right) $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Randomizacja danych** to losowe potasowanie przykładów uczących (wraz z odpowiedziami)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### SGD - zalety\n",
|
||
"\n",
|
||
"* Dużo szybszy niż _batch gradient descent_\n",
|
||
"* Można dodawać nowe przykłady na bieżąco w trakcie trenowania (*online learning*)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### SGD\n",
|
||
"\n",
|
||
"* Częsta aktualizacja parametrów z dużą wariancją:\n",
|
||
"\n",
|
||
"<img src=\"http://ruder.io/content/images/2016/09/sgd_fluctuation.png\" style=\"margin: auto;\" width=\"50%\" />\n",
|
||
"\n",
|
||
"* Z jednej strony dzięki temu nie utyka w złych minimach lokalnych, ale z drugiej strony może „wyskoczyć” z dobrego minimum"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### _Mini-batch gradient descent_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Algorytm\n",
|
||
"\n",
|
||
"1. Ustal rozmiar \"paczki/wsadu\" (*batch*) $b \\leq m$.\n",
|
||
"2. Powtórz określoną liczbę razy (liczba epok):\n",
|
||
" 1. Powtórz dla każdego batcha (czyli dla $i = 1, 1 + b, 1 + 2 b, \\ldots$):\n",
|
||
" $$ \\theta := \\theta - \\alpha \\cdot \\nabla_\\theta \\, J \\left( \\theta, x^{(i : i+b)}, y^{(i : i+b)} \\right) $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### _Mini-batch gradient descent_\n",
|
||
"\n",
|
||
"* Kompromis między _batch gradient descent_ i SGD\n",
|
||
"* Stabilniejsza zbieżność dzięki redukcji wariancji aktualizacji parametrów\n",
|
||
"* Szybszy niż klasyczny _batch gradient descent_\n",
|
||
"* Typowa wielkość batcha: między kilka a kilkaset przykładów\n",
|
||
" * Im większy batch, tym bliżej do BGD; im mniejszy batch, tym bliżej do SGD\n",
|
||
" * BGD i SGD można traktować jako odmiany MBGD dla $b = m$ i $b = 1$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "skip"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Mini-batch gradient descent - przykładowa implementacja\n",
|
||
"\n",
|
||
"def MiniBatchSGD(h, fJ, fdJ, theta, X, y, \n",
|
||
" alpha=0.001, maxEpochs=1.0, batchSize=100, \n",
|
||
" logError=True):\n",
|
||
" errorsX, errorsY = [], []\n",
|
||
" \n",
|
||
" m, n = X.shape\n",
|
||
" start, end = 0, batchSize\n",
|
||
" \n",
|
||
" maxSteps = (m * float(maxEpochs)) / batchSize\n",
|
||
" for i in range(int(maxSteps)):\n",
|
||
" XBatch, yBatch = X[start:end,:], y[start:end,:]\n",
|
||
"\n",
|
||
" theta = theta - alpha * fdJ(h, theta, XBatch, yBatch)\n",
|
||
" \n",
|
||
" if logError:\n",
|
||
" errorsX.append(float(i*batchSize)/m)\n",
|
||
" errorsY.append(fJ(h, theta, XBatch, yBatch).item())\n",
|
||
" \n",
|
||
" if start + batchSize < m:\n",
|
||
" start += batchSize\n",
|
||
" else:\n",
|
||
" start = 0\n",
|
||
" end = min(start + batchSize, m)\n",
|
||
" \n",
|
||
" return theta, (errorsX, errorsY)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Porównanie uśrednionych krzywych uczenia na przykładzie klasyfikacji dwuklasowej zbioru [MNIST](https://en.wikipedia.org/wiki/MNIST_database):\n",
|
||
"\n",
|
||
"<img src=\"sgd-comparison.png\" width=\"70%\" />"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Wady klasycznej metody gradientu prostego, czyli dlaczego potrzebujemy optymalizacji\n",
|
||
"\n",
|
||
"* Trudno dobrać właściwą szybkość uczenia (*learning rate*)\n",
|
||
"* Jedna ustalona wartość stałej uczenia się dla wszystkich parametrów\n",
|
||
"* Funkcja kosztu dla sieci neuronowych nie jest wypukła, więc uczenie może utknąć w złym minimum lokalnym lub punkcie siodłowym"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.8. Algorytmy optymalizacji metody gradientu\n",
|
||
"\n",
|
||
"* Momentum\n",
|
||
"* Nesterov Accelerated Gradient\n",
|
||
"* Adagrad\n",
|
||
"* Adadelta\n",
|
||
"* RMSprop\n",
|
||
"* Adam\n",
|
||
"* Nadam\n",
|
||
"* AMSGrad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Momentum\n",
|
||
"\n",
|
||
"* SGD źle radzi sobie w „wąwozach” funkcji kosztu\n",
|
||
"* Momentum rozwiązuje ten problem przez dodanie współczynnika $\\gamma$, który można trakować jako „pęd” spadającej piłki:\n",
|
||
" $$ v_t := \\gamma \\, v_{t-1} + \\alpha \\, \\nabla_\\theta J(\\theta) $$\n",
|
||
" $$ \\theta := \\theta - v_t $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Przyspiesony gradient Nesterova (*Nesterov Accelerated Gradient*, NAG)\n",
|
||
"\n",
|
||
"* Momentum czasami powoduje niekontrolowane rozpędzanie się piłki, przez co staje się „mniej sterowna”\n",
|
||
"* Nesterov do piłki posiadającej pęd dodaje „hamulec”, który spowalnia piłkę przed wzniesieniem:\n",
|
||
" $$ v_t := \\gamma \\, v_{t-1} + \\alpha \\, \\nabla_\\theta J(\\theta - \\gamma \\, v_{t-1}) $$\n",
|
||
" $$ \\theta := \\theta - v_t $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Adagrad\n",
|
||
"\n",
|
||
"* “<b>Ada</b>ptive <b>grad</b>ient”\n",
|
||
"* Adagrad dostosowuje współczynnik uczenia (*learning rate*) do parametrów: zmniejsza go dla cech występujących częściej, a zwiększa dla występujących rzadziej:\n",
|
||
"* Świetny do trenowania na rzadkich (*sparse*) zbiorach danych\n",
|
||
"* Wada: współczynnik uczenia może czasami gwałtownie maleć\n",
|
||
"* Wyniki badań pokazują, że często **starannie** dobrane $\\alpha$ daje lepsze wyniki na zbiorze testowym"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Adadelta i RMSprop\n",
|
||
"* Warianty algorytmu Adagrad, które radzą sobie z problemem gwałtownych zmian współczynnika uczenia"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Adam\n",
|
||
"\n",
|
||
"* “<b>Ada</b>ptive <b>m</b>oment estimation”\n",
|
||
"* Łączy zalety algorytmów RMSprop i Momentum\n",
|
||
"* Można go porównać do piłki mającej ciężar i opór\n",
|
||
"* Obecnie jeden z najpopularniejszych algorytmów optymalizacji"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Nadam\n",
|
||
"* “<b>N</b>esterov-accelerated <b>ada</b>ptive <b>m</b>oment estimation”\n",
|
||
"* Łączy zalety algorytmów Adam i Nesterov Accelerated Gradient"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### AMSGrad\n",
|
||
"* Wariant algorytmu Adam lepiej dostosowany do zadań takich jak rozpoznawanie obiektów czy tłumaczenie maszynowe"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"contours_evaluation_optimizers.gif\" style=\"margin: auto;\" width=\"80%\" />"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"<img src=\"saddle_point_evaluation_optimizers.gif\" style=\"margin: auto;\" width=\"80%\" />"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## 3.9. Metody zbiorcze"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * **Metody zbiorcze** (*ensemble methods*) używają połączonych sił wielu modeli uczenia maszynowego w celu uzyskania lepszej skuteczności niż mogłaby być osiągnięta przez każdy z tych modeli z osobna."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Na metodę zbiorczą składa się:\n",
|
||
" * dobór modeli\n",
|
||
" * sposób agregacji wyników"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Warto zastosować randomizację, czyli przetasować zbiór uczący przed trenowaniem każdego modelu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Uśrednianie prawdopodobieństw"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Przykład\n",
|
||
"\n",
|
||
"Mamy 3 modele, które dla klas $c=1, 2, 3, 4, 5$ zwróciły prawdopodobieństwa:\n",
|
||
"\n",
|
||
"* $M_1$: [0.10, 0.40, **0.50**, 0.00, 0.00]\n",
|
||
"* $M_2$: [0.10, **0.60**, 0.20, 0.00, 0.10]\n",
|
||
"* $M_3$: [0.10, 0.30, **0.40**, 0.00, 0.20]\n",
|
||
"\n",
|
||
"Która klasa zostanie wybrana według średnich prawdopodobieństw dla każdej klasy?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Średnie prawdopodobieństwo: [0.10, **0.43**, 0.36, 0.00, 0.10]\n",
|
||
"\n",
|
||
"Została wybrana klasa $c = 2$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Głosowanie klas"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Przykład\n",
|
||
"\n",
|
||
"Mamy 3 modele, które dla klas $c=1, 2, 3, 4, 5$ zwróciły prawdopodobieństwa:\n",
|
||
"\n",
|
||
"* $M_1$: [0.10, 0.40, **0.50**, 0.00, 0.00]\n",
|
||
"* $M_2$: [0.10, **0.60**, 0.20, 0.00, 0.10]\n",
|
||
"* $M_3$: [0.10, 0.30, **0.40**, 0.00, 0.20]\n",
|
||
"\n",
|
||
"Która klasa zostanie wybrana według głosowania?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Liczba głosów: [0, 1, **2**, 0, 0]\n",
|
||
"\n",
|
||
"Została wybrana klasa $c = 3$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Inne metody zbiorcze\n",
|
||
"\n",
|
||
" * Bagging\n",
|
||
" * Boostng\n",
|
||
" * Stacking\n",
|
||
" \n",
|
||
"https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"celltoolbar": "Slideshow",
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.8.3"
|
||
},
|
||
"livereveal": {
|
||
"start_slideshow_at": "selected",
|
||
"theme": "white"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|