zuma/wyk/3b_Przegląd_metod_uczenia_nadzorowanego.ipynb

2915 lines
632 KiB
Plaintext
Raw Normal View History

2022-04-23 07:59:14 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### AITech — Uczenie maszynowe\n",
"# 8. Przegląd metod uczenia nadzorowanego"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 8.1. Naiwny klasyfikator bayesowski"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Naiwny klasyfikator bayesowski jest algorytmem dla problemu klasyfikacji wieloklasowej.\n",
"* Naszym celem jest znalezienie funkcji uczącej $f \\colon x \\mapsto y$, gdzie $y$ oznacza jedną ze zdefiniowanych wcześniej klas.\n",
"* Klasyfikacja probabilistyczna polega na wskazaniu klasy o najwyższym prawdopodobieństwie:\n",
"$$ \\hat{y} = \\mathop{\\arg \\max}_y P( y \\,|\\, x ) $$\n",
"* Naiwny klasyfikator bayesowski należy do rodziny klasyfikatorów probabilistycznych"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"<img style=\"float: right;\" src=\"https://upload.wikimedia.org/wikipedia/commons/d/d4/Thomas_Bayes.gif\">\n",
"\n",
"**Thomas Bayes** (wymowa: /beɪz/) (17021761) angielski matematyk i duchowny"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Twierdzenie Bayesa wzór ogólny\n",
"\n",
"$$ P( Y \\,|\\, X ) = \\frac{ P( X \\,|\\, Y ) \\cdot P( Y ) }{ P ( X ) } $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"Twierdzenie Bayesa opisuje związek między prawdopodobieństwami warunkowymi dwóch zdarzeń warunkujących się nawzajem."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Twierdzenie Bayesa\n",
"(po zastosowaniu wzoru na prawdopodobieństwo całkowite)\n",
"\n",
"$$ \\underbrace{P( y_k \\,|\\, x )}_\\textrm{ prawd. a posteriori } = \\frac{ \\overbrace{ P( x \\,|\\, y_k )}^\\textrm{ model klasy } \\cdot \\overbrace{P( y_k )}^\\textrm{ prawd. a priori } }{ \\underbrace{\\sum_{i} P( x \\,|\\, y_i ) \\, P( y_i )}_\\textrm{wyrażenie normalizacyjne} } $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
" * W tym przypadku „zdarzenie $x$” oznacza, że cechy wejściowe danej obserwacji przyjmują wartości opisane wektorem $x$.\n",
" * „Zdarzenie $y_k$” oznacza, że dana obserwacja należy do klasy $y_k$.\n",
" * **Model klasy** $y_k$ opisuje rozkład prawdopodobieństwa cech obserwacji należących do tej klasy.\n",
" * **Prawdopodobieństwo *a priori*** to prawdopodobienstwo, że losowa obserwacja należy do klasy $y_k$.\n",
" * **Prawdopodobieństwo *a posteriori*** to prawdopodobieństwo, którego szukamy: że obserwacja opisana wektorem cech $x$ należy do klasy $y_k$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Rola wyrażenia normalizacyjnego w twierdzeniu Bayesa\n",
"\n",
" * Wartość wyrażenia normalizacyjnego nie wpływa na wynik klasyfikacji.\n",
"\n",
"**Przykład**: obserwacja nietypowa ma małe prawdopodobieństwo względem dowolnej klasy, wyrażenie normalizacyjne sprawia, że to prawdopodobieństwo staje się porównywalne z prawdopodobieństwami typowych obserwacji, ale nie wpływa na klasyfikację!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Klasyfikatory dyskryminatywne a generatywne\n",
"\n",
"* Klasyfikatory generatywne tworzą model rozkładu prawdopodobieństwa dla każdej z klas.\n",
"* Klasyfikatory dyskryminatywne wyznaczają granicę klas (*decision boundary*) bezpośrednio.\n",
"* Naiwny klasyfikator bayesowski jest klasyfikatorem generatywnym (ponieważ wyznacza $P( x \\,|\\, y )$).\n",
"* Wszystkie klasyfikatory generatywne są probabilistyczne, ale nie na odwrót.\n",
"* Regresja logistyczna jest przykładem klasyfikatora dyskryminatywnego."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Założenie niezależności dla naiwnego klasyfikatora bayesowskiego\n",
"\n",
"* Naiwny klasyfikator bayesowski jest *naiwny*, ponieważ zakłada, że poszczególne cechy są niezależne od siebie:\n",
"$$ P( x_1, \\ldots, x_n \\,|\\, y ) \\,=\\, \\prod_{i=1}^n P( x_i \\,|\\, x_1, \\ldots, x_{i-1}, y ) \\,=\\, \\prod_{i=1}^n P( x_i \\,|\\, y ) $$\n",
"* To założenie jest bardzo przydatne ze względów obliczeniowych, ponieważ bardzo często mamy do czynienia z ogromną liczbą cech (bitmapy, słowniki itp.)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Naiwny klasyfikator bayesowski przykład"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Przydtne importy\n",
"\n",
"import ipywidgets as widgets\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Wczytanie danych (gatunki kosaćców)\n",
"\n",
"data_iris = pandas.read_csv('iris.csv')\n",
"data_iris_setosa = pandas.DataFrame()\n",
"data_iris_setosa['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
"data_iris_setosa['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
"data_iris_setosa['Iris setosa?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-setosa' else 0)\n",
"\n",
"m, n_plus_1 = data_iris_setosa.values.shape\n",
"n = n_plus_1 - 1\n",
"Xn = data_iris_setosa.values[:, 0:n].reshape(m, n)\n",
"\n",
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"liczba przykładów: {0: 100, 1: 50}\n",
"prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}\n"
]
}
],
"source": [
"classes = [0, 1]\n",
"count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]\n",
"prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]\n",
"\n",
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
"print('prior probability:', {c: prior_prob[c] for c in classes})"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wykres danych (wersja macierzowa)\n",
"def plot_data_for_classification(X, Y, xlabel, ylabel): \n",
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
" ax = fig.add_subplot(111)\n",
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
" X = X.tolist()\n",
" Y = Y.tolist()\n",
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
" \n",
" ax.set_xlabel(xlabel)\n",
" ax.set_ylabel(ylabel)\n",
" ax.margins(.05, .05)\n",
" return fig"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83DyBNhqIkNkCI4SGiQHXIRMSKBXwGLEymQEBbsWWZ0mLXiHYF4q1etbeGlXuv0+GWaila5F7FEBkCC7H4AG3h3lJJQkAiDwElJQ2IoMgkspLMOd/7xz6HOTNzztk7c/Zvn332eb/W2mtmP5zf/u7fZHG+7P3bv6+5uwAAABDOjHYHAAAAUHQkXAAAAIGRcAEAAARGwgUAABAYCRcAAEBgJFwAAACBzWp3APtr3rx5vnjx4naHAQAAMMGmTZued/f59fZ1XMK1ePFibdy4sd1hAAAATGBm2xvt45EiAABAYCRcAAAAgZFwAQAABEbCBQAAEBgJFwAAQGAkXAAAAIGRcAEAAARGwgUAABBYsITLzI40s7vN7BEz22pmg3WOOd3MfmVmWyrLZ0LFAwBAEO7SLbdEP5NsD3WOLOLAtIW8wzUm6ZPu/kZJp0i6zMyOr3PcPe7eW1k+HzAeAADSt2GDNDAgXX75eFLjHq0PDET7szhHFnFg2oKV9nH3ZyQ9U/l91MwekXSEpB+HOicAAJnr75cGB6Xh4Wh9aChKcoaHo+39/dmdI3QcmDbzDG4xmtliSf8q6UR3f6lm++mSbpa0Q9JOSX/h7lubtbVs2TKnliIAIFeqd5KqyY4UJTlDQ5JZdufIIg40ZGab3H1Z3X2hEy4zmyvpXyT9tbuPTNp3sKSyu+8ys7MkDbv7kjptrJS0UpIWLVrUt317w9qQAAC0h7s0o2akTrmcfpKT5BxZxIG6miVcQd9SNLPZiu5gfX1ysiVJ7v6Su++q/H6HpNlmNq/Ocde6+zJ3XzZ//vyQIQMAsP+qd5Zq1Y6lyuocWcSBaQn5lqJJ+oqkR9z9iw2OWVA5TmZ2ciWeF0LFBABA6mof4w0ORneUqmOp0kp2kpwjizgwbcEGzUt6u6Q/lPQjM9tS2fYpSYskyd2/LOk8SX9qZmOSXpZ0oWcxqAwAgLRs2DCe5FTHSg0NRfuGh6XTTpOWLw9/jurvIePAtGUyaD5NDJoHAOSKe5QQ9fdPHCvVaHuoc0jh40BTbR00nzYSLgAAkEdtGzQPAAAAEi4AAIDgSLgAANnrlLp/5bJ0xRXRzyTbgQZIuAAA2euUun+rV0tr10p9fePJVbkcra9dG+0HEiDhAgBkr7Y2YDXpymPdvzVrpN5eacuW8aSrry9a7+2N9gMJhJyHCwCA+ibPEVWt/Ze3un8zZkibNo0nWTNnRtt7e6PtM7hvgWSYFgIA0D6dUvevXB5PtiSpVCLZwhRMCwEAyJ9OqftXfYxYq3ZMF5AACRcAIHudUvdv8pitUmnqmC4gARIuAED2GtUGrCZdeXpLsZpsVcdsbdo0nnTxliISYgwXACB7WdQfTEO5HCVVa9ZMHWtWbzu6GrUUAQAAAmPQPAAAQBuRcAEAshdX2qdcji/9k0YbWVxLkvPkpY0iyVt/uHtHLX19fQ4A6HAjI1HKNDjoXi5H28rlaF1yX7Wq+f6RkXTayOJakpwnL20USRv6Q9JGb5C/tD2B2t+FhAsACqD2i6/6hVi7Xio1318up9NGFteS5Dx5aaNI2tAfJFwAgPyp/QKsLo3uRtTbn1YbWVxLJ7VRJBn3R7OEi7cUAQDt4zGlfeL2p9VGGtI4T17aKJIM+4O3FAEA+eMxpX3i9qfVRhrSOE9e2iiSPPVHo1tfeV14pAgABcAYrny2USSM4SLhAoCux1uK+WyjSHhLkYQLALpeuRx94U2+y1DdXio131+9w9VqG1lcS9K7U3loo0ja0B/NEi4GzQMAAKSAQfMAAABtRMIFAAAQGAkXAACNeAr1+NJoo9sUsM9IuAAAaGTDBmlgoP7cXgMD0f4s2ug2BeyzWe0OAACA3OrvlwYHpeHhaH1oKPrSHx6Otvf3Z9NGtylgn/GWIgAAzVTvrFS//KXoS39oKHmJmDTa6DYd2GfN3lIk4QIAII5T47AtOqzPmBYCAIDpqt5pqUWNw/AK1mckXAAANFL7WGtwMLrDUh1blPTLP402uk0B+4xB8wAANLJhw/iXfnXs0NBQtG94WDrtNGn58vBtdJsC9hljuAAAaMQ9+vLv7584dqjR9lBtdJsO7TMGzQMAAATGoHkAAIA2IuECAAAIjIQLAFBMSerxxR1TLrfeBvUWJ+qma61BwgUAKKYk9fjijlm9uvU2qLc4UTdday1376ilr6/PAQCIVS67Dw5G96AGB+uvxx1TKrXeRrmcTqxFUeBrlbTRG+QvbU+g9nch4QIAJFb7ZV5dJn+pxx2TRhtpxVoUBb3WZgkX00IAAIrNE9TjizsmjTbSirUoCnitTAsBAOhOnqAeX9wxabSRVqxF0U3XWtXo1ldeFx4pAgASYQxXPhX4WsUYLgBA1xkZmfolXvvlPjISf8yqVa23MTKSTqxFUeBrbZZwMYYLAFBMnqAen9T8mHPPlW69tbU2qLc4UYGvlVqKAAAAgTFoHgAAoI1IuAAAAAILlnCZ2ZFmdreZPWJmW81ssM4xZmZXm9kTZvaQmS0NFQ8AICWeQf3BJG0ge3F/t7T+LlmdJ0Mh73CNSfqku79R0imSLjOz4ycdc6akJZVlpaQvBYwHAJCGLOoPJmkD2cuqDmIR6y02en0x7UXSrZLeM2nb30u6qGb9MUmHNWuHaSEAoM2ymLsqSRvIXlZzaHXoXF1q9zxckhZL+g9JB0/afrukU2vWfyBpWbO2SLgAIAeyqD9Y0Hp7HS+rv0sH/v2bJVzBp4Uws7mS/kXSX7v7yKR935a0xt3vraz/QNIqd9806biVih45atGiRX3bt28PGjMAIAHPoP5gkjaQvaz+Lh3292/btBBmNlvSzZK+PjnZqtgh6cia9YWSdk4+yN2vdfdl7r5s/vz5YYIFACRXHU9TK+36g0naQPay+rsU7e/f6NZXq4skk3SDpL9pcszZkr5TOfYUST+Ma5dHigDQZozh6l6M4WpK7RjDJelUSS7pIUlbKstZki6VdKmPJ2XXSHpS0o8UM37LSbgAoP2yqD+YpA1kL6s6iB1ab7FZwkVpHwDA/vGYWnhp1B9M0kaOx/IUVtzfPq2/S1bnSRm1FAEAAAKjliIAAEAbkXABAAAERsIFAEiXJ6iDVy5LV1wR/azVaPt0z9NN6I9cI+ECAKQrSR281aultWulvr7x5KpcjtbXro32p3GebkJ/5Fuj1xfzujAtBADkXJI5lEol997eaFtvb/31NM7TTeiPthPTQgAAMlW9szI8PL5tcFAaGhp/nb96R2vLlvFjenulTZsmlnNp9TzdhP5oK6aFAABkzxPUwSuXpZkzx9dLpeTJ1v6cp5vQH23DtBAAgGxV77TUmlwHr3qHq1btmK60ztNN6I/cIuECAKSr9rHW4GCUQA0ORuvVL//ax4m9vdGdrd7eaD1p0pXkPN2E/si3RoO78rowaB4Aci5JHbxqrcTaAfK1A+dXrUrnPN2E/mg7MWgeAJAZT1AHzz2a+mHNmqnjjeptn+55umnsEv3RdgyaBwAACIxB8wAAAG1EwgUAGFcqScuXRz8bbS9SWZ64aymVWo8zjWvNqr/y8ncpokaDu/K6MGgeAALq748GWM+b5z42Fm0bG4vWpWh/kQa8x11LtT9aiTONa82qv/Lyd+lQajJovu0J1P4uJFwAEFBtclVNuiavF6ksT9y1jI21Hmca15pVf+Xl79KhSLgAAMnVJlnVpfaOl/vExKS6JE22qmq/zKtLO77U464ljTjz0kaezlNAzRIu3lIEAExVKkmzZo2vj41NLMEjFassT9y1pBFnXtrI03kKhrcUAQDJlUrSggUTty1YMHEgfZHK8sRdSxpx5qWNPJ2n2zS69ZXXhUeKABAQY7gYw5WHv0uHEmO4AACJ8JY
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"średnia: [matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
"odchylenie standardowe: [matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n",
"(1, 3)\n"
]
}
],
"source": [
"XY = np.column_stack((X, Y))\n",
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
"Y_split = [XY_split[c][:,3] for c in classes]\n",
"\n",
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
"print('średnia: ', X_mean) \n",
"print('odchylenie standardowe: ', X_std)\n",
"\n",
"print(X_std[0].shape)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Rysowanie średnich\n",
"def draw_means(fig, means, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
" class_color = {0: 'r', 1: 'g'}\n",
" classes = range(len(means))\n",
" ax = fig.axes[0]\n",
" mean_x1 = [means[c].item(0, 1) for c in classes]\n",
" mean_x2 = [means[c].item(0, 2) for c in classes]\n",
" for c in classes:\n",
" ax.plot([mean_x1[c], mean_x1[c]], [xmin, xmax],\n",
" color=class_color.get(c, 'c'), linestyle='dashed')\n",
" ax.plot([ymin, ymax], [mean_x2[c], mean_x2[c]],\n",
" color=class_color.get(c, 'c'), linestyle='dashed') "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"from scipy.stats import norm\n",
"\n",
"# Prawdopodobieństwo klasy dla pojedynczej cechy\n",
"# Uwaga: jeżeli odchylenie standardowe dla danej cechy jest równe 0, \n",
"# to nie można określić prawdopodbieństwa klasy!\n",
"def prob(x, c, feature, mean, std):\n",
" sd = std[c].item(0, feature)\n",
" if sd == 0:\n",
" print('Nie można określić prawdopodobieństwa klasy dla cechy {}.!'.format(feature))\n",
" return norm(mean[c].item(0, feature), sd).pdf(x)\n",
"\n",
"# Prawdopodobieństwo klasy\n",
"# Uwaga: tu bierzemy iloczyn dwóch cech (1. i 2.), w ogólności może być ich więcej\n",
"def class_prob(x, c, mean, std, features=[1, 2]):\n",
" result = 1\n",
" for feature in features:\n",
" result *= prob(x[feature], c, feature, mean, std)\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1, 3)\n",
"[matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n",
"[matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
"[[1.57003335e-06 1.61965173e-23 3.09005273e-08]]\n"
]
}
],
"source": [
"print(X_std[0].shape)\n",
"print(X_std)\n",
"print(X_mean)\n",
"\n",
"X_prob_0=class_prob(X, 0, X_mean, X_std)\n",
"print(X_prob_0)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wykres prawdopodobieństw klas\n",
"def plot_prob(fig, X_mean, X_std, classes, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
" class_color = {0: 'r', 1: 'g'}\n",
" ax = fig.axes[0]\n",
" x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
" np.arange(xmin, xmax, 0.02))\n",
" for c in classes:\n",
" fun1 = lambda x: prob(x, c, 1, X_mean, X_std)\n",
" fun2 = lambda x: prob(x, c, 2, X_mean, X_std)\n",
" p = fun1(x1) * fun2(x2)\n",
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n",
" colors=class_color.get(c, 'c'), lw=3)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3gU5fYH8O8kkEIJLXQI1dAlQECaEgERBAQpYsH7w4blihQVBdHrtYEIigW86BWwN6pYERW9iAUQFFBAekAghJ4AKbvn98dh2JLdZLNtUr6f55kn2ZmdmXc3y+zhfc97xhAREBEREVHhRVjdACIiIqLiioEUERERkZ8YSBERERH5iYEUERERkZ8YSBERERH5iYEUERERkZ/KWN0AZ/Hx8dKwYUOrm0FERETkYv369ekiUt19fZEKpBo2bIh169ZZ3QwiIiIiF4Zh7PW0nkN7RERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH4KWSBlGEYzwzA2Oi2nDMMYF6rzEREREYVbyG4RIyLbACQBgGEYkQAOAFgSqvMRERERhVu4hvZ6AdgpIh7vU0NERERUHIUrkLoOwHthOhcRERFRWIQ8kDIMIwrA1QA+8rJ9tGEY6wzDWHfkyJFQN4eCLGVBClIWpFjdDCKyWkqKLkSlTMhypJz0A/CriBz2tFFEXgXwKgAkJydLGNpDQVQvrp7VTSCioqAerwVUOoUjkLoeHNYrsd4e8rbVTSCiouBtXguodArp0J5hGOUAXAFgcSjPQ0RERGSFkPZIicgZANVCeQ6y1rgvtDTYrL6zLG4JEVlq3PkygbN4LaDSJRxDe1SCbTy00eomEFFRsJHXAiqdeIsYIiIiIj8xkCIiIiLyEwMpIiIiIj8xR4oCklgt0eomEFFRkMhrAZVOhkjRqYGZnJws69ats7oZRERERC4Mw1gvIsnu6zm0R0REROQnBlIUkNHLR2P08tFWN4OIrDZ6tC5EpQxzpCgg249ut7oJRFQUbOe1gEon9kgRERER+YmBFBEREZGfGEgRERER+Yk5UhSQpFpJVjeBiIqCJF4LqHRiHSkiIiKiArCOFBEREVGQMZCigIxcPBIjF4+0uhlEZLWRI3UhKmWYI0UB2X9qv9VNIKKiYD+vBVQ6sUeKiIiIyE8MpIiIiIj8xECKiIiIyE/MkaKAdKnXxeomEFFR0IXXAiqdWEeKiIiIqACsI0VEREQUZAykKCBDPxyKoR8OtboZRGS1oUN1ISplmCNFATl65qjVTSCiouAorwVUOrFHioiIiMhPDKSIiIiI/MRAioiIiMhPzJGigPRq1MvqJhBRUdCL1wIqnVhHioiIiKgArCNFREREFGQMpCgg/d7ph37v9LO6GURktX79dCEqZZgjRQE5m3PW6iYQUVFwltcCKp1C2iNlGEZlwzAWGoax1TCMPw3D4F0tiYiIqMQIdY/UCwC+EJFhhmFEASgX4vMRERERhU3IAinDMOIAXAZgFACISDaA7FCdj4iIiCjcQtkj1RjAEQDzDcNoC2A9gLEikhnCc1KYDUgcYHUTiKgoGMBrAZVOIasjZRhGMoCfAHQTkZ8Nw3gBwCkRecTteaMBjAaAhISEDnv37g1Je4iIiIj8ZUUdqf0A9ovIz+cfLwTQ3v1JIvKqiCSLSHL16tVD2BwiIiKi4ApZICUihwCkGobR7PyqXgD+CNX5yBopC1KQsiDF6mYQkdVSUnQhKmVCPWtvDIB3zs/Y2wXg5hCfj4iIiChsQhpIichGAHnGE4mIiIhKAt4ihoiIiMhPDKSIiIiI/MR77VFArm11rdVNIKKi4FpeC6h0ClkdKX8kJyfLunXrrG4GERERkQsr6khRKXAm5wzO5JyxuhlEZLUzZ3QhKmU4tEcBueqdqwAAq0atsrYhRGStq/RagFWrLG0GUbixR4qIiIjITwykiIiIiPzEQIqIiIjITwykiIiIiPzEZHMKyKikUVY3gYiKglGjrG4BkSVYR4qIiIioAKwjRSGRfiYd6WfSrW4GEVktPV0XolKGQ3sUkGEfDgPAOlJEpd4wvRawjhSVNuyRIiIiIvITAykiIiIiPzGQIiIiIvITAykiIiIiPzHZnAJyV/JdVjeBiIqCu3gtoNKJgRQFZETrEVY3gYiKghG8FlDpxKE9CkjqyVSknky1uhlEZLXUVF2IShn2SFFAblpyEwDWkSIq9W7SawHrSFFpwx4pIiIiIj8xkCIiIiLyEwMpIiIiIj8xkCIiIiLyE5PNKSD3dbnP6iYQUVFwH68FVDoxkKKADGw20OomEFFRMJDXAiqdOLRHAdmWvg3b0rdZ3Qwistq2bboQlTLskaKA3PHJHQBYR4qo1LtDrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjITyFNNjcMYw+A0wBsAHJFJDmU56Pwm3LZFKubQERFwRReC6h0CsesvctFJD0M5yEL9G7c2+omEFFR0JvXAiqdOLRHAdl4aCM2HtpodTOIyGobN+pCVMqEukdKAKwwDEMAzBWRV92fYBjGaACjASAhISHEzaFgG/fFOACsI0VU6o3TawHrSFFpE+oeqW4i0h5APwD/NAzjMvcniMirIpIsIsnVq1cPcXOIiIiIgiekgZSI/H3+ZxqAJQA6hfJ8REREROEUskDKMIzyhmFUNH8H0AfA5lCdj4iIiCjcQpkjVRPAEsMwzPO8KyJfhPB8RERERGEVskBKRHYBaBuq41PR8HSvp61uAhEVBU/zWkClUzjqSFEJ1rV+V6ubQERFQVdeC6h0Yh0pCsia1DVYk7rG6mYQkdXWrNGFqJRhjxQFZPLXkwGwjhRRqTdZrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjIT0w2p4DM6jvL6iYQUVEwi9cCKp0YSFFAkmolWd0EIioKkngtoNKJQ3sUkJW7VmLlrpVWN4OIrLZypS5EpQx7pCggT37/JACgd+PeFreEiCz1pF4L0JvXAipd2CNFRERE5CcGUkRERER+YiBFRERE5CcGUkRERER+YrI5BWTugLlWN4GIioK5vBZQ6cRAigLSLL6Z1U0goqKgGa8FVDpxaI8CsnzbcizfttzqZhCR1ZYv14WolGGPFAVk5o8zAQADmw20uCVEZKmZei3AQF4LqHRhjxQRERGRnxhIEREREfmJQ3shlJaZhlk/zcLhjMNWNyVktqVvAwDcuuxWi1sSOmUjy2JU0ih0rtfZ6qYQEVERw0DKg5/3/4ynVz+No2eOBnScTWmbkJmdidoVawepZUXPsXPHAAArdq2wuCWhc/LcScxdPxfJdZIRHRnt93HKRJTBDW1uwK3tbkVkRGQQW0hERFYxRMS3JxpGDQAx5mMR2RfsxiQnJ8u6deu8bs/MzsS53HMFHkcgWLp1KZ754RmcOHeiUG0QERw9exQ1y9dE6xqtC7Wvu9oVa2PKpVNKdImA1JOpAID6lepb3JLQycjOwIw1M7B63+qAjpOWmYZNaZsQFx2HqMgon/erF1cPj6c8jq71u/q8T4QRgcoxlWEYhj9NJSq8VL0WoH7JvRZQ6WYYxnoRSc6zvqBAyjCMqwHMBFAHQBqABgD+FJFWwW5k84uby7xP5+VZLyL4cMuHmLNuDnLtuT4fr3O9zmhfq32h25FQKQF3d7wbFaMrFnpfIm9EBEu2LsHXu772fR8IVu5aib+O/VXo813W4DJM6j4JcdFx+T6vRvkaaFq1aaGPT0RUmgQSSP0GoCeAlSLSzjCMywFcLyKjg97IOobgDs/bIowI3Jx0M9rWbOvTsRpXaYyrLrqK/yMPsQ82fwAAGNF6hMUtKbmybdlY+MfCQg01nzh3Ai/98hKOnDni0/NvaHMDejfqne9zDMNASsMUNKzc0Od2UCnygV4LMILXAiqZAgmk1olI8vmAqp2I2A3D+EVEOgW7kYltEuXlpS973NakShM0qdok2KekAKUsSAEArBq1ytJ2UF6nsk7
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"draw_means(fig, X_mean)\n",
"plot_prob(fig, X_mean, X_std, classes)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Prawdopodobieństwo a posteriori\n",
"def posterior_prob(x, c):\n",
" normalizer = sum(class_prob(x, c, X_mean, X_std)\n",
" * prior_prob[c]\n",
" for c in classes)\n",
" return (class_prob(x, c, X_mean, X_std) \n",
" * prior_prob[c]\n",
" / normalizer)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"Aby teraz przewidzieć klasę $y$ dla dowolnego zestawu cech $x$, wystarczy sprawdzić, dla której klasy prawdopodobieństwo *a posteriori* jest większe:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"# Funkcja klasyfikująca (funkcja predykcji)\n",
"def predict_class(x):\n",
" p = [posterior_prob(x, c) for c in classes]\n",
" if p[1] > p[0]:\n",
" return 1\n",
" else:\n",
" return 0"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"0\n"
]
}
],
"source": [
"x = [1, 2.0, 0.5] # długość płatka: 2.0, szerokość płatka: 0.5\n",
"y = predict_class(x)\n",
"print(y) # 1 To prawdopodobnie jest Iris setosa\n",
"\n",
"x = [1, 2.5, 1.0] # długość płatka: 2.5, szerokość płatka: 1.0\n",
"y = predict_class(x)\n",
"print(y) # 0 To prawdopodobnie nie jest Iris setosa"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"Zobaczmy, jak to wygląda na wykresie. Narysujemy w tym celu granicę między klasą 1 a 0:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"# Wykres granicy klas dla naiwnego Bayesa\n",
"def plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
" ax = fig.axes[0]\n",
" x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
" np.arange(ymin, ymax, 0.02))\n",
" p = [posterior_prob([1, x1, x2], c) for c in classes]\n",
" p_diff = p[1] - p[0]\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5QcdZn/8c8zt0wyM2EyJGRIAkkgEETWhDAiArILKCK4SQwIuuqKi7K7hx+JoIBR/K2iLi7uEnFl3eUXFVRW0Vy4CKuyK4quIiYYESFAhEQw9/vkMpfufn5/VDfTM+lbqqf6Nu/XOX0mVdVV9e1KDvPhW08/Ze4uAAAAHL66cg8AAACgWhGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAIKSGcg8g3fjx433atGnlHgYAAMAgq1ev3u7uE4aur6ggNW3aNK1atarcwwAAABjEzDZkWs+tPQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQUqRByszazWyZma01s2fN7I1Rng8AAKCUGiI+/u2SfuDul5pZk6QxEZ8PAACgZCILUmY2VtI5kq6QJHfvk9QX1fkAAABKLcpbe8dJ2ibp62b2GzNbamYtQ99kZleZ2SozW7Vt27YIhwMAADC8ogxSDZLmSPqKu58qab+kjw19k7vf6e5d7t41YcKECIcDAAAwvKIMUq9IesXdf5VcXqYgWAEAANSEyIKUu2+W9LKZzUyuOl/SM1GdDwAAoNSi/tbeNZLuSX5j70VJH4j4fAAAACUTaZBy9zWSuqI8BwAAQLnQ2RwAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkBqiPLiZrZfULSkuKebuXVGeDwAAoJQiDVJJ57r79hKcBwAAoKS4tQcAABBS1EHKJf3IzFab2VURnwsAAKCkor61d5a7bzSzoyQ9YmZr3f2x9DckA9ZVknTsscdGPBwAAIDhE+mMlLtvTP7cKmmlpNMzvOdOd+9y964JEyZEORwAAIBhFVmQMrMWM2tL/VnSBZKejup8AAAApRblrb2JklaaWeo8/+nuP4jwfAAAACUVWZBy9xclzYrq+AAAAOVG+wMAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQmoo9I1mdpSk5tSyu/8xkhEBAABUibwzUmY218xekPSSpJ9KWi/pvwo9gZnVm9lvzOz7oUcJAABQgQq5tfcZSWdIet7dp0s6X9L/HsY5Fkl6NsTYAAAAKlohQarf3XdIqjOzOnd/VNLsQg5uZlMkXSxpaRFjBAAAqEiF1EjtNrNWSY9JusfMtkqKFXj8L0q6QVJbyPEBAABUrEJmpOZJOijpWkk/kPQHSW/Pt5OZvV3SVndfned9V5nZKjNbtW3btgKGAwAAUBkKCVLvcve4u8fc/W53/5KCWaZ8zpI018zWS/qOpPPM7FtD3+Tud7p7l7t3TZgw4bAGDwAAUE6FBKlLzew9qQUzu0NS3sTj7ovdfYq7T5P0Lkk/dvf3hh4pAABAhSmkRmqBpAfMLCHpbZJ2uvvV0Q4LAACg8mUNUmbWkbb4QUn3KWh7cLOZdbj7zkJP4u4/kfSTkGMEAACoSLlmpFZLckmW9vPi5MslHRf56AAAACpY1iCVbL4pM2t29570bWbWnHkvAACAkaOQYvNfFLgOAABgRMlVI9UpabKk0WZ2qoJbe5I0VtKYEowNAACgouWqkXqrpCskTZF0W9r6bkkfj3BMAAAAVSFXjdTdku42s0vcfXkJxwQAAFAV8vaRcvflZnaxpNdKak5bf3OUAwMAAKh0eYvNzezfJV0u6RoFdVLvlDQ14nEBAABUvEK+tXemu/+1pF3u/mlJb5R0TLTDAgAAqHyFBKmDyZ8HzGySpH5J06MbEgAAQHUo5Fl73zezdklfkPSkgq7mSyMdFQAAQBUoJEh9XlJrsuj8+5Ka3X1PxOMCAACoeLkaci5I/nGKpAVm9qW0bXL3FVEPDgAAoJLlmpH6y7Q/b5b0JUmPJJddEkEKAACMaLkacn4gfdnMLnX3ZdEPCQAAoDoU0kfqyORtvU+Y2Wozu93MjizB2AAAACpaIe0PviNpm6QFki5N/vneKAcFAABQDQr51l6Hu38mbfmzZjY/qgEBAABUi0JmpB41s3eZWV3ydZmkh6IeGAAAQKUrJEj9raT/lNSbfH1H0nVm1m1me6McHAAAQCXLe2vP3dtKMRAAAIBqU8iMFAAAADIgSAEAAIREkAIAAAiJIAUAABBSqCBlZt8f7oEAAABUm5xByszqzewLGTZ9KKLxAAAAVI2cQcrd45JOMzMbsn5TpKMCAACoAoU8IuY3ku43s+9J2p9a6e4rIhsVAABAFSjoWXuSdkg6L22dSyJIAQCAEa2QzuYfKMVAAAAAqk3eb+2Z2Ylm9j9m9nRy+XVmdlP0QwMAAKhshbQ/+H+SFkvqlyR3f0rSu6IcFAAAQDUoJEiNcfcnhqyLRTEYAACAalJIkNpuZscrKDCXmV0qifYHAABgxCvkW3tXS7pT0klm9idJL0l6b6SjAgAAqAKFfGvvRUlvNrMWSXXu3l3Igc2sWdJjkkYlz7PM3f+hmMECAABUkkK+tRc3s89LOpAKUWb2ZAHH7pV0nrvPkjRb0oVmdkZRowUAAKgghdRI/T75vh+ZWUdyneV4vyTJA/uSi43Jl4caJQAAQAUqJEjF3P0GBW0QfmZmp6nAQJR86PEaSVslPeLuv8rwnqvMbJWZrdq2bdvhjB0AAKCsCglSJknu/l1Jl0n6uqTjCjm4u8fdfbakKZJON7NTMrznTnfvcveuCRMmFD5yAACAMiskSH0w9Qd3/72ksyUtPJyTuPtuST+RdOHh7AcAAFDJCglSx5lZmyQlHw1zl6Sn8+1kZhPMrD3559GS3ixpbfihAgAAVJZCgtQn3b3bzM6W9FZJd0v6SgH7HS3pUTN7StKvFdRIfT/8UAEAACpLIQ0548mfF0v6irvfb2afyrdT8pl8pxYxNgAAgIpWyIzUn8zsPxQUmj9sZqMK3A8AAKCmFRKILpP0Q0kXJovGOyRdH+moAAAAqkAhj4g5IGlF2vIm8dBiAAAAbtEBAACERZACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgCgnNyllSuDn4WsH+7jJxLRnr/GEaQAACin++6TFiyQrr12ILS4B8sLFgTbozz+4sXRnr/G5X3WHgAAiND8+dKiRdLttwfLS5YEIeb224P18+dHe/xbbpF
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Dla porównania: regresja logistyczna na tych samych danych"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"def powerme(x1,x2,n):\n",
" X = []\n",
" for m in range(n+1):\n",
" for i in range(m+1):\n",
" X.append(np.multiply(np.power(x1,i),np.power(x2,(m-i))))\n",
" return np.hstack(X)\n",
"\n",
"# Funkcja logistyczna\n",
"def safeSigmoid(x, eps=0):\n",
" y = 1.0/(1.0 + np.exp(-x))\n",
" if eps > 0:\n",
" y[y < eps] = eps\n",
" y[y > 1 - eps] = 1 - eps\n",
" return y\n",
"\n",
"# Funkcja hipotezy dla regresji logistycznej\n",
"def h(theta, X, eps=0.0):\n",
" return safeSigmoid(X*theta, eps)\n",
"\n",
"# Funkcja kosztu dla regresji logistycznej\n",
"def J(h,theta,X,y, lamb=0):\n",
" m = len(y)\n",
" f = h(theta, X, eps=10**-7)\n",
" j = -np.sum(np.multiply(y, np.log(f)) + \n",
" np.multiply(1 - y, np.log(1 - f)), axis=0)/m\n",
" if lamb > 0:\n",
" j += lamb/(2*m) * np.sum(np.power(theta[1:],2))\n",
" return j\n",
"\n",
"# Gradient funkcji kosztu\n",
"def dJ(h,theta,X,y,lamb=0):\n",
" g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))\n",
" if lamb > 0:\n",
" g[1:] += lamb/float(y.shape[0]) * theta[1:] \n",
" return g\n",
"\n",
"# Funkcja klasyfikująca\n",
"def classifyBi(theta, X):\n",
" prob = h(theta, X)\n",
" return prob"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Przygotowanie danych dla wielomianowej regresji logistycznej\n",
"\n",
"data = np.matrix(data_iris_setosa)\n",
"\n",
"Xpl = powerme(data[:, 1], data[:, 0], n)\n",
"Ypl = np.matrix(data[:, 2]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Metoda gradientu prostego dla regresji logistycznej\n",
"def GD(h, fJ, fdJ, theta, X, y, alpha=0.01, eps=10**-3, maxSteps=10000):\n",
" errorCurr = fJ(h, theta, X, y)\n",
" errors = [[errorCurr, theta]]\n",
" while True:\n",
" # oblicz nowe theta\n",
" theta = theta - alpha * fdJ(h, theta, X, y)\n",
" # raportuj poziom błędu\n",
" errorCurr, errorPrev = fJ(h, theta, X, y), errorCurr\n",
" # kryteria stopu\n",
" if abs(errorPrev - errorCurr) <= eps:\n",
" break\n",
" if len(errors) > maxSteps:\n",
" break\n",
" errors.append([errorCurr, theta]) \n",
" return theta, errors"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"theta = [[ 4.01960795]\n",
" [ 3.89499137]\n",
" [ 0.18747599]\n",
" [-1.3524039 ]\n",
" [-2.00123783]\n",
" [-0.87625505]]\n"
]
}
],
"source": [
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
"theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)\n",
"theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, \n",
" alpha=0.1, eps=10**-7, maxSteps=100000)\n",
"print(r'theta = {}'.format(theta))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wykres granicy klas\n",
"def plot_decision_boundary(fig, theta, Xpl, xmin=0.0, xmax=7.0):\n",
" ax = fig.axes[0]\n",
" xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
" np.arange(xmin, xmax, 0.02))\n",
" l = len(xx.ravel())\n",
" C = powerme(yy.reshape(l, 1), xx.reshape(l, 1), n)\n",
" z = classifyBi(theta, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))\n",
"\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5xVdb3/8fdnZkBgALnLPVSEUhMEUlGzzE4ZdhQpL2XndLc6pSTlrfp1OdWxtOTgyS5G15OVBYimnsxqRFQGYQAFBAHB4TLcBgSG21z2/vz+2Hs7M7BvrD1r9t4zr+fjsR8ze629vuuzV+Z8/H4/67PM3QUAAIATV5LvAAAAAIoViRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQGX5DqClAQMG+KhRo/IdBgAAQCtVVVW17j7w2O0FlUiNGjVKS5cuzXcYAAAArZhZdbLtLO0BAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQUKiJlJn1MbM5ZrbWzNaY2eQwzwcAANCeykIef5akv7r7B82sq6QeIZ8PAACg3YSWSJlZb0mXSPqYJLl7g6SGsM4HAADQ3sJc2jtN0m5JvzKz5WY228zKj/2Qmd1oZkvNbOnu3btDDAcAAKBthZlIlUmaIOkn7n6upEOS7jj2Q+7+gLtPcvdJAwcODDEcAACAthVmIrVV0lZ3Xxx/P0exxAoAAKBDCC2RcvcdkraY2dj4psskvRzW+QAAANpb2Hft3STpwfgdexslfTzk8wEAALSbUBMpd18haVKY5wAAAMgXOpsDAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAZWEObmavSaqTFJHU5O6TwjwfAABAewo1kYq71N1r2+E8AAAA7YqlPQAAgIDCTqRc0t/MrMrMbgz5XAAAAO0q7KW9i9y9xswGSXrKzNa6+zMtPxBPsG6UpJEjR4YcDgAAQNsJdUbK3WviP3dJeljSeUk+84C7T3L3SQMHDgwzHAAAgDYVWiJlZuVm1ivxu6T3SFoV1vkAAADaW5hLe6dIetjMEuf5vbv/NcTzAQAAtKvQEil33yhpXFjjAwAA5BvtDwAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAIKCybD9oZoMkdUu8d/fNoUQEAABQJDLOSJnZlWa2XtImSQskvSbp/7I9gZmVmtlyM3sscJQAAAAFKJulvW9LukDSOnc/VdJlkp47gXNMl7QmQGwAAAAFLZtEqtHd90gqMbMSd6+QND6bwc1suKQrJM3OIUYAAICClE2N1D4z6ynpGUkPmtkuSU1Zjv/fkm6T1CtgfAAAAAUrmxmpqyQdkXSLpL9KelXS+zMdZGbvl7TL3asyfO5GM1tqZkt3796dRTgAAACFIZtE6np3j7h7k7v/xt3vU2yWKZOLJF1pZq9J+qOkd5nZ7479kLs/4O6T3H3SwIEDTyh4AACAfMomkfqgmd2QeGNm90vKmPG4+53uPtzdR0m6XtI/3f0jgSMFAAAoMNnUSE2T9KiZRSW9T9Jed/98uGEBAAAUvpSJlJn1a/H2U5LmK9b24D/NrJ+77832JO7+tKSnA8YIAABQkNLNSFVJcknW4ucV8ZdLOi306AAAAApYykQq3nxTZtbN3Y+23Gdm3ZIfBQAA0HlkU2z+fJbbAAAAOpV0NVKDJQ2T1N3MzlVsaU+Sekvq0Q6xAQAAFLR0NVLvlfQxScMl3dtie52kr4QYEwAAQFFIVyP1G0m/MbMPuPvcdowJAACgKGTsI+Xuc83sCklnSerWYvt/hhkYAABAoctYbG5mP5V0naSbFKuTukbSm0KOCwAAoOBlc9fehe7+75Jed/dvSZosaUS4YQEAABS+bBKpI/Gfh81sqKRGSaeGFxIAAEBxyOZZe4+ZWR9J90haplhX89mhRgUAAFAEskmkviepZ7zo/DFJ3dx9f8hxAQAAFLx0DTmnxX8dLmmamd3XYp/cfV7YwQEAABSydDNS/9ri9x2S7pP0VPy9SyKRAgAAnVq6hpwfb/nezD7o7nPCDwkAAKA4ZNNHqn98We+rZlZlZrPMrH87xAYAAFDQsml/8EdJuyVNk/TB+O8PhRkUAABAMcjmrr1+7v7tFu+/Y2ZTwwoIAACgWGQzI1VhZtebWUn8da2kx8MODAAAoNBlk0h9RtLvJdXHX3+UNMPM6szsQJjBAQAAFLKMS3vu3qs9AgEAACg22cxIAQAAIAkSKQAAgIBIpAAAAAIikQIAAAgoUCJlZo+1dSAAAADFJm0iZWalZnZPkl2fDikeAACAopE2kXL3iKSJZmbHbN8ealQAAABFIJtHxCyX9IiZ/VnSocRGd58XWlQAAABFIKtn7UnaI+ldLba5JBIpAADQqWXT2fzj7REIAABAscl4156ZjTGzf5jZqvj7c8zsa+GHBgAAUNiyaX/wc0l3SmqUJHd/SdL1YQYFAABQDLJJpHq4+wvHbGsKIxgAAIBikk0iVWtmpytWYC4z+6Ak2h8AAIBOL5u79j4v6QFJbzazbZI2SfpIqFEBAAAUgWzu2tso6d1mVi6pxN3rshnYzLpJekbSSfHzzHH3b+QSLAAAQCHJ5q69iJl9T9LhRBJlZsuyGLte0rvcfZyk8ZIuN7MLcooWAACggGRTI7U6/rm/mVm/+DZL83lJksccjL/tEn95oCgBAAAKUDaJVJO736ZYG4SFZjZRWSZE8Ycer5C0S9JT7r44yWduNLOlZrZ09+7dJxI7AABAXmWTSJkkufufJF0r6VeSTstmcHePuPt4ScMlnWdmZyf5zAPuPsndJw0cODD7yAEAAPIsm0TqU4lf3H21pIsl3XwiJ3H3fZKelnT5iRwHAABQyLJJpE4zs16SFH80zK8lrcp0kJkNNLM+8d+7S3q3pLXBQwUAACgs2SRS/8/d68zsYknvlfQbST/J4rghkirM7CVJSxSrkXoseKgAAACFJZuGnJH4zysk/cTdHzGzb2Y6KP5MvnNziA0AAKCgZTMjtc3MfqZYofkTZnZSlscBAAB0aNkkRNdKelLS5fGi8X6Sbg01KgAAgCKQzSNiDkua1+L9dvHQYgAAAJboAAAAgiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECACCf3KWHH479zGZ7W48fjYZ7/g6ORAoAgHyaP1+aNk265ZbmpMU99n7atNj+MMe/885wz9/BZXzWHgAACNHUqdL06dKsWbH3M2fGkphZs2L
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary(fig, theta, Xpl)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n",
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxU5fU/8M/JRnZICBAIq2wuVBAoIlpbtVWLfgFpXVrbr7Za2v6sotS1td9utrZqodjaxdLFtrZq2bRqa2kbEIUgBJBFIKyBJIRsZF8mM/f8/pgZsjAbd+bmzkw+79drXmHunfvcM1c0x+c591xRVRARERHRuUuwOwAiIiKiWMVEioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExKsjuA7vLy8nTs2LF2h0FERETUQ3FxcY2qDum9PaoSqbFjx2Lbtm12h0FERETUg4iU+trOpT0iIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSZYmUiIySERWish+EdknIpdZeT4iIiKivpRk8fjLAfxTVT8tIikA0i0+HxEREVGfsSyREpFsAFcCuBMAVNUBwGHV+YiIiIj6mpVLe+cBqAbwexHZISIrRCSj94dEZJGIbBORbdXV1RaGQ0RERBRZViZSSQCmA/ilql4CoAXAo70/pKrPq+pMVZ05ZMgQC8MhIiIiiiwrE6kyAGWqusXzfiXciRURERFRXLAskVLVSgAnRGSyZ9M1AD6w6nxEREREfc3qu/buBfCi5469IwC+YPH5iIiIiPqMpYmUqu4EMNPKcxARERHZhZ3NiYiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5KsHFxEjgFoAuAC4FTVmVaej4iIiKgvWZpIeVylqjV9cB4iIiKiPsWlPSIiIiKTrE6kFMC/RKRYRBZZfC4iIiKiPmX10t7lqlohIkMBrBOR/ar6dvcPeBKsRQAwevRoi8MhIiIiihxLZ6RUtcLzswrAGgCzfHzmeVWdqaozhwwZYmU4RERERBFlWSIlIhkikuX9M4BrAeyx6nxEREREfc3Kpb1hANaIiPc8f1HVf1p4PiIiIqI+ZVkipapHAEy1anwiIiIiu7H9AREREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZFJSqB8UkaEAUr3vVfW4JRERERERxYigM1IiMk9EDgI4CmADgGMA/hHqCUQkUUR2iMjrpqMkIiIiikKhLO19H8BsACWqOg7ANQDePYdzLAawz0RsRERERFEtlESqU1VrASSISIKqFgKYFsrgIjISwA0AVoQRIxEREVFUCqVGql5EMgG8DeBFEakC4Axx/J8CeBhAlsn4iIiIiKJWKDNS8wG0AXgAwD8BHAZwY7CDRORGAFWqWhzkc4tEZJuIbKuurg4hHCIiIqLoEEoidZuqulTVqaovqOqzcM8yBXM5gHkicgzASwCuFpE/9/6Qqj6vqjNVdeaQIUPOKXgiIiIiO4WSSH1aRG73vhGR5wAEzXhU9TFVHamqYwHcBuC/qvo505ESERERRZlQaqQWAnhNRAwAnwRQp6r3WBsWERERUfTzm0iJSG63t3cDWAt324PviUiuqtaFehJVXQ9gvckYiYiIiKJSoBmpYgAKQLr9vMHzUgDnWR4dERERURTzm0h5mm9CRFJVtb37PhFJ9X0UERERUf8RSrH5phC3EREREfUrgWqk8gEUAEgTkUvgXtoDgGwA6X0QGxEREVFUC1QjdR2AOwGMBLC02/YmAN+wMCYiIiKimBCoRuoFAC+IyKdUdVUfxkREREQUE4L2kVLVVSJyA4CLAKR22/49KwMjIiIiinZBi81F5FcAbgVwL9x1UjcDGGNxXERERERRL5S79uao6v8COK2q3wVwGYBR1oZFREREFP1CSaTaPD9bRWQEgE4A46wLiYiIiCg2hPKsvddFZBCApwFsh7ur+QpLoyIiIiKKAaEkUj8CkOkpOn8dQKqqNlgcFxEREVHUC9SQc6HnjyMBLBSRZ7vtg6qutjo4IiIiomgWaEbqf7r9uRLAswDWed4rACZSRERE1K8Fasj5he7vReTTqrrS+pCIiIiIYkMofaQGe5b1vikixSKyXEQG90FsRERERFEtlPYHLwGoBrAQwKc9f37ZyqCIiIiIYkEod+3lqur3u71/QkQWWBUQERERUawIZUaqUERuE5EEz+sWAG9YHRgRERFRtAslkfoygL8A6PC8XgKwRESaRKTRyuCIiIiIolnQpT1VzeqLQIiIiIhiTSgzUkRERETkAxMpIiIiIpOYSBERERGZxESKiIiIyCRTiZSIvB7pQIiIiIhiTcBESkQSReRpH7u+ZFE8RERERDEjYCKlqi4AM0REem0/aWlURERERDEglEfE7ADwqoj8DUCLd6OqrrYsKiIiIqIYENKz9gDUAri62zYFwESKiIiI+rVQOpt/oS8CISIiIoo1Qe/aE5FJIvIfEdnjeX+xiDxufWhERERE0S2U9ge/AfAYgE4AUNVdAG6zMigiIiKiWBBKIpWuqu/12ua0IhgiIiKiWBJKIlUjIuPhLjCHiHwaANsfEBERUb8Xyl179wB4HsD5IlIO4CiAz1kaFREREVEMCOWuvSMAPi4iGQASVLUplIFFJBXA2wAGeM6zUlW/HU6wRERERNEklLv2XCLyIwCt3iRKRLaHMHYHgKtVdSqAaQCuF5HZYUVLREREFEVCqZHa6/ncv0Qk17NNAnweAKBuzZ63yZ6XmoqSiIiIKAqFkkg5VfVhuNsgbBSRGQgxIfI89HgngCoA61R1i4/PLBKRbSKyrbq6+lxiJyIiIrJVKImUAICqvgLgFgC/B3BeKIOrqktVpwEYCWCWiEzx8ZnnVXWmqs4cMmRI6JETERER2SyUROpu7x9UdS+AKwDcdy4nUdV6AOsBXH8uxxERERFFs1ASqfNEJAsAPI+G+QOAPcEOEpEhIjLI8+c0AB8HsN98qERERETRJZRE6luq2iQiVwC4DsALAH4ZwnHDARSKyC4AW+GukXrdfKhERERE0SWUhpwuz88bAPxSVV8Vke8EO8jzTL5LwoiNiIiIKKqFMiNVLiK/hrvQ/E0RGRDicUR
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary(fig, theta, Xpl)\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Inny przykład"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Wczytanie danych (gatunki kosaćców)\n",
"\n",
"data_iris = pandas.read_csv('iris.csv')\n",
"data_iris_versicolor = pandas.DataFrame()\n",
"data_iris_versicolor['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
"data_iris_versicolor['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
"data_iris_versicolor['Iris versicolor?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-versicolor' else 0)\n",
"\n",
"m, n_plus_1 = data_iris_versicolor.values.shape\n",
"n = n_plus_1 - 1\n",
"Xn = data_iris_versicolor.values[:, 0:n].reshape(m, n)\n",
"\n",
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"liczba przykładów: {0: 100, 1: 50}\n",
"prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}\n"
]
}
],
"source": [
"classes = [0, 1]\n",
"count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]\n",
"prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]\n",
"\n",
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
"print('prior probability:', {c: prior_prob[c] for c in classes})"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83DyBNhqIkNkCI4SGiQHXIRMSKBXwGLEymQEBbsWWZ0mLXiHYF4q1etbeGlXuv0+GWaila5F7FEBkCC7H4AG3h3lJJQkAiDwElJQ2IoMgkspLMOd/7xz6HOTNzztk7c/Zvn332eb/W2mtmP5zf/u7fZHG+7P3bv6+5uwAAABDOjHYHAAAAUHQkXAAAAIGRcAEAAARGwgUAABAYCRcAAEBgJFwAAACBzWp3APtr3rx5vnjx4naHAQAAMMGmTZued/f59fZ1XMK1ePFibdy4sd1hAAAATGBm2xvt45EiAABAYCRcAAAAgZFwAQAABEbCBQAAEBgJFwAAQGAkXAAAAIGRcAEAAARGwgUAABBYsITLzI40s7vN7BEz22pmg3WOOd3MfmVmWyrLZ0LFAwBAEO7SLbdEP5NsD3WOLOLAtIW8wzUm6ZPu/kZJp0i6zMyOr3PcPe7eW1k+HzAeAADSt2GDNDAgXX75eFLjHq0PDET7szhHFnFg2oKV9nH3ZyQ9U/l91MwekXSEpB+HOicAAJnr75cGB6Xh4Wh9aChKcoaHo+39/dmdI3QcmDbzDG4xmtliSf8q6UR3f6lm++mSbpa0Q9JOSX/h7lubtbVs2TKnliIAIFeqd5KqyY4UJTlDQ5JZdufIIg40ZGab3H1Z3X2hEy4zmyvpXyT9tbuPTNp3sKSyu+8ys7MkDbv7kjptrJS0UpIWLVrUt317w9qQAAC0h7s0o2akTrmcfpKT5BxZxIG6miVcQd9SNLPZiu5gfX1ysiVJ7v6Su++q/H6HpNlmNq/Ocde6+zJ3XzZ//vyQIQMAsP+qd5Zq1Y6lyuocWcSBaQn5lqJJ+oqkR9z9iw2OWVA5TmZ2ciWeF0LFBABA6mof4w0ORneUqmOp0kp2kpwjizgwbcEGzUt6u6Q/lPQjM9tS2fYpSYskyd2/LOk8SX9qZmOSXpZ0oWcxqAwAgLRs2DCe5FTHSg0NRfuGh6XTTpOWLw9/jurvIePAtGUyaD5NDJoHAOSKe5QQ9fdPHCvVaHuoc0jh40BTbR00nzYSLgAAkEdtGzQPAAAAEi4AAIDgSLgAANnrlLp/5bJ0xRXRzyTbgQZIuAAA2euUun+rV0tr10p9fePJVbkcra9dG+0HEiDhAgBkr7Y2YDXpymPdvzVrpN5eacuW8aSrry9a7+2N9gMJhJyHCwCA+ibPEVWt/Ze3un8zZkibNo0nWTNnRtt7e6PtM7hvgWSYFgIA0D6dUvevXB5PtiSpVCLZwhRMCwEAyJ9OqftXfYxYq3ZMF5AACRcAIHudUvdv8pitUmnqmC4gARIuAED2GtUGrCZdeXpLsZpsVcdsbdo0nnTxliISYgwXACB7WdQfTEO5HCVVa9ZMHWtWbzu6GrUUAQAAAmPQPAAAQBuRcAEAshdX2qdcji/9k0YbWVxLkvPkpY0iyVt/uHtHLX19fQ4A6HAjI1HKNDjoXi5H28rlaF1yX7Wq+f6RkXTayOJakpwnL20USRv6Q9JGb5C/tD2B2t+FhAsACqD2i6/6hVi7Xio1318up9NGFteS5Dx5aaNI2tAfJFwAgPyp/QKsLo3uRtTbn1YbWVxLJ7VRJBn3R7OEi7cUAQDt4zGlfeL2p9VGGtI4T17aKJIM+4O3FAEA+eMxpX3i9qfVRhrSOE9e2iiSPPVHo1tfeV14pAgABcAYrny2USSM4SLhAoCux1uK+WyjSHhLkYQLALpeuRx94U2+y1DdXio131+9w9VqG1lcS9K7U3loo0ja0B/NEi4GzQMAAKSAQfMAAABtRMIFAAAQGAkXAACNeAr1+NJoo9sUsM9IuAAAaGTDBmlgoP7cXgMD0f4s2ug2BeyzWe0OAACA3OrvlwYHpeHhaH1oKPrSHx6Otvf3Z9NGtylgn/GWIgAAzVTvrFS//KXoS39oKHmJmDTa6DYd2GfN3lIk4QIAII5T47AtOqzPmBYCAIDpqt5pqUWNw/AK1mckXAAANFL7WGtwMLrDUh1blPTLP402uk0B+4xB8wAANLJhw/iXfnXs0NBQtG94WDrtNGn58vBtdJsC9hljuAAAaMQ9+vLv7584dqjR9lBtdJsO7TMGzQMAAATGoHkAAIA2IuECAAAIjIQLAFBMSerxxR1TLrfeBvUWJ+qma61BwgUAKKYk9fjijlm9uvU2qLc4UTdday1376ilr6/PAQCIVS67Dw5G96AGB+uvxx1TKrXeRrmcTqxFUeBrlbTRG+QvbU+g9nch4QIAJFb7ZV5dJn+pxx2TRhtpxVoUBb3WZgkX00IAAIrNE9TjizsmjTbSirUoCnitTAsBAOhOnqAeX9wxabSRVqxF0U3XWtXo1ldeFx4pAgASYQxXPhX4WsUYLgBA1xkZmfolXvvlPjISf8yqVa23MTKSTqxFUeBrbZZwMYYLAFBMnqAen9T8mHPPlW69tbU2qLc4UYGvlVqKAAAAgTFoHgAAoI1IuAAAAAILlnCZ2ZFmdreZPWJmW81ssM4xZmZXm9kTZvaQmS0NFQ8AICWeQf3BJG0ge3F/t7T+LlmdJ0Mh73CNSfqku79R0imSLjOz4ycdc6akJZVlpaQvBYwHAJCGLOoPJmkD2cuqDmIR6y02en0x7UXSrZLeM2nb30u6qGb9MUmHNWuHaSEAoM2ymLsqSRvIXlZzaHXoXF1q9zxckhZL+g9JB0/afrukU2vWfyBpWbO2SLgAIAeyqD9Y0Hp7HS+rv0sH/v2bJVzBp4Uws7mS/kXSX7v7yKR935a0xt3vraz/QNIqd9806biVih45atGiRX3bt28PGjMAIAHPoP5gkjaQvaz+Lh3292/btBBmNlvSzZK+PjnZqtgh6cia9YWSdk4+yN2vdfdl7r5s/vz5YYIFACRXHU9TK+36g0naQPay+rsU7e/f6NZXq4skk3SDpL9pcszZkr5TOfYUST+Ma5dHigDQZozh6l6M4WpK7RjDJelUSS7pIUlbKstZki6VdKmPJ2XXSHpS0o8UM37LSbgAoP2yqD+YpA1kL6s6iB1ab7FZwkVpHwDA/vGYWnhp1B9M0kaOx/IUVtzfPq2/S1bnSRm1FAEAAAKjliIAAEAbkXABAAAERsIFAEiXJ6iDVy5LV1wR/azVaPt0z9NN6I9cI+ECAKQrSR281aultWulvr7x5KpcjtbXro32p3GebkJ/5Fuj1xfzujAtBADkXJI5lEol997eaFtvb/31NM7TTeiPthPTQgAAMlW9szI8PL5tcFAaGhp/nb96R2vLlvFjenulTZsmlnNp9TzdhP5oK6aFAABkzxPUwSuXpZkzx9dLpeTJ1v6cp5vQH23DtBAAgGxV77TUmlwHr3qHq1btmK60ztNN6I/cIuECAKSr9rHW4GCUQA0ORuvVL//ax4m9vdGdrd7eaD1p0pXkPN2E/si3RoO78rowaB4Aci5JHbxqrcTaAfK1A+dXrUrnPN2E/mg7MWgeAJAZT1AHzz2a+mHNmqnjjeptn+55umnsEv3RdgyaBwAACIxB8wAAAG1EwgUAGFcqScuXRz8bbS9SWZ64aymVWo8zjWvNqr/y8ncpokaDu/K6MGgeAALq748GWM+b5z42Fm0bG4vWpWh/kQa8x11LtT9aiTONa82qv/Lyd+lQajJovu0J1P4uJFwAEFBtclVNuiavF6ksT9y1jI21Hmca15pVf+Xl79KhSLgAAMnVJlnVpfaOl/vExKS6JE22qmq/zKtLO77U464ljTjz0kaezlNAzRIu3lIEAExVKkmzZo2vj41NLMEjFassT9y1pBFnXtrI03kKhrcUAQDJlUrSggUTty1YMHEgfZHK8sRdSxpx5qWNPJ2n2zS69ZXXhUeKABAQY7gYw5WHv0uHEmO4AACJ8JY
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"średnia: [matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
"odchylenie standardowe: [matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n"
]
}
],
"source": [
"XY = np.column_stack((X, Y))\n",
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
"Y_split = [XY_split[c][:,3] for c in classes]\n",
"\n",
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
"print('średnia: ', X_mean) \n",
"print('odchylenie standardowe: ', X_std)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3gU5fYH8O8kkEIJLXQI1dAlQECaEgERBAQpYsH7w4blihQVBdHrtYEIigW86BWwN6pYERW9iAUQFFBAekAghJ4AKbvn98dh2JLdZLNtUr6f55kn2ZmdmXc3y+zhfc97xhAREBEREVHhRVjdACIiIqLiioEUERERkZ8YSBERERH5iYEUERERkZ8YSBERERH5iYEUERERkZ/KWN0AZ/Hx8dKwYUOrm0FERETkYv369ekiUt19fZEKpBo2bIh169ZZ3QwiIiIiF4Zh7PW0nkN7RERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH4KWSBlGEYzwzA2Oi2nDMMYF6rzEREREYVbyG4RIyLbACQBgGEYkQAOAFgSqvMRERERhVu4hvZ6AdgpIh7vU0NERERUHIUrkLoOwHthOhcRERFRWIQ8kDIMIwrA1QA+8rJ9tGEY6wzDWHfkyJFQN4eCLGVBClIWpFjdDCKyWkqKLkSlTMhypJz0A/CriBz2tFFEXgXwKgAkJydLGNpDQVQvrp7VTSCioqAerwVUOoUjkLoeHNYrsd4e8rbVTSCiouBtXguodArp0J5hGOUAXAFgcSjPQ0RERGSFkPZIicgZANVCeQ6y1rgvtDTYrL6zLG4JEVlq3PkygbN4LaDSJRxDe1SCbTy00eomEFFRsJHXAiqdeIsYIiIiIj8xkCIiIiLyEwMpIiIiIj8xR4oCklgt0eomEFFRkMhrAZVOhkjRqYGZnJws69ats7oZRERERC4Mw1gvIsnu6zm0R0REROQnBlIUkNHLR2P08tFWN4OIrDZ6tC5EpQxzpCgg249ut7oJRFQUbOe1gEon9kgRERER+YmBFBEREZGfGEgRERER+Yk5UhSQpFpJVjeBiIqCJF4LqHRiHSkiIiKiArCOFBEREVGQMZCigIxcPBIjF4+0uhlEZLWRI3UhKmWYI0UB2X9qv9VNIKKiYD+vBVQ6sUeKiIiIyE8MpIiIiIj8xECKiIiIyE/MkaKAdKnXxeomEFFR0IXXAiqdWEeKiIiIqACsI0VEREQUZAykKCBDPxyKoR8OtboZRGS1oUN1ISplmCNFATl65qjVTSCiouAorwVUOrFHioiIiMhPDKSIiIiI/MRAioiIiMhPzJGigPRq1MvqJhBRUdCL1wIqnVhHioiIiKgArCNFREREFGQMpCgg/d7ph37v9LO6GURktX79dCEqZZgjRQE5m3PW6iYQUVFwltcCKp1C2iNlGEZlwzAWGoax1TCMPw3D4F0tiYiIqMQIdY/UCwC+EJFhhmFEASgX4vMRERERhU3IAinDMOIAXAZgFACISDaA7FCdj4iIiCjcQtkj1RjAEQDzDcNoC2A9gLEikhnCc1KYDUgcYHUTiKgoGMBrAZVOIasjZRhGMoCfAHQTkZ8Nw3gBwCkRecTteaMBjAaAhISEDnv37g1Je4iIiIj8ZUUdqf0A9ovIz+cfLwTQ3v1JIvKqiCSLSHL16tVD2BwiIiKi4ApZICUihwCkGobR7PyqXgD+CNX5yBopC1KQsiDF6mYQkdVSUnQhKmVCPWtvDIB3zs/Y2wXg5hCfj4iIiChsQhpIichGAHnGE4mIiIhKAt4ihoiIiMhPDKSIiIiI/MR77VFArm11rdVNIKKi4FpeC6h0ClkdKX8kJyfLunXrrG4GERERkQsr6khRKXAm5wzO5JyxuhlEZLUzZ3QhKmU4tEcBueqdqwAAq0atsrYhRGStq/RagFWrLG0GUbixR4qIiIjITwykiIiIiPzEQIqIiIjITwykiIiIiPzEZHMKyKikUVY3gYiKglGjrG4BkSVYR4qIiIioAKwjRSGRfiYd6WfSrW4GEVktPV0XolKGQ3sUkGEfDgPAOlJEpd4wvRawjhSVNuyRIiIiIvITAykiIiIiPzGQIiIiIvITAykiIiIiPzHZnAJyV/JdVjeBiIqCu3gtoNKJgRQFZETrEVY3gYiKghG8FlDpxKE9CkjqyVSknky1uhlEZLXUVF2IShn2SFFAblpyEwDWkSIq9W7SawHrSFFpwx4pIiIiIj8xkCIiIiLyEwMpIiIiIj8xkCIiIiLyE5PNKSD3dbnP6iYQUVFwH68FVDoxkKKADGw20OomEFFRMJDXAiqdOLRHAdmWvg3b0rdZ3Qwistq2bboQlTLskaKA3PHJHQBYR4qo1LtDrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjITyFNNjcMYw+A0wBsAHJFJDmU56Pwm3LZFKubQERFwRReC6h0CsesvctFJD0M5yEL9G7c2+omEFFR0JvXAiqdOLRHAdl4aCM2HtpodTOIyGobN+pCVMqEukdKAKwwDEMAzBWRV92fYBjGaACjASAhISHEzaFgG/fFOACsI0VU6o3TawHrSFFpE+oeqW4i0h5APwD/NAzjMvcniMirIpIsIsnVq1cPcXOIiIiIgiekgZSI/H3+ZxqAJQA6hfJ8REREROEUskDKMIzyhmFUNH8H0AfA5lCdj4iIiCjcQpkjVRPAEsMwzPO8KyJfhPB8RERERGEVskBKRHYBaBuq41PR8HSvp61uAhEVBU/zWkClUzjqSFEJ1rV+V6ubQERFQVdeC6h0Yh0pCsia1DVYk7rG6mYQkdXWrNGFqJRhjxQFZPLXkwGwjhRRqTdZrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjIT0w2p4DM6jvL6iYQUVEwi9cCKp0YSFFAkmolWd0EIioKkngtoNKJQ3sUkJW7VmLlrpVWN4OIrLZypS5EpQx7pCggT37/JACgd+PeFreEiCz1pF4L0JvXAipd2CNFRERE5CcGUkRERER+YiBFRERE5CcGUkRERER+YrI5BWTugLlWN4GIioK5vBZQ6cRAigLSLL6Z1U0goqKgGa8FVDpxaI8CsnzbcizfttzqZhCR1ZYv14WolGGPFAVk5o8zAQADmw20uCVEZKmZei3AQF4LqHRhjxQRERGRnxhIEREREfmJQ3shlJaZhlk/zcLhjMNWNyVktqVvAwDcuuxWi1sSOmUjy2JU0ih0rtfZ6qYQEVERw0DKg5/3/4ynVz+No2eOBnScTWmbkJmdidoVawepZUXPsXPHAAArdq2wuCWhc/LcScxdPxfJdZIRHRnt93HKRJTBDW1uwK3tbkVkRGQQW0hERFYxRMS3JxpGDQAx5mMR2RfsxiQnJ8u6deu8bs/MzsS53HMFHkcgWLp1KZ754RmcOHeiUG0QERw9exQ1y9dE6xqtC7Wvu9oVa2PKpVNKdImA1JOpAID6lepb3JLQycjOwIw1M7B63+qAjpOWmYZNaZsQFx2HqMgon/erF1cPj6c8jq71u/q8T4QRgcoxlWEYhj9NJSq8VL0WoH7JvRZQ6WYYxnoRSc6zvqBAyjCMqwHMBFAHQBqABgD+FJFWwW5k84uby7xP5+VZLyL4cMuHmLNuDnLtuT4fr3O9zmhfq32h25FQKQF3d7wbFaMrFnpfIm9EBEu2LsHXu772fR8IVu5aib+O/VXo813W4DJM6j4JcdFx+T6vRvkaaFq1aaGPT0RUmgQSSP0GoCeAlSLSzjCMywFcLyKjg97IOobgDs/bIowI3Jx0M9rWbOvTsRpXaYyrLrqK/yMPsQ82fwAAGNF6hMUtKbmybdlY+MfCQg01nzh3Ai/98hKOnDni0/NvaHMDejfqne9zDMNASsMUNKzc0Od2UCnygV4LMILXAiqZAgmk1olI8vmAqp2I2A3D+EVEOgW7kYltEuXlpS973NakShM0qdok2KekAKUsSAEArBq1ytJ2UF6nsk7
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"draw_means(fig, X_mean)\n",
"plot_prob(fig, X_mean, X_std, classes)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5QcdZn/8c8zt0wyM2EyJGRIAkkgEETWhDAiArILKCK4SQwIuuqKi7K7hx+JoIBR/K2iLi7uEnFl3eUXFVRW0Vy4CKuyK4quIiYYESFAhEQw9/vkMpfufn5/VDfTM+lbqqf6Nu/XOX0mVdVV9e1KDvPhW08/Ze4uAAAAHL66cg8AAACgWhGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAIKSGcg8g3fjx433atGnlHgYAAMAgq1ev3u7uE4aur6ggNW3aNK1atarcwwAAABjEzDZkWs+tPQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQUqRByszazWyZma01s2fN7I1Rng8AAKCUGiI+/u2SfuDul5pZk6QxEZ8PAACgZCILUmY2VtI5kq6QJHfvk9QX1fkAAABKLcpbe8dJ2ibp62b2GzNbamYtQ99kZleZ2SozW7Vt27YIhwMAADC8ogxSDZLmSPqKu58qab+kjw19k7vf6e5d7t41YcKECIcDAAAwvKIMUq9IesXdf5VcXqYgWAEAANSEyIKUu2+W9LKZzUyuOl/SM1GdDwAAoNSi/tbeNZLuSX5j70VJH4j4fAAAACUTaZBy9zWSuqI8BwAAQLnQ2RwAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkBqiPLiZrZfULSkuKebuXVGeDwAAoJQiDVJJ57r79hKcBwAAoKS4tQcAABBS1EHKJf3IzFab2VURnwsAAKCkor61d5a7bzSzoyQ9YmZr3f2x9DckA9ZVknTsscdGPBwAAIDhE+mMlLtvTP7cKmmlpNMzvOdOd+9y964JEyZEORwAAIBhFVmQMrMWM2tL/VnSBZKejup8AAAApRblrb2JklaaWeo8/+nuP4jwfAAAACUVWZBy9xclzYrq+AAAAOVG+wMAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQmoo9I1mdpSk5tSyu/8xkhEBAABUibwzUmY218xekPSSpJ9KWi/pvwo9gZnVm9lvzOz7oUcJAABQgQq5tfcZSWdIet7dp0s6X9L/HsY5Fkl6NsTYAAAAKlohQarf3XdIqjOzOnd/VNLsQg5uZlMkXSxpaRFjBAAAqEiF1EjtNrNWSY9JusfMtkqKFXj8L0q6QVJbyPEBAABUrEJmpOZJOijpWkk/kPQHSW/Pt5OZvV3SVndfned9V5nZKjNbtW3btgKGAwAAUBkKCVLvcve4u8fc/W53/5KCWaZ8zpI018zWS/qOpPPM7FtD3+Tud7p7l7t3TZgw4bAGDwAAUE6FBKlLzew9qQUzu0NS3sTj7ovdfYq7T5P0Lkk/dvf3hh4pAABAhSmkRmqBpAfMLCHpbZJ2uvvV0Q4LAACg8mUNUmbWkbb4QUn3KWh7cLOZdbj7zkJP4u4/kfSTkGMEAACoSLlmpFZLckmW9vPi5MslHRf56AAAACpY1iCVbL4pM2t29570bWbWnHkvAACAkaOQYvNfFLgOAABgRMlVI9UpabKk0WZ2qoJbe5I0VtKYEowNAACgouWqkXqrpCskTZF0W9r6bkkfj3BMAAAAVSFXjdTdku42s0vcfXkJxwQAAFAV8vaRcvflZnaxpNdKak5bf3OUAwMAAKh0eYvNzezfJV0u6RoFdVLvlDQ14nEBAABUvEK+tXemu/+1pF3u/mlJb5R0TLTDAgAAqHyFBKmDyZ8HzGySpH5J06MbEgAAQHUo5Fl73zezdklfkPSkgq7mSyMdFQAAQBUoJEh9XlJrsuj8+5Ka3X1PxOMCAACoeLkaci5I/nGKpAVm9qW0bXL3FVEPDgAAoJLlmpH6y7Q/b5b0JUmPJJddEkEKAACMaLkacn4gfdnMLnX3ZdEPCQAAoDoU0kfqyORtvU+Y2Wozu93MjizB2AAAACpaIe0PviNpm6QFki5N/vneKAcFAABQDQr51l6Hu38mbfmzZjY/qgEBAABUi0JmpB41s3eZWV3ydZmkh6IeGAAAQKUrJEj9raT/lNSbfH1H0nVm1m1me6McHAAAQCXLe2vP3dtKMRAAAIBqU8iMFAAAADIgSAEAAIREkAIAAAiJIAUAABBSqCBlZt8f7oEAAABUm5xByszqzewLGTZ9KKLxAAAAVI2cQcrd45JOMzMbsn5TpKMCAACoAoU8IuY3ku43s+9J2p9a6e4rIhsVAABAFSjoWXuSdkg6L22dSyJIAQCAEa2QzuYfKMVAAAAAqk3eb+2Z2Ylm9j9m9nRy+XVmdlP0QwMAAKhshbQ/+H+SFkvqlyR3f0rSu6IcFAAAQDUoJEiNcfcnhqyLRTEYAACAalJIkNpuZscrKDCXmV0qifYHAABgxCvkW3tXS7pT0klm9idJL0l6b6SjAgAAqAKFfGvvRUlvNrMWSXXu3l3Igc2sWdJjkkYlz7PM3f+hmMECAABUkkK+tRc3s89LOpAKUWb2ZAHH7pV0nrvPkjRb0oVmdkZRowUAAKgghdRI/T75vh+ZWUdyneV4vyTJA/uSi43Jl4caJQAAQAUqJEjF3P0GBW0QfmZmp6nAQJR86PEaSVslPeLuv8rwnqvMbJWZrdq2bdvhjB0AAKCsCglSJknu/l1Jl0n6uqTjCjm4u8fdfbakKZJON7NTMrznTnfvcveuCRMmFD5yAACAMiskSH0w9Qd3/72ksyUtPJyTuPtuST+RdOHh7AcAAFDJCglSx5lZmyQlHw1zl6Sn8+1kZhPMrD3559GS3ixpbfihAgAAVJZCgtQn3b3bzM6W9FZJd0v6SgH7HS3pUTN7StKvFdRIfT/8UAEAACpLIQ0548mfF0v6irvfb2afyrdT8pl8pxYxNgAAgIpWyIzUn8zsPxQUmj9sZqMK3A8AAKCmFRKILpP0Q0kXJovGOyRdH+moAAAAqkAhj4g5IGlF2vIm8dBiAAAAbtEBAACERZACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgCgnNyllSuDn4WsH+7jJxLRnr/GEaQAACin++6TFiyQrr12ILS4B8sLFgTbozz+4sXRnr/G5X3WHgAAiND8+dKiRdLttwfLS5YEIeb224P18+dHe/xbbpF
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Przygotowanie danych dla wielomianowej regresji logistycznej\n",
"\n",
"data = np.matrix(data_iris_versicolor)\n",
"\n",
"Xpl = powerme(data[:, 1], data[:, 0], n)\n",
"Ypl = np.matrix(data[:, 2]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"theta = [[-10.68923095]\n",
" [ 5.52649967]\n",
" [ 5.83316957]\n",
" [ -0.60707243]\n",
" [ -0.46353729]\n",
" [ -2.82974456]]\n"
]
}
],
"source": [
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
"theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)\n",
"theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, \n",
" alpha=0.05, eps=10**-7, maxSteps=100000)\n",
"print(r'theta = {}'.format(theta))"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de3wU9bk/8M+zu7lfSEJCAgREQEBQRBOvYK3VqkUFRFusvfrT2lNtRbAK6PH0Yisc2wOlp7TnWGqt1lOpXG2lVmtB1CLIXbnfBRMgF0Lu2ezu8/tjdskm2Ruzmexu8nm/XvPazMzOzHe3Nvkw88wzoqogIiIionNni/UAiIiIiBIVgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJjlgPwF9+fr4OGTIk1sMgIiIiamfz5s2VqlrQcXlcBakhQ4Zg06ZNsR4GERERUTsicjTQcl7aIyIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJEuDlIjkiMhSEdkjIrtF5Gorj0dERETUnRwW738hgDdU9S4RSQaQbvHxiIiIiLqNZUFKRLIBfAbANwFAVZ0AnFYdj4iIiKi7WXlpbyiACgC/F5GtIrJYRDI6vklEHhCRTSKyqaKiwsLhEBEREXUtK4OUA8BlAH6jqpcCaAAwu+ObVPU5VS1V1dKCggILh0NERETUtawMUscBHFfVDd75pTCCFREREVGPYFmQUtUTAI6JyEjvohsA7LLqeERERETdzeq79r4H4GXvHXuHANxr8fGIiIiIuo2lQUpVtwEotfIYRERERLHCzuZEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJDFJEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJDFJEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZJLDyp2LyBEAdQDcAFyqWmrl8YiIiIi6k6VByut6Va3shuMQERERdSte2iMiIiIyyeogpQDeFJHNIvKAxcciIiIi6lZWX9obr6plItIPwFsiskdV1/m/wRuwHgCAwYMHWzwcIiIioq5j6RkpVS3zvp4CsALAFQHe85yqlqpqaUFBgZXDISIiIupSlgUpEckQkSzfzwBuAvCxVccjIiIi6m5WXtorBLBCRHzH+T9VfcPC4xERERF1K8uClKoeAnCJVfsnIiIiijW2PyAiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxyRvlFE+gFI9c2r6ieWjIiIiIgoQYQ9IyUik0RkP4DDAN4BcATA3yI9gIjYRWSriPzV9CiJiIiI4lAkl/aeBnAVgH2qej6AGwC8fw7HmA5gt4mxEREREcW1SIJUq6pWAbCJiE1V1wAYF8nORaQYwK0AFkcxRiIiIqK4FEmNVI2IZAJYB+BlETkFwBXh/n8B4HEAWSbHR0RERBS3IjkjNRlAE4AZAN4AcBDAbeE2EpHbAJxS1c1h3veAiGwSkU0VFRURDIeIiIgoPkQSpO5WVbequlT1D6r6SxhnmcIZD2CSiBwB8AqAz4nIHzu+SVWfU9VSVS0tKCg4p8ETERERxVIkQeouEfmKb0ZEFgEIm3hUdY6qFqvqEAB3A/inqn7V9EiJiIiI4kwkNVJTAbwmIh4AXwBQraoPWTssIiIiovgXNEiJSJ7f7P0AVsJoe/BjEclT1epID6KqawGsNTlGIiIiorgU6ozUZgAKQPxeb/VOCmCo5aMjIiIiimNBg5S3+SZEJFVVm/3XiUhq4K2IiIiIeo9Iis3/FeEyIiIiol4lVI1UEYCBANJE5FIYl/YAIBtAejeMjYiIiCiuhaqRuhnANwEUA5jvt7wOwBMWjomIiIgoIYSqkfoDgD+IyJ2quqwbx0RERESUEML2kVLVZSJyK4AxAFL9lv/YyoERERERxbuwxeYi8j8ApgH4How6qS8COM/icRERERHFvUju2rtGVb8O4LSq/gjA1QAGWTssIiIiovgXSZBq8r42isgAAK0AzrduSERERESJIZJn7f1VRHIA/AzAFhhdzRdbOioiIiKiBBBJkJoHINNbdP5XAKmqesbicRERERHFvVANOad6fywGMFVEfum3Dqq63OrBEREREcWzUGekbvf7+QSAXwJ4yzuvABikiIiIqFcL1ZDzXv95EblLVZdaPyQiIiKixBBJH6m+3st6T4rIZhFZKCJ9u2FsRERERHEtkvYHrwCoADAVwF3en5dYOSgiIiKiRBDJXXt5qvq03/xPRGSKVQMiIiIiShSRnJFaIyJ3i4jNO30JwOtWD4yIiIgo3kUSpL4N4P8AtHinVwDMFJE6Eam1cnBERERE8SzspT1VzeqOgRARERElmkjOSBERERFRAAxSRERERCYxSBERERGZxCBFREREZJKpICUif+3qgRARERElmpBBSkTsIvKzAKu+ZdF4iIiIiBJGyCClqm4AJSIiHZaXWzoqIiIiogQQySNitgJYJSKvAmjwLVTV5ZaNioiIiCgBRPSsPQBVAD7nt0wBMEgRERFRrxZJZ/N7u2MgRERERIkm7F17IjJCRN4WkY+982NF5N+tHxoRERFRfIuk/cFvAcwB0AoAqroDwN1WDoqIiIgoEUQSpNJVdWOHZS4rBkNERESUSCIJUpUiMgxGgTlE5C4AbH9AREREvV4kd+09BOA5AKNE5FMAhwF81dJRERERESWASO7aOwTgRhHJAGBT1bpIdiwiqQDWAUjxHmepqv4gmsESERERxZNI7tpzi8g8AI2+ECUiWyLYdwuAz6nqJQDGAbhFRK6KarREREREcSSSGqmd3ve9KSJ53mUS4v0AADXUe2eTvJOaGiURERFRHIokSLlU9XEYbRDeFZESRBiIvA893gbgFIC3VHVDgPc8ICKbRGRTRUXFuYydiIiIKKYiCVICAKr6ZwBfAvB7AEMj2bmqulV1HIBiAFeIyEUB3vOcqpaqamlBQUHkIyciIiKKsUiC1P2+H1R1J4AJAB4+l4Ooag2AtQBuOZftiIiIiOJZJEFqqIhkAYD30TAvAPg43EYiUiAiOd6f0wDcCGCP+aESERERxZdIgtRTqlonIhMA3AzgDwB+E8F2/QGsEZEdAD6EUSP1V/NDJSIiIoovkTTkdHtfbwXwG1VdJSI/DLeR95l8l0YxNiIiIqK4FskZqU9F5H9hFJqvFpGUCLcjIiIi6tEiCURfAvB3ALd
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary(fig, theta, Xpl)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n",
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZwU9Zk/8M+3u+fsuZlhZpjhEBAQFNBBRPGI0VVDjOAVzLFJXF2zq4koRhETf7lWcU0iIbuaXYMaNW7UcGkiGk0yiAeIDKccww0DczD3PdNHPb8/qpvpOfqwumu6e+bzfr3q1VNVXVVPlzj9zLeeekqJCIiIiIjo87NEOwAiIiKieMVEioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAyyRTsAX7m5uTJu3Lhoh0FERETUS1lZWZ2I5PVdHlOJ1Lhx47B169Zoh0FERETUi1Lq+EDLeWmPiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZJCpiZRSKksptUoptV8ptU8pdbGZxyMiIiIaTDaT978CwDsicotSKhFAqsnHIyIiIho0piVSSqkMAJcD+A4AiIgDgMOs4xERERENNjMv7Y0HUAvgBaXUdqXUSqWUve+blFJ3KaW2KqW21tbWmhgOERERUWSZmUjZAFwA4Lcicj6AdgAP932TiDwrIrNEZFZeXp6J4RARERFFlpmJ1EkAJ0XkE8/8KuiJFREREdGQYFoiJSLVACqUUpM9i64CsNes4xERERENNrPv2vs+gFc8d+wdAXC7yccjIiIiGjSmJlIisgPALDOPQURERBQt7GxOREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZZDNz50qpYwBaAbgBuERklpnHIyIiIhpMpiZSHleKSN0gHIeIiIhoUPHSHhEREZFBZidSAuBdpVSZUuouk49FRERENKjMvrQ3V0QqlVIjAbynlNovIht93+BJsO4CgDFjxpgcDhEREVHkmDoiJSKVntfTANYCmD3Ae54VkVkiMisvL8/McIiIiIgiyrRESillV0qle38GcA2Az8w6HhEREdFgM/PSXj6AtUop73H+T0TeMfF4RERERIPKtERKRI4AmGHW/omIiIiije0PiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig2yhvlEpNRJAsndeRE6YEhERERFRnAg6IqWUukEpdRDAUQDvAzgG4O1QD6CUsiqltiul/mI4SiIiIqIYFMqlvZ8DmAPggIicBeAqAB99jmMsArDPQGxEREREMS2URMopIvUALEopi4iUApgZys6VUsUAvgxgZRgxEhEREcWkUGqkmpRSaQA2AnhFKXUagCvE/f8awEMA0g3GR0RERBSzQhmRmg+gE8D9AN4BcBjA9cE2UkpdD+C0iJQFed9dSqmtSqmttbW1IYRDREREFBtCSaRuExG3iLhE5EUR+Q30UaZg5gK4QSl1DMCrAL6olPpD3zeJyLMiMktEZuXl5X2u4ImIiIiiKZRE6hal1De8M0qppwEEzXhEZKmIFIvIOAC3AfiHiHzTcKREREREMSaUGqmbALyplNIAfAlAg4jcY25YRERERLHPbyKllMrxmb0TwDrobQ9+ppTKEZGGUA8iIhsAbDAYIxEREVFMCjQiVQZAACif1y97JgEw3vToiIiIiGKY30TK03wTSqlkEenyXaeUSh54KyIiIqLhI5Ri849DXEZEREQ0rASqkSoAUAQgRSl1PvRLewCQASB1EGIjIiIiimmBaqSuBfAdAMUAnvJZ3grgERNjIiIiIooLgWqkXgTwolLqZhFZPYgxEREREcWFoH2kRGS1UurLAKYBSPZZ/jMzAyMiIiKKdUGLzZVS/wNgIYDvQ6+TuhXAWJPjIiIiIop5ody1d4mIfAtAo4j8FMDFAEabGxYRERFR7Aslker0vHYopUYBcAI4y7yQiIiIiOJDKM/a+4tSKgvALwBsg97VfKWpURERERHFgVASqScApHmKzv8CIFlEmk2Oi4iIiCjmBWrIeZPnx2IANymlfuOzDiKyxuzgiIiIiGJZoBGpr/j8XA3gNwDe88wLACZSRERENKwFash5u++8UuoWEVllfkhERERE8SGUPlIjPJf1fqiUKlNKrVBKjRiE2IiIiIhiWijtD14FUAvgJgC3eH5+zcygiIiIiOJBKHft5YjIz33m/0MptcCsgIiIiIjiRSgjUqVKqduUUhbP9FUAb5kdGBEREVGsCyWR+i6A/wPQ7ZleBbBYKdWqlGoxMzgiIiKiWBb00p6IpA9GIERERETxJpQRKSIiIiIaABMpIiIiIoOYSBEREREZxESKiIiIyCBDiZRS6i+RDoSIiIgo3gRMpJRSVqXULwZY9a8mxUNEREQUNwImUiLiBlCilFJ9lleZGhURERFRHAjlETHbAbyhlPoTgHbvQhFZY1pURERERHEgpGftAagH8EWfZQKAiRQRERENa6F0Nr99MAIhIiIiijdB79pTSk1SSv1dKfWZZ366UupH5odGREREFNtCaX/wOwBLATgBQER2AbjNzKCIiIiI4kEoiVSqiGzps8xlRjBERERE8SSURKpOKTUBeoE5lFK3AGD7AyIiIhr2Qrlr7x4AzwKYopQ6BeAogG+aGhURERFRHAjlrr0jAK5WStkBWESkNZQdK6WSAWwEkOQ5zioR+XE4wRIRERHFklDu2nMrpZ4A0OFNopRS20LYdzeAL4rIDAAzAVynlJoTVrREREREMSSUGqk9nve9q5TK8SxTAd4PABBdm2c2wTOJoSiJiIiIYlAoiZRLRB6C3gbhA6VUCUJMiDwPPd4B4DSA90TkkwHec5dSaqtSamttbe3niZ2IiIgoqkJJpBQAiMjrAL4K4AUA40PZuYi4RWQmgGIAs5VS5w7wnmdFZJaIzMrLyws9ciIiIqIoCyWRutP7g4jsAXApgHs/z0FEpAnABgDXfZ7tiIiIiGJZKInUeKVUOgB4Hg3zewCfBdtIKZWnlMry/JwC4GoA+42HSkRERBRbQkmkHhWRVqXUpQCuBfAigN+GsF0hgFKl1C4An0KvkfqL8VCJiIiIYksoDTndntcvA/itiLyhlPp
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_decision_boundary(fig, theta, Xpl)\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Kiedy naiwny Bayes nie działa?"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Wczytanie danych\n",
"import pandas\n",
"import numpy as np\n",
"\n",
"alldata = pandas.read_csv('bayes_nasty.tsv', sep='\\t')\n",
"data = np.matrix(alldata)\n",
"\n",
"m, n_plus_1 = data.shape\n",
"n = n_plus_1 - 1\n",
"Xn = data[:, 1:]\n",
"\n",
"Xbn = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
"Xbnp = powerme(data[:, 1], data[:, 2], n)\n",
"Ybn = np.matrix(data[:, 0]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df5Dcd33n+dd7zIyvaPUlSBgwA3MmO1OptbSJIukcEk+tYPklJkU89oYaOB1ha5Xz+S4kElKCRXGXdS0LZr21EnItm8TMUudcaaGpYyS8MLER3iR4lkoWSWVAKq+ZgcPCGR92JDa0OnczA/2+P779lb7q6Z7pnvn291c/H1Vd0/39frvn8+2enn735/P+vD/m7gIAAEB2DaTdAAAAAKyNgA0AACDjCNgAAAAyjoANAAAg4wjYAAAAMo6ADQAAIONelnYD0vDKV77Sb7vttrSbAQAAcINz5879jbvf0ry9LwO22267TWfPnk27GQAAADcws+dabWdIFAAAIOMI2AAAADKOgA0AACDjCNgAAAAyjoANAAAg4wjYAAAAMo6ADQAAIOMI2AAAADIu9YDNzD5jZi+a2YU2+83MHjazBTP7lpntiuzbZ2bPNvYdTa7VADrmLp06FfzsZHuexH1uRX6uAGxK6gGbpP9D0r419r9T0ljjcq+kP5QkM7tJ0qca+2+X9F4zu72nLQXQvdOnpXvukT74wesBh3tw+557gv15Ffe5Ffm5ArApqQds7v41SVfWOOQuSX/igb+U9LNmdqukOyQtuPv33H1Z0ucaxwLIkslJ6eBB6cSJ64HIBz8Y3D54MNifV3GfW5GfKwCbkoe1RIcl/SBy+/nGtlbbf7ndg5jZvQp66DQyMhJ/KwG0ZiYdPx5cP3EiuEhBAHL8eLA/r+I+tyI/VwA2xTwDORFmdpukL7n7jhb7vizpQXefa9x+UtKHJP2cpHe4+281tr9P0h3u/jvr/b49e/Y4i78DCXOXBiKd+vV6cQKQuM+tyM8VgDWZ2Tl339O8PfUh0Q48L+n1kduvk7S4xnYAWRMO7UVF87TyLO5zK/JzBWDD8hCwPSbpNxuzRd8o6W/d/QVJ35A0ZmZvMLMhSe9pHAsgS5rzsOr11XlaeRX3uRX5uQJ6oLpU1fT5ad1/5n5Nn59WdamadpN6x91TvUj6rKQXJK0o6DU7IOk+Sfc19puC2aDflfRtSXsi952Q9J3Gvo90+jt3797tABIyM+MuuR886F6vB9vq9eC2FOzPq7jPrcjPFRCzp557yssfL3vpYyXXA/LSx0pe/njZn3ruqbSbtimSznqL2CUTOWxJI4cNSJB7UI5icvLGPKx22/Mk7nMr8nMFxKi6VNXwsWFVl1f3qJWHylo8sqgtQ1tSaNnm5TmHDUCemUl337060Gi3PU/iPrciP1dAjCoXK6p7veW+utdVuVBJuEW9R8AGAAByZf7yvGortZb7ais1LVxZSLhFvZeHOmwAAKDHqktVVS5WNH95XmPbxjS1fUrlm8tpN6ulsW1jKg2WWgZtpcGSRreOptCq3qKHDQCwNtY4Lby5S3MaPjasQ48f0kNff0iHHj+k4WPDmrs0l3bTWpraPqUBax3CDNiApnZMJdyi3iNgAwCsjTVOC626VNXEyQlVl6vXeqxqKzVVl4PtV5evdvVYSZTZKN9c1uz+WZWHyioNliQFPWvloWB7XiccrIUhUQCrMVsRUdE1TqVgmSzWOC2MThL4D+w6sO7jzF2a08TJCdW9rtpKTaXBkg4/cViz+2c1PjIed7M1PjKuxSOLqlyoaOHKgka3jmpqx1QhgzWJgA1AK2GPSnQNy2hR15mZYNYi+gNrnF6TpzyvTsWRwB/tpYveV5ImTk70rMzGlqEtHQWTRcCQKIDVoj0q4TAYPSr9LRq0hfosWMtbnlenwgT+VjpN4O/HMhtJI2ADsFr44RwGbQMD14O1PvuQRkOfr3EaZ55X1sSRwN+PZTaSRsAGoDV6VBBijdNC9yDFkcAfRy8d1kYOG4DW2vWoELT1n9OnV/ewRnPa9u4tfE5j0XuQNpvAP7V9SoefONxyX1HLbCSNgA3Aas09KtFZgRJBW7+ZnAwmmkRnB4dB2969fZHT2A+FWjeTwB/20jXPEh2wgcKW2Ugai78DWO3UKWaJAhFFXmw8TleXr/ZNmY1eabf4OwEbgNWowwas0qrOWNiD1Is6Y+hPBGwRBGwAgI2gBwm91i5gI4cNAIAO9VOhVmQLZT0AAAAyjoAN6GfuwQSD5tSIdtsBAKkgYAP6WbhmaLT4aTgb9J57gv0AgNSRwwb0s+iaodKN9dZYMxQAMoOADehnzRXrw8CNNUMBIFMo6wEgGAYdiGRI1OsEawCQgnZlPchhQ7xIYs+fdmuG8loBQGYQsCFeJLFny3oBdL1+Y85avX49p42gDQAygxw2xIsk9mwJA+h2a4J+6EM3LvDenNO2dy9rhgJIVXWpqsrFiuYvz2ts25imtk+pfHM57WYljhw2xC8aEIT6IYk9i+tvRl+L8DWI3j52TPriF7PVZgBo6Mf1W1lLNIKALQH9mMR+6tTavVkzM+n0VvVrAA0g16pLVQ0fG1Z1ubpqX3morMUji4VcxzXTkw7MbJ+ZPWtmC2Z2tMX+3zezpxuXC2b2UzPb2tj3fTP7dmMfUVgW9GsSe3Q4ODzfLAwHR4c5QwRrADKucrGiutdb7qt7XZULlYRblK7UAzYzu0nSpyS9U9Ltkt5rZrdHj3H3f+XuO919p6QPS/oLd78SOeTNjf2rIlIkrDlI6ack9jAwCs93YGB1flga+jWABpBr85fnVVuptdxXW6lp4cpCV49XXapq+vy07j9zv6bPT6u6tLrnLstSD9gk3SFpwd2/5+7Lkj4n6a41jn+vpM8m0jJ07/Tp1knsYRCTl1minZQnaXWMWZAXFpWFYK0fA2gAuTa2bUylwVLLfaXBkka3jnb8WHOX5jR8bFiHHj+kh77+kA49fkjDx4Y1d2kurub2XBYCtmFJP4jcfr6xbRUze7mkfZK+ENnskr5iZufM7N6etRKdmZwMcrWiQUoYtM3M5GeWaCflSVodU69Lu3ff+FhpBkZFCaAB9J2p7VMasNZhyoANaGrHVEePU12qauLkhKrL1Ws9drWVmqrLwfary1dja3MvZSFga9X10O7T7V2S/lPTcOid7r5LwZDqb5vZP2z5S8zuNbOzZnb2pZde2lyL0Z5ZkFjf3KPUbntWdZKP1nxMGKw9/bS0c6f005+m35tVlAA6LRSCBlJTvrms2f2zKg+Vr/W0lQZLKg8F2zudcFCUXLgs1GF7XtLrI7dfJ2mxzbHvUdNwqLsvNn6+aGanFAyxfq35ju7+iKRHpGCW6OabjULrdI3NVsfs3CmdOxfksKVd0ywMlDvdjhutV8eu1czfLJZ3AXJqfGRci0cWVblQ0cKVBY1uHdXUjqmuZofGnQuXliz0sH1D0piZvcHMhhQEZY81H2RmPyNpr6QvRraVzKwcXpf0dkkXEmk1iq+T2ZWtjgmDteh+erPyaSMzf7O02gc9hCiALUNbdGDXAT341gd1YNeBrkt5xJkLl6bUAzZ3/4mkD0h6QtIzkj7v7hfN7D4zuy9y6N2SvuLu0TD51ZLmzOybkv6zpC+7++NJtR0F18nsylbHHD68eiJCnoaDcd1GZv5mqbxLjMFj3mfYoX/FlQuXOnfvu8vu3bsdWFO97n7wYDAX9ODB1rc7OQbFUK+H84KDy3qvbfRvIbyk8TcR09/oU8895eWPl730sZLrAXnpYyUvf7zsTz331Pq/f2Zm9e9ptx3okQ3/DadA0llvEbukHjylcSFgw7pmZlZ/qEU/7GZmOjsG+bfR4KvbIK9XNhk8/vj/+7GXP152PaBVl/LHy15dqra/M+8RZEh1qerT56b96JmjPn1ueu2/3RQRsBGwoRud9AzQe1B8G+2hykoPW7Q9GwweP33u09d6JZovpY+VfPrc9Nq/l15ooCvtArbUc9iATOqkPElRSpigvY3UsfOMFSsO2xPVRTs2NcMuq6t/ADlEwAb
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"liczba przykładów: {0: 69, 1: 30}\n",
"prior probability: {0: 0.696969696969697, 1: 0.30303030303030304}\n"
]
}
],
"source": [
"classes = [0, 1]\n",
"count = [sum(1 if y == c else 0 for y in Ybn.T.tolist()[0]) for c in classes]\n",
"prior_prob = [float(count[c]) / float(Ybn.shape[0]) for c in classes]\n",
"\n",
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
"print('prior probability:', {c: prior_prob[c] for c in classes})"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"średnia: [matrix([[1. , 0.03949835, 0.02825019]]), matrix([[1. , 0.09929617, 0.06206227]])]\n",
"odchylenie standardowe: [matrix([[0. , 0.52318432, 0.60106092]]), matrix([[0. , 0.61370281, 0.6081128 ]])]\n"
]
}
],
"source": [
"XY = np.column_stack((Xbn, Ybn))\n",
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
"Y_split = [XY_split[c][:,3] for c in classes]\n",
"\n",
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
"print('średnia: ', X_mean) \n",
"print('odchylenie standardowe: ', X_std)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd1gUVxfG3116RxFREVvEisbeCxoTezexJ+bTRE1M1GiixkSxIPZuYhSVaFSwt9iwd41dERUQEASk97JlzvfHlaUIirC7swv39zz3GXbn3pkzs8PMO+eee66EiMDhcDgcDofD0V2kYhvA4XA4HA6Hw3k3XLBxOBwOh8Ph6DhcsHE4HA6Hw+HoOFywcTgcDofD4eg4XLBxOBwOh8Ph6DhcsHE4HA6Hw+HoOIZiGyAGFSpUoBo1aohtBofD4XA4HE4e7ty5E0tE9vm/L5OCrUaNGrh9+7bYZnA4HA6Hw+HkQSKRhBb0Pe8S5XA4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcbhg43A4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcbhg43A4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcUQXbBKJZKtEIomWSCSPC1kvkUgkayUSSaBEInkokUia5VrXQyKRPHuzbqb2rOZwOKUCIuDgQbYsyvccDocjEqILNgBeAHq8Y31PAM5vyrcA/gQAiURiAGDDm/UNAAyXSCQNNGoph1OWKAti5tAhYNAgYOrUnOMhYp8HDWLrP5SycN44HI7WEV2wEdElAPHvqNIfwHZi3ABgK5FIKgNoBSCQiF4QkQyA95u6HA5HHWhCzOgaAwYAkycDa9bkHOfUqezz5Mls/YdSFs4bh8PROvow+bsjgLBcn8PffFfQ9621aBeHU7rJLWYAYNWqkosZXUMiYccFsOPKPtbJk9n3EsmHb7MsnDcOh6N19EGwFXTHpHd8X/BGJJJvwbpUUa1aNfVYxuGUZjQhZnSR7OPMPj7g3cfn6sqWFy68e3tAgefN9e8urPmYQtpzOBxOAYjeJVoEwgE45fpcFUDEO74vECLaREQtiKiFvb29RgzlcEoducVHNqVJrAE53ZW5yd2dmZ+qVVl5F+84b1Wtq6Kq9XvaczgcTj70QbAdAfDlm9GibQAkEVEkgP8AOEskkpoSicQYwLA3dTkccYmKYl6V1q2BmzdVX0emRGLL3S0Y5DMIow+OFtHAD+BDxYy+kT9mTRDejmnLzz//sFKU7ebmzfb+GfQP/hn0nvYccTh5EujUCdi4EYiN1cguzr44i2Z/NcOc83NwM/wmBBI0sh9O6UN0wSaRSHYDuA6grkQiCZdIJGMlEskEiUQy4U2V4wBeAAgEsBnAdwBARAoAkwCcAuAPYA8R+Wn9ADgcAMjMBHx8gJ49AUdHYMoUkFyGx68fYeGlhWixqQWqrKyCcUfH4XbEbThYOIht8fspjpjRNw4dyjm+bM/hqlU5x1ncUaKl/byVVrKymFCbOBGoXBno35+N7JXJ1LobC2MLuF92R5stbVBpeSV8ffhrHH56GBnyDLXupzSQkpUCz7uemOE7A553PZGSlSK2SaIhoTJ482jRogXdvn1bbDM4+g4R86B5eQHe3kBSEpTVquL6l11xqIEUh2IuIyghCADQpmob9KvTD33q9IFLRRdI9KFL8eBBNqoxt5jJLUYOHAAGDhTbypJBxETZgAF5u3kL+x4Apkxhy9WrC97me87blPW9gY9qY3WPQtpzxIUIePgQ2LmTeVIjIwE7O2DECGDMGKBpU7WEBMSlx+FU0Ckce34MxwOOIykrCeZG5uhRuwcG1B2A3nV6o7xZ+ZIfjx5z5eUV9NrZCwIJSJOnwcLIAlKJFMdHHkeHah3ENk9jSCSSO0TU4q3vuWDjcD6QmBhg+3bA0xN4+hQKCzNcHNke+5ub40DyTbxOew1jA2N8UvMTDKg3AH3r9EVlq8piW/3hFEfMlAXeN+jgPefNNYkNQuCDDvQAhQLw9WUvZYcOMU9b48bAuHHAqFFAuXJq2Y1cKcfF0Is46H8Qh54dQkRKBAylhuhasysG1x+MAfUGoKJFRbXsS19IyUqB40pHpMje9qhZGVshYloELI0tRbBM83DBlgsu2DgfjFIJnDnDRNrhw1Ao5TjXuwH2dLLDIaUf4jLjYW5kjl7OvTC4/mD0cu4FaxNrsa3maIL3Cbb3Nfdi7blg0zMSEpgnfcsW4M4dwNQUGDKEibdOndT28iKQgDsRd7Dffz/2PdmHoIQgSCVSdK7eGYPrD8aQBkPgYKkHIRUlxPOuJ6acnII0edpb6yyMLLCmxxqMbTZWBMs0T2GCTR/SenA44hEdDWzdCmzcCOFlKK64WMN7en3stQpDrOwJrORW6Fu3LwbXH4wetXvA3MhcbIs5HI4mKFeOxbZNnAjcu8de3rIHoNSpA0yYwLpMS+h1k0qkaOnYEi0dW8LjEw88fP0Q+57sw37//Zh0YhJ+PPkjutbsimENh2FQ/UEoZ/b+/aVkpcDHzwcBcQFwtnPG0IZDYWViVSI7NU1AXECBYg0A0uRpCIwP1LJF4sM9bBxOfoiAGzeADRuAvXvxoJwM//R2gnetNIQr4mFmaIZ+dfthmMsw9KjdA6aGpmJbzNEm+uxh493c6iU9Hdi7F/jrL+D6dcDMjMW6ff89i3VTM37RfvB+7I3dj3cjKCEIRlIj9HTuiREuI9Cvbj+YGZm91UZf48C4h413iQLggo1TCFlZbKTnmjUID7yLXS2M8U87KzwyjIOh1BA9avfAcJfh6Fe3X6mNneAUgW+/ZctNm4rX/Chrv6lv8dqXiLIwkEQs7t0D/viDDVbIyADatmXnefBgwFC9nVlEhDuRd7D70W54+3kjIiUC1ibWGFJ/CL78+Et0rN4RUolUr+PA9Nn2ksIFWy64YOPkIToa2LgR6Zs24IBdNLzaW+CcQzoIhLZV22JU41H4ouEXqGBeQWxLOZySkT/lSP5ps0pbUmQxSEgA/v6beegDAwEnJ2DSJOCbb9Q2SCE3SkGJCyEXsOPhDuz3349UWSqq2VTDqEajYG5kDo8rHmrzUmm7a1VfvYMlhQu2XHDBxgEA+PuDVizHzXM7sNVFDu8mBkgxUKJWuVr4svGXGNl4JGqXry22lUWHd3dxikJu0ZZNGRdrGhEiggD8+y9L/3LuHGBuDnz9NTv3H32kHsPzkSZLw+Fnh7Hj4Q6cDjr93qS8M9vPhEc3jyJtWyzxlCpLhc9jHwTGB6J2+doY6jK01HrWsuGCLRdcsJVxrl1D7PL52B51Cp7NJfCvQDA3MMPnLl/g6yZfq7oT9A7e3aUd9LlLNBsiQJrrGheEMivWtCJEHjxgwm3XLpYqZMgQYMYMoFkz9Wy/ACJSIvDD8R9w8OlBUAHTbH+Ih60sd0+KQWGCTQ+fShxOMRAE0NGjuNi3MUasbA9Hl1OY1h2wadgMnn09EfXza3gN8ELnGp31U6wBzIOWP6N+7u6uAQPEtrB08Pw5K8VtHvccz+OK377ElPbpxj6AlKwU9NrZCymyFFW3YZo8DSky9n2qLFU9O/r4Y2DbNiAkBPj5ZzYFVvPmwKefsnRBGjj3VayqwGuAFyyMLApcrxAU6FG7R5G25ePnU6i3TiABPo99im0np+jo6ZOJwykigoBEby+sGeKE+qf7wbXFIxxvZIrxLSbg4YSHuD7hNsY2G6vzQ9yLRP5plaRSHpvEyQufNisPWhcilSsDixcDL1+y5ePHTLS1bg0cO6b2829lYoUTo07AythKJdyMpcaQSqTIUmahzvo6GHdkHO5G3n3ndniKDd2ACzZO6USpxKNtizHhKzs4PvoaUz6OQDnHj+DVxxMRv8Zhbb8/0cihkdhWqp9s0ZYbLtY42Whi7lQ9RjQhYmPDukSDg1lKkNhYoG9f5nU7dIgJaTXRoVoHREyLwJoeazCz/Uz80fsPJM1Mwn/f/IcRLiOw+/FuNN/UHO23tsfuR7shU749b6qznXOhnjoLIwv9ivXVY7hg45QqFLJM7Pv
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
"draw_means(fig, X_mean, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)\n",
"plot_prob(fig, X_mean, X_std, classes, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxT15k38N8jW5JtWWbfl7CGhBDigNu0WaCTNBkCDWkSWqbbdDq0mUmbsKUFOuk2bbM0nRdC2mmnKW3nnTbT0jdAlkKzDglk2iaBFBIIIRiCWcxiFmN50WLpvH9cXUuWtdpXPpLv7/v5+BNbkq8eKzb3p3POfY4opUBERERE+ePQXQARERFRX8fARURERJRnDFxEREREecbARURERJRnDFxEREREecbARURERJRnpboL6I7BgwercePG6S6DyNYUgMN+P86FQri4ogLekhLdJRWdsFJ4u6UFThFcXFEBp4jukoioh3bu3HlGKTUk8faiDFzjxo3Djh07dJdBZFuhSASf3rcPbzY04KEJE7By7FjdJRWtbY2NmPf222h1ufBydTVGud26SyKiHhCRumS3c0qRiHLSHongs/v24YmGBvyfiRMZtnpoVv/+eH76dJwKBvE3u3ahPhDQXRIR5QEDFxFlLaIUFu3fj983NOCHEyZg+ZgxukvqEz7crx+enT4dJ4JB3Lh7NxqCQd0lEZHFGLiIKCtKKSyprcV/nTqF744bh69yZMtSV/frh2emTcMhvx9z3noLTe3tuksiIgsxcBFRVr5XV4cfHz+Oe0ePxjcuukh3OX3SRwYMwIbLLsPu5mZ8fM8eBCIR3SURkUUYuIgoo1+cOIFvHz6Mvx82DD+cOBHCq+nyZu6gQfjVJZdga2Mj/uHddxFRSndJRGSBorxKkYh6z/PnzuGf9u/HTQMGYN2UKQxbveBzw4ejPhjEqkOHMK6sDA9OmKC7JCLqIQYuIkppb0sLPrF3Ly7zePDEZZfB6eCgeG9ZMWYMDvv9eOjIEUwqL8eiESN0l0REPcB/PYkoqTPBIG55+21UlJTgD5dfDm8p35/1JhHBjyZNwk0DBuCu997DtsZG3SURUQ8wcBFRF+2RCD75zjuoDwTw5LRpGFNWprskWyp1OLB+6lSMLyvDHXv3os7v110SEXUTAxcRdbHi0CFsbWzEY1Om4KqqKt3l2Fp/pxNPX345gpEIbt+zB23hsO6SiKgbGLiIqJP1p09jzbFjuHvUKPz98OG6yyEAUyoq8JtLL8Wbzc24+8AB3eUQUTcwcBFRh/2trfji/v24uqoKqydO1F0Oxbll8GDcN3YsfnnyJH514oTucogoRwxcRAQAaAuHsWDvXpRF1w3xisTC86/jx+P6/v3x5QMHsKe5WXc5RJQD/otKRACAZbW12NPSgl9fcglGc5F8QSoRwX9PnYp+JSVY+M47aOV6LqKiwcBFRNjY0ICfnTiBr40ZgzmDBukuh9IY5nLh15deindaW7G8tlZ3OUSUJQYuIpurDwTwpf37UeP14vvjx+suh7Jw48CB+NqYMfjZiRN4+swZ3eUQURYYuIhsTCmFL7z7LvyRCH5z6aVwcd1W0fje+PGorqzEov37cToY1F0OEWXAf12JbOyn9fV4/vx5/NvEiZhSUaG7HMqB2+HA45deCl97O+7cvx+Km1wTFTQGLiKbOtTWhq8dPIibBgzAP48cqbsc6oapHg/unzABT509i9+cOqW7HCJKg4GLyIYiSuGL+/ejVATrpkyBiOguibpp6ejRuLqqCktqa3EiENBdDhGlwMBFZEM/P3ECWxsb8W8TJ3KfxCJXIoJfXnIJ2iIRfIVd6IkKFgMXkc3UBwJYcfAgru/fH18cMUJ3OWSBKRUV+M64cdh05gw2NjToLoeIkmDgIrKZew4cQFAp/OziizmV2IcsHz0a1ZWVuPvAATS1t+suh4gSMHAR2cjms2ex8cwZfOuiizCJVyX2KU6HA49dfDFOBoP4xvvv6y6HiBIwcBHZRGs4jLsPHMDUigrcO2aM7nIoDz5QVYUvjxyJfz9+HDt9Pt3lEFEcBi4im3joyBEc9vvxk4svZoPTPuz748djsNOJr7z3HiLszUVUMPivLpENHGxrw8NHjuDTQ4didv/+usuhPOrvdOLhiRPxms+H/3vypO5yiCiKgYvIBu6trYXT4cDDEyfqLoV6weeGDcPVVVVYdegQF9ATFQgGLqI+7qXz5/HU2bO4b+xYjHK7dZdDvcAhgrWTJqEhFML9dXW6yyEiMHAR9WlhpbCsthbjy8qwdPRo3eVQL6qpqsLnhw/HI8eO4WBbm+5yiGyPgYuoD/vViRN4u6UFP5gwAWUlJbrLoV52//jxKBXBqkOHdJdCZHsMXER9VEs4jG8ePowPV1VhwZAhusshDUa63fjamDF4oqEBf75wQXc5RLamPXCJSJmIvC4iu0Vkr4j8q+6aiPqCNUeP4mQwiH+bOJEd5W3sq2PGYLjLhRWHDkGxTQSRNtoDF4AAgOuVUlcAqAYwR0Q+pLkmoqJ2JhjEw0eP4uODB+Pqfv10l0MaVZaW4tsXXYRXL1zA5rNndZdDZFvaA5cyNEe/dEY/+DaMqAceOHIELeEw7h8/XncpVAAWjRiByeXl+Jf332czVCJNtAcuABCREhHZBeA0gBeUUq/promoWB0PBPCT48fx98OHY6rHo7scKgBOhwPfHTcOb7e0YP3p07rLIbKlgghcSqmwUqoawGgAHxSRaYmPEZE7RWSHiOxoaGjo/SKJisT9dXWIAPjWRRfpLoUKyCeHDsXlHg++c/gw2iMR3eUQ2U5BBC6TUqoRwMsA5iS57zGlVI1SqmYIr7giSqrO78e6EyewaMQIjC8v110OFRCHCL47bhzea2vD4xzlIup12gOXiAwRkf7Rz8sBfBTAu3qrIipOD9TVQQD8y9ixukuhAnTr4MG4srIS3+MoF1Gv0x64AIwAsFVE3gLwBow1XH/QXBNR0Tni9+NXJ09i0YgRGFNWprscKkAigm+PG4eDfj9HuYh6WanuApRSbwG4UncdRMXuB0eOAABWcXSL0pg/aBCu8HjwQF0dPjtsGErYo42oVxTCCBcR9dCJQAC/OHECnx8+HGM5ukVpiAjuu+givNfWhg28AImo1zBwEfUBa44dQ0gprBwzRncpVARuHzIEl1RU4IG6OnafJ+olDFxERa4xFMJP6+uxcOhQTKqo0F0OFYESEawcMwa7W1rw7LlzusshsgUGLqIi99P6ejSHw1jB0S3KwaeHDcNot7tj7R8R5RcDF1ER84fDWHvsGG4aMADVXq/ucqiIuBwOLBs9Gq9cuIDXm5p0l0PU5zFwERWx/z59GqdCIXyNo1vUDV8aMQL9Skrwf44e1V0KUZ/HwEVUpJRSWH30KK7weHDDgAG6y6Ei5C0txZ0jR2JDQwPq/H7d5RD1aQxcREXqhfPnsbe1FcvGjIGwlxJ10z2jRgEAHj12THMlRH0bAxdRkVp77BiGOZ34u6FDdZdCRWxMWRk+MXQofnHiBJrb23WXQ9RnMXARFaEDra3Ycu4c7ho1Cm4H/4ypZ5aMGoUL4TD+69Qp3aUQ9Vn8l5qoCP34+HE4RfBPI0boLoX6gKuqqlDj9eLHx4+zESpRnjBwERWZ5vZ2/OfJk/jEkCEY7nbrLof6ABHBPaNGYV9rK/6nsVF3OUR9EgMXUZF5/PRpNIXDuDu62JnICp8cMgSDSkvxk+PHdZdC1CcxcBEVEaUUfnr8OK7wePChqird5VAfUlZSgkUjRuCpM2dwPBDQXQ5Rn8PARVREXmtqwu6WFvzzyJFsBUGW+6eRIxEGsO7ECd2lEPU5DFxEReRnJ06gsqQEnxk2THcp1AdNKC/HTQMG4BcnTiDMxfNElmLgIioSF9rbsf70aXxq6FB4S0t1l0N91JdGjMDRQADPnTunuxSiPoWBi6hI/PbUKbRFIvgSW0FQHs0fPBhDnU78nNOKRJZi4CIqEutOnMB0jwc1Xq/uUqgPczkc+Pzw4XjmzBmcCgZ1l0PUZzBwERWBt5ubsbO5GYt
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"theta = [[-0.31582268]\n",
" [ 0.43496774]\n",
" [-0.21840373]\n",
" [-7.88802319]\n",
" [22.73897346]\n",
" [-4.43682364]]\n"
]
}
],
"source": [
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
"theta_start = np.matrix(np.zeros(Xbnp.shape[1])).reshape(Xbnp.shape[1], 1)\n",
"theta, errors = GD(h, J, dJ, theta_start, Xbnp, Ybn, \n",
" alpha=0.05, eps=10**-7, maxSteps=100000)\n",
"print(r'theta = {}'.format(theta))"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzde5xT5Z0/8M+TTDKX5ASYOxcRFeoFlMsMqC21tdqqsAqCLSrebf1t2xWE7qr9dbfb32613XYLYtu9WLrrBazUctOC9VbrtQIzFAS8gRRRYDLDMJCTTCbX5/fHyUkymSSTzCRzkszn/XrlJbmdPHMGZj4+z/d8HyGlBBERERHlj8noARARERGVOgYuIiIiojxj4CIiIiLKMwYuIiIiojxj4CIiIiLKMwYuIiIiojwrM3oAA1FbWysnTJhg9DCIiIpG0BWEd78XVZ+pglkxGz0copLV2tp6XEpZl/h4UQauCRMmoKWlxehhEBEVjb0L9uJU1ylc/M7FMJVzcYMoX4QQHyd7nP/qiIhKnP+4H53PdqLxtkaGLSKD8F8eEVGJ6/hdB2RQon5xvdFDIRq2GLiIiEqc83Enqs6tgn2q3eihEA1bDFxERCXMvccN159dGP310RBCGD0comGLgYuIqIQ5n3BClAk03NJg9FCIhjUGLiKiEiXDEu1PtWPUFaNgrbUaPRyiYY2Bi4ioRHW93AXfJz40LObsFpHRGLiIiErUsUeOwVJrQd2CPj0YiWiIMXAREZWgoCuI488eR/0N9ey9RVQA+K+QiKgEdazvgPRJ1N/A3ltEhYCBi4ioBB1bfQyVn6mE4yKH0UMhIjBwERGVnO4PuuF6y4XR32DvLaJCwcBFRFRinE86AQFenUhUQBi4iIhKiAxLONc6MfLSkSgfXW70cIgogoGLiKiEnHzlJHo+6sHoO0YbPRQiisPARURUQtoebUPZyDLULqw1eihEFIeBi4ioRIS6Qzi+6ThqF9bCXGE2ejhEFIeBi4ioRBzfeBwhdwgNN7FYnqjQMHAREZWIo48cReXESoy8ZKTRQyGiBAxcREQlwHvIi1OvnULjbY0QJvbeIio0DFxERCWg/TftAID6G7mVD1EhYuAiIipyUkq0PdqGEbNHoPKMSqOHQ0RJMHARERW5U6+fgvdDL0Z/g723iAoVAxcRUZFzrnXCVGVC3cI6o4dCRCkwcBERFbGwL4yOpztQO78WZht7bxEVKgYuIqIi1rGhA8GuIBpvazR6KESUBgMXEVERa/vfNlRMqMCoy0YZPRQiSoOBi4ioSPnafOh6uQv1i+vZe4uowDFwEREVqfYn24Ew0LCYW/kQFToGLiKiIiSlxLHVx+C4yAHbuTajh0NE/WDgIiIqQmqLiu73utF4B4vliYoBAxcRURFyrnVCWAXqvsreW0TFwPDAJYSoEEJsF0LsFkLsE0L8P6PHRERUyML+MNqfbEfN39TAMtJi9HCIKANlRg8AgA/Al6SUbiGEBcAbQojnpJRvGz0wIqJC1PlsJwIdAYz+OrfyISoWhgcuKaUE4I7ctURu0rgREREVtrYn2mAdbUX1V6qNHgoRZcjwJUUAEEKYhRC7ALQDeFFKuc3oMRERFaLAiQBOPHcC9dfXQ5jZe4uoWBRE4JJShqSU0wCMAzBLCDEl8TVCiLuEEC1CiJaOjo6hHyQRUQFwrnVC+iUabmHvLaJiUhCBSyelPAngTwCuTPLcI1LKZillc10dr8ohouGp7X/aYG+yQ5mmGD0UIsqC4YFLCFEnhBgZ+XMlgMsBvG/sqIiICo9nnwfuXW403sLeW0TFxvCieQCjATwmhDBDC4C/lVL+3uAxEREVHOcaJ2AG6hfVGz0UIsqS4YFLSvkOgOlGj4OIqJCFg2G0PdqGmqtqYG2wGj0cIsqS4UuKRETUv64XuuBv86PxTi4nEhUjBi4ioiLgXOtE2agy1MypMXooRDQADFxERAUu6Ari+MbjqF9UD5OVP7aJihH/5RIRFbj2p9oR9obReDuXE4mKFQMXEVGBa3usDVWTq6DMZO8tomLFwEVEVMC8f/XC9ZYLDYsbIAS38iEqVgxcREQFrO2xNkAADYu5lQ9RMWPgIiIqUDIs0fZoG0ZdPgoV4yuMHg4RDQIDFxFRgTr1+in4Pvah8TYWyxMVOwYuIqIC5VzjhMlmQu28WqOHQkSDxMBFRFSAQp4Q2te1o+66OphtZqOHQ0SDxMBFRFSAOjZ0IKSGMPqO0UYPhYhygIGLiKgAOdc4UTGhAiM+P8LooRBRDjBwEREVGN9RH7pe6kL94nr23iIqEQxcREQFpu3xNiAMNN7CqxOJSgUDFxFRAZFSwvmYEyNmj0DVZ6qMHg4R5QgDFxFRAXHvdKP7/W403MzO8kSlhIGLiKiAtD3eBmEVqPtqndFDIaIcYuAiIioQYV8YzjVO1M6rhWWUxejhEFEOMXARERWIzuc6ETwRROPtLJYnKjUMXEREBcK5xglLvQWjvjzK6KEQUY4xcBERFYBAZwCdz3ai/oZ6mMr4o5mo1PBfNRFRAXA+6YT0S4y+nVv5EJUiBi4iogLgfMIJ21Qb7FPtRg+FiPKAgYuIyGDdH3ZD3aGi4Sb23iIqVQxcREQGa3u0DTABDTcycBGVKgYuIiIDyZBE2+NtqL6yGuVjyo0eDhHlCQMXEZGBTr56Ev4jfm5UTVTiGLiIiAzkfMIJs2JGzdU1Rg+FiPKIgYuIyCBBdxDtT7ej7mt1MFeZjR4OEeWR4YFLCHGaEOIVIcR7Qoh9QoilRo+JiGgoHN9wHGFPGI23cTmRqNSVGT0AAEEA35FS7hRCKABahRAvSinfNXpgRET55FzjRMUZFRjxuRFGD4WI8szwGS4p5TEp5c7In1UA7wEYa+yoiIjyy3fEh66Xu9CwuAFCCKOHQ0R5ZnjgiieEmABgOoBtxo6EiCi/2p5oA8JAw63svUU0HBRM4BJC2AGsB3CPlNKV5Pm7hBAtQoiWjo6OoR8gEVGOSCnhfNwJx+ccqJpYZfRwiGgIFETgEkJYoIWttVLKDcleI6V8RErZLKVsrqurG9oBEhHlkHunG93vdaPxZhbLEw0XhgcuoRUv/BrAe1LKFUaPh4go39qeaIOwCtR9lf/zSDRcGB64AHwOwM0AviSE2BW5zTF6UERE+RD2h9G+th2182phqbYYPRwiGiKGt4WQUr4BgJfoENGwcOL5EwgcD6DhFhbLEw0nhTDDRUQ0bDjXOGGptaD6imqjh0JEQ4iBi4hoiAS6Aji++Tjqr6+HycIfv0TDCf/FExENkY6nOyB9kr23iIYhBi4ioiHifMKJqnOroDQpRg+FiIYYAxcR0RDw/tWLU2+cQsNN3MqHaDhi4CIiGgLOJ5wAgIbFXE4kGo4YuIiI8kxKCecTToy8dCQqTq8wejhEZAAGLiKiPHNtc8F7wIuGmzm7RTRcMXAREeWZ8wknTBUm1C3kVj5EwxUDFxFRHoV9YbQ/1Y7a+bUocxi+uQcRGYSBi4gojzqf60TwRJBb+RANcwxcRER55FzjhKXeglFfHmX0UIjIQAxcRER5EugKoPPZTm0rnzL+uCUazvgTgIgoTzp+2wHpl2i8pdHooRCRwVjBSUSUJ8412lY+9hl2o4dCRHkmpUTIE0r5PAMXEVEeeA9pW/mc8cAZ3MqHqAT52nxwt7rh2u6C620X1BYVIz4/IuXrGbiIiPKgfW07AKD+xnqDR0JEgxV0BaG2qtpthwrXWy74PvVpT5oA2/k21C2sw8hLRwKbkx+DgYuIKMeklGh7vA0jLhmBygmVRg+HiLIQDoTheccD1zZt1kptVeHZ6wHC2vPlp5fD8VkHHBc5oDQpsM+wo8weF6duTH5cBi4iohxTW1R4P/TitH84zeihEFEaMizhPeCFukPFqT+fgtqiwrPbg3CPlq4sdRbYZ9hRe20tRlw8AvYZdljrrAP6LAYuIqIcc651QlgFt/IhKjC+Iz5t1qolErB2qAi5tEJ3k80EpVnBmG+OgeNCB5QLFVScXpGzGkwGLiKiHAoHta18aubWwDLKYvRwiIa
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xbnp, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
"plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n",
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeXyU1b0/8M+ZPbNl30jYQdmECAFcEKxWq1DFBaVK1Xpd7rVVEbxFfrft7Wprvffi1ltvK22tSitWQKXiUi0CbixBQJElAVmTTPbMvj7n98czT2YymZnMJDN5Zibf9+uVF8ksz5wJSeYz53yf72GccxBCCCGEkPRRyD0AQgghhJBcR4GLEEIIISTNKHARQgghhKQZBS5CCCGEkDSjwEUIIYQQkmYUuAghhBBC0kwl9wAGoqSkhI8ZM0buYRAyrHEAJ9xudPh8OEevh0mplHtIWSfAOT53OKBmDOfo9VAzlrbH8lv9cNW7oD9HD6WJ/q8ISZe6uro2znlp5OVZGbjGjBmDPXv2yD0MQoYtnyDg1kOHsLe1FY+NG4dHRo2Se0hZa3tXFxZ9/jmcGg0+qKlBlVablsf54oYv0N3ZjQsPXAiFlhY3CEkXxtjJaJfTbx0hJCl+QcC3Dx3Cq62t+J/x4ylsDdL8ggK8O306LF4vvrZvHxo9npQ/hrfNi/bN7aj4TgWFLUJkQr95hJCECZzjriNH8EprK/5r3DisHDlS7iHlhAvz8/H29Olo8npxxf79aPV6U3r81ldbwf0cZcvKUnpcQkjiKHARQhLCOcfyhga8YLHgZ2PG4N9pZiulLsrPx+Zp03Dc7cZVBw7A6ven7NiWFyzQT9bDOMOYsmMSQpJDgYsQkpCfnzyJ35w9i4erq/HD0aPlHk5OurSwEBumTsV+ux3XffEFPIIw6GPaP7fD+okVlXdXgqWxKJ8QEh8FLkJIv/7Q1IQfnziB28vL8V/jx9MLdxotLC7GnyZNwtauLnzn8GEInA/qeJYXLWAqhvLby1M0QkLIQGTlWYqEkKHzbkcH/vXIEVxZWIi1555LYWsI3FZRgUavF6uPH8cYnQ6/GjduQMfhAkfLyy0o/EYhNCWaFI+SEJIMClyEkJgOOhy46eBBTDUY8OrUqVAraFJ8qKwaORIn3G48duoUJuTl4a7KyqSP0fl+JzynPRj364EFNkJI6tBfT0JIVG1eL675/HPolUr8/bzzYFLR+7OhxBjDMxMm4MrCQtx39Ci2d3UlfYym3zdBXaJG6Q19ejASQoYYBS5CSB9+QcDNX36JRo8Hr02bhpE6ndxDGpZUCgXWT5mCsTodbjx4ECfd7oTv67f60ba5DWW3lFHvLUIyAP0WEkL6WHX8OLZ2deH3556LuWaz3MMZ1grUarxx3nnwCgJu+OILuAKBhO7XuqEV3MNRdgv13iIkE1DgIoT0sr6lBU+cOYP7q6pwe0WF3MMhAM7V6/HS5MnYa7fj/vr6hO7TtLYJeefkwXwBBWZCMgEFLkJIjyNOJ+4+cgQXmc1YM3683MMhYa4pKcEPRo3CH5ub8aempri3dR5xwvqxFZX3UO8tQjIFBS5CCADAFQhgycGD0AXrhuiMxMzz07FjcVlBAb5bX48v7PaYt7P8xQIwoHwZ9d4iJFPQX1RCCABgRUMDvnA48OKkSaimIvmMpGQMf5kyBflKJZZ++SWcUeq5uMBhWWdBwdcKoK3UyjBKQkg0FLgIIdjY2orfNTXh+yNH4qriYrmHQ+Io12jw4uTJ+NLpxMqGhj7Xd23tgvuYG5X/knzfLkJI+lDgImSYa/R4cM+RI6g1mfCLsWPlHg5JwBVFRfj+yJH4XVMT3mhr63Vd8/PNUBWoUHJjiUyjI4REQ4GLkGGMc447Dx+GWxDw0uTJ0FDdVtb4+dixqDEacdeRI2jxegEAAWcAba+1oeTGEih1SplHSAgJR39dCRnGnm1sxLudnfjv8eNxrl4v93BIErQKBdZNngyb3497jxwB5xxtm9oQsAdQ/m0qlick01DgImSYOu5y4fvHjuHKwkL824gRcg+HDMAUgwGPjhuH19vb8ZLFgsbfNyJvQh4K5hfIPTRCSAQKXIQMQwLnuPvIEagYw9pzz6VeTVnsoepqXGQ24xc76tG9vRsV36kAU9D/JyGZhgIXIcPQc01N2NrVhf8eP572ScxySsbwx0mTcME/BABA2a20lQ8hmYgCFyHDTKPHg1XHjuGyggLcXUmtA3LBOXl5+Nb7Khw4D3jLGLshKiFEPhS4CBlmHqivh5dz/O6cc2gpMUd07+hG3nEfPr9Ri/vr62H1++UeEiEkAgUuQoaRN9vbsbGtDf85ejQm0FmJOcOyzgKFXoF77p2EZq8XP/zqK7mHRAiJQIGLkGHCGQjg/vp6TNHr8fDIkXIPh6SI4BHQ+rdWlFxXgjmVhfjuiBH437NnUWezyT00QkgYClyEDBOPnTqFE243fnvOOdTgNIe0bmyFv9OPiu9UAAB+MXYsStRqfO/oUQicyzw6QoiE/uoSMgwcc7nw+KlTuLWsDAsKqEdTLmn+UzN0Y3QovLwQAFCgVuPx8eOx02bDn5ubZR4dIURCgYuQYeDhhgaoFQo8Pn683EMhKeRp9qDz/U6ULSvr1XvrtvJyXGQ2Y/Xx41RAT0iGoMBFSI57v7MTr7e34wejRqFKq5V7OCSFWv7SAghA+bLeW/koGMNTEyag1efDoydPyjQ6Qkg4ClyE5LAA51jR0ICxOh0eqq6WezgkhTjnaFrbBPMFZhgmG/pcX2s2446KCjx55gyOuVwyjJAQEo4CFyE57E9NTfjc4cCvx42DTqmUezgkhWx7bHAecqLiXypi3ubRsWOhYgyrjx8fwpERQqKhwEVIjnIEAvjRiRO40GzGktJSuYdDUsyyzgKmYSi9Kfb/7QitFt8fORKvtrbik+7uIRwdISSS7IGLMaZjjO1ijO1njB1kjP1U7jERkgueOH0azV4v/nv8eOoon2MEr4CWv7Sg+JvFUBeo497230eORIVGg1XHj4NTmwhCZCN74ALgAXAZ53wGgBoAVzHGLpB5TIRktTavF4+fPo3rSkpwUX6+3MMhKda+uR2+Vh8q7+5/L0yjSoUfjx6ND7u78WZ7+xCMjhASjeyBi4uk3VbVwQ96G0bIIPzy1Ck4AgE8Onas3EMhadD8YjM0lRoUXVmU0O3vqqzExLw8/MdXX1EzVEJkInvgAgDGmJIxtg9AC4B/cM53yj0mQrLVWY8Hvz17FrdXVGCKoe/ZayS7+Tp86HirA2XfKgNTJrZUrFYo8LMxY/C5w4H1LS1pHiEhJJqMCFyc8wDnvAZANYA5jLFpkbdhjN3LGNvDGNvT2to69IMkJEs8evIkBAD/OXq03EMhaWBZZwH3cpTfXt7/jcPcXFaG8wwG/OTECfgFIU2jI4TEkhGBS8I57wLwAYCrolz3e855Lee8tpTOuCIkqpNuN9Y2NeGuykqMzcuTezgkDZr/2AzjLCNMNaak7qdgDD8bMwZHXS6so1kuQoac7IGLMVbKGCsIfp4H4OsADss7KkKy0y9PngQD8B+jRsk9FJIGjoMO2PfZUXF77N5b8SwuKcH5RiN+TrNchAw52QMXgEoAWxljBwDshljD9XeZx0RI1jnlduNPzc24q7ISI3U6uYdD0sDykgVQAmVLywZ0f8YYfjxmDI653TTLRcgQU8k9AM75AQDnyz0OQrLdr0+dAgCsptmtnCT4BTQ/34ziq4uhKdcM+DjXFhdjhsGAX548iW+Xl0NJPdoIGRKZMMNFCBmkJo8Hf2hqwh0VFRhFs1s5qfPdTnibvai4a2DLiRLGGH4wejSOulzYQCcgETJkKHARkgOeOHMGPs7xyMiRcg+FpIllnQWqQhWKFxYP+lg3lJZikl6PX548Sd3nCRkiFLgIyXJdPh+ebWzE0rIyTNDr5R4OSQO/1Y+2TW0oW1oGhWbwf7aVjOGRkSOx3+HA2x0dKRghIaQ/FLgIyXLPNjbCHghgFc1u5ayWl1sguARU3Dm45cRwt5aXo1qr7an9I4SkFwUuQrK
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
"plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)\n",
"plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Naiwny klasyfikator Bayesa nie działa, jeżeli dane nie różnią się ani średnią, ani odchyleniem standardowym."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 8.2. Algorytm $k$ najbliższych sąsiadów"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### KNN intuicja"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Do której kategorii powinien należeć punkt oznaczony gwiazdką?"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Przydatne importy\n",
"\n",
"import ipywidgets as widgets\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wczytanie danych (gatunki kosaćców)\n",
"\n",
"data_iris = pandas.read_csv('iris.csv')\n",
"data_iris_setosa = pandas.DataFrame()\n",
"data_iris_setosa['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
"data_iris_setosa['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
"data_iris_setosa['Iris setosa?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-setosa' else 0)\n",
"\n",
"m, n_plus_1 = data_iris_setosa.values.shape\n",
"n = n_plus_1 - 1\n",
"Xn = data_iris_setosa.values[:, 0:n].reshape(m, n)\n",
"\n",
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wykres danych (wersja macierzowa)\n",
"def plot_data_for_classification(X, Y, xlabel, ylabel): \n",
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
" ax = fig.add_subplot(111)\n",
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
" X = X.tolist()\n",
" Y = Y.tolist()\n",
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
" \n",
" ax.set_xlabel(xlabel)\n",
" ax.set_ylabel(ylabel)\n",
" ax.margins(.05, .05)\n",
" return fig"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"def plot_new_example(fig, x, y):\n",
" ax = fig.axes[0]\n",
" ax.scatter([x], [y], c='k', marker='*', s=100, label='?')"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83mYAhGURJbIAQw0PEAtVJJiJWLOAzYGEyhQRqK7YsU1psR7QrEG/lqr01rNx7nQ6WYilayL2KITIEFmDxAdRwhUoSEiQ8BZSUNCCCApPISjLnfO8f+xzmzMw5Z++Zs5/OOe/XWnvN7Ifz29/9myzOl71/+/c1dxcAAACSMyXrAAAAAFodCRcAAEDCSLgAAAASRsIFAACQMBIuAACAhJFwAQAAJKwj6wAmatasWT5//vyswwAAABhl06ZNL7j77Gr7mi7hmj9/vjZu3Jh1GAAAAKOY2Y5a+3ikCAAAkDASLgAAgISRcAEAACSMhAsAACBhJFwAAAAJI+ECAABIGAkXAABAwki4AAAAEpZYwmVmR5rZPWb2qJltM7O+KsecZmYvm9mW0nJFUvEAAJAId+mWW4KfUbYndY404sCkJXmHa1jSZ9z9dyWdLOkSMzu+ynEb3L2rtHwxwXgAAIjf+vVSb6906aUjSY17sN7bG+xP4xxpxIFJS6y0j7s/K+nZ0u9DZvaopCMkPZLUOQEASF1Pj9TXJw0MBOv9/UGSMzAQbO/pSe8cSceBSTNP4Rajmc2X9GNJJ7r7KxXbT5N0s6SdknZJ+lt331avrcWLFzu1FAEAuVK+k1ROdqQgyenvl8zSO0cacaAmM9vk7our7ks64TKzmZJ+JOkf3H1wzL6DJRXdfbeZnSlpwN0XVGljuaTlkjRv3rzuHTtq1oYEACAb7tKUipE6xWL8SU6Uc6QRB6qql3Al+paimU1TcAfrG2OTLUly91fcfXfp9zslTTOzWVWOu9bdF7v74tmzZycZMgAAE1e+s1SpcixVWudIIw5MSpJvKZqkr0l61N2/XOOYOaXjZGYnleJ5MamYAACIXeVjvL6+4I5SeSxVXMlOlHOkEQcmLbFB85LeLelPJf3MzLaUtn1W0jxJcvevSjpX0l+a2bCkVyWd72kMKgMAIC7r148kOeWxUv39wb6BAenUU6UlS5I/R/n3JOPApKUyaD5ODJoHAOSKe5AQ9fSMHitVa3tS55CSjwN1ZTpoPm4kXAAAII8yGzQPAAAAEi4AAIDEkXABANLXLHX/ikXpssuCn1G2AzWQcAEA0tcsdf9WrpRWr5a6u0eSq2IxWF+9OtgPREDCBQBIX2VtwHLSlce6f6tWSV1d0pYtI0lXd3ew3tUV7AciSHIeLgAAqhs7R1S59l/e6v5NmSJt2jSSZE2dGmzv6gq2T+G+BaJhWggAQHaape5fsTiSbElSoUCyhXGYFgIAkD/NUvev/BixUuWYLiACEi4AQPqape7f2DFbhcL4MV1ABCRcAID01aoNWE668vSWYjnZKo/Z2rRpJOniLUVExBguAED60qg/GIdiMUiqVq0aP9as2na0NWopAgAAJIxB8wAAABki4QIApC+stE+xGF76J4420riWKOfJSxutJG/94e5NtXR3dzsAoMkNDgYpU1+fe7EYbCsWg3XJfcWK+vsHB+NpI41riXKevLTRSjLoD0kbvUb+knkCNdGFhAsAWkDlF1/5C7FyvVCov79YjKeNNK4lynny0kYryaA/SLgAAPlT+QVYXmrdjai2P6420riWZmqjlaTcH/USLt5SBABkx0NK+4Ttj6uNOMRxnry00UpS7A/eUgQA5I+HlPYJ2x9XG3GI4zx5aaOV5Kk/at36yuvCI0UAaAGM4cpnG62EMVwkXADQ9nhLMZ9ttBLeUiThAoC2VywGX3hj7zKUtxcK9feX73A12kYa1xL17lQe2mglGfRHvYSLQfMAAAAxYNA8AABAhki4AAAAEkbCBQBALR5DPb442mg3LdhnJFwAANSyfr3U21t9bq/e3mB/Gm20mxbss46sAwAAILd6eqS+PmlgIFjv7w++9AcGgu09Pem00W5asM94SxEAgHrKd1bKX/5S8KXf3x+9REwcbbSbJuyzem8pknABABDGqXGYiSbrM6aFAABgssp3WipR4zB5LdZnJFwAANRS+Virry+4w1IeWxT1yz+ONtpNC/YZg+YBAKhl/fqRL/3y2KH+/mDfwIB06qnSkiXJt9FuWrDPGMMFAEAt7sGXf0/P6LFDtbYn1Ua7adI+Y9A8AABAwhg0DwAAkCESLgAAgISRcAEAWlOUenxhxxSLjbdBvcXR2ulaK5BwAQBaU5R6fGHHrFzZeBvUWxytna61krs31dLd3e0AAIQqFt37+oJ7UH191dfDjikUGm+jWIwn1lbRwtcqaaPXyF8yT6AmupBwAQAiq/wyLy9jv9TDjomjjbhibRUteq31Ei6mhQAAtDaPUI8v7Jg42ogr1lbRgtfKtBAAgPbkEerxhR0TRxtxxdoq2ulay2rd+srrwiNFAEAkjOHKpxa+VjGGCwDQdgYHx3+JV365Dw6GH7NiReNtDA7GE2uraOFrrZdwMYYLANCaPEI9Pqn+MeecI916a2NtUG9xtBa+VmopAgAAJIxB8wAAABki4QIAAEhYYgmXmR1pZveY2aNmts3M+qocY2Z2lZk9aWYPmdmipOIBAMTEU6g/GKUNpC/s7xbX3yWt86QoyTtcw5I+4+6/K+lkSZeY2fFjjjlD0oLSslzSNQnGAwCIQxr1B6O0gfSlVQexFest1np9Me5F0q2SPjBm279IuqBi/XFJh9Vrh2khACBjacxdFaUNpC+tObSadK4uZT0Pl6T5kv5T0sFjtt8u6ZSK9R9IWlyvLRIuAMiBNOoPtmi9vaaX1t+lCf/+9RKuxKeFMLOZkn4k6R/cfXDMvjskrXL3e0vrP5C0wt03jTluuYJHjpo3b173jh07Eo0ZABCBp1B/MEobSF9af5cm+/tnNi2EmU2TdLOkb4xNtkp2SjqyYn2upF1jD3L3a919sbsvnj17djLBAgCiK4+nqRR3/cEobSB9af1dWu3vX+vWV6OLJJO0RtI/1jnmLEnfKR17sqSfhrXLI0UAyBhjuNoXY7jqUhZjuCSdIsklPSRpS2k5U9LFki72kaTsaklPSfqZQsZvOQkXAGQvjfqDUdpA+tKqg9ik9RbrJVyU9gEATIyH1MKLo/5glDZyPJanZYX97eP6u6R1nphRSxEAACBh1FIEAADIEAkXAABAwki4AADx8gh18IpF6bLLgp+Vam2f7HnaCf2RayRcAIB4RamDt3KltHq11N09klwVi8H66tXB/jjO007oj3yr9fpiXhemhQCAnIsyh1Kh4N7VFWzr6qq+Hsd52gn9kTkxLQQAIFXlOysDAyPb+vqk/v6R1/nLd7S2bBk5pqtL2rRpdDmXRs/TTuiPTDEtBAAgfR6hDl6xKE2dOrJeKERPtiZynnZCf2SGaSEAAOkq32mpNLYOXvkOV6XKMV1xnaed0B+5RcIFAIhX5WOtvr4ggerrC9bLX/6VjxO7uoI7W11dwXrUpCvKedoJ/ZFvtQZ35XVh0DwA5FyUOnjlWomVA+QrB86vWBHPedoJ/ZE5MWgeAJAaj1AHzz2Y+mHVqvHjjaptn+x52mnsEv2ROQbNAwAAJIxB8wAAABki4QIAjCgUpCVLgp+1trdSWZ6waykUGo8zjmtNq7/y8ndpRbUGd+V1YdA8ACSopycYYD1rlvvwcLBteDhYl4L9rTTgPexayv3RSJxxXGta/ZWXv0uTUp1B85knUBNdSLgAIEGVyVU56Rq73kplecKuZXi48TjjuNa0+isvf5cmRcIFAIiuMskqL5V3vNxHJyblJWqyVVb5ZV5esvhSD7uWOOLMSxt5Ok8Lqpdw8ZYiAGC8QkHq6BhZHx4eXYJHaq2yPGHXEkeceWkjT+dpMbylCACIrlCQ5swZvW3OnNED6VupLE/YtcQRZ17ayNN52k2tW195XXikCAAJYgwXY7jy8HdpUmIMFwA
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_new_example(fig, 2.8, 0.9)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Wydaje się sensownym przyjąć, że punkt oznaczony gwiazdką powinien być czerwony, ponieważ sąsiednie punkty są czerwone. Najbliższe czerwone punkty są położone bliżej niż najbliższe zielone."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Algorytm oparty na tej intuicji nazywamy algorytmem **$k$ najbliższych sąsiadów** (*$k$ nearest neighbors*, KNN)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Idea (KNN dla $k = 1$):\n",
" 1. Dla nowego przykładu $x'$ znajdź najbliższy przykład $x$ ze zbioru uczącego.\n",
" 1. Jego klasa $y$ to szukana klasa $y'$."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"from scipy.spatial import Voronoi, voronoi_plot_2d\n",
"\n",
"def plot_voronoi(fig, points):\n",
" ax = fig.axes[0]\n",
" vor = Voronoi(points)\n",
" ax.scatter(vor.vertices[:, 0], vor.vertices[:, 1], s=1)\n",
" \n",
" for simplex in vor.ridge_vertices:\n",
" simplex = np.asarray(simplex)\n",
" if np.all(simplex >= 0):\n",
" ax.plot(vor.vertices[simplex, 0], vor.vertices[simplex, 1],\n",
" color='orange', linewidth=1)\n",
" \n",
" xmin, ymin = points.min(axis=0).tolist()[0]\n",
" xmax, ymax = points.max(axis=0).tolist()[0]\n",
" pad = 0.1\n",
" ax.set_xlim(xmin - pad, xmax + pad)\n",
" ax.set_ylim(ymin - pad, ymax + pad)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAl8AAAFkCAYAAAAe6l7uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3hUZfbHvzeNNFoghRJ6rxFCDxAUFFABWbFgxXVFXV2s+ENd2xZW17LWFSu6gtICoUrvCCGBAKGHBBIgIT2kJzNzfn8cxrQp987cuXcG3s/zzAOZee97zr2ZyT3zvuecr0REEAgEAoFAIBBog5feDggEAoFAIBDcSIjgSyAQCAQCgUBDRPAlEAgEAoFAoCEi+BIIBAKBQCDQEBF8CQQCgUAgEGiICL4EAoFAIBAINMRHbweU0rp1a+rUqZPebrgfRceApt0A7wC9PRFYo+gY0LQ74O2vtyfyKb8ISN5AQBtt7VYXAlX5gCQBPsGAf7i29pVQkQUYy4HgrnWeJKDyClCRDQS0BfzDdHNPUIeKywAZgMAO/HNBEhDUAagpAYK72D+eTMDVk4B/BNCklWt9FTAlZwG/VkCTEOtjTDVA8XGgxQBA0ndNKSkpKY+IQu2N87jgq1OnTkhMTNTbDfcjYTbQrBfQ63m9PRFY47dHgdbDgO5P6e2JfC6uAU5/DNyyRVu7FVeAtT2Bm7cAO+8ApuwCfAK19UEuhgpgbS9g5IdA2Jj6rxWfBPY/Bng3AYZ9CzTtankOgeshE7C6KzB6LRAyiJ9bLAF3HwHiOwFTNsoLqIqOA1tjgVvWAC36udJjQflFYP0AYFqK/c//9klApweAzg9q45sVJEm6IGec2Ha8XoiYAGRt1tsLgS3CxwFXdujthTLCRgP5BwBjlbZ2A8KBwHYACAgdCZz9r7b2leATAES9CyQ9zzf4ujTvDUzYA7SbAmwaBpz6D2Ay6uPnjU7OTsC3GdDypvrP+7UA2k4Gzi+WN0+LvsBN7wN77gZqStX3U1BL+o9A5N3yvnh1mQWkfe96n1RCBF/XC+E3A7l7tL9JCuQTHgvk7AA8SVXCrwXQrCeQn6C97bBY4Mp2oP9bwMl/A4Yy7X2QS8d7eXUr/cfGr3l5A71fACb8BmTGAVvGAFdPa+/jjc657/kGLUmNX1N64+7yCBAawzsOnvR59iSIan9ncmg/BSg6ApSed6lbaiGCr+uFJiH8LTtvn96eCKwR1BHwDgKKT+jtiTLCx3EQpJfdFv15O+/M59r7IBdJAgZ9BBx5zfpqSLPuwPgdQMf7gM2jgBP/FqtgWlFzFbi0mrelLBF+M1CVBxQekT/n4E+B4hQg9St1fBTUJ3cvf3FpPVzeeG9/oMN9QPoPrvVLJUTwdT0hth7dn/BxvPrlSYTp5HPYWCBvL2AyAP3eBE594N7bPK2H8e/3xLvWx0heQM9ngdsSgKwNwOaRnheMeyIXlnKA5W8lD9rLG+j8iLLVL58AIGYZcPR1oOCQOn4KakmzsVJpja6zgLSFjbf/3RARfF1PREwAskXw5daEx+qziuQMYTG87Wis1NaufyhXpRUkcZ5N+M3Amc+09UEpA+cDZ78AyjJsjwvuwsUEXWYBW8YCx//JFVsC15AmY/uqy6Oc92Wslj9vsx5A9GfAnnuA6mKnXBTUwVDGW/SdHlJ2XMtBgG9Tzu9zc0TwdT3RegRQcoZL9AXuiXnlywO+mf2ObzOgeV8gb7/2tutuefZ7Azj1IW8huStBkUCPZ4Dk/7M/VvICuj8JTEwCruwENg4HCo+63scbjaungdI0oO0k2+OaduXUjctrlc3f8V6gzUTgwGMi/0stMpZzoU1gW2XHSRIH2efcP/HeZcGXJEmRkiRtlyTppCRJxyVJmmNhTKwkScWSJCVfe7zhKn9uCLz9gNDRQPZWvT0RWCOwPeDbknvSeBJ6VWrW3aZt3ptXd09/qr0fSugzF8jZBeT+Jm98UAdg3K9Ajz8D224Bjr6lbPVFYJu0hdx+wEtGZyVHb9yDPgDKLgCnP7E+hghYubJxgGbteWeRY09rn+QiZ6XSGp0e5Pw+d/6SBteufBkAvEhEvQEMB/BnSZL6WBi3m4iirj3ecaE/NwZi69H98citx1ggRwefw8YAuftqt+T6vwGc/o97b/H4BAED/wkcstB6whqSBHR9DJiUDBQkAhuHiDwiNTAZuQJV7o088m6uGq/IVmbHuwnnfx3/h/UV4lWrgOnTgeefrw1qiPjn6dP5dTWRY09rn+RQmsZfTtvd6djx/qH8pe3CUnX9UhmXBV9ElEVEh679vwTASQDtXGVPcI02t3LwJZa/3Re9qgedITSGc68MFdrabdKK86PyD/LPzXoCbSZx41d3pvODABmBC78oOy6wHTB2DdD7JWD7RK6eFO1jHCd7E682N7f0vd8CvsFA5F1A+v+U2wruDAz7Gth7n+XUj2nTgDlzgI8/rg12nn+ef54zh19XEzn2tPZJDmkLgY73c0DrKJ7Q84uIXP4A0AlABoBmDZ6PBZAP4AiADQD62ptr8ODBJLCByUQU146o+LTengisUXaJaFkIkcmotyfK2DiCKGur9nYTnyNK+Uftz8VniJa3Iqoq1N4XJVzZTbQykqimzLHjyy8T7ZxGtLYPUe4BdX27Udg9g+jMf62/vgiNn7uyi2hNb/5b6ghJLxJtn2z5820yEc2ZY97w48ecOY7bsocce1r7ZNNfI9HKDkT5h5ybx1hNtCKcqPiUOn4pAEAiyYiLXJ5wL0lSMIAVAJ4jooabsIcAdCSigQA+BWBxjVOSpCckSUqUJCkxNzfXtQ57OpIEtBFbj25NYFugSWugyMOSq81NT7Wm4TZts+68JXHqP9r7ooSwGC6COfmBY8cHtAFGxwF9/wrsmgIcnqv9yqMnU1UAZG3ivmpKCI3hbW5HGwtHzWdt0pP/bvyaJAEffVT/uY8+UtZOQQly7Gntky2ubAP8WgIhN9kfawsvX879SluoiluuwKXBlyRJvuDAaxERxTV8nYiuElHptf+vB+ArSVJrC+O+IqJoIooODbWrVykQeV/ujyduPYaP0y/vK29//e23vq8DZz/jm5w7E/Uu56iVX3LseEkCOt0HTD4KlJ0HNkRxDpzAPucXs2yQXwtlx0kSt51wdNvKyxcYtQQ49REXXtTFvK1Xl7r5Vmojx57WPtlCSUd7e3SZxfl+btrI2JXVjhKAbwGcJKIPrYyJuDYOkiQNveaP6JPgLBHjuTJN9A1yXzxR5zF0JFCYrL3Mj19LoGmP2rwvgNsCtJsKnLT4p8V9CO4EdJvNuVvO4B8GxCzlRP49d7OOpDvLLbkDzlTMdX4YyFjq+EpjUCQwfCGwdyZQmcPPNcynMpka51upiRx7Wvtki+oibvNhTYVAKS36AgHtOO/PHZGzN+nIA0AMAAJwFEDytcdkAE8CePLamGcAHAfnfO0HMNLevCLnSybro4hy9ujthcAa5VlES1sQGQ16e6KMTaOILm/S3m7SC0RH36n/XEka585V5mnvjxKqrxLFtSHKO6jOfJV5RHsfIIrvSpS9Q505rzcKjnC+nb3Pl6WcLzNbbyVKX+ScH8mvEW0dz37ExTXOp6qbbxUX55ythsixp7VPtjjzJdGu6SrP+QXn/WkIZOZ8aZJwr+ZDBF8yOfQy0ZE39fZCYIs1vYnyE/X2QhnJrxMdnqe93YtriLaMa/z8/j/p449SUr8h2hSjbhJzZjwX1yQ8TVRdot681wOJz/F71R62gq/0nzlwcgZjDdHmWKKjb10rhopr/B6w9ryzyLGntU+2+HUYf87VpKqAaGlzosp8dee1gdzgS3S4v14ReV/ujyduPeqlTRk62rLEUb/XgNQFQGWe9j4pofOjgKEUyFyu3pztpwC3HwOM5cD6/kD2FvXm9mTKs0BnPseynKEoKHOiWW3kNO61Zk8qyhZePsCoxfwevbIVuOuuxonskmT5eQvUbBiOc3ETUXC1xL5ta/PWfV7OGC0oPslNattMVHdev5asbHB+sbrzqoAIvq5XQmO4ms6dm1He6Hh
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_new_example(fig, 2.8, 0.9)\n",
"plot_voronoi(fig, X[:, 1:])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Podział płaszczyzny jak na powyższym wykresie nazywamy **diagramem Woronoja** (*Voronoi diagram*)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Taki algorytm wyznacza dość efektowne granice klas, zwłaszcza jak na tak prosty algorytm. "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Niestety jest bardzo podatny na obserwacje odstające:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"X_outliers = np.vstack((X, np.matrix([[1.0, 3.9, 1.7]])))\n",
"Y_outliers = np.vstack((Y, np.matrix([[1]])))"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAl8AAAFkCAYAAAAe6l7uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3xUVfr/35NA6C2U0KugSBWCUgUUFFARVCzYFt0Vt4mu/nRdXVfd4lfXlXXVXbFiAaU36b23hN47oZckhPRkZp7fH4eYNuXemTtzB3Ler9e8YOaee85z7kzmPnPO8zwfh4ig0Wg0Go1GowkPUXYboNFoNBqNRlOW0M6XRqPRaDQaTRjRzpdGo9FoNBpNGNHOl0aj0Wg0Gk0Y0c6XRqPRaDQaTRjRzpdGo9FoNBpNGClntwFmqVOnjjRv3txuMyKPSzuh2nUQXcluS65N8lIg5zxUv8FuSzSRTP5lyD4D1dtAzjnIPguVGkLFenZbpgHIPg3ihMpN1fOURKjSFPLToWpL/+eLGy7vhYr1oULt0NqqUaQfhJjaUCHWext3PqTthpodwWHvmlJiYuJFEanrr91V53w1b96chIQEu82IPDaNVo7BDS/Ybcm1idsF8zpA139AgzvstkYTqeRfhhkN4f71EB0DaXthw1MQXQFu+RKqtbLbwrKLuGF2K+jzE8R2Ua9NdMAD22FWcxi60JhDdWk3LO0Ht8+Bmu1DabEm6yTM6wjDdkG5yr7bLh8MzR+FFo+FxzYvOByO40ba6W3Ha4X6A+HMYrutuHaJiob2b8COv4AuTKzxRvnqULUVpG5Tz2u0hYFroNFQWHQL7Pu3cuQ14ef8SvX+1Lqp+OsxNaHhEDg20Vg/NdvBTe/DmgcgP8N6OzWFHP0Wmjzg3/ECaDkKjnwdepssQjtf1wpxt8GFNeDKtduSa5emI8CZDmcW2m2JJpKp0xMurit8HhUNbf8AA9fDiemw5Fa4vN8++8oqh79WN2iHo/Qxszfulk9C3d5qx0H/GAsNIoXvmREaD4VL2yHjWEjNsgrtfF0rVIhVv7KLfulrrCUqGjr8BXa8ob9wNd6p29Pz32H11jBgBTR7GBb3gj3/1Ktg4SL/MpyarbalPBF3G+RehNTtxvvs+hGk7YJDn1ljo6Y4F9aq79w63Y21j64ITR+Go9+E1i6L0M7XtYTeegw9Te4Hdw6cnme3JZpIpU5PdePw5KA7ouD638Odm+DMfFjcE9L2hN/GssbxycrBquglDjoqGlo8aW71q1wl6D0FdrwOKVussVNTyBEfK5XeaDUKjoxX8X0Rjna+riXqD4Sz2vkKKY4o6PAm7NSxXxovVG0J4oKsJN9tbluibi5L+sLuf6iMLU1oOGJg+6rlL1TclyvPeL/V20D8x7DmQchLC8pETRGcmWqLvvnj5s6r1QXKV1PxfRGOdr6uJer0gPQDkJtstyXXNo2HgdsJp+bYbYkmEnE4rqx++QkBcERB62dhUCKcWwkLu0PqjvDYWJa4vB8yjkDDwb7bVWulQjdO/2Su/2YPQYNBsPEp/YPMKpKmqu37yg3NnedwKCf7cOQH3ofM+XI4HE0cDsdyh8Ox1+Fw7HY4HGM8tOnncDjSHA7HtiuPN0JlT5kgOgbq9oGzS+225Nrm59WvN/WXrcYz3uK+PFGlKfRfAG1+C8tuhx1vmlt90fjmyHhVfiDKQGWlQG/cXf4Fmcdh/3+8txGBGTNKf2d4ez1YjIwXbpuMYmSl0hvNH1PxffmXrbXJYkK58uUEXhSRtkB34LcOh+NGD+1Wi0jnK4+3Q2hP2UBvPYaHxveqf0/OstcOTUSSHt2I3IPfkJKRY+wEhwNaPQWDt0FKAizspuOIrMDtUuUKjN7Imzygssazz5obJ7qCiv/a/Xe4uMFzm5kz4b774IUXCp0aEfX8vvvUcSsxMl64bTJCxhFVMLXRPYGdX7EuxPVXcX4RTMicLxE5IyJbrvw/HdgLNArVeJorNLhDOV96RSa0OBzQ4a0rsV+RH9ypCS8/nupABUln6+r/mjuxciPoOwfavgTLB8H213T5mGA4uwgqN4Yann73e6B8VWgyHI5+Z36sqi3gls9h7cOeQz+GDYMxY+DDDwudnRdeUM/HjFHHrcTIeOG2yQhHxkOzR5RDGyhXQ80vEQn5A2gOJAHVS7zeD0gGtgPzgXb++uratatofOB2i0xvJJK2325Lrn3cbpH58SLHp9ptiSbCSM7IFZmAOKc1FsnPDKyTrNMiK4eJ/HSjyIWN1hpYVlg9QuTA/7wfn0Dp186tEpnTVv19B0LiiyLLh4i4XaWPud0iY8YUbPipx5gxgY/lDyPjhdsmn/a6RGY0FUneElw/rjyRaXEiafusscsEQIIY8IscEuIVEofDURVYCfxdRKaXOFYdcItIhsPhGAJ8KCKtPfTxDPAMQNOmTbseP26oen/ZZcMoiI1XMSSa0HJqLmz7IwzZbrummCbCmOhQhXlrtIcOAYazisDxSbDleWjxhFptLaf1Ww2RmwKzW8K9x1QVe09MdMBID/FOc9pAz++hzi3mx3XnqwzWxvfCja+UPi4CUUW+K9xuc+UUzGJkvHDb5I2zS2DLSzBkW/B9bXkJospD53eC78sEDocjUUTi/bUL6d3C4XCUB6YBE0o6XgAicllEMq78fx5Q3uFw1PHQ7jMRiReR+Lp1/epVanTcV/hoOERJXyRNtdsSTSTS+V3Y/yFknQrsfIcDmj8MQ3ZA5jGY39l/FqVGcWyi+vv05nh5w+FQZScC3baKKg+9JsG+sXB+VfFjBdt6RSkab2U1RsYLt02+MFPR3h8tR6l4vwgtZBzKbEcH8CWwV0Q+8NKm/pV2OByOm6/Yo+skBEv9AXBuha4bFA4KYr92vRWxf+QaG6naAq57Brb/Kbh+KtaD3pOh0z+UpmDiC6oWksY7wWTMtXgCkiaDMzuw86s0ge7jYe1IyDmvXisZT+V2l463shIj44XbJl/kXVJlPrypEJilZjuo1EjF/UUiRvYmA3kAvQEBdgDbrjyGAM8Cz15p8ztgNyrmawPQ01+/OubLIPM6i5xfY7cVZQO3W2RBd5GjP9htiSaSKIgnyksTmd5A5OJma/rNuSiy9lGRWa1Ezq6wps9rjZTtIjOaiLicvtt5ivkqYOkdIkcnBGfHttdElg5QdkyfXjqeqmi81fTpwY1VEiPjhdsmXxz4VGTVfRb3+V8V9xdGiJSYL6uJj4+XhIQEu82IfLa+DNGVoeObdltSNjizGBKfgyG7lFSJRlM0nujQF3B0PAxYbV0szcnZsPk3Krao87sqU0+jSHwBylWFTn/13c5TzFcBx36EI1/CbUGEcLidsGwgxPWD9m+o0g3DhhX/DIh4fj1YvPVb9HUIr02+WNgd2r8Oje62rs+8VJjVAoYeUfrHYSAiYr40NqLjvsJL/QFQoTYc/9FuSzSRSMtRkJ8BJyyMDWw8FO7aCa4smNdBBStrIOsMcuATppy/mZTMIIrVNhmmaq1l+pCJ8kdUOeg1EQ6Ng3NLYfjw0s6Mw+H5dQ/kz+/O4emDSLmc7n9sb/0Wfd1Im3CQtlcVqW0wyNp+Y2opZYNjE63t1wK083WtUrc3XNqh9cbCxc+xX2+rX7saTVGioqHrWLUi7TJYeNUIMbWg+9fQ7b+w4SnY+EzEV/YOKeKGTb/kQMW7GXFhKFMSTgTeV3RFJR105JvgbKrUAHpOgHWPQ9bpwPtJmkbu5SSSzl2EBV0heXNwdkUSR76GFo8bUyEwS4TW/NLO17VKuUpK6/HccrstKTvE3QaV6kfkryxNBBDXH2p1hn3/tr7vhoNhyE71/7nt4fQC68e4GtjzHuSlUneAKpI6Ir5JcP21HKW2i4MtpBzXX5X+WftwYD/O0g/B5mdx9ZrKgbY/EtPpdVh5typzY6Uzbwdupypqa1WWY0nibofc8xGnm6qdr2sZvfUYXn5e/fqrXv3SeOamf8K+983L1xghpgbc8hl0/wo2/1rV+8tLtX6cUCMB6g2eXwX7/w29JhFbrQoAsVVigrMlNh6iK8H
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X_outliers, Y_outliers, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
"plot_new_example(fig, 2.8, 0.9)\n",
"plot_voronoi(fig, X_outliers[:, 1:])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Pojedyncza obserwacja odstająca dramatycznie zmienia granice klas."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Aby temu zaradzić, użyjemy więcej niż jednego najbliższego sąsiada ($k > 1$)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Algorytm $k$ najbliższych sąsiadów dla problemu klasyfikacji\n",
"\n",
"1. Dany jest zbiór uczący zawierajacy przykłady $(x_i, y_i)$, gdzie: $x_i$ zestaw cech, $y_i$ klasa.\n",
"1. Dany jest przykład testowy $x'$, dla którego chcemy określić klasę.\n",
"1. Oblicz odległość $d(x', x_i)$ dla każdego przykładu $x_i$ ze zbioru uczącego.\n",
"1. Wybierz $k$ przykładów $x_{i_1}, \\ldots, x_{i_k}$, dla których wyliczona odległość jest najmniejsza.\n",
"1. Jako wynik $y'$ zwróć tę spośrod klas $y_{i_1}, \\ldots, y_{i_k}$, która występuje najczęściej."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Algorytm $k$ najbliższych sąsiadów dla problemu klasyfikacji przykład"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# Odległość euklidesowa\n",
"def euclidean_distance(x1, x2):\n",
" return np.linalg.norm(x1 - x2)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Algorytm k najbliższych sąsiadów\n",
"def knn(X, Y, x_new, k, distance=euclidean_distance):\n",
" data = np.concatenate((X, Y), axis=1)\n",
" nearest = sorted(\n",
" data, key=lambda xy:distance(xy[0, :-1], x_new))[:k]\n",
" y_nearest = [xy[0, -1] for xy in nearest]\n",
" return max(y_nearest, key=lambda y:y_nearest.count(y))"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wykres klas dla KNN\n",
"def plot_knn(fig, X, Y, k, distance=euclidean_distance):\n",
" ax = fig.axes[0]\n",
" x1min, x2min = X.min(axis=0).tolist()[0]\n",
" x1max, x2max = X.max(axis=0).tolist()[0]\n",
" pad1 = (x1max - x1min) / 10\n",
" pad2 = (x2max - x2min) / 10\n",
" step1 = (x1max - x1min) / 50\n",
" step2 = (x2max - x2min) / 50\n",
" x1grid, x2grid = np.meshgrid(\n",
" np.arange(x1min - pad1, x1max + pad1, step1),\n",
" np.arange(x2min - pad2, x2max + pad2, step2))\n",
" z = np.matrix([[knn(X, Y, [x1, x2], k, distance) \n",
" for x1, x2 in zip(x1row, x2row)] \n",
" for x1row, x2row in zip(x1grid, x2grid)])\n",
" plt.contour(x1grid, x2grid, z, levels=[0.5]);"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Przygotowanie interaktywnego wykresu\n",
"\n",
"slider_k = widgets.IntSlider(min=1, max=10, step=1, value=1, description=r'$k$', width=300)\n",
"\n",
"def interactive_knn_1(k):\n",
" fig = plot_data_for_classification(X_outliers, Y_outliers, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
" plot_voronoi(fig, X_outliers[:, 1:])\n",
" plot_knn(fig, X_outliers[:, 1:], Y_outliers, k)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6e20fc55e2ad4e59874fd59b9f128073",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=1, description='$k$', max=10, min=1), Button(description='Run Interact',…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<function __main__.interactive_knn_1(k)>"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"widgets.interact_manual(interactive_knn_1, k=slider_k)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Wczytanie danych (inny przykład)\n",
"\n",
"alldata = pandas.read_csv('classification.tsv', sep='\\t')\n",
"data = np.matrix(alldata)\n",
"\n",
"m, n_plus_1 = data.shape\n",
"n = n_plus_1 - 1\n",
"Xn = data[:, 1:].reshape(m, n)\n",
"\n",
"X2 = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
"Y2 = np.matrix(data[:, 0]).reshape(m, 1)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df4wc533f8c/nJJ5QHw+JSVEyLYmREx6CWmosq1vlh4meZZuudKjD4yHFyRVcpSVKCK3QM6UEYOE0NZoWdR3Y7DlRbNCOALlQrEth8kQk58iyGkRhDKc6GvpBWpbvrPoHfaxEU459uhQilf32j5kVR8fdu70fu/Ps3vsFLHbmeWaW3x3u7X1u5pkZR4QAAACQrp6yCwAAAMDSCGwAAACJI7ABAAAkjsAGAACQOAIbAABA4ghsAAAAibu87ALKcOWVV8b1119fdhkAAABvcOLEiR9GxLbF7RsysF1//fWanp4uuwwAAIA3sP3deu0cEgUAAEgcgQ0AACBxBDYAAIDEEdgAAAASR2ADAABIHIENAAAgcQQ2AACAxBHYAAAAEkdgAwAA3SdCOno0e26mPXEENgAA0H0mJ6WREenAgYvhLCKbHxnJ+jvIhrw1FQAA6HLDw9LYmDQ+ns0fOpSFtfHxrH14uNz6VojABgAAuo+dhTQpC2m14DY2lrXb5dW2Co4OO4a7HiqVSnDzdwAANoAIqacwAqxaTTqs2T4REZXF7YxhA8rSZQNiASA5tTFrRcUxbR2EwAaUpcsGxAJAUmrfp7Uxa9XqxTFtHRjaGMMGlKXLBsQCQFImJy9+n9bGrBXHtA0OSnv3llvjCjCGDShT8S/Amg4dEAsASYnIQtvw8Bu/Txu1J6LRGDYCG1C2DhsQCwBoHU46AFLURQNiAQCtQ2ADytJlA2IBAK3DSQdAWbpsQCwAoHUIbEBZhoelI0feOPC1FtoGBzlLFADwOgIbUBa7/h60Ru0AgA2LMWwAAACJSyKw2b7N9vO2Z20frNP/m7afyh8nbf+d7S1533dsP5v3ca0OAADQdUo/JGr7Mkn3S9ot6bSkJ20fi4hv1JaJiN+V9Lv58h+QdCAiXi68zK0R8cM2lg0AANA2Kexhu0XSbES8EBHnJT0sac8Sy39Q0hfaUhkAAEACUghs10j6fmH+dN52CdtvknSbpC8WmkPSl22fsL2/ZVUCAACUpPRDopLq3YOn0RVDPyDprxYdDn1XRMzZvkrSY7a/GRFPXPKPZGFuvyTt2LFjrTUDAAC0TQp72E5Luq4wf62kuQbL3qFFh0MjYi5/fknSUWWHWC8REYcjohIRlW3btq25aAAAgHZJIbA9KWnA9tts9yoLZccWL2T7pyQNSnqk0NZnu782Len9kk62pWoAAIA2Kf2QaES8ZvseSY9KukzSAxFxyvbdef9n8kX3SvpyRCwUVr9a0lFnV4m/XNIfRcSfta96AACA1nNswBtMVyqVmJ7mkm0AACAttk9ERGVxewqHRAEAALAEAhsAAEDiCGwAAACJI7ABAAAkjsAGAACQOAIbAABA4ghsAAAAiSOwAQAAJI7Atl4ipKNHs+dm2gEAAJpEYFsvk5PSyIh04MDFcBaRzY+MZP0AAACrUPq9RLvG8LA0NiaNj2fzhw5lYW18PGsfHi63PgAA0LEIbOvFzkKalIW0WnAbG8vasxvUAwAArBg3f19vEVJP4UhztUpYAwAATeHm7+1QG7NWVBzTBgAAsAoEtvVSC2u1MWvV6sUxbYQ2AACwBoxhWy+TkxfDWm3MWnFM2+CgtHdvuTUCAICORGBbL8PD0pEj2XNtzFottA0OcpYoAABYNQLberHr70Fr1A4AANAkxrABAAAkjsAGAACQOAIbAABA4ghsAAAAiSOwAQAAJI7ABgAAkDgCGwAAQOIIbAAAAIkjsAEAACSOwAYAAJA4AhsAAEDikghstm+z/bztWdsH6/S/2/aPbT+VP3672XUBAAA6Xek3f7d9maT7Je2WdFrSk7aPRcQ3Fi36lxHxT1e5LoAG5l+d18SpCc2cm9HA1gGN3jCq/iv6yy4LAFBQemCTdIuk2Yh4QZJsPyxpj6RmQtda1gU2vOPfO66hh4ZUjaoWLiyob1Of7n30Xk3dOaVdO3aVXR4AIJfCIdFrJH2/MH86b1vsl20/bftLtm9Y4boAFpl/dV5DDw1p/vy8Fi4sSJIWLixo/nzW/sr5V0quEABQk0Jgc522WDT/dUk/ExHvkPR7kiZXsG62oL3f9rTt6bNnz666WKBbTJyaUDWqdfuqUdXEyYk2VwQAaCSFwHZa0nWF+WslzRUXiIifRMQr+fSUpE22r2xm3cJrHI6ISkRUtm3btp71Ax1p5tzM63vWFlu4sKDZl2fbXBEAoJEUAtuTkgZsv812r6Q7JB0rLmD7LbadT9+irO5zzawLoL6BrQPq29RXt69vU592btnZ5ooAAI2UHtgi4jVJ90h6VNJzkv44Ik7Zvtv23flivybppO2nJX1K0h2Rqbtu+98F0HlGbxhVj+t/BfS4R6M3jra5IgBAI46oO+Srq1UqlZieni67DKB09c4S7XEPZ4kCQElsn4iIyuL2FC7rAaAku3bs0tx9c5o4OaHZl2e1c8tOjd44qs29m8suDQBQQGADNrjNvZu17+Z9ZZcBAFhC6WPYAAAAsDQCGwAAQOI4JAoAieC+rgAaIbABQAK4ryuApXBIFABKxn1dASyHwAYAJeO+rgCWQ2ADgJJxX1cAyyGwAUDJuK8rgOUQ2ACgZNzXFcByCGwAULL+K/o1deeU+nv7X9/T1repT/29WTu3CgPAZT0AIAHc1xVIUIQ0OSkND0v28u0tRGADgERwX1cgMZOT0siINDYmHTqUhbMI6cABaXxcOnJE2ru3LaVwSBTNi5COHs2em2kHAKCTDQ9nYW18PAtpxbA2Npb1twmBDc2r/aVR+9BKFz+8IyNZPwAA3cLO9qzVQltPz8WwVtvj1iYENjQvob80AABoi1poK2pzWJMIbFiJhP7SAACgLWo7J4qKR5rahMCGlUnkLw0AAFpu8ZGkavXSI01tQmDDyiTylwYAAC03OXnpkaTikaY2jt0msKF5Cf2lAQBAyw0PZ5fuKB5JqoW2I0faOnab67CheY3+0pCy9sHBtl2PBgCAlrPr/15r1N5CBDY0r/aXRvHKzrXQNjjIWaIAALQIgQ3NS+gvDQAANhLGsAEAACSOwAYAAJA4AhsAAEDiCGwAAACJI7ABAAAkjsAGAACQuCQCm+3bbD9ve9b2wTr9d9p+Jn981fY7Cn3fsf2s7adsT7e3cgAAgNYr/Tpsti+TdL+k3ZJOS3rS9rGI+EZhsf8jaTAifmT7dkmHJf1iof/WiPhh24oGAABooxT2sN0iaTYiXoiI85IelrSnuEBEfDUifpTPfk3StW2uEQAAoDQpBLZrJH2/MH86b2tkn6QvFeZD0pdtn7C9v9FKtvfbnrY9ffbs2TUVDAAA0E6lHxKV5DptUXdB+1ZlgW1XofldETFn+ypJj9n+ZkQ8cckLRhxWdihVlUql7usDAACkKIU9bKclXVeYv1bS3OKFbP+CpM9J2hMR52rtETGXP78k6aiyQ6wAAABdI4XA9qSkAdtvs90r6Q5Jx4oL2N4h6YikD0XEtwrtfbb7a9OS3i/pZNsqBwAAaIPSD4lGxGu275H0qKTLJD0QEads3533f0bSb0vaKukPbEvSaxFRkXS1pKN52+WS/igi/qyEtwEAANAyjth4w7kqlUpMT3PJNgAAkBbbJ/KdUm+QwiFRAAAALIHABgAAkDgCGwAAQOIIbAAAYOOKkI4ezZ6baS8JgQ0AAGxck5PSyIh04MDFcBaRzY+MZP0JKP2yHgAAAKUZHpbGxqTx8Wz+0KEsrI2PZ+3Dw+XWlyOwAQCAjcvOQpqUhbRacBsby9pd7w6a7cd12AAAACKknsJIsWq1lLDGddgAAADqqY1ZKyqOaUsAgQ0AAGxctbBWG7NWrV4c05ZQaGMMGwAAWBfzr85r4tSEZs7NaGDrgEZvGFX/Ff1ll7W0ycmLYa0
"text/plain": [
"<Figure size 691.2x388.8 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Przygotowanie interaktywnego wykresu\n",
"\n",
"slider_k = widgets.IntSlider(min=1, max=10, step=1, value=1, description=r'$k$', width=300)\n",
"\n",
"def interactive_knn_2(k):\n",
" fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
" plot_voronoi(fig, X2[:, 1:])\n",
" plot_knn(fig, X2[:, 1:], Y2, k)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "db278f40e7fa4fdc90a9fe2f15a917c6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=1, description='$k$', max=10, min=1), Button(description='Run Interact',…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<function __main__.interactive_knn_2(k)>"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"widgets.interact_manual(interactive_knn_2, k=slider_k)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Algorytm $k$ najbliższych sąsiadów dla problemu regresji\n",
"\n",
"1. Dany jest zbiór uczący zawierajacy przykłady $(x_i, y_i)$, gdzie: $x_i$ zestaw cech, $y_i$ liczba rzeczywista.\n",
"1. Dany jest przykład testowy $x'$, dla którego chcemy określić klasę.\n",
"1. Oblicz odległość $d(x', x_i)$ dla każdego przykładu $x_i$ ze zbioru uczącego.\n",
"1. Wybierz $k$ przykładów $x_{i_1}, \\ldots, x_{i_k}$, dla których wyliczona odległość jest najmniejsza.\n",
"1. Jako wynik $y'$ zwróć średnią liczb $y_{i_1}, \\ldots, y_{i_k}$:\n",
" $$ y' = \\frac{1}{k} \\sum_{j=1}^{k} y_{i_j} $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Wybór $k$\n",
"\n",
"* Wartość $k$ ma duży wpływ na wynik działania algorytmu KNN:\n",
" * Jeżeli $k$ jest zbyt duże, wszystkie nowe przykłady są klasyfikowane jako klasa większościowa.\n",
" * Jeżeli $k$ jest zbyt małe, granice klas są niestabilne, a algorytm jest bardzo podatny na obserwacje odstające.\n",
"* Aby dobrać optymalną wartość $k$, najlepiej użyć zbioru walidacyjnego."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Miary podobieństwa"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Odległość euklidesowa\n",
"$$ d(x, x') = \\sqrt{ \\sum_{i=1}^n \\left( x_i - x'_i \\right) ^2 } $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Dobry wybór w przypadku numerycznych cech.\n",
"* Symetryczna, traktuje wszystkie wymiary jednakowo.\n",
"* Wrażliwa na duże wahania jednej cechy."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Odległość Hamminga\n",
"$$ d(x, x') = \\sum_{i=1}^n \\mathbf{1}_{x_i \\neq x'_i} $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Dobry wybór w przypadku cech zero-jedynkowych.\n",
"* Liczba cech, którymi różnią się dane przykłady."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Odległość Minkowskiego ($p$-norma)\n",
"$$ d(x, x') = \\sqrt[p]{ \\sum_{i=1}^n \\left| x_i - x'_i \\right| ^p } $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Dla $p = 2$ jest to odległość euklidesowa.\n",
"* Dla $p = 1$ jest to odległość taksówkowa.\n",
"* Jeżeli $p \\to \\infty$, to $p$-norma zbliża się do logicznej alternatywy.\n",
"* Jeżeli $p \\to 0$, to $p$-norma zbliża się do logicznej koniunkcji."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### KNN praktyczne porady"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Co zrobić z remisami?\n",
" * Można wybrać losową klasę.\n",
" * Można wybrać klasę o wyższym prawdopodobieństwie _a priori_.\n",
" * Można wybrać klasę wskazaną przez algorytm 1NN."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* KNN źle radzi sobie z brakującymi wartościami cech (nie można wówczas sensownie wyznaczyć odległości)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 8.3. Drzewa decyzyjne"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Drzewa decyzyjne przykład"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"outputs": [],
"source": [
"# Przydatne importy\n",
"\n",
"import ipywidgets as widgets\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Day Outlook Humidity Wind Play\n",
"0 1 Sunny High Weak No\n",
"1 2 Sunny High Strong No\n",
"2 3 Overcast High Weak Yes\n",
"3 4 Rain High Weak Yes\n",
"4 5 Rain Normal Weak Yes\n",
"5 6 Rain Normal Strong No\n",
"6 7 Overcast Normal Strong Yes\n",
"7 8 Sunny High Weak No\n",
"8 9 Sunny Normal Weak Yes\n",
"9 10 Rain Normal Weak Yes\n",
"10 11 Sunny Normal Strong Yes\n",
"11 12 Overcast High Strong Yes\n",
"12 13 Overcast Normal Weak Yes\n",
"13 14 Rain High Strong No\n"
]
}
],
"source": [
"alldata = pandas.read_csv('tennis.tsv', sep='\\t')\n",
"print(alldata)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'Outlook': {'Overcast', 'Rain', 'Sunny'},\n",
" 'Humidity': {'High', 'Normal'},\n",
" 'Wind': {'Strong', 'Weak'}}"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Dane jako lista słowników\n",
"data = alldata.T.to_dict().values()\n",
"features = ['Outlook', 'Humidity', 'Wind']\n",
"\n",
"# Możliwe wartości w poszczególnych kolumnach\n",
"values = {feature: set(row[feature] for row in data)\n",
" for feature in features}\n",
"values"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Czy John zagra w tenisa, jeżeli będzie padać, przy wysokiej wilgotności i silnym wietrze?\n",
"* Algorytm drzew decyzyjnych spróbuje _zrozumieć_ „taktykę” Johna.\n",
"* Wykorzystamy metodę „dziel i zwyciężaj”."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# Podziel dane\n",
"def split(features, data):\n",
" values = {feature: list(set(row[feature]\n",
" for row in data))\n",
" for feature in features}\n",
" if not features:\n",
" return data\n",
" return {val: split(features[1:],\n",
" [row for row in data\n",
" if row[features[0]] == val])\n",
" for val in values[features[0]]}"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 1:\tSunny\tHigh\tWeak\tNo\n",
"Day 2:\tSunny\tHigh\tStrong\tNo\n",
"Day 8:\tSunny\tHigh\tWeak\tNo\n",
"Day 9:\tSunny\tNormal\tWeak\tYes\n",
"Day 11:\tSunny\tNormal\tStrong\tYes\n",
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 4:\tRain\tHigh\tWeak\tYes\n",
"Day 5:\tRain\tNormal\tWeak\tYes\n",
"Day 6:\tRain\tNormal\tStrong\tNo\n",
"Day 10:\tRain\tNormal\tWeak\tYes\n",
"Day 14:\tRain\tHigh\tStrong\tNo\n",
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 3:\tOvercast\tHigh\tWeak\tYes\n",
"Day 7:\tOvercast\tNormal\tStrong\tYes\n",
"Day 12:\tOvercast\tHigh\tStrong\tYes\n",
"Day 13:\tOvercast\tNormal\tWeak\tYes\n"
]
}
],
"source": [
"split_data = split(['Outlook'], data)\n",
"\n",
"for outlook in values['Outlook']:\n",
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
" for row in split_data[outlook]:\n",
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
" .format(**row))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Obserwacja: John lubi grać, gdy jest pochmurnie.\n",
"\n",
"W pozostałych przypadkach podzielmy dane ponownie:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 1:\tSunny\tHigh\tWeak\tNo\n",
"Day 2:\tSunny\tHigh\tStrong\tNo\n",
"Day 8:\tSunny\tHigh\tWeak\tNo\n",
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 9:\tSunny\tNormal\tWeak\tYes\n",
"Day 11:\tSunny\tNormal\tStrong\tYes\n"
]
}
],
"source": [
"split_data_sunny = split(['Outlook', 'Humidity'], data)\n",
"\n",
"for humidity in values['Humidity']:\n",
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
" for row in split_data_sunny['Sunny'][humidity]:\n",
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
" .format(**row))"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 6:\tRain\tNormal\tStrong\tNo\n",
"Day 14:\tRain\tHigh\tStrong\tNo\n",
"\n",
"\tOutlook\tHumid\tWind\tPlay\n",
"Day 4:\tRain\tHigh\tWeak\tYes\n",
"Day 5:\tRain\tNormal\tWeak\tYes\n",
"Day 10:\tRain\tNormal\tWeak\tYes\n"
]
}
],
"source": [
"split_data_rain = split(['Outlook', 'Wind'], data)\n",
"\n",
"for wind in values['Wind']:\n",
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
" for row in split_data_rain['Rain'][wind]:\n",
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
" .format(**row))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* Outlook=\n",
" * Overcast\n",
" * → Playing\n",
" * Sunny\n",
" * Humidity=\n",
" * High\n",
" * → Not playing\n",
" * Normal\n",
" * → Playing\n",
" * Rain\n",
" * Wind=\n",
" * Weak\n",
" * → Playing\n",
" * Strong\n",
" * → Not playing"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* (9/5)\n",
" * Outlook=Overcast (4/0)\n",
" * YES\n",
" * Outlook=Sunny (2/3)\n",
" * Humidity=High (0/3)\n",
" * NO\n",
" * Humidity=Normal (2/0)\n",
" * YES\n",
" * Outlook=Rain (3/2)\n",
" * Wind=Weak (3/0)\n",
" * YES\n",
" * Wind=Strong (0/2)\n",
" * NO"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Algorytm ID3"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Pseudokod algorytmu:\n",
"\n",
"* podziel(węzeł, zbiór przykładów):\n",
" 1. A ← najlepszy atrybut do podziału zbioru przykładów\n",
" 1. Dla każdej wartości atrybutu A, utwórz nowy węzeł potomny\n",
" 1. Podziel zbiór przykładów na podzbiory według węzłów potomnych\n",
" 1. Dla każdego węzła potomnego i podzbioru:\n",
" * jeżeli podzbiór jest jednolity: zakończ\n",
" * w przeciwnym przypadku: podziel(węzeł potomny, podzbiór)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Jak wybrać „najlepszy atrybut”?\n",
"* powinien zawierać jednolity podzbiór\n",
"* albo przynajmniej „w miarę jednolity”\n",
"\n",
"Skąd wziąć miarę „jednolitości” podzbioru?\n",
"* miara powinna być symetryczna (4/0 vs. 0/4)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Entropia\n",
"\n",
"$$ H(S) = - p_{(+)} \\log p_{(+)} - p_{(-)} \\log p_{(-)} $$\n",
"\n",
"* $S$ podzbiór przykładów\n",
"* $p_{(+)}$, $p_{(-)}$ procent pozytywnych/negatywnych przykładów w $S$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Entropię można traktować jako „liczbę bitów” potrzebną do sprawdzenia, czy losowo wybrany $x \\in S$ jest pozytywnym, czy negatywnym przykładem."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Przykład:\n",
"\n",
"* (3 TAK / 3 NIE):\n",
"$$ H(S) = -\\frac{3}{6} \\log\\frac{3}{6} - \\frac{3}{6} \\log\\frac{3}{6} = 1 \\mbox{ bit} $$\n",
"* (4 TAK / 0 NIE):\n",
"$$ H(S) = -\\frac{4}{4} \\log\\frac{4}{4} - \\frac{0}{4} \\log\\frac{0}{4} = 0 \\mbox{ bitów} $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### *Information gain*\n",
"\n",
"*Information gain* różnica między entropią przed podziałem a entropią po podziale (podczas podziału entropia zmienia się):\n",
"\n",
"$$ \\mathop{\\rm Gain}(S,A) = H(S) - \\sum_{V \\in \\mathop{\\rm Values(A)}} \\frac{|S_V|}{|S|} H(S_V) $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Przykład:\n",
"\n",
"$$ \\mathop{\\rm Gain}(S, Wind) = H(S) - \\frac{8}{14} H(S_{Wind={\\rm Weak}}) - \\frac{6}{14} H(S_{Wind={\\rm Strong}}) = \\\\\n",
"= 0.94 - \\frac{8}{14} \\cdot 0.81 - \\frac{6}{14} \\cdot 1.0 = 0.049 $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* _Information gain_ jest całkiem sensowną heurystyką wskazującą, który atrybut jest najlepszy do dokonania podziału.\n",
"* **Ale**: _information gain_ przeszacowuje użyteczność atrybutów, które mają dużo różnych wartości.\n",
"* **Przykład**: gdybyśmy wybrali jako atrybut *datę*, otrzymalibyśmy bardzo duży *information gain*, ponieważ każdy podzbiór byłby jednolity, a nie byłoby to ani trochę użyteczne!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### _Information gain ratio_\n",
"\n",
"$$ \\mathop{\\rm GainRatio}(S, A) = \\frac{ \\mathop{\\rm Gain}(S, A) }{ -\\sum_{V \\in \\mathop{\\rm Values}(A)} \\frac{|S_V|}{|S|} \\log\\frac{|S_V|}{|S|} } $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* _Information gain ratio_ może być lepszym wyborem heurystyki wskazującej najużyteczniejszy atrybut, jeżeli atrybuty mają wiele różnych wartości."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Drzewa decyzyjne a formuły logiczne\n",
"\n",
"Drzewo decyzyjne można pzekształcić na formułę logiczną w postaci normalnej (DNF):\n",
"\n",
"$$ Play={\\rm True} \\Leftrightarrow \\left( Outlook={\\rm Overcast} \\vee \\\\\n",
"( Outlook={\\rm Rain} \\wedge Wind={\\rm Weak} ) \\vee \\\\\n",
"( Outlook={\\rm Sunny} \\wedge Humidity={\\rm Normal} ) \\right) $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Klasyfikacja wieloklasowa przy użyciu drzew decyzyjnych\n",
"\n",
"Algorytm przebiega analogicznie, zmienia się jedynie wzór na entropię:\n",
"\n",
"$$ H(S) = -\\sum_{y \\in Y} p_{(y)} \\log p_{(y)} $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Skuteczność algorytmu ID3\n",
"\n",
"* Przyjmujemy, że wśród danych uczących nie ma duplikatów (tj. przykładów, które mają jednakowe cechy $x$, a mimo to należą do różnych klas $y$).\n",
"* Wówczas algorytm drzew decyzyjnych zawsze znajdzie rozwiązanie, ponieważ w ostateczności będziemy mieli węzły 1-elementowe na liściach drzewa."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Nadmierne dopasowanie drzew decyzyjnych\n",
"\n",
"* Zauważmy, że w miarę postępowania algorytmu dokładność przewidywań drzewa (*accuracy*) liczona na zbiorze uczącym dąży do 100% (i w ostateczności osiąga 100%, nawet kosztem jednoelementowych liści).\n",
"* Takie rozwiązanie niekoniecznie jest optymalne. Dokładność na zbiorze testowym może być dużo niższa, a to oznacza nadmierne dopasowanie."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Jak zapobiec nadmiernemu dopasowaniu?\n",
"\n",
"Aby zapobiegać nadmiernemu dopasowaniu drzew decyzyjnych, należy je przycinać (*pruning*)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Można tego dokonywać na kilka sposobów:\n",
"* Można zatrzymywać procedurę podziału w pewnym momencie (np. kiedy podzbiory staja się zbyt małe).\n",
"* Można najpierw wykonać algorytm ID3 w całości, a następnie przyciąć drzewo, np. kierując się wynikami uzyskanymi na zbiorze walidacyjnym.\n",
"* Algorytm _sub-tree replacement pruning_ (algorytm zachłanny)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Algorytm _Sub-tree replacement pruning_\n",
"\n",
"1. Dla każdego węzła:\n",
" 1. Udaj, że usuwasz węzeł wraz z całym zaczepionym w nim poddrzewem.\n",
" 1. Dokonaj ewaluacji na zbiorze walidacyjnym.\n",
"1. Usuń węzeł, którego usunięcie daje największą poprawę wyniku.\n",
"1. Powtarzaj, dopóki usuwanie węzłów poprawia wynik."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Zalety drzew decyzyjnych\n",
"\n",
"* Zasadę działania drzew decyzyjnych łatwo zrozumieć człowiekowi.\n",
"* Atrybuty, które nie wpływają na wynik, mają _gain_ równy 0, zatem są od razu pomijane przez algorytm.\n",
"* Po zbudowaniu, drzewo decyzyjne jest bardzo szybkim klasyfikatorem (złożoność $O(d)$, gdzie $d$ jest głębokościa drzewa)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Wady drzew decyzyjnych\n",
"\n",
"* ID3 jest algorytmem zachłannym może nie wskazać najlepszego drzewa.\n",
"* Nie da się otrzymać granic klas (*decision boundaries*), które nie są równoległe do osi wykresu."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Lasy losowe"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Algorytm lasów losowych idea\n",
"\n",
"* Algorytm lasów losowych jest rozwinięciem algorytmu ID3.\n",
"* Jest to bardzo wydajny algorytm klasyfikacji.\n",
"* Zamiast jednego, będziemy budować $k$ drzew."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Algorytm lasów losowych budowa lasu\n",
"\n",
"1. Weź losowy podzbiór $S_r$ zbioru uczącego.\n",
"1. Zbuduj pełne (tj. bez przycinania) drzewo decyzyjne dla $S_r$, używając algorytmu ID3 z następującymi modyfikacjami:\n",
" * podczas podziału używaj losowego $d$-elementowego podzbioru atrybutów,\n",
" * obliczaj _gain_ względem $S_r$.\n",
"1. Powyższą procedurę powtórz $k$-krotnie, otrzymując $k$ drzew ($T_1, T_2, \\ldots, T_k$)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Algorytm lasów losowych predykcja\n",
"\n",
"1. Sklasyfikuj $x$ według każdego z drzew $T_1, T_2, \\ldots, T_k$ z osobna.\n",
"1. Użyj głosowania większościowego: przypisz klasę przewidzianą przez najwięcej drzew."
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
},
"livereveal": {
"start_slideshow_at": "selected",
"theme": "white"
}
},
"nbformat": 4,
"nbformat_minor": 4
}