2915 lines
632 KiB
Plaintext
2915 lines
632 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### AITech — Uczenie maszynowe\n",
|
|||
|
"# 8. Przegląd metod uczenia nadzorowanego"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 8.1. Naiwny klasyfikator bayesowski"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Naiwny klasyfikator bayesowski jest algorytmem dla problemu klasyfikacji wieloklasowej.\n",
|
|||
|
"* Naszym celem jest znalezienie funkcji uczącej $f \\colon x \\mapsto y$, gdzie $y$ oznacza jedną ze zdefiniowanych wcześniej klas.\n",
|
|||
|
"* Klasyfikacja probabilistyczna polega na wskazaniu klasy o najwyższym prawdopodobieństwie:\n",
|
|||
|
"$$ \\hat{y} = \\mathop{\\arg \\max}_y P( y \\,|\\, x ) $$\n",
|
|||
|
"* Naiwny klasyfikator bayesowski należy do rodziny klasyfikatorów probabilistycznych"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"<img style=\"float: right;\" src=\"https://upload.wikimedia.org/wikipedia/commons/d/d4/Thomas_Bayes.gif\">\n",
|
|||
|
"\n",
|
|||
|
"**Thomas Bayes** (wymowa: /beɪz/) (1702–1761) – angielski matematyk i duchowny"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Twierdzenie Bayesa – wzór ogólny\n",
|
|||
|
"\n",
|
|||
|
"$$ P( Y \\,|\\, X ) = \\frac{ P( X \\,|\\, Y ) \\cdot P( Y ) }{ P ( X ) } $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Twierdzenie Bayesa opisuje związek między prawdopodobieństwami warunkowymi dwóch zdarzeń warunkujących się nawzajem."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Twierdzenie Bayesa\n",
|
|||
|
"(po zastosowaniu wzoru na prawdopodobieństwo całkowite)\n",
|
|||
|
"\n",
|
|||
|
"$$ \\underbrace{P( y_k \\,|\\, x )}_\\textrm{ prawd. a posteriori } = \\frac{ \\overbrace{ P( x \\,|\\, y_k )}^\\textrm{ model klasy } \\cdot \\overbrace{P( y_k )}^\\textrm{ prawd. a priori } }{ \\underbrace{\\sum_{i} P( x \\,|\\, y_i ) \\, P( y_i )}_\\textrm{wyrażenie normalizacyjne} } $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
" * W tym przypadku „zdarzenie $x$” oznacza, że cechy wejściowe danej obserwacji przyjmują wartości opisane wektorem $x$.\n",
|
|||
|
" * „Zdarzenie $y_k$” oznacza, że dana obserwacja należy do klasy $y_k$.\n",
|
|||
|
" * **Model klasy** $y_k$ opisuje rozkład prawdopodobieństwa cech obserwacji należących do tej klasy.\n",
|
|||
|
" * **Prawdopodobieństwo *a priori*** to prawdopodobienstwo, że losowa obserwacja należy do klasy $y_k$.\n",
|
|||
|
" * **Prawdopodobieństwo *a posteriori*** to prawdopodobieństwo, którego szukamy: że obserwacja opisana wektorem cech $x$ należy do klasy $y_k$."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Rola wyrażenia normalizacyjnego w twierdzeniu Bayesa\n",
|
|||
|
"\n",
|
|||
|
" * Wartość wyrażenia normalizacyjnego nie wpływa na wynik klasyfikacji.\n",
|
|||
|
"\n",
|
|||
|
"**Przykład**: obserwacja nietypowa ma małe prawdopodobieństwo względem dowolnej klasy, wyrażenie normalizacyjne sprawia, że to prawdopodobieństwo staje się porównywalne z prawdopodobieństwami typowych obserwacji, ale nie wpływa na klasyfikację!"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Klasyfikatory dyskryminatywne a generatywne\n",
|
|||
|
"\n",
|
|||
|
"* Klasyfikatory generatywne tworzą model rozkładu prawdopodobieństwa dla każdej z klas.\n",
|
|||
|
"* Klasyfikatory dyskryminatywne wyznaczają granicę klas (*decision boundary*) bezpośrednio.\n",
|
|||
|
"* Naiwny klasyfikator bayesowski jest klasyfikatorem generatywnym (ponieważ wyznacza $P( x \\,|\\, y )$).\n",
|
|||
|
"* Wszystkie klasyfikatory generatywne są probabilistyczne, ale nie na odwrót.\n",
|
|||
|
"* Regresja logistyczna jest przykładem klasyfikatora dyskryminatywnego."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Założenie niezależności dla naiwnego klasyfikatora bayesowskiego\n",
|
|||
|
"\n",
|
|||
|
"* Naiwny klasyfikator bayesowski jest *naiwny*, ponieważ zakłada, że poszczególne cechy są niezależne od siebie:\n",
|
|||
|
"$$ P( x_1, \\ldots, x_n \\,|\\, y ) \\,=\\, \\prod_{i=1}^n P( x_i \\,|\\, x_1, \\ldots, x_{i-1}, y ) \\,=\\, \\prod_{i=1}^n P( x_i \\,|\\, y ) $$\n",
|
|||
|
"* To założenie jest bardzo przydatne ze względów obliczeniowych, ponieważ bardzo często mamy do czynienia z ogromną liczbą cech (bitmapy, słowniki itp.)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Naiwny klasyfikator bayesowski – przykład"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przydtne importy\n",
|
|||
|
"\n",
|
|||
|
"import ipywidgets as widgets\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas\n",
|
|||
|
"\n",
|
|||
|
"%matplotlib inline"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych (gatunki kosaćców)\n",
|
|||
|
"\n",
|
|||
|
"data_iris = pandas.read_csv('iris.csv')\n",
|
|||
|
"data_iris_setosa = pandas.DataFrame()\n",
|
|||
|
"data_iris_setosa['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
|
|||
|
"data_iris_setosa['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
|
|||
|
"data_iris_setosa['Iris setosa?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-setosa' else 0)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data_iris_setosa.values.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data_iris_setosa.values[:, 0:n].reshape(m, n)\n",
|
|||
|
"\n",
|
|||
|
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
|
|||
|
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"liczba przykładów: {0: 100, 1: 50}\n",
|
|||
|
"prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"classes = [0, 1]\n",
|
|||
|
"count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]\n",
|
|||
|
"prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
|
|||
|
"print('prior probability:', {c: prior_prob[c] for c in classes})"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres danych (wersja macierzowa)\n",
|
|||
|
"def plot_data_for_classification(X, Y, xlabel, ylabel): \n",
|
|||
|
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
|||
|
" ax = fig.add_subplot(111)\n",
|
|||
|
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
|||
|
" X = X.tolist()\n",
|
|||
|
" Y = Y.tolist()\n",
|
|||
|
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
|
|||
|
" \n",
|
|||
|
" ax.set_xlabel(xlabel)\n",
|
|||
|
" ax.set_ylabel(ylabel)\n",
|
|||
|
" ax.margins(.05, .05)\n",
|
|||
|
" return fig"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83DyBNhqIkNkCI4SGiQHXIRMSKBXwGLEymQEBbsWWZ0mLXiHYF4q1etbeGlXuv0+GWaila5F7FEBkCC7H4AG3h3lJJQkAiDwElJQ2IoMgkspLMOd/7xz6HOTNzztk7c/Zvn332eb/W2mtmP5zf/u7fZHG+7P3bv6+5uwAAABDOjHYHAAAAUHQkXAAAAIGRcAEAAARGwgUAABAYCRcAAEBgJFwAAACBzWp3APtr3rx5vnjx4naHAQAAMMGmTZued/f59fZ1XMK1ePFibdy4sd1hAAAATGBm2xvt45EiAABAYCRcAAAAgZFwAQAABEbCBQAAEBgJFwAAQGAkXAAAAIGRcAEAAARGwgUAABBYsITLzI40s7vN7BEz22pmg3WOOd3MfmVmWyrLZ0LFAwBAEO7SLbdEP5NsD3WOLOLAtIW8wzUm6ZPu/kZJp0i6zMyOr3PcPe7eW1k+HzAeAADSt2GDNDAgXX75eFLjHq0PDET7szhHFnFg2oKV9nH3ZyQ9U/l91MwekXSEpB+HOicAAJnr75cGB6Xh4Wh9aChKcoaHo+39/dmdI3QcmDbzDG4xmtliSf8q6UR3f6lm++mSbpa0Q9JOSX/h7lubtbVs2TKnliIAIFeqd5KqyY4UJTlDQ5JZdufIIg40ZGab3H1Z3X2hEy4zmyvpXyT9tbuPTNp3sKSyu+8ys7MkDbv7kjptrJS0UpIWLVrUt317w9qQAAC0h7s0o2akTrmcfpKT5BxZxIG6miVcQd9SNLPZiu5gfX1ysiVJ7v6Su++q/H6HpNlmNq/Ocde6+zJ3XzZ//vyQIQMAsP+qd5Zq1Y6lyuocWcSBaQn5lqJJ+oqkR9z9iw2OWVA5TmZ2ciWeF0LFBABA6mof4w0ORneUqmOp0kp2kpwjizgwbcEGzUt6u6Q/lPQjM9tS2fYpSYskyd2/LOk8SX9qZmOSXpZ0oWcxqAwAgLRs2DCe5FTHSg0NRfuGh6XTTpOWLw9/jurvIePAtGUyaD5NDJoHAOSKe5QQ9fdPHCvVaHuoc0jh40BTbR00nzYSLgAAkEdtGzQPAAAAEi4AAIDgSLgAANnrlLp/5bJ0xRXRzyTbgQZIuAAA2euUun+rV0tr10p9fePJVbkcra9dG+0HEiDhAgBkr7Y2YDXpymPdvzVrpN5eacuW8aSrry9a7+2N9gMJhJyHCwCA+ibPEVWt/Ze3un8zZkibNo0nWTNnRtt7e6PtM7hvgWSYFgIA0D6dUvevXB5PtiSpVCLZwhRMCwEAyJ9OqftXfYxYq3ZMF5AACRcAIHudUvdv8pitUmnqmC4gARIuAED2GtUGrCZdeXpLsZpsVcdsbdo0nnTxliISYgwXACB7WdQfTEO5HCVVa9ZMHWtWbzu6GrUUAQAAAmPQPAAAQBuRcAEAshdX2qdcji/9k0YbWVxLkvPkpY0iyVt/uHtHLX19fQ4A6HAjI1HKNDjoXi5H28rlaF1yX7Wq+f6RkXTayOJakpwnL20USRv6Q9JGb5C/tD2B2t+FhAsACqD2i6/6hVi7Xio1318up9NGFteS5Dx5aaNI2tAfJFwAgPyp/QKsLo3uRtTbn1YbWVxLJ7VRJBn3R7OEi7cUAQDt4zGlfeL2p9VGGtI4T17aKJIM+4O3FAEA+eMxpX3i9qfVRhrSOE9e2iiSPPVHo1tfeV14pAgABcAYrny2USSM4SLhAoCux1uK+WyjSHhLkYQLALpeuRx94U2+y1DdXio131+9w9VqG1lcS9K7U3loo0ja0B/NEi4GzQMAAKSAQfMAAABtRMIFAAAQGAkXAACNeAr1+NJoo9sUsM9IuAAAaGTDBmlgoP7cXgMD0f4s2ug2BeyzWe0OAACA3OrvlwYHpeHhaH1oKPrSHx6Otvf3Z9NGtylgn/GWIgAAzVTvrFS//KXoS39oKHmJmDTa6DYd2GfN3lIk4QIAII5T47AtOqzPmBYCAIDpqt5pqUWNw/AK1mckXAAANFL7WGtwMLrDUh1blPTLP402uk0B+4xB8wAANLJhw/iXfnXs0NBQtG94WDrtNGn58vBtdJsC9hljuAAAaMQ9+vLv7584dqjR9lBtdJsO7TMGzQMAAATGoHkAAIA2IuECAAAIjIQLAFBMSerxxR1TLrfeBvUWJ+qma61BwgUAKKYk9fjijlm9uvU2qLc4UTdday1376ilr6/PAQCIVS67Dw5G96AGB+uvxx1TKrXeRrmcTqxFUeBrlbTRG+QvbU+g9nch4QIAJFb7ZV5dJn+pxx2TRhtpxVoUBb3WZgkX00IAAIrNE9TjizsmjTbSirUoCnitTAsBAOhOnqAeX9wxabSRVqxF0U3XWtXo1ldeFx4pAgASYQxXPhX4WsUYLgBA1xkZmfolXvvlPjISf8yqVa23MTKSTqxFUeBrbZZwMYYLAFBMnqAen9T8mHPPlW69tbU2qLc4UYGvlVqKAAAAgTFoHgAAoI1IuAAAAAILlnCZ2ZFmdreZPWJmW81ssM4xZmZXm9kTZvaQmS0NFQ8AICWeQf3BJG0ge3F/t7T+LlmdJ0Mh73CNSfqku79R0imSLjOz4ycdc6akJZVlpaQvBYwHAJCGLOoPJmkD2cuqDmIR6y02en0x7UXSrZLeM2nb30u6qGb9MUmHNWuHaSEAoM2ymLsqSRvIXlZzaHXoXF1q9zxckhZL+g9JB0/afrukU2vWfyBpWbO2SLgAIAeyqD9Y0Hp7HS+rv0sH/v2bJVzBp4Uws7mS/kXSX7v7yKR935a0xt3vraz/QNIqd9806biVih45atGiRX3bt28PGjMAIAHPoP5gkjaQvaz+Lh3292/btBBmNlvSzZK+PjnZqtgh6cia9YWSdk4+yN2vdfdl7r5s/vz5YYIFACRXHU9TK+36g0naQPay+rsU7e/f6NZXq4skk3SDpL9pcszZkr5TOfYUST+Ma5dHigDQZozh6l6M4WpK7RjDJelUSS7pIUlbKstZki6VdKmPJ2XXSHpS0o8UM37LSbgAoP2yqD+YpA1kL6s6iB1ab7FZwkVpHwDA/vGYWnhp1B9M0kaOx/IUVtzfPq2/S1bnSRm1FAEAAAKjliIAAEAbkXABAAAERsIFAEiXJ6iDVy5LV1wR/azVaPt0z9NN6I9cI+ECAKQrSR281aultWulvr7x5KpcjtbXro32p3GebkJ/5Fuj1xfzujAtBADkXJI5lEol997eaFtvb/31NM7TTeiPthPTQgAAMlW9szI8PL5tcFAaGhp/nb96R2vLlvFjenulTZsmlnNp9TzdhP5oK6aFAABkzxPUwSuXpZkzx9dLpeTJ1v6cp5vQH23DtBAAgGxV77TUmlwHr3qHq1btmK60ztNN6I/cIuECAKSr9rHW4GCUQA0ORuvVL//ax4m9vdGdrd7eaD1p0pXkPN2E/si3RoO78rowaB4Aci5JHbxqrcTaAfK1A+dXrUrnPN2E/mg7MWgeAJAZT1AHzz2a+mHNmqnjjeptn+55umnsEv3RdgyaBwAACIxB8wAAAG1EwgUAGFcqScuXRz8bbS9SWZ64aymVWo8zjWvNqr/y8ncpokaDu/K6MGgeAALq748GWM+b5z42Fm0bG4vWpWh/kQa8x11LtT9aiTONa82qv/Lyd+lQajJovu0J1P4uJFwAEFBtclVNuiavF6ksT9y1jI21Hmca15pVf+Xl79KhSLgAAMnVJlnVpfaOl/vExKS6JE22qmq/zKtLO77U464ljTjz0kaezlNAzRIu3lIEAExVKkmzZo2vj41NLMEjFassT9y1pBFnXtrI03kKhrcUAQDJlUrSggUTty1YMHEgfZHK8sRdSxpx5qWNPJ2n2zS69ZXXhUeKABAQY7gYw5WHv0uHEmO4AACJ8JY
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"średnia: [matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
|
|||
|
"odchylenie standardowe: [matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n",
|
|||
|
"(1, 3)\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"XY = np.column_stack((X, Y))\n",
|
|||
|
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
|
|||
|
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
|
|||
|
"Y_split = [XY_split[c][:,3] for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"print('średnia: ', X_mean) \n",
|
|||
|
"print('odchylenie standardowe: ', X_std)\n",
|
|||
|
"\n",
|
|||
|
"print(X_std[0].shape)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Rysowanie średnich\n",
|
|||
|
"def draw_means(fig, means, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
|
|||
|
" class_color = {0: 'r', 1: 'g'}\n",
|
|||
|
" classes = range(len(means))\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" mean_x1 = [means[c].item(0, 1) for c in classes]\n",
|
|||
|
" mean_x2 = [means[c].item(0, 2) for c in classes]\n",
|
|||
|
" for c in classes:\n",
|
|||
|
" ax.plot([mean_x1[c], mean_x1[c]], [xmin, xmax],\n",
|
|||
|
" color=class_color.get(c, 'c'), linestyle='dashed')\n",
|
|||
|
" ax.plot([ymin, ymax], [mean_x2[c], mean_x2[c]],\n",
|
|||
|
" color=class_color.get(c, 'c'), linestyle='dashed') "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from scipy.stats import norm\n",
|
|||
|
"\n",
|
|||
|
"# Prawdopodobieństwo klasy dla pojedynczej cechy\n",
|
|||
|
"# Uwaga: jeżeli odchylenie standardowe dla danej cechy jest równe 0, \n",
|
|||
|
"# to nie można określić prawdopodbieństwa klasy!\n",
|
|||
|
"def prob(x, c, feature, mean, std):\n",
|
|||
|
" sd = std[c].item(0, feature)\n",
|
|||
|
" if sd == 0:\n",
|
|||
|
" print('Nie można określić prawdopodobieństwa klasy dla cechy {}.!'.format(feature))\n",
|
|||
|
" return norm(mean[c].item(0, feature), sd).pdf(x)\n",
|
|||
|
"\n",
|
|||
|
"# Prawdopodobieństwo klasy\n",
|
|||
|
"# Uwaga: tu bierzemy iloczyn dwóch cech (1. i 2.), w ogólności może być ich więcej\n",
|
|||
|
"def class_prob(x, c, mean, std, features=[1, 2]):\n",
|
|||
|
" result = 1\n",
|
|||
|
" for feature in features:\n",
|
|||
|
" result *= prob(x[feature], c, feature, mean, std)\n",
|
|||
|
" return result"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"(1, 3)\n",
|
|||
|
"[matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n",
|
|||
|
"[matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
|
|||
|
"[[1.57003335e-06 1.61965173e-23 3.09005273e-08]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"print(X_std[0].shape)\n",
|
|||
|
"print(X_std)\n",
|
|||
|
"print(X_mean)\n",
|
|||
|
"\n",
|
|||
|
"X_prob_0=class_prob(X, 0, X_mean, X_std)\n",
|
|||
|
"print(X_prob_0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres prawdopodobieństw klas\n",
|
|||
|
"def plot_prob(fig, X_mean, X_std, classes, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
|
|||
|
" class_color = {0: 'r', 1: 'g'}\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
|
|||
|
" np.arange(xmin, xmax, 0.02))\n",
|
|||
|
" for c in classes:\n",
|
|||
|
" fun1 = lambda x: prob(x, c, 1, X_mean, X_std)\n",
|
|||
|
" fun2 = lambda x: prob(x, c, 2, X_mean, X_std)\n",
|
|||
|
" p = fun1(x1) * fun2(x2)\n",
|
|||
|
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n",
|
|||
|
" colors=class_color.get(c, 'c'), lw=3)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3gU5fYH8O8kkEIJLXQI1dAlQECaEgERBAQpYsH7w4blihQVBdHrtYEIigW86BWwN6pYERW9iAUQFFBAekAghJ4AKbvn98dh2JLdZLNtUr6f55kn2ZmdmXc3y+zhfc97xhAREBEREVHhRVjdACIiIqLiioEUERERkZ8YSBERERH5iYEUERERkZ8YSBERERH5iYEUERERkZ/KWN0AZ/Hx8dKwYUOrm0FERETkYv369ekiUt19fZEKpBo2bIh169ZZ3QwiIiIiF4Zh7PW0nkN7RERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH4KWSBlGEYzwzA2Oi2nDMMYF6rzEREREYVbyG4RIyLbACQBgGEYkQAOAFgSqvMRERERhVu4hvZ6AdgpIh7vU0NERERUHIUrkLoOwHthOhcRERFRWIQ8kDIMIwrA1QA+8rJ9tGEY6wzDWHfkyJFQN4eCLGVBClIWpFjdDCKyWkqKLkSlTMhypJz0A/CriBz2tFFEXgXwKgAkJydLGNpDQVQvrp7VTSCioqAerwVUOoUjkLoeHNYrsd4e8rbVTSCiouBtXguodArp0J5hGOUAXAFgcSjPQ0RERGSFkPZIicgZANVCeQ6y1rgvtDTYrL6zLG4JEVlq3PkygbN4LaDSJRxDe1SCbTy00eomEFFRsJHXAiqdeIsYIiIiIj8xkCIiIiLyEwMpIiIiIj8xR4oCklgt0eomEFFRkMhrAZVOhkjRqYGZnJws69ats7oZRERERC4Mw1gvIsnu6zm0R0REROQnBlIUkNHLR2P08tFWN4OIrDZ6tC5EpQxzpCgg249ut7oJRFQUbOe1gEon9kgRERER+YmBFBEREZGfGEgRERER+Yk5UhSQpFpJVjeBiIqCJF4LqHRiHSkiIiKiArCOFBEREVGQMZCigIxcPBIjF4+0uhlEZLWRI3UhKmWYI0UB2X9qv9VNIKKiYD+vBVQ6sUeKiIiIyE8MpIiIiIj8xECKiIiIyE/MkaKAdKnXxeomEFFR0IXXAiqdWEeKiIiIqACsI0VEREQUZAykKCBDPxyKoR8OtboZRGS1oUN1ISplmCNFATl65qjVTSCiouAorwVUOrFHioiIiMhPDKSIiIiI/MRAioiIiMhPzJGigPRq1MvqJhBRUdCL1wIqnVhHioiIiKgArCNFREREFGQMpCgg/d7ph37v9LO6GURktX79dCEqZZgjRQE5m3PW6iYQUVFwltcCKp1C2iNlGEZlwzAWGoax1TCMPw3D4F0tiYiIqMQIdY/UCwC+EJFhhmFEASgX4vMRERERhU3IAinDMOIAXAZgFACISDaA7FCdj4iIiCjcQtkj1RjAEQDzDcNoC2A9gLEikhnCc1KYDUgcYHUTiKgoGMBrAZVOIasjZRhGMoCfAHQTkZ8Nw3gBwCkRecTteaMBjAaAhISEDnv37g1Je4iIiIj8ZUUdqf0A9ovIz+cfLwTQ3v1JIvKqiCSLSHL16tVD2BwiIiKi4ApZICUihwCkGobR7PyqXgD+CNX5yBopC1KQsiDF6mYQkdVSUnQhKmVCPWtvDIB3zs/Y2wXg5hCfj4iIiChsQhpIichGAHnGE4mIiIhKAt4ihoiIiMhPDKSIiIiI/MR77VFArm11rdVNIKKi4FpeC6h0ClkdKX8kJyfLunXrrG4GERERkQsr6khRKXAm5wzO5JyxuhlEZLUzZ3QhKmU4tEcBueqdqwAAq0atsrYhRGStq/RagFWrLG0GUbixR4qIiIjITwykiIiIiPzEQIqIiIjITwykiIiIiPzEZHMKyKikUVY3gYiKglGjrG4BkSVYR4qIiIioAKwjRSGRfiYd6WfSrW4GEVktPV0XolKGQ3sUkGEfDgPAOlJEpd4wvRawjhSVNuyRIiIiIvITAykiIiIiPzGQIiIiIvITAykiIiIiPzHZnAJyV/JdVjeBiIqCu3gtoNKJgRQFZETrEVY3gYiKghG8FlDpxKE9CkjqyVSknky1uhlEZLXUVF2IShn2SFFAblpyEwDWkSIq9W7SawHrSFFpwx4pIiIiIj8xkCIiIiLyEwMpIiIiIj8xkCIiIiLyE5PNKSD3dbnP6iYQUVFwH68FVDoxkKKADGw20OomEFFRMJDXAiqdOLRHAdmWvg3b0rdZ3Qwistq2bboQlTLskaKA3PHJHQBYR4qo1LtDrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjITyFNNjcMYw+A0wBsAHJFJDmU56Pwm3LZFKubQERFwRReC6h0CsesvctFJD0M5yEL9G7c2+omEFFR0JvXAiqdOLRHAdl4aCM2HtpodTOIyGobN+pCVMqEukdKAKwwDEMAzBWRV92fYBjGaACjASAhISHEzaFgG/fFOACsI0VU6o3TawHrSFFpE+oeqW4i0h5APwD/NAzjMvcniMirIpIsIsnVq1cPcXOIiIiIgiekgZSI/H3+ZxqAJQA6hfJ8REREROEUskDKMIzyhmFUNH8H0AfA5lCdj4iIiCjcQpkjVRPAEsMwzPO8KyJfhPB8RERERGEVskBKRHYBaBuq41PR8HSvp61uAhEVBU/zWkClUzjqSFEJ1rV+V6ubQERFQVdeC6h0Yh0pCsia1DVYk7rG6mYQkdXWrNGFqJRhjxQFZPLXkwGwjhRRqTdZrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjIT0w2p4DM6jvL6iYQUVEwi9cCKp0YSFFAkmolWd0EIioKkngtoNKJQ3sUkJW7VmLlrpVWN4OIrLZypS5EpQx7pCggT37/JACgd+PeFreEiCz1pF4L0JvXAipd2CNFRERE5CcGUkRERER+YiBFRERE5CcGUkRERER+YrI5BWTugLlWN4GIioK5vBZQ6cRAigLSLL6Z1U0goqKgGa8FVDpxaI8CsnzbcizfttzqZhCR1ZYv14WolGGPFAVk5o8zAQADmw20uCVEZKmZei3AQF4LqHRhjxQRERGRnxhIEREREfmJQ3shlJaZhlk/zcLhjMNWNyVktqVvAwDcuuxWi1sSOmUjy2JU0ih0rtfZ6qYQEVERw0DKg5/3/4ynVz+No2eOBnScTWmbkJmdidoVawepZUXPsXPHAAArdq2wuCWhc/LcScxdPxfJdZIRHRnt93HKRJTBDW1uwK3tbkVkRGQQW0hERFYxRMS3JxpGDQAx5mMR2RfsxiQnJ8u6deu8bs/MzsS53HMFHkcgWLp1KZ754RmcOHeiUG0QERw9exQ1y9dE6xqtC7Wvu9oVa2PKpVNKdImA1JOpAID6lepb3JLQycjOwIw1M7B63+qAjpOWmYZNaZsQFx2HqMgon/erF1cPj6c8jq71u/q8T4QRgcoxlWEYhj9NJSq8VL0WoH7JvRZQ6WYYxnoRSc6zvqBAyjCMqwHMBFAHQBqABgD+FJFWwW5k84uby7xP5+VZLyL4cMuHmLNuDnLtuT4fr3O9zmhfq32h25FQKQF3d7wbFaMrFnpfIm9EBEu2LsHXu772fR8IVu5aib+O/VXo813W4DJM6j4JcdFx+T6vRvkaaFq1aaGPT0RUmgQSSP0GoCeAlSLSzjCMywFcLyKjg97IOobgDs/bIowI3Jx0M9rWbOvTsRpXaYyrLrqK/yMPsQ82fwAAGNF6hMUtKbmybdlY+MfCQg01nzh3Ai/98hKOnDni0/NvaHMDejfqne9zDMNASsMUNKzc0Od2UCnygV4LMILXAiqZAgmk1olI8vmAqp2I2A3D+EVEOgW7kYltEuXlpS973NakShM0qdok2KekAKUsSAEArBq1ytJ2UF6nsk7
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"draw_means(fig, X_mean)\n",
|
|||
|
"plot_prob(fig, X_mean, X_std, classes)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Prawdopodobieństwo a posteriori\n",
|
|||
|
"def posterior_prob(x, c):\n",
|
|||
|
" normalizer = sum(class_prob(x, c, X_mean, X_std)\n",
|
|||
|
" * prior_prob[c]\n",
|
|||
|
" for c in classes)\n",
|
|||
|
" return (class_prob(x, c, X_mean, X_std) \n",
|
|||
|
" * prior_prob[c]\n",
|
|||
|
" / normalizer)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Aby teraz przewidzieć klasę $y$ dla dowolnego zestawu cech $x$, wystarczy sprawdzić, dla której klasy prawdopodobieństwo *a posteriori* jest większe:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "skip"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Funkcja klasyfikująca (funkcja predykcji)\n",
|
|||
|
"def predict_class(x):\n",
|
|||
|
" p = [posterior_prob(x, c) for c in classes]\n",
|
|||
|
" if p[1] > p[0]:\n",
|
|||
|
" return 1\n",
|
|||
|
" else:\n",
|
|||
|
" return 0"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "skip"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"1\n",
|
|||
|
"0\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"x = [1, 2.0, 0.5] # długość płatka: 2.0, szerokość płatka: 0.5\n",
|
|||
|
"y = predict_class(x)\n",
|
|||
|
"print(y) # 1 – To prawdopodobnie jest Iris setosa\n",
|
|||
|
"\n",
|
|||
|
"x = [1, 2.5, 1.0] # długość płatka: 2.5, szerokość płatka: 1.0\n",
|
|||
|
"y = predict_class(x)\n",
|
|||
|
"print(y) # 0 – To prawdopodobnie nie jest Iris setosa"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Zobaczmy, jak to wygląda na wykresie. Narysujemy w tym celu granicę między klasą 1 a 0:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "skip"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres granicy klas dla naiwnego Bayesa\n",
|
|||
|
"def plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
|
|||
|
" np.arange(ymin, ymax, 0.02))\n",
|
|||
|
" p = [posterior_prob([1, x1, x2], c) for c in classes]\n",
|
|||
|
" p_diff = p[1] - p[0]\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5QcdZn/8c8zt0wyM2EyJGRIAkkgEETWhDAiArILKCK4SQwIuuqKi7K7hx+JoIBR/K2iLi7uEnFl3eUXFVRW0Vy4CKuyK4quIiYYESFAhEQw9/vkMpfufn5/VDfTM+lbqqf6Nu/XOX0mVdVV9e1KDvPhW08/Ze4uAAAAHL66cg8AAACgWhGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAIKSGcg8g3fjx433atGnlHgYAAMAgq1ev3u7uE4aur6ggNW3aNK1atarcwwAAABjEzDZkWs+tPQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQUqRByszazWyZma01s2fN7I1Rng8AAKCUGiI+/u2SfuDul5pZk6QxEZ8PAACgZCILUmY2VtI5kq6QJHfvk9QX1fkAAABKLcpbe8dJ2ibp62b2GzNbamYtQ99kZleZ2SozW7Vt27YIhwMAADC8ogxSDZLmSPqKu58qab+kjw19k7vf6e5d7t41YcKECIcDAAAwvKIMUq9IesXdf5VcXqYgWAEAANSEyIKUu2+W9LKZzUyuOl/SM1GdDwAAoNSi/tbeNZLuSX5j70VJH4j4fAAAACUTaZBy9zWSuqI8BwAAQLnQ2RwAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkBqiPLiZrZfULSkuKebuXVGeDwAAoJQiDVJJ57r79hKcBwAAoKS4tQcAABBS1EHKJf3IzFab2VURnwsAAKCkor61d5a7bzSzoyQ9YmZr3f2x9DckA9ZVknTsscdGPBwAAIDhE+mMlLtvTP7cKmmlpNMzvOdOd+9y964JEyZEORwAAIBhFVmQMrMWM2tL/VnSBZKejup8AAAApRblrb2JklaaWeo8/+nuP4jwfAAAACUVWZBy9xclzYrq+AAAAOVG+wMAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQmoo9I1mdpSk5tSyu/8xkhEBAABUibwzUmY218xekPSSpJ9KWi/pvwo9gZnVm9lvzOz7oUcJAABQgQq5tfcZSWdIet7dp0s6X9L/HsY5Fkl6NsTYAAAAKlohQarf3XdIqjOzOnd/VNLsQg5uZlMkXSxpaRFjBAAAqEiF1EjtNrNWSY9JusfMtkqKFXj8L0q6QVJbyPEBAABUrEJmpOZJOijpWkk/kPQHSW/Pt5OZvV3SVndfned9V5nZKjNbtW3btgKGAwAAUBkKCVLvcve4u8fc/W53/5KCWaZ8zpI018zWS/qOpPPM7FtD3+Tud7p7l7t3TZgw4bAGDwAAUE6FBKlLzew9qQUzu0NS3sTj7ovdfYq7T5P0Lkk/dvf3hh4pAABAhSmkRmqBpAfMLCHpbZJ2uvvV0Q4LAACg8mUNUmbWkbb4QUn3KWh7cLOZdbj7zkJP4u4/kfSTkGMEAACoSLlmpFZLckmW9vPi5MslHRf56AAAACpY1iCVbL4pM2t29570bWbWnHkvAACAkaOQYvNfFLgOAABgRMlVI9UpabKk0WZ2qoJbe5I0VtKYEowNAACgouWqkXqrpCskTZF0W9r6bkkfj3BMAAAAVSFXjdTdku42s0vcfXkJxwQAAFAV8vaRcvflZnaxpNdKak5bf3OUAwMAAKh0eYvNzezfJV0u6RoFdVLvlDQ14nEBAABUvEK+tXemu/+1pF3u/mlJb5R0TLTDAgAAqHyFBKmDyZ8HzGySpH5J06MbEgAAQHUo5Fl73zezdklfkPSkgq7mSyMdFQAAQBUoJEh9XlJrsuj8+5Ka3X1PxOMCAACoeLkaci5I/nGKpAVm9qW0bXL3FVEPDgAAoJLlmpH6y7Q/b5b0JUmPJJddEkEKAACMaLkacn4gfdnMLnX3ZdEPCQAAoDoU0kfqyORtvU+Y2Wozu93MjizB2AAAACpaIe0PviNpm6QFki5N/vneKAcFAABQDQr51l6Hu38mbfmzZjY/qgEBAABUi0JmpB41s3eZWV3ydZmkh6IeGAAAQKUrJEj9raT/lNSbfH1H0nVm1m1me6McHAAAQCXLe2vP3dtKMRAAAIBqU8iMFAAAADIgSAEAAIREkAIAAAiJIAUAABBSqCBlZt8f7oEAAABUm5xByszqzewLGTZ9KKLxAAAAVI2cQcrd45JOMzMbsn5TpKMCAACoAoU8IuY3ku43s+9J2p9a6e4rIhsVAABAFSjoWXuSdkg6L22dSyJIAQCAEa2QzuYfKMVAAAAAqk3eb+2Z2Ylm9j9m9nRy+XVmdlP0QwMAAKhshbQ/+H+SFkvqlyR3f0rSu6IcFAAAQDUoJEiNcfcnhqyLRTEYAACAalJIkNpuZscrKDCXmV0qifYHAABgxCvkW3tXS7pT0klm9idJL0l6b6SjAgAAqAKFfGvvRUlvNrMWSXXu3l3Igc2sWdJjkkYlz7PM3f+hmMECAABUkkK+tRc3s89LOpAKUWb2ZAHH7pV0nrvPkjRb0oVmdkZRowUAAKgghdRI/T75vh+ZWUdyneV4vyTJA/uSi43Jl4caJQAAQAUqJEjF3P0GBW0QfmZmp6nAQJR86PEaSVslPeLuv8rwnqvMbJWZrdq2bdvhjB0AAKCsCglSJknu/l1Jl0n6uqTjCjm4u8fdfbakKZJON7NTMrznTnfvcveuCRMmFD5yAACAMiskSH0w9Qd3/72ksyUtPJyTuPtuST+RdOHh7AcAAFDJCglSx5lZmyQlHw1zl6Sn8+1kZhPMrD3559GS3ixpbfihAgAAVJZCgtQn3b3bzM6W9FZJd0v6SgH7HS3pUTN7StKvFdRIfT/8UAEAACpLIQ0548mfF0v6irvfb2afyrdT8pl8pxYxNgAAgIpWyIzUn8zsPxQUmj9sZqMK3A8AAKCmFRKILpP0Q0kXJovGOyRdH+moAAAAqkAhj4g5IGlF2vIm8dBiAAAAbtEBAACERZACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgCgnNyllSuDn4WsH+7jJxLRnr/GEaQAACin++6TFiyQrr12ILS4B8sLFgTbozz+4sXRnr/G5X3WHgAAiND8+dKiRdLttwfLS5YEIeb224P18+dHe/xbbpF
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Dla porównania: regresja logistyczna na tych samych danych"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def powerme(x1,x2,n):\n",
|
|||
|
" X = []\n",
|
|||
|
" for m in range(n+1):\n",
|
|||
|
" for i in range(m+1):\n",
|
|||
|
" X.append(np.multiply(np.power(x1,i),np.power(x2,(m-i))))\n",
|
|||
|
" return np.hstack(X)\n",
|
|||
|
"\n",
|
|||
|
"# Funkcja logistyczna\n",
|
|||
|
"def safeSigmoid(x, eps=0):\n",
|
|||
|
" y = 1.0/(1.0 + np.exp(-x))\n",
|
|||
|
" if eps > 0:\n",
|
|||
|
" y[y < eps] = eps\n",
|
|||
|
" y[y > 1 - eps] = 1 - eps\n",
|
|||
|
" return y\n",
|
|||
|
"\n",
|
|||
|
"# Funkcja hipotezy dla regresji logistycznej\n",
|
|||
|
"def h(theta, X, eps=0.0):\n",
|
|||
|
" return safeSigmoid(X*theta, eps)\n",
|
|||
|
"\n",
|
|||
|
"# Funkcja kosztu dla regresji logistycznej\n",
|
|||
|
"def J(h,theta,X,y, lamb=0):\n",
|
|||
|
" m = len(y)\n",
|
|||
|
" f = h(theta, X, eps=10**-7)\n",
|
|||
|
" j = -np.sum(np.multiply(y, np.log(f)) + \n",
|
|||
|
" np.multiply(1 - y, np.log(1 - f)), axis=0)/m\n",
|
|||
|
" if lamb > 0:\n",
|
|||
|
" j += lamb/(2*m) * np.sum(np.power(theta[1:],2))\n",
|
|||
|
" return j\n",
|
|||
|
"\n",
|
|||
|
"# Gradient funkcji kosztu\n",
|
|||
|
"def dJ(h,theta,X,y,lamb=0):\n",
|
|||
|
" g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))\n",
|
|||
|
" if lamb > 0:\n",
|
|||
|
" g[1:] += lamb/float(y.shape[0]) * theta[1:] \n",
|
|||
|
" return g\n",
|
|||
|
"\n",
|
|||
|
"# Funkcja klasyfikująca\n",
|
|||
|
"def classifyBi(theta, X):\n",
|
|||
|
" prob = h(theta, X)\n",
|
|||
|
" return prob"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przygotowanie danych dla wielomianowej regresji logistycznej\n",
|
|||
|
"\n",
|
|||
|
"data = np.matrix(data_iris_setosa)\n",
|
|||
|
"\n",
|
|||
|
"Xpl = powerme(data[:, 1], data[:, 0], n)\n",
|
|||
|
"Ypl = np.matrix(data[:, 2]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Metoda gradientu prostego dla regresji logistycznej\n",
|
|||
|
"def GD(h, fJ, fdJ, theta, X, y, alpha=0.01, eps=10**-3, maxSteps=10000):\n",
|
|||
|
" errorCurr = fJ(h, theta, X, y)\n",
|
|||
|
" errors = [[errorCurr, theta]]\n",
|
|||
|
" while True:\n",
|
|||
|
" # oblicz nowe theta\n",
|
|||
|
" theta = theta - alpha * fdJ(h, theta, X, y)\n",
|
|||
|
" # raportuj poziom błędu\n",
|
|||
|
" errorCurr, errorPrev = fJ(h, theta, X, y), errorCurr\n",
|
|||
|
" # kryteria stopu\n",
|
|||
|
" if abs(errorPrev - errorCurr) <= eps:\n",
|
|||
|
" break\n",
|
|||
|
" if len(errors) > maxSteps:\n",
|
|||
|
" break\n",
|
|||
|
" errors.append([errorCurr, theta]) \n",
|
|||
|
" return theta, errors"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"theta = [[ 4.01960795]\n",
|
|||
|
" [ 3.89499137]\n",
|
|||
|
" [ 0.18747599]\n",
|
|||
|
" [-1.3524039 ]\n",
|
|||
|
" [-2.00123783]\n",
|
|||
|
" [-0.87625505]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
|
|||
|
"theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)\n",
|
|||
|
"theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, \n",
|
|||
|
" alpha=0.1, eps=10**-7, maxSteps=100000)\n",
|
|||
|
"print(r'theta = {}'.format(theta))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres granicy klas\n",
|
|||
|
"def plot_decision_boundary(fig, theta, Xpl, xmin=0.0, xmax=7.0):\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.02),\n",
|
|||
|
" np.arange(xmin, xmax, 0.02))\n",
|
|||
|
" l = len(xx.ravel())\n",
|
|||
|
" C = powerme(yy.reshape(l, 1), xx.reshape(l, 1), n)\n",
|
|||
|
" z = classifyBi(theta, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))\n",
|
|||
|
"\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5xVdb3/8fdnZkBgALnLPVSEUhMEUlGzzE4ZdhQpL2XndLc6pSTlrfp1OdWxtOTgyS5G15OVBYimnsxqRFQGYQAFBAHB4TLcBgSG21z2/vz+2Hs7M7BvrD1r9t4zr+fjsR8ze629vuuzV+Z8/H4/67PM3QUAAIATV5LvAAAAAIoViRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQGX5DqClAQMG+KhRo/IdBgAAQCtVVVW17j7w2O0FlUiNGjVKS5cuzXcYAAAArZhZdbLtLO0BAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQUKiJlJn1MbM5ZrbWzNaY2eQwzwcAANCeykIef5akv7r7B82sq6QeIZ8PAACg3YSWSJlZb0mXSPqYJLl7g6SGsM4HAADQ3sJc2jtN0m5JvzKz5WY228zKj/2Qmd1oZkvNbOnu3btDDAcAAKBthZlIlUmaIOkn7n6upEOS7jj2Q+7+gLtPcvdJAwcODDEcAACAthVmIrVV0lZ3Xxx/P0exxAoAAKBDCC2RcvcdkraY2dj4psskvRzW+QAAANpb2Hft3STpwfgdexslfTzk8wEAALSbUBMpd18haVKY5wAAAMgXOpsDAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAJFIAAAABkUgBAAAERCIFAAAQEIkUAABAQCRSAAAAAZFIAQAABEQiBQAAEBCJFAAAQEAkUgAAAAGRSAEAAAREIgUAABAQiRQAAEBAZWEObmavSaqTFJHU5O6TwjwfAABAewo1kYq71N1r2+E8AAAA7YqlPQAAgIDCTqRc0t/MrMrMbgz5XAAAAO0q7KW9i9y9xswGSXrKzNa6+zMtPxBPsG6UpJEjR4YcDgAAQNsJdUbK3WviP3dJeljSeUk+84C7T3L3SQMHDgwzHAAAgDYVWiJlZuVm1ivxu6T3SFoV1vkAAADaW5hLe6dIetjMEuf5vbv/NcTzAQAAtKvQEil33yhpXFjjAwAA5BvtDwAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECAAAIiEQKAAAgIBIpAACAgEikAAAAAiKRAgAACIhECgAAIKCybD9oZoMkdUu8d/fNoUQEAABQJDLOSJnZlWa2XtImSQskvSbp/7I9gZmVmtlyM3sscJQAAAAFKJulvW9LukDSOnc/VdJlkp47gXNMl7QmQGwAAAAFLZtEqtHd90gqMbMSd6+QND6bwc1suKQrJM3OIUYAAICClE2N1D4z6ynpGUkPmtkuSU1Zjv/fkm6T1CtgfAAAAAUrmxmpqyQdkXSLpL9KelXS+zMdZGbvl7TL3asyfO5GM1tqZkt3796dRTgAAACFIZtE6np3j7h7k7v/xt3vU2yWKZOLJF1pZq9J+qOkd5nZ7479kLs/4O6T3H3SwIEDTyh4AACAfMomkfqgmd2QeGNm90vKmPG4+53uPtzdR0m6XtI/3f0jgSMFAAAoMNnUSE2T9KiZRSW9T9Jed/98uGEBAAAUvpSJlJn1a/H2U5LmK9b24D/NrJ+77832JO7+tKSnA8YIAABQkNLNSFVJcknW4ucV8ZdLOi306AAAAApYykQq3nxTZtbN3Y+23Gdm3ZIfBQAA0HlkU2z+fJbbAAAAOpV0NVKDJQ2T1N3MzlVsaU+Sekvq0Q6xAQAAFLR0NVLvlfQxScMl3dtie52kr4QYEwAAQFFIVyP1G0m/MbMPuPvcdowJAACgKGTsI+Xuc83sCklnSerWYvt/hhkYAABAoctYbG5mP5V0naSbFKuTukbSm0KOCwAAoOBlc9fehe7+75Jed/dvSZosaUS4YQEAABS+bBKpI/Gfh81sqKRGSaeGFxIAAEBxyOZZe4+ZWR9J90haplhX89mhRgUAAFAEskmkviepZ7zo/DFJ3dx9f8hxAQAAFLx0DTmnxX8dLmmamd3XYp/cfV7YwQEAABSydDNS/9ri9x2S7pP0VPy9SyKRAgAAnVq6hpwfb/nezD7o7nPCDwkAAKA4ZNNHqn98We+rZlZlZrPMrH87xAYAAFDQsml/8EdJuyVNk/TB+O8PhRkUAABAMcjmrr1+7v7tFu+/Y2ZTwwoIAACgWGQzI1VhZtebWUn8da2kx8MODAAAoNBlk0h9RtLvJdXHX3+UNMPM6szsQJjBAQAAFLKMS3vu3qs9AgEAACg22cxIAQAAIAkSKQAAgIBIpAAAAAIikQIAAAgoUCJlZo+1dSAAAADFJm0iZWalZnZPkl2fDikeAACAopE2kXL3iKSJZmbHbN8ealQAAABFIJtHxCyX9IiZ/VnSocRGd58XWlQAAABFIKtn7UnaI+ldLba5JBIpAADQqWXT2fzj7REIAABAscl4156ZjTGzf5jZqvj7c8zsa+GHBgAAUNiyaX/wc0l3SmqUJHd/SdL1YQYFAABQDLJJpHq4+wvHbGsKIxgAAIBikk0iVWtmpytWYC4z+6Ak2h8AAIBOL5u79j4v6QFJbzazbZI2SfpIqFEBAAAUgWzu2tso6d1mVi6pxN3rshnYzLpJekbSSfHzzHH3b+QSLAAAQCHJ5q69iJl9T9LhRBJlZsuyGLte0rvcfZyk8ZIuN7MLcooWAACggGRTI7U6/rm/mVm/+DZL83lJksccjL/tEn95oCgBAAAKUDaJVJO736ZYG4SFZjZRWSZE8Ycer5C0S9JT7r44yWduNLOlZrZ09+7dJxI7AABAXmWTSJkkufufJF0r6VeSTstmcHePuPt4ScMlnWdmZyf5zAPuPsndJw0cODD7yAEAAPIsm0TqU4lf3H21pIsl3XwiJ3H3fZKelnT5iRwHAABQyLJJpE4zs16SFH80zK8lrcp0kJkNNLM+8d+7S3q3pLXBQwUAACgs2SRS/8/d68zsYknvlfQbST/J4rghkirM7CVJSxSrkXoseKgAAACFJZuGnJH4zysk/cTdHzGzb2Y6KP5MvnNziA0AAKCgZTMjtc3MfqZYofkTZnZSlscBAAB0aNkkRNdKelLS5fGi8X6Sbg01KgAAgCKQzSNiDkua1+L9dvHQYgAAAJboAAAAgiKRAgAACIhECgAAICASKQAAgIBIpAAAAAIikQIAAAiIRAoAACAgEikAAICASKQAAAACIpECACCf3KWHH479zGZ7W48fjYZ7/g6ORAoAgHyaP1+aNk265ZbmpMU99n7atNj+MMe/885wz9/BZXzWHgAACNHUqdL06dKsWbH3M2fGkphZs2L
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xpl)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n",
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxU5fU/8M/JRnZICBAIq2wuVBAoIlpbtVWLfgFpXVrbr7Za2v6sotS1td9utrZqodjaxdLFtrZq2bRqa2kbEIUgBJBFIKyBJIRsZF8mM/f8/pgZsjAbd+bmzkw+79drXmHunfvcM1c0x+c591xRVRARERHRuUuwOwAiIiKiWMVEioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExKsjuA7vLy8nTs2LF2h0FERETUQ3FxcY2qDum9PaoSqbFjx2Lbtm12h0FERETUg4iU+trOpT0iIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSZYmUiIySERWish+EdknIpdZeT4iIiKivpRk8fjLAfxTVT8tIikA0i0+HxEREVGfsSyREpFsAFcCuBMAVNUBwGHV+YiIiIj6mpVLe+cBqAbwexHZISIrRCSj94dEZJGIbBORbdXV1RaGQ0RERBRZViZSSQCmA/ilql4CoAXAo70/pKrPq+pMVZ05ZMgQC8MhIiIiiiwrE6kyAGWqusXzfiXciRURERFRXLAskVLVSgAnRGSyZ9M1AD6w6nxEREREfc3qu/buBfCi5469IwC+YPH5iIiIiPqMpYmUqu4EMNPKcxARERHZhZ3NiYiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5KsHFxEjgFoAuAC4FTVmVaej4iIiKgvWZpIeVylqjV9cB4iIiKiPsWlPSIiIiKTrE6kFMC/RKRYRBZZfC4iIiKiPmX10t7lqlohIkMBrBOR/ar6dvcPeBKsRQAwevRoi8MhIiIiihxLZ6RUtcLzswrAGgCzfHzmeVWdqaozhwwZYmU4RERERBFlWSIlIhkikuX9M4BrAeyx6nxEREREfc3Kpb1hANaIiPc8f1HVf1p4PiIiIqI+ZVkipapHAEy1anwiIiIiu7H9AREREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZBITKSIiIiKTmEgRERERmcREioiIiMgkJlJEREREJjGRIiIiIjKJiRQRERGRSUykiIiIiExiIkVERERkEhMpIiIiIpOYSBERERGZxESKiIiIyCQmUkREREQmMZEiIiIiMomJFBEREZFJTKSIiIiITGIiRURERGQSEykiIiIik5hIEREREZnERIqIiIjIJCZSRERERCYxkSIiIiIyiYkUERERkUlMpIiIiIhMYiJFREREZFJSqB8UkaEAUr3vVfW4JRERERERxYigM1IiMk9EDgI4CmADgGMA/hHqCUQkUUR2iMjrpqMkIiIiikKhLO19H8BsACWqOg7ANQDePYdzLAawz0RsRERERFEtlESqU1VrASSISIKqFgKYFsrgIjISwA0AVoQRIxEREVFUCqVGql5EMgG8DeBFEakC4Axx/J8CeBhAlsn4iIiIiKJWKDNS8wG0AXgAwD8BHAZwY7CDRORGAFWqWhzkc4tEZJuIbKuurg4hHCIiIqLoEEoidZuqulTVqaovqOqzcM8yBXM5gHkicgzASwCuFpE/9/6Qqj6vqjNVdeaQIUPOKXgiIiIiO4WSSH1aRG73vhGR5wAEzXhU9TFVHamqYwHcBuC/qvo505ESERERRZlQaqQWAnhNRAwAnwRQp6r3WBsWERERUfTzm0iJSG63t3cDWAt324PviUiuqtaFehJVXQ9gvckYiYiIiKJSoBmpYgAKQLr9vMHzUgDnWR4dERERURTzm0h5mm9CRFJVtb37PhFJ9X0UERERUf8RSrH5phC3EREREfUrgWqk8gEUAEgTkUvgXtoDgGwA6X0QGxEREVFUC1QjdR2AOwGMBLC02/YmAN+wMCYiIiKimBCoRuoFAC+IyKdUdVUfxkREREQUE4L2kVLVVSJyA4CLAKR22/49KwMjIiIiinZBi81F5FcAbgVwL9x1UjcDGGNxXERERERRL5S79uao6v8COK2q3wVwGYBR1oZFREREFP1CSaTaPD9bRWQEgE4A46wLiYiIiCg2hPKsvddFZBCApwFsh7ur+QpLoyIiIiKKAaEkUj8CkOkpOn8dQKqqNlgcFxEREVHUC9SQc6HnjyMBLBSRZ7vtg6qutjo4IiIiomgWaEbqf7r9uRLAswDWed4rACZSRERE1K8Fasj5he7vReTTqrrS+pCIiIiIYkMofaQGe5b1vikixSKyXEQG90FsRERERFEtlPYHLwGoBrAQwKc9f37ZyqCIiIiIYkEod+3lqur3u71/QkQWWBUQERERUawIZUaqUERuE5EEz+sWAG9YHRgRERFRtAslkfoygL8A6PC8XgKwRESaRKTRyuCIiIiIolnQpT1VzeqLQIiIiIhiTSgzUkRERETkAxMpIiIiIpOYSBERERGZxESKiIiIyCRTiZSIvB7pQIiIiIhiTcBESkQSReRpH7u+ZFE8RERERDEjYCKlqi4AM0REem0/aWlURERERDEglEfE7ADwqoj8DUCLd6OqrrYsKiIiIqIYENKz9gDUAri62zYFwESKiIiI+rVQOpt/oS8CISIiIoo1Qe/aE5FJIvIfEdnjeX+xiDxufWhERERE0S2U9ge/AfAYgE4AUNVdAG6zMigiIiKiWBBKIpWuqu/12ua0IhgiIiKiWBJKIlUjIuPhLjCHiHwaANsfEBERUb8Xyl179wB4HsD5IlIO4CiAz1kaFREREVEMCOWuvSMAPi4iGQASVLUplIFFJBXA2wAGeM6zUlW/HU6wRERERNEklLv2XCLyIwCt3iRKRLaHMHYHgKtVdSqAaQCuF5HZYUVLREREFEVCqZHa6/ncv0Qk17NNAnweAKBuzZ63yZ6XmoqSiIiIKAqFkkg5VfVhuNsgbBSRGQgxIfI89HgngCoA61R1i4/PLBKRbSKyrbq6+lxiJyIiIrJVKImUAICqvgLgFgC/B3BeKIOrqktVpwEYCWCWiEzx8ZnnVXWmqs4cMmRI6JETERER2SyUROpu7x9UdS+AKwDcdy4nUdV6AOsBXH8uxxERERFFs1ASqfNEJAsAPI+G+QOAPcEOEpEhIjLI8+c0AB8HsN98qERERETRJZRE6luq2iQiVwC4DsALAH4ZwnHDARSKyC4AW+GukXrdfKhERERE0SWUhpwuz88bAPxSVV8Vke8EO8jzTL5LwoiNiIiIKKqFMiNVLiK/hrvQ/E0RGRDicUR
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xpl)\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Inny przykład"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 24,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych (gatunki kosaćców)\n",
|
|||
|
"\n",
|
|||
|
"data_iris = pandas.read_csv('iris.csv')\n",
|
|||
|
"data_iris_versicolor = pandas.DataFrame()\n",
|
|||
|
"data_iris_versicolor['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
|
|||
|
"data_iris_versicolor['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
|
|||
|
"data_iris_versicolor['Iris versicolor?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-versicolor' else 0)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data_iris_versicolor.values.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data_iris_versicolor.values[:, 0:n].reshape(m, n)\n",
|
|||
|
"\n",
|
|||
|
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
|
|||
|
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 25,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"liczba przykładów: {0: 100, 1: 50}\n",
|
|||
|
"prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"classes = [0, 1]\n",
|
|||
|
"count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]\n",
|
|||
|
"prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
|
|||
|
"print('prior probability:', {c: prior_prob[c] for c in classes})"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 26,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83DyBNhqIkNkCI4SGiQHXIRMSKBXwGLEymQEBbsWWZ0mLXiHYF4q1etbeGlXuv0+GWaila5F7FEBkCC7H4AG3h3lJJQkAiDwElJQ2IoMgkspLMOd/7xz6HOTNzztk7c/Zvn332eb/W2mtmP5zf/u7fZHG+7P3bv6+5uwAAABDOjHYHAAAAUHQkXAAAAIGRcAEAAARGwgUAABAYCRcAAEBgJFwAAACBzWp3APtr3rx5vnjx4naHAQAAMMGmTZued/f59fZ1XMK1ePFibdy4sd1hAAAATGBm2xvt45EiAABAYCRcAAAAgZFwAQAABEbCBQAAEBgJFwAAQGAkXAAAAIGRcAEAAARGwgUAABBYsITLzI40s7vN7BEz22pmg3WOOd3MfmVmWyrLZ0LFAwBAEO7SLbdEP5NsD3WOLOLAtIW8wzUm6ZPu/kZJp0i6zMyOr3PcPe7eW1k+HzAeAADSt2GDNDAgXX75eFLjHq0PDET7szhHFnFg2oKV9nH3ZyQ9U/l91MwekXSEpB+HOicAAJnr75cGB6Xh4Wh9aChKcoaHo+39/dmdI3QcmDbzDG4xmtliSf8q6UR3f6lm++mSbpa0Q9JOSX/h7lubtbVs2TKnliIAIFeqd5KqyY4UJTlDQ5JZdufIIg40ZGab3H1Z3X2hEy4zmyvpXyT9tbuPTNp3sKSyu+8ys7MkDbv7kjptrJS0UpIWLVrUt317w9qQAAC0h7s0o2akTrmcfpKT5BxZxIG6miVcQd9SNLPZiu5gfX1ysiVJ7v6Su++q/H6HpNlmNq/Ocde6+zJ3XzZ//vyQIQMAsP+qd5Zq1Y6lyuocWcSBaQn5lqJJ+oqkR9z9iw2OWVA5TmZ2ciWeF0LFBABA6mof4w0ORneUqmOp0kp2kpwjizgwbcEGzUt6u6Q/lPQjM9tS2fYpSYskyd2/LOk8SX9qZmOSXpZ0oWcxqAwAgLRs2DCe5FTHSg0NRfuGh6XTTpOWLw9/jurvIePAtGUyaD5NDJoHAOSKe5QQ9fdPHCvVaHuoc0jh40BTbR00nzYSLgAAkEdtGzQPAAAAEi4AAIDgSLgAANnrlLp/5bJ0xRXRzyTbgQZIuAAA2euUun+rV0tr10p9fePJVbkcra9dG+0HEiDhAgBkr7Y2YDXpymPdvzVrpN5eacuW8aSrry9a7+2N9gMJhJyHCwCA+ibPEVWt/Ze3un8zZkibNo0nWTNnRtt7e6PtM7hvgWSYFgIA0D6dUvevXB5PtiSpVCLZwhRMCwEAyJ9OqftXfYxYq3ZMF5AACRcAIHudUvdv8pitUmnqmC4gARIuAED2GtUGrCZdeXpLsZpsVcdsbdo0nnTxliISYgwXACB7WdQfTEO5HCVVa9ZMHWtWbzu6GrUUAQAAAmPQPAAAQBuRcAEAshdX2qdcji/9k0YbWVxLkvPkpY0iyVt/uHtHLX19fQ4A6HAjI1HKNDjoXi5H28rlaF1yX7Wq+f6RkXTayOJakpwnL20USRv6Q9JGb5C/tD2B2t+FhAsACqD2i6/6hVi7Xio1318up9NGFteS5Dx5aaNI2tAfJFwAgPyp/QKsLo3uRtTbn1YbWVxLJ7VRJBn3R7OEi7cUAQDt4zGlfeL2p9VGGtI4T17aKJIM+4O3FAEA+eMxpX3i9qfVRhrSOE9e2iiSPPVHo1tfeV14pAgABcAYrny2USSM4SLhAoCux1uK+WyjSHhLkYQLALpeuRx94U2+y1DdXio131+9w9VqG1lcS9K7U3loo0ja0B/NEi4GzQMAAKSAQfMAAABtRMIFAAAQGAkXAACNeAr1+NJoo9sUsM9IuAAAaGTDBmlgoP7cXgMD0f4s2ug2BeyzWe0OAACA3OrvlwYHpeHhaH1oKPrSHx6Otvf3Z9NGtylgn/GWIgAAzVTvrFS//KXoS39oKHmJmDTa6DYd2GfN3lIk4QIAII5T47AtOqzPmBYCAIDpqt5pqUWNw/AK1mckXAAANFL7WGtwMLrDUh1blPTLP402uk0B+4xB8wAANLJhw/iXfnXs0NBQtG94WDrtNGn58vBtdJsC9hljuAAAaMQ9+vLv7584dqjR9lBtdJsO7TMGzQMAAATGoHkAAIA2IuECAAAIjIQLAFBMSerxxR1TLrfeBvUWJ+qma61BwgUAKKYk9fjijlm9uvU2qLc4UTdday1376ilr6/PAQCIVS67Dw5G96AGB+uvxx1TKrXeRrmcTqxFUeBrlbTRG+QvbU+g9nch4QIAJFb7ZV5dJn+pxx2TRhtpxVoUBb3WZgkX00IAAIrNE9TjizsmjTbSirUoCnitTAsBAOhOnqAeX9wxabSRVqxF0U3XWtXo1ldeFx4pAgASYQxXPhX4WsUYLgBA1xkZmfolXvvlPjISf8yqVa23MTKSTqxFUeBrbZZwMYYLAFBMnqAen9T8mHPPlW69tbU2qLc4UYGvlVqKAAAAgTFoHgAAoI1IuAAAAAILlnCZ2ZFmdreZPWJmW81ssM4xZmZXm9kTZvaQmS0NFQ8AICWeQf3BJG0ge3F/t7T+LlmdJ0Mh73CNSfqku79R0imSLjOz4ycdc6akJZVlpaQvBYwHAJCGLOoPJmkD2cuqDmIR6y02en0x7UXSrZLeM2nb30u6qGb9MUmHNWuHaSEAoM2ymLsqSRvIXlZzaHXoXF1q9zxckhZL+g9JB0/afrukU2vWfyBpWbO2SLgAIAeyqD9Y0Hp7HS+rv0sH/v2bJVzBp4Uws7mS/kXSX7v7yKR935a0xt3vraz/QNIqd9806biVih45atGiRX3bt28PGjMAIAHPoP5gkjaQvaz+Lh3292/btBBmNlvSzZK+PjnZqtgh6cia9YWSdk4+yN2vdfdl7r5s/vz5YYIFACRXHU9TK+36g0naQPay+rsU7e/f6NZXq4skk3SDpL9pcszZkr5TOfYUST+Ma5dHigDQZozh6l6M4WpK7RjDJelUSS7pIUlbKstZki6VdKmPJ2XXSHpS0o8UM37LSbgAoP2yqD+YpA1kL6s6iB1ab7FZwkVpHwDA/vGYWnhp1B9M0kaOx/IUVtzfPq2/S1bnSRm1FAEAAAKjliIAAEAbkXABAAAERsIFAEiXJ6iDVy5LV1wR/azVaPt0z9NN6I9cI+ECAKQrSR281aultWulvr7x5KpcjtbXro32p3GebkJ/5Fuj1xfzujAtBADkXJI5lEol997eaFtvb/31NM7TTeiPthPTQgAAMlW9szI8PL5tcFAaGhp/nb96R2vLlvFjenulTZsmlnNp9TzdhP5oK6aFAABkzxPUwSuXpZkzx9dLpeTJ1v6cp5vQH23DtBAAgGxV77TUmlwHr3qHq1btmK60ztNN6I/cIuECAKSr9rHW4GCUQA0ORuvVL//ax4m9vdGdrd7eaD1p0pXkPN2E/si3RoO78rowaB4Aci5JHbxqrcTaAfK1A+dXrUrnPN2E/mg7MWgeAJAZT1AHzz2a+mHNmqnjjeptn+55umnsEv3RdgyaBwAACIxB8wAAAG1EwgUAGFcqScuXRz8bbS9SWZ64aymVWo8zjWvNqr/y8ncpokaDu/K6MGgeAALq748GWM+b5z42Fm0bG4vWpWh/kQa8x11LtT9aiTONa82qv/Lyd+lQajJovu0J1P4uJFwAEFBtclVNuiavF6ksT9y1jI21Hmca15pVf+Xl79KhSLgAAMnVJlnVpfaOl/vExKS6JE22qmq/zKtLO77U464ljTjz0kaezlNAzRIu3lIEAExVKkmzZo2vj41NLMEjFassT9y1pBFnXtrI03kKhrcUAQDJlUrSggUTty1YMHEgfZHK8sRdSxpx5qWNPJ2n2zS69ZXXhUeKABAQY7gYw5WHv0uHEmO4AACJ8JY
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 27,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"średnia: [matrix([[1. , 4.906, 1.676]]), matrix([[1. , 1.464, 0.244]])]\n",
|
|||
|
"odchylenie standardowe: [matrix([[0. , 0.8214402 , 0.42263933]]), matrix([[0. , 0.17176728, 0.10613199]])]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"XY = np.column_stack((X, Y))\n",
|
|||
|
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
|
|||
|
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
|
|||
|
"Y_split = [XY_split[c][:,3] for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"print('średnia: ', X_mean) \n",
|
|||
|
"print('odchylenie standardowe: ', X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 28,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdd3gU5fYH8O8kkEIJLXQI1dAlQECaEgERBAQpYsH7w4blihQVBdHrtYEIigW86BWwN6pYERW9iAUQFFBAekAghJ4AKbvn98dh2JLdZLNtUr6f55kn2ZmdmXc3y+zhfc97xhAREBEREVHhRVjdACIiIqLiioEUERERkZ8YSBERERH5iYEUERERkZ8YSBERERH5iYEUERERkZ/KWN0AZ/Hx8dKwYUOrm0FERETkYv369ekiUt19fZEKpBo2bIh169ZZ3QwiIiIiF4Zh7PW0nkN7RERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH5iIEVERETkJwZSRERERH4KWSBlGEYzwzA2Oi2nDMMYF6rzEREREYVbyG4RIyLbACQBgGEYkQAOAFgSqvMRERERhVu4hvZ6AdgpIh7vU0NERERUHIUrkLoOwHthOhcRERFRWIQ8kDIMIwrA1QA+8rJ9tGEY6wzDWHfkyJFQN4eCLGVBClIWpFjdDCKyWkqKLkSlTMhypJz0A/CriBz2tFFEXgXwKgAkJydLGNpDQVQvrp7VTSCioqAerwVUOoUjkLoeHNYrsd4e8rbVTSCiouBtXguodArp0J5hGOUAXAFgcSjPQ0RERGSFkPZIicgZANVCeQ6y1rgvtDTYrL6zLG4JEVlq3PkygbN4LaDSJRxDe1SCbTy00eomEFFRsJHXAiqdeIsYIiIiIj8xkCIiIiLyEwMpIiIiIj8xR4oCklgt0eomEFFRkMhrAZVOhkjRqYGZnJws69ats7oZRERERC4Mw1gvIsnu6zm0R0REROQnBlIUkNHLR2P08tFWN4OIrDZ6tC5EpQxzpCgg249ut7oJRFQUbOe1gEon9kgRERER+YmBFBEREZGfGEgRERER+Yk5UhSQpFpJVjeBiIqCJF4LqHRiHSkiIiKiArCOFBEREVGQMZCigIxcPBIjF4+0uhlEZLWRI3UhKmWYI0UB2X9qv9VNIKKiYD+vBVQ6sUeKiIiIyE8MpIiIiIj8xECKiIiIyE/MkaKAdKnXxeomEFFR0IXXAiqdWEeKiIiIqACsI0VEREQUZAykKCBDPxyKoR8OtboZRGS1oUN1ISplmCNFATl65qjVTSCiouAorwVUOrFHioiIiMhPDKSIiIiI/MRAioiIiMhPzJGigPRq1MvqJhBRUdCL1wIqnVhHioiIiKgArCNFREREFGQMpCgg/d7ph37v9LO6GURktX79dCEqZZgjRQE5m3PW6iYQUVFwltcCKp1C2iNlGEZlwzAWGoax1TCMPw3D4F0tiYiIqMQIdY/UCwC+EJFhhmFEASgX4vMRERERhU3IAinDMOIAXAZgFACISDaA7FCdj4iIiCjcQtkj1RjAEQDzDcNoC2A9gLEikhnCc1KYDUgcYHUTiKgoGMBrAZVOIasjZRhGMoCfAHQTkZ8Nw3gBwCkRecTteaMBjAaAhISEDnv37g1Je4iIiIj8ZUUdqf0A9ovIz+cfLwTQ3v1JIvKqiCSLSHL16tVD2BwiIiKi4ApZICUihwCkGobR7PyqXgD+CNX5yBopC1KQsiDF6mYQkdVSUnQhKmVCPWtvDIB3zs/Y2wXg5hCfj4iIiChsQhpIichGAHnGE4mIiIhKAt4ihoiIiMhPDKSIiIiI/MR77VFArm11rdVNIKKi4FpeC6h0ClkdKX8kJyfLunXrrG4GERERkQsr6khRKXAm5wzO5JyxuhlEZLUzZ3QhKmU4tEcBueqdqwAAq0atsrYhRGStq/RagFWrLG0GUbixR4qIiIjITwykiIiIiPzEQIqIiIjITwykiIiIiPzEZHMKyKikUVY3gYiKglGjrG4BkSVYR4qIiIioAKwjRSGRfiYd6WfSrW4GEVktPV0XolKGQ3sUkGEfDgPAOlJEpd4wvRawjhSVNuyRIiIiIvITAykiIiIiPzGQIiIiIvITAykiIiIiPzHZnAJyV/JdVjeBiIqCu3gtoNKJgRQFZETrEVY3gYiKghG8FlDpxKE9CkjqyVSknky1uhlEZLXUVF2IShn2SFFAblpyEwDWkSIq9W7SawHrSFFpwx4pIiIiIj8xkCIiIiLyEwMpIiIiIj8xkCIiIiLyE5PNKSD3dbnP6iYQUVFwH68FVDoxkKKADGw20OomEFFRMJDXAiqdOLRHAdmWvg3b0rdZ3Qwistq2bboQlTLskaKA3PHJHQBYR4qo1LtDrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjITyFNNjcMYw+A0wBsAHJFJDmU56Pwm3LZFKubQERFwRReC6h0CsesvctFJD0M5yEL9G7c2+omEFFR0JvXAiqdOLRHAdl4aCM2HtpodTOIyGobN+pCVMqEukdKAKwwDEMAzBWRV92fYBjGaACjASAhISHEzaFgG/fFOACsI0VU6o3TawHrSFFpE+oeqW4i0h5APwD/NAzjMvcniMirIpIsIsnVq1cPcXOIiIiIgiekgZSI/H3+ZxqAJQA6hfJ8REREROEUskDKMIzyhmFUNH8H0AfA5lCdj4iIiCjcQpkjVRPAEsMwzPO8KyJfhPB8RERERGEVskBKRHYBaBuq41PR8HSvp61uAhEVBU/zWkClUzjqSFEJ1rV+V6ubQERFQVdeC6h0Yh0pCsia1DVYk7rG6mYQkdXWrNGFqJRhjxQFZPLXkwGwjhRRqTdZrwWsI0WlDXukiIiIiPzEQIqIiIjITwykiIiIiPzEQIqIiIjIT0w2p4DM6jvL6iYQUVEwi9cCKp0YSFFAkmolWd0EIioKkngtoNKJQ3sUkJW7VmLlrpVWN4OIrLZypS5EpQx7pCggT37/JACgd+PeFreEiCz1pF4L0JvXAipd2CNFRERE5CcGUkRERER+YiBFRERE5CcGUkRERER+YrI5BWTugLlWN4GIioK5vBZQ6cRAigLSLL6Z1U0goqKgGa8FVDpxaI8CsnzbcizfttzqZhCR1ZYv14WolGGPFAVk5o8zAQADmw20uCVEZKmZei3AQF4LqHRhjxQRERGRnxhIEREREfmJQ3shlJaZhlk/zcLhjMNWNyVktqVvAwDcuuxWi1sSOmUjy2JU0ih0rtfZ6qYQEVERw0DKg5/3/4ynVz+No2eOBnScTWmbkJmdidoVawepZUXPsXPHAAArdq2wuCWhc/LcScxdPxfJdZIRHRnt93HKRJTBDW1uwK3tbkVkRGQQW0hERFYxRMS3JxpGDQAx5mMR2RfsxiQnJ8u6deu8bs/MzsS53HMFHkcgWLp1KZ754RmcOHeiUG0QERw9exQ1y9dE6xqtC7Wvu9oVa2PKpVNKdImA1JOpAID6lepb3JLQycjOwIw1M7B63+qAjpOWmYZNaZsQFx2HqMgon/erF1cPj6c8jq71u/q8T4QRgcoxlWEYhj9NJSq8VL0WoH7JvRZQ6WYYxnoRSc6zvqBAyjCMqwHMBFAHQBqABgD+FJFWwW5k84uby7xP5+VZLyL4cMuHmLNuDnLtuT4fr3O9zmhfq32h25FQKQF3d7wbFaMrFnpfIm9EBEu2LsHXu772fR8IVu5aib+O/VXo813W4DJM6j4JcdFx+T6vRvkaaFq1aaGPT0RUmgQSSP0GoCeAlSLSzjCMywFcLyKjg97IOobgDs/bIowI3Jx0M9rWbOvTsRpXaYyrLrqK/yMPsQ82fwAAGNF6hMUtKbmybdlY+MfCQg01nzh3Ai/98hKOnDni0/NvaHMDejfqne9zDMNASsMUNKzc0Od2UCnygV4LMILXAiqZAgmk1olI8vmAqp2I2A3D+EVEOgW7kYltEuXlpS973NakShM0qdok2KekAKUsSAEArBq1ytJ2UF6nsk7
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"draw_means(fig, X_mean)\n",
|
|||
|
"plot_prob(fig, X_mean, X_std, classes)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 29,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5QcdZn/8c8zt0wyM2EyJGRIAkkgEETWhDAiArILKCK4SQwIuuqKi7K7hx+JoIBR/K2iLi7uEnFl3eUXFVRW0Vy4CKuyK4quIiYYESFAhEQw9/vkMpfufn5/VDfTM+lbqqf6Nu/XOX0mVdVV9e1KDvPhW08/Ze4uAAAAHL66cg8AAACgWhGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAIKSGcg8g3fjx433atGnlHgYAAMAgq1ev3u7uE4aur6ggNW3aNK1atarcwwAAABjEzDZkWs+tPQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQUqRByszazWyZma01s2fN7I1Rng8AAKCUGiI+/u2SfuDul5pZk6QxEZ8PAACgZCILUmY2VtI5kq6QJHfvk9QX1fkAAABKLcpbe8dJ2ibp62b2GzNbamYtQ99kZleZ2SozW7Vt27YIhwMAADC8ogxSDZLmSPqKu58qab+kjw19k7vf6e5d7t41YcKECIcDAAAwvKIMUq9IesXdf5VcXqYgWAEAANSEyIKUu2+W9LKZzUyuOl/SM1GdDwAAoNSi/tbeNZLuSX5j70VJH4j4fAAAACUTaZBy9zWSuqI8BwAAQLnQ2RwAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkBqiPLiZrZfULSkuKebuXVGeDwAAoJQiDVJJ57r79hKcBwAAoKS4tQcAABBS1EHKJf3IzFab2VURnwsAAKCkor61d5a7bzSzoyQ9YmZr3f2x9DckA9ZVknTsscdGPBwAAIDhE+mMlLtvTP7cKmmlpNMzvOdOd+9y964JEyZEORwAAIBhFVmQMrMWM2tL/VnSBZKejup8AAAApRblrb2JklaaWeo8/+nuP4jwfAAAACUVWZBy9xclzYrq+AAAAOVG+wMAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgAACIkgBQAAEBJBCgAAICSCFAAAQEgEKQAAgJAIUgAAACERpAAAAEIiSAEAAIREkAIAAAiJIAUAABASQQoAACAkghQAAEBIBCkAAICQCFIAAAAhEaQAAABCIkgBAACERJACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQmoo9I1mdpSk5tSyu/8xkhEBAABUibwzUmY218xekPSSpJ9KWi/pvwo9gZnVm9lvzOz7oUcJAABQgQq5tfcZSWdIet7dp0s6X9L/HsY5Fkl6NsTYAAAAKlohQarf3XdIqjOzOnd/VNLsQg5uZlMkXSxpaRFjBAAAqEiF1EjtNrNWSY9JusfMtkqKFXj8L0q6QVJbyPEBAABUrEJmpOZJOijpWkk/kPQHSW/Pt5OZvV3SVndfned9V5nZKjNbtW3btgKGAwAAUBkKCVLvcve4u8fc/W53/5KCWaZ8zpI018zWS/qOpPPM7FtD3+Tud7p7l7t3TZgw4bAGDwAAUE6FBKlLzew9qQUzu0NS3sTj7ovdfYq7T5P0Lkk/dvf3hh4pAABAhSmkRmqBpAfMLCHpbZJ2uvvV0Q4LAACg8mUNUmbWkbb4QUn3KWh7cLOZdbj7zkJP4u4/kfSTkGMEAACoSLlmpFZLckmW9vPi5MslHRf56AAAACpY1iCVbL4pM2t29570bWbWnHkvAACAkaOQYvNfFLgOAABgRMlVI9UpabKk0WZ2qoJbe5I0VtKYEowNAACgouWqkXqrpCskTZF0W9r6bkkfj3BMAAAAVSFXjdTdku42s0vcfXkJxwQAAFAV8vaRcvflZnaxpNdKak5bf3OUAwMAAKh0eYvNzezfJV0u6RoFdVLvlDQ14nEBAABUvEK+tXemu/+1pF3u/mlJb5R0TLTDAgAAqHyFBKmDyZ8HzGySpH5J06MbEgAAQHUo5Fl73zezdklfkPSkgq7mSyMdFQAAQBUoJEh9XlJrsuj8+5Ka3X1PxOMCAACoeLkaci5I/nGKpAVm9qW0bXL3FVEPDgAAoJLlmpH6y7Q/b5b0JUmPJJddEkEKAACMaLkacn4gfdnMLnX3ZdEPCQAAoDoU0kfqyORtvU+Y2Wozu93MjizB2AAAACpaIe0PviNpm6QFki5N/vneKAcFAABQDQr51l6Hu38mbfmzZjY/qgEBAABUi0JmpB41s3eZWV3ydZmkh6IeGAAAQKUrJEj9raT/lNSbfH1H0nVm1m1me6McHAAAQCXLe2vP3dtKMRAAAIBqU8iMFAAAADIgSAEAAIREkAIAAAiJIAUAABBSqCBlZt8f7oEAAABUm5xByszqzewLGTZ9KKLxAAAAVI2cQcrd45JOMzMbsn5TpKMCAACoAoU8IuY3ku43s+9J2p9a6e4rIhsVAABAFSjoWXuSdkg6L22dSyJIAQCAEa2QzuYfKMVAAAAAqk3eb+2Z2Ylm9j9m9nRy+XVmdlP0QwMAAKhshbQ/+H+SFkvqlyR3f0rSu6IcFAAAQDUoJEiNcfcnhqyLRTEYAACAalJIkNpuZscrKDCXmV0qifYHAABgxCvkW3tXS7pT0klm9idJL0l6b6SjAgAAqAKFfGvvRUlvNrMWSXXu3l3Igc2sWdJjkkYlz7PM3f+hmMECAABUkkK+tRc3s89LOpAKUWb2ZAHH7pV0nrvPkjRb0oVmdkZRowUAAKgghdRI/T75vh+ZWUdyneV4vyTJA/uSi43Jl4caJQAAQAUqJEjF3P0GBW0QfmZmp6nAQJR86PEaSVslPeLuv8rwnqvMbJWZrdq2bdvhjB0AAKCsCglSJknu/l1Jl0n6uqTjCjm4u8fdfbakKZJON7NTMrznTnfvcveuCRMmFD5yAACAMiskSH0w9Qd3/72ksyUtPJyTuPtuST+RdOHh7AcAAFDJCglSx5lZmyQlHw1zl6Sn8+1kZhPMrD3559GS3ixpbfihAgAAVJZCgtQn3b3bzM6W9FZJd0v6SgH7HS3pUTN7StKvFdRIfT/8UAEAACpLIQ0548mfF0v6irvfb2afyrdT8pl8pxYxNgAAgIpWyIzUn8zsPxQUmj9sZqMK3A8AAKCmFRKILpP0Q0kXJovGOyRdH+moAAAAqkAhj4g5IGlF2vIm8dBiAAAAbtEBAACERZACAAAIiSAFAAAQEkEKAAAgJIIUAABASAQpAACAkAhSAAAAIRGkAAAAQiJIAQAAhESQAgCgnNyllSuDn4WsH+7jJxLRnr/GEaQAACin++6TFiyQrr12ILS4B8sLFgTbozz+4sXRnr/G5X3WHgAAiND8+dKiRdLttwfLS5YEIeb224P18+dHe/xbbpF
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 30,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przygotowanie danych dla wielomianowej regresji logistycznej\n",
|
|||
|
"\n",
|
|||
|
"data = np.matrix(data_iris_versicolor)\n",
|
|||
|
"\n",
|
|||
|
"Xpl = powerme(data[:, 1], data[:, 0], n)\n",
|
|||
|
"Ypl = np.matrix(data[:, 2]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 31,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"theta = [[-10.68923095]\n",
|
|||
|
" [ 5.52649967]\n",
|
|||
|
" [ 5.83316957]\n",
|
|||
|
" [ -0.60707243]\n",
|
|||
|
" [ -0.46353729]\n",
|
|||
|
" [ -2.82974456]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
|
|||
|
"theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)\n",
|
|||
|
"theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, \n",
|
|||
|
" alpha=0.05, eps=10**-7, maxSteps=100000)\n",
|
|||
|
"print(r'theta = {}'.format(theta))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 32,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de3wU9bk/8M+zu7lfSEJCAgREQEBQRBOvYK3VqkUFRFusvfrT2lNtRbAK6PH0Yisc2wOlp7TnWGqt1lOpXG2lVmtB1CLIXbnfBRMgF0Lu2ezu8/tjdskm2Ruzmexu8nm/XvPazMzOzHe3Nvkw88wzoqogIiIionNni/UAiIiIiBIVgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJjlgPwF9+fr4OGTIk1sMgIiIiamfz5s2VqlrQcXlcBakhQ4Zg06ZNsR4GERERUTsicjTQcl7aIyIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJEuDlIjkiMhSEdkjIrtF5Gorj0dERETUnRwW738hgDdU9S4RSQaQbvHxiIiIiLqNZUFKRLIBfAbANwFAVZ0AnFYdj4iIiKi7WXlpbyiACgC/F5GtIrJYRDI6vklEHhCRTSKyqaKiwsLhEBEREXUtK4OUA8BlAH6jqpcCaAAwu+ObVPU5VS1V1dKCggILh0NERETUtawMUscBHFfVDd75pTCCFREREVGPYFmQUtUTAI6JyEjvohsA7LLqeERERETdzeq79r4H4GXvHXuHANxr8fGIiIiIuo2lQUpVtwEotfIYRERERLHCzuZEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJDFJEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZBKDFBEREZFJDFJEREREJjFIEREREZnEIEVERERkEoMUERERkUkMUkREREQmMUgRERERmcQgRURERGQSgxQRERGRSQxSRERERCYxSBERERGZxCBFREREZJLDyp2LyBEAdQDcAFyqWmrl8YiIiIi6k6VByut6Va3shuMQERERdSte2iMiIiIyyeogpQDeFJHNIvKAxcciIiIi6lZWX9obr6plItIPwFsiskdV1/m/wRuwHgCAwYMHWzwcIiIioq5j6RkpVS3zvp4CsALAFQHe85yqlqpqaUFBgZXDISIiIupSlgUpEckQkSzfzwBuAvCxVccjIiIi6m5WXtorBLBCRHzH+T9VfcPC4xERERF1K8uClKoeAnCJVfsnIiIiijW2PyAiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxikiIiIiExikCIiIiIyiUGKiIiIyCQGKSIiIiKTGKSIiIiITGKQIiIiIjKJQYqIiIjIJAYpIiIiIpMYpIiIiIhMYpAiIiIiMolBioiIiMgkBikiIiIikxyRvlFE+gFI9c2r6ieWjIiIiIgoQYQ9IyUik0RkP4DDAN4BcATA3yI9gIjYRWSriPzV9CiJiIiI4lAkl/aeBnAVgH2qej6AGwC8fw7HmA5gt4mxEREREcW1SIJUq6pWAbCJiE1V1wAYF8nORaQYwK0AFkcxRiIiIqK4FEmNVI2IZAJYB+BlETkFwBXh/n8B4HEAWSbHR0RERBS3IjkjNRlAE4AZAN4AcBDAbeE2EpHbAJxS1c1h3veAiGwSkU0VFRURDIeIiIgoPkQSpO5WVbequlT1D6r6SxhnmcIZD2CSiBwB8AqAz4nIHzu+SVWfU9VSVS0tKCg4p8ETERERxVIkQeouEfmKb0ZEFgEIm3hUdY6qFqvqEAB3A/inqn7V9EiJiIiI4kwkNVJTAbwmIh4AXwBQraoPWTssIiIiovgXNEiJSJ7f7P0AVsJoe/BjEclT1epID6KqawGsNTlGIiIiorgU6ozUZgAKQPxeb/VOCmCo5aMjIiIiimNBg5S3+SZEJFVVm/3XiUhq4K2IiIiIeo9Iis3/FeEyIiIiol4lVI1UEYCBANJE5FIYl/YAIBtAejeMjYiIiCiuhaqRuhnANwEUA5jvt7wOwBMWjomIiIgoIYSqkfoDgD+IyJ2quqwbx0RERESUEML2kVLVZSJyK4AxAFL9lv/YyoERERERxbuwxeYi8j8ApgH4How6qS8COM/icRERERHFvUju2rtGVb8O4LSq/gjA1QAGWTssIiIiovgXSZBq8r42isgAAK0AzrduSERERESJIZJn7f1VRHIA/AzAFhhdzRdbOioiIiKiBBBJkJoHINNbdP5XAKmqesbicRERERHFvVANOad6fywGMFVEfum3Dqq63OrBEREREcWzUGekbvf7+QSAXwJ4yzuvABikiIiIqFcL1ZDzXv95EblLVZdaPyQiIiKixBBJH6m+3st6T4rIZhFZKCJ9u2FsRERERHEtkvYHrwCoADAVwF3en5dYOSgiIiKiRBDJXXt5qvq03/xPRGSKVQMiIiIiShSRnJFaIyJ3i4jNO30JwOtWD4yIiIgo3kUSpL4N4P8AtHinVwDMFJE6Eam1cnBERERE8SzspT1VzeqOgRARERElmkjOSBERERFRAAxSRERERCYxSBERERGZxCBFREREZJKpICUif+3qgRARERElmpBBSkTsIvKzAKu+ZdF4iIiIiBJGyCClqm4AJSIiHZaXWzoqIiIiogQQySNitgJYJSKvAmjwLVTV5ZaNioiIiCgBRPSsPQBVAD7nt0wBMEgRERFRrxZJZ/N7u2MgRERERIkm7F17IjJCRN4WkY+982NF5N+tHxoRERFRfIuk/cFvAcwB0AoAqroDwN1WDoqIiIgoEUQSpNJVdWOHZS4rBkNERESUSCIJUpUiMgxGgTlE5C4AbH9AREREvV4kd+09BOA5AKNE5FMAhwF81dJRERERESWASO7aOwTgRhHJAGBT1bpIdiwiqQDWAUjxHmepqv4gmsESERERxZNI7tpzi8g8AI2+ECUiWyLYdwuAz6nqJQDGAbhFRK6KarREREREcSSSGqmd3ve9KSJ53mUS4v0AADXUe2eTvJOaGiURERFRHIokSLlU9XEYbRDeFZESRBiIvA893gbgFIC3VHVDgPc8ICKbRGRTRUXFuYydiIiIKKYiCVICAKr6ZwBfAvB7AEMj2bmqulV1HIBiAFeIyEUB3vOcqpaqamlBQUHkIyciIiKKsUiC1P2+H1R1J4AJAB4+l4Ooag2AtQBuOZftiIiIiOJZJEFqqIhkAYD30TAvAPg43EYiUiAiOd6f0wDcCGCP+aESERERxZdIgtRTqlonIhMA3AzgDwB+E8F2/QGsEZEdAD6EUSP1V/NDJSIiIoovkTTkdHtfbwXwG1VdJSI/DLeR95l8l0YxNiIiIqK4FskZqU9F5H9hFJqvFpGUCLcjIiIi6tEiCURfAvB3ALd
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xpl)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 33,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n",
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAFkCAYAAADrFNVeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeZwU9Zk/8M+3u+fsuZlhZpjhEBAQFNBBRPGI0VVDjOAVzLFJXF2zq4koRhETf7lWcU0iIbuaXYMaNW7UcGkiGk0yiAeIDKccww0DczD3PdNHPb8/qpvpOfqwumu6e+bzfr3q1VNVXVVPlzj9zLeeekqJCIiIiIjo87NEOwAiIiKieMVEioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAyyRTsAX7m5uTJu3Lhoh0FERETUS1lZWZ2I5PVdHlOJ1Lhx47B169Zoh0FERETUi1Lq+EDLeWmPiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZJCpiZRSKksptUoptV8ptU8pdbGZxyMiIiIaTDaT978CwDsicotSKhFAqsnHIyIiIho0piVSSqkMAJcD+A4AiIgDgMOs4xERERENNjMv7Y0HUAvgBaXUdqXUSqWUve+blFJ3KaW2KqW21tbWmhgOERERUWSZmUjZAFwA4Lcicj6AdgAP932TiDwrIrNEZFZeXp6J4RARERFFlpmJ1EkAJ0XkE8/8KuiJFREREdGQYFoiJSLVACqUUpM9i64CsNes4xERERENNrPv2vs+gFc8d+wdAXC7yccjIiIiGjSmJlIisgPALDOPQURERBQt7GxOREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZZDNz50qpYwBaAbgBuERklpnHIyIiIhpMpiZSHleKSN0gHIeIiIhoUPHSHhEREZFBZidSAuBdpVSZUuouk49FRERENKjMvrQ3V0QqlVIjAbynlNovIht93+BJsO4CgDFjxpgcDhEREVHkmDoiJSKVntfTANYCmD3Ae54VkVkiMisvL8/McIiIiIgiyrRESillV0qle38GcA2Az8w6HhEREdFgM/PSXj6AtUop73H+T0TeMfF4RERERIPKtERKRI4AmGHW/omIiIiije0PiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig5hIERERERnERIqIiIjIICZSRERERAYxkSIiIiIyiIkUERERkUFMpIiIiIgMYiJFREREZBATKSIiIiKDmEgRERERGcREioiIiMggJlJEREREBjGRIiIiIjKIiRQRERGRQUykiIiIiAxiIkVERERkEBMpIiIiIoOYSBEREREZxESKiIiIyCAmUkREREQGMZEiIiIiMoiJFBEREZFBTKSIiIiIDGIiRURERGQQEykiIiIig2yhvlEpNRJAsndeRE6YEhERERFRnAg6IqWUukEpdRDAUQDvAzgG4O1QD6CUsiqltiul/mI4SiIiIqIYFMqlvZ8DmAPggIicBeAqAB99jmMsArDPQGxEREREMS2URMopIvUALEopi4iUApgZys6VUsUAvgxgZRgxEhEREcWkUGqkmpRSaQA2AnhFKXUagCvE/f8awEMA0g3GR0RERBSzQhmRmg+gE8D9AN4BcBjA9cE2UkpdD+C0iJQFed9dSqmtSqmttbW1IYRDREREFBtCSaRuExG3iLhE5EUR+Q30UaZg5gK4QSl1DMCrAL6olPpD3zeJyLMiMktEZuXl5X2u4ImIiIiiKZRE6hal1De8M0qppwEEzXhEZKmIFIvIOAC3AfiHiHzTcKREREREMSaUGqmbALyplNIAfAlAg4jcY25YRERERLHPbyKllMrxmb0TwDrobQ9+ppTKEZGGUA8iIhsAbDAYIxEREVFMCjQiVQZAACif1y97JgEw3vToiIiIiGKY30TK03wTSqlkEenyXaeUSh54KyIiIqLhI5Ri849DXEZEREQ0rASqkSoAUAQgRSl1PvRLewCQASB1EGIjIiIiimmBaqSuBfAdAMUAnvJZ3grgERNjIiIiIooLgWqkXgTwolLqZhFZPYgxEREREcWFoH2kRGS1UurLAKYBSPZZ/jMzAyMiIiKKdUGLzZVS/wNgIYDvQ6+TuhXAWJPjIiIiIop5ody1d4mIfAtAo4j8FMDFAEabGxYRERFR7Aslker0vHYopUYBcAI4y7yQiIiIiOJDKM/a+4tSKgvALwBsg97VfKWpURERERHFgVASqScApHmKzv8CIFlEmk2Oi4iIiCjmBWrIeZPnx2IANymlfuOzDiKyxuzgiIiIiGJZoBGpr/j8XA3gNwDe88wLACZSRERENKwFash5u++8UuoWEVllfkhERERE8SGUPlIjPJf1fqiUKlNKrVBKjRiE2IiIiIhiWijtD14FUAvgJgC3eH5+zcygiIiIiOJBKHft5YjIz33m/0MptcCsgIiIiIjiRSgjUqVKqduUUhbP9FUAb5kdGBEREVGsCyWR+i6A/wPQ7ZleBbBYKdWqlGoxMzgiIiKiWBb00p6IpA9GIERERETxJpQRKSIiIiIaABMpIiIiIoOYSBEREREZxESKiIiIyCBDiZRS6i+RDoSIiIgo3gRMpJRSVqXULwZY9a8mxUNEREQUNwImUiLiBlCilFJ9lleZGhURERFRHAjlETHbAbyhlPoTgHbvQhFZY1pURERERHEgpGftAagH8EWfZQKAiRQRERENa6F0Nr99MAIhIiIiijdB79pTSk1SSv1dKfWZZ366UupH5odGREREFNtCaX/wOwBLATgBQER2AbjNzKCIiIiI4kEoiVSqiGzps8xlRjBERERE8SSURKpOKTUBeoE5lFK3AGD7AyIiIhr2Qrlr7x4AzwKYopQ6BeAogG+aGhURERFRHAjlrr0jAK5WStkBWESkNZQdK6WSAWwEkOQ5zioR+XE4wRIRERHFklDu2nMrpZ4A0OFNopRS20LYdzeAL4rIDAAzAVynlJoTVrREREREMSSUGqk9nve9q5TK8SxTAd4PABBdm2c2wTOJoSiJiIiIYlAoiZRLRB6C3gbhA6VUCUJMiDwPPd4B4DSA90TkkwHec5dSaqtSamttbe3niZ2IiIgoqkJJpBQAiMjrAL4K4AUA40PZuYi4RWQmgGIAs5VS5w7wnmdFZJaIzMrLyws9ciIiIqIoCyWRutP7g4jsAXApgHs/z0FEpAnABgDXfZ7tiIiIiGJZKInUeKVUOgB4Hg3zewCfBdtIKZWnlMry/JwC4GoA+42HSkRERBRbQkmkHhWRVqXUpQCuBfAigN+GsF0hgFKl1C4An0KvkfqL8VCJiIiIYksoDTndntcvA/itiLyhlPp
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xpl)\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Kiedy naiwny Bayes nie działa?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 34,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych\n",
|
|||
|
"import pandas\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"\n",
|
|||
|
"alldata = pandas.read_csv('bayes_nasty.tsv', sep='\\t')\n",
|
|||
|
"data = np.matrix(alldata)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data[:, 1:]\n",
|
|||
|
"\n",
|
|||
|
"Xbn = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
|
|||
|
"Xbnp = powerme(data[:, 1], data[:, 2], n)\n",
|
|||
|
"Ybn = np.matrix(data[:, 0]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 35,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df5Dcd33n+dd7zIyvaPUlSBgwA3MmO1OptbSJIukcEk+tYPklJkU89oYaOB1ha5Xz+S4kElKCRXGXdS0LZr21EnItm8TMUudcaaGpYyS8MLER3iR4lkoWSWVAKq+ZgcPCGR92JDa0OnczA/2+P779lb7q6Z7pnvn291c/H1Vd0/39frvn8+2enn735/P+vD/m7gIAAEB2DaTdAAAAAKyNgA0AACDjCNgAAAAyjoANAAAg4wjYAAAAMo6ADQAAIONelnYD0vDKV77Sb7vttrSbAQAAcINz5879jbvf0ry9LwO22267TWfPnk27GQAAADcws+dabWdIFAAAIOMI2AAAADKOgA0AACDjCNgAAAAyjoANAAAg4wjYAAAAMo6ADQAAIOMI2AAAADIu9YDNzD5jZi+a2YU2+83MHjazBTP7lpntiuzbZ2bPNvYdTa7VADrmLp06FfzsZHuexH1uRX6uAGxK6gGbpP9D0r419r9T0ljjcq+kP5QkM7tJ0qca+2+X9F4zu72nLQXQvdOnpXvukT74wesBh3tw+557gv15Ffe5Ffm5ArApqQds7v41SVfWOOQuSX/igb+U9LNmdqukOyQtuPv33H1Z0ucaxwLIkslJ6eBB6cSJ64HIBz8Y3D54MNifV3GfW5GfKwCbkoe1RIcl/SBy+/nGtlbbf7ndg5jZvQp66DQyMhJ/KwG0ZiYdPx5cP3EiuEhBAHL8eLA/r+I+tyI/VwA2xTwDORFmdpukL7n7jhb7vizpQXefa9x+UtKHJP2cpHe4+281tr9P0h3u/jvr/b49e/Y4i78DCXOXBiKd+vV6cQKQuM+tyM8VgDWZ2Tl339O8PfUh0Q48L+n1kduvk7S4xnYAWRMO7UVF87TyLO5zK/JzBWDD8hCwPSbpNxuzRd8o6W/d/QVJ35A0ZmZvMLMhSe9pHAsgS5rzsOr11XlaeRX3uRX5uQJ6oLpU1fT5ad1/5n5Nn59WdamadpN6x91TvUj6rKQXJK0o6DU7IOk+Sfc19puC2aDflfRtSXsi952Q9J3Gvo90+jt3797tABIyM+MuuR886F6vB9vq9eC2FOzPq7jPrcjPFRCzp557yssfL3vpYyXXA/LSx0pe/njZn3ruqbSbtimSznqL2CUTOWxJI4cNSJB7UI5icvLGPKx22/Mk7nMr8nMFxKi6VNXwsWFVl1f3qJWHylo8sqgtQ1tSaNnm5TmHDUCemUl337060Gi3PU/iPrciP1dAjCoXK6p7veW+utdVuVBJuEW9R8AGAAByZf7yvGortZb7ais1LVxZSLhFvZeHOmwAAKDHqktVVS5WNH95XmPbxjS1fUrlm8tpN6ulsW1jKg2WWgZtpcGSRreOptCq3qKHDQCwNtY4Lby5S3MaPjasQ48f0kNff0iHHj+k4WPDmrs0l3bTWpraPqUBax3CDNiApnZMJdyi3iNgAwCsjTVOC626VNXEyQlVl6vXeqxqKzVVl4PtV5evdvVYSZTZKN9c1uz+WZWHyioNliQFPWvloWB7XiccrIUhUQCrMVsRUdE1TqVgmSzWOC2MThL4D+w6sO7jzF2a08TJCdW9rtpKTaXBkg4/cViz+2c1PjIed7M1PjKuxSOLqlyoaOHKgka3jmpqx1QhgzWJgA1AK2GPSnQNy2hR15mZYNYi+gNrnF6TpzyvTsWRwB/tpYveV5ImTk70rMzGlqEtHQWTRcCQKIDVoj0q4TAYPSr9LRq0hfosWMtbnlenwgT+VjpN4O/HMhtJI2ADsFr44RwGbQMD14O1PvuQRkOfr3EaZ55X1sSRwN+PZTaSRsAGoDV6VBBijdNC9yDFkcAfRy8d1kYOG4DW2vWoELT1n9OnV/ewRnPa9u4tfE5j0XuQNpvAP7V9SoefONxyX1HLbCSNgA3Aas09KtFZgRJBW7+ZnAwmmkRnB4dB2969fZHT2A+FWjeTwB/20jXPEh2wgcKW2Ugai78DWO3UKWaJAhFFXmw8TleXr/ZNmY1eabf4OwEbgNWowwas0qrOWNiD1Is6Y+hPBGwRBGwAgI2gBwm91i5gI4cNAIAO9VOhVmQLZT0AAAAyjoAN6GfuwQSD5tSIdtsBAKkgYAP6WbhmaLT4aTgb9J57gv0AgNSRwwb0s+iaodKN9dZYMxQAMoOADehnzRXrw8CNNUMBIFMo6wEgGAYdiGRI1OsEawCQgnZlPchhQ7xIYs+fdmuG8loBQGYQsCFeJLFny3oBdL1+Y85avX49p42gDQAygxw2xIsk9mwJA+h2a4J+6EM3LvDenNO2dy9rhgJIVXWpqsrFiuYvz2ts25imtk+pfHM57WYljhw2xC8aEIT6IYk9i+tvRl+L8DWI3j52TPriF7PVZgBo6Mf1W1lLNIKALQH9mMR+6tTavVkzM+n0VvVrAA0g16pLVQ0fG1Z1ubpqX3morMUji4VcxzXTkw7MbJ+ZPWtmC2Z2tMX+3zezpxuXC2b2UzPb2tj3fTP7dmMfUVgW9GsSe3Q4ODzfLAwHR4c5QwRrADKucrGiutdb7qt7XZULlYRblK7UAzYzu0nSpyS9U9Ltkt5rZrdHj3H3f+XuO919p6QPS/oLd78SOeTNjf2rIlIkrDlI6ack9jAwCs93YGB1flga+jWABpBr85fnVVuptdxXW6lp4cpCV49XXapq+vy07j9zv6bPT6u6tLrnLstSD9gk3SFpwd2/5+7Lkj4n6a41jn+vpM8m0jJ07/Tp1knsYRCTl1minZQnaXWMWZAXFpWFYK0fA2gAuTa2bUylwVLLfaXBkka3jnb8WHOX5jR8bFiHHj+kh77+kA49fkjDx4Y1d2kurub2XBYCtmFJP4jcfr6xbRUze7mkfZK+ENnskr5iZufM7N6etRKdmZwMcrWiQUoYtM3M5GeWaCflSVodU69Lu3ff+FhpBkZFCaAB9J2p7VMasNZhyoANaGrHVEePU12qauLkhKrL1Ws9drWVmqrLwfary1dja3MvZSFga9X10O7T7V2S/lPTcOid7r5LwZDqb5vZP2z5S8zuNbOzZnb2pZde2lyL0Z5ZkFjf3KPUbntWdZKP1nxMGKw9/bS0c6f005+m35tVlAA6LRSCBlJTvrms2f2zKg+Vr/W0lQZLKg8F2zudcFCUXLgs1GF7XtLrI7dfJ2mxzbHvUdNwqLsvNn6+aGanFAyxfq35ju7+iKRHpGCW6OabjULrdI3NVsfs3CmdOxfksKVd0ywMlDvdjhutV8eu1czfLJZ3AXJqfGRci0cWVblQ0cKVBY1uHdXUjqmuZofGnQuXliz0sH1D0piZvcHMhhQEZY81H2RmPyNpr6QvRraVzKwcXpf0dkkXEmk1iq+T2ZWtjgmDteh+erPyaSMzf7O02gc9hCiALUNbdGDXAT341gd1YNeBrkt5xJkLl6bUAzZ3/4mkD0h6QtIzkj7v7hfN7D4zuy9y6N2SvuLu0TD51ZLmzOybkv6zpC+7++NJtR0F18nsylbHHD68eiJCnoaDcd1GZv5mqbxLjMFj3mfYoX/FlQuXOnfvu8vu3bsdWFO97n7wYDAX9ODB1rc7OQbFUK+H84KDy3qvbfRvIbyk8TcR09/oU8895eWPl730sZLrAXnpYyUvf7zsTz331Pq/f2Zm9e9ptx3okQ3/DadA0llvEbukHjylcSFgw7pmZlZ/qEU/7GZmOjsG+bfR4KvbIK9XNhk8/vj/+7GXP152PaBVl/LHy15dqra/M+8RZEh1qerT56b96JmjPn1ueu2/3RQRsBGwoRud9AzQe1B8G+2hykoPW7Q9GwweP33u09d6JZovpY+VfPrc9Nq/l15ooCvtArbUc9iATOqkPElRSpigvY3UsfOMFSsO2xPVRTs2NcMuq6t/ADlEwAb
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 36,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"liczba przykładów: {0: 69, 1: 30}\n",
|
|||
|
"prior probability: {0: 0.696969696969697, 1: 0.30303030303030304}\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"classes = [0, 1]\n",
|
|||
|
"count = [sum(1 if y == c else 0 for y in Ybn.T.tolist()[0]) for c in classes]\n",
|
|||
|
"prior_prob = [float(count[c]) / float(Ybn.shape[0]) for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"print('liczba przykładów: ', {c: count[c] for c in classes})\n",
|
|||
|
"print('prior probability:', {c: prior_prob[c] for c in classes})"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 37,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"średnia: [matrix([[1. , 0.03949835, 0.02825019]]), matrix([[1. , 0.09929617, 0.06206227]])]\n",
|
|||
|
"odchylenie standardowe: [matrix([[0. , 0.52318432, 0.60106092]]), matrix([[0. , 0.61370281, 0.6081128 ]])]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"XY = np.column_stack((Xbn, Ybn))\n",
|
|||
|
"XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]\n",
|
|||
|
"X_split = [XY_split[c][:,0:3] for c in classes]\n",
|
|||
|
"Y_split = [XY_split[c][:,3] for c in classes]\n",
|
|||
|
"\n",
|
|||
|
"X_mean = [np.mean(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"X_std = [np.std(X_split[c], axis=0) for c in classes]\n",
|
|||
|
"print('średnia: ', X_mean) \n",
|
|||
|
"print('odchylenie standardowe: ', X_std)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 38,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-10-793ac8294852>:11: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd1gUVxfG3116RxFREVvEisbeCxoTezexJ+bTRE1M1GiixkSxIPZuYhSVaFSwt9iwd41dERUQEASk97JlzvfHlaUIirC7swv39zz3GXbn3pkzs8PMO+eee66EiMDhcDgcDofD0V2kYhvA4XA4HA6Hw3k3XLBxOBwOh8Ph6DhcsHE4HA6Hw+HoOFywcTgcDofD4eg4XLBxOBwOh8Ph6DhcsHE4HA6Hw+HoOIZiGyAGFSpUoBo1aohtBofD4XA4HE4e7ty5E0tE9vm/L5OCrUaNGrh9+7bYZnA4HA6Hw+HkQSKRhBb0Pe8S5XA4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcbhg43A4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcbhg43A4HA6Hw9FxuGDjcDgcDofD0XG4YONwOBwOh8PRcUQXbBKJZKtEIomWSCSPC1kvkUgkayUSSaBEInkokUia5VrXQyKRPHuzbqb2rOZwOKUCIuDgQbYsyvccDocjEqILNgBeAHq8Y31PAM5vyrcA/gQAiURiAGDDm/UNAAyXSCQNNGoph1OWKAti5tAhYNAgYOrUnOMhYp8HDWLrP5SycN44HI7WEV2wEdElAPHvqNIfwHZi3ABgK5FIKgNoBSCQiF4QkQyA95u6HA5HHWhCzOgaAwYAkycDa9bkHOfUqezz5Mls/YdSFs4bh8PROvow+bsjgLBcn8PffFfQ9621aBeHU7rJLWYAYNWqkosZXUMiYccFsOPKPtbJk9n3EsmHb7MsnDcOh6N19EGwFXTHpHd8X/BGJJJvwbpUUa1aNfVYxuGUZjQhZnSR7OPMPj7g3cfn6sqWFy68e3tAgefN9e8urPmYQtpzOBxOAYjeJVoEwgE45fpcFUDEO74vECLaREQtiKiFvb29RgzlcEoducVHNqVJrAE53ZW5yd2dmZ+qVVl5F+84b1Wtq6Kq9XvaczgcTj70QbAdAfDlm9GibQAkEVEkgP8AOEskkpoSicQYwLA3dTkccYmKYl6V1q2BmzdVX0emRGLL3S0Y5DMIow+OFtHAD+BDxYy+kT9mTRDejmnLzz//sFKU7ebmzfb+GfQP/hn0nvYccTh5EujUCdi4EYiN1cguzr44i2Z/NcOc83NwM/wmBBI0sh9O6UN0wSaRSHYDuA6grkQiCZdIJGMlEskEiUQy4U2V4wBeAAgEsBnAdwBARAoAkwCcAuAPYA8R+Wn9ADgcAMjMBHx8gJ49AUdHYMoUkFyGx68fYeGlhWixqQWqrKyCcUfH4XbEbThYOIht8fspjpjRNw4dyjm+bM/hqlU5x1ncUaKl/byVVrKymFCbOBGoXBno35+N7JXJ1LobC2MLuF92R5stbVBpeSV8ffhrHH56GBnyDLXupzSQkpUCz7uemOE7A553PZGSlSK2SaIhoTJ482jRogXdvn1bbDM4+g4R86B5eQHe3kBSEpTVquL6l11xqIEUh2IuIyghCADQpmob9KvTD33q9IFLRRdI9KFL8eBBNqoxt5jJLUYOHAAGDhTbypJBxETZgAF5u3kL+x4Apkxhy9WrC97me87blPW9gY9qY3WPQtpzxIUIePgQ2LmTeVIjIwE7O2DECGDMGKBpU7WEBMSlx+FU0Ckce34MxwOOIykrCeZG5uhRuwcG1B2A3nV6o7xZ+ZIfjx5z5eUV9NrZCwIJSJOnwcLIAlKJFMdHHkeHah3ENk9jSCSSO0TU4q3vuWDjcD6QmBhg+3bA0xN4+hQKCzNcHNke+5ub40DyTbxOew1jA2N8UvMTDKg3AH3r9EVlq8piW/3hFEfMlAXeN+jgPefNNYkNQuCDDvQAhQLw9WUvZYcOMU9b48bAuHHAqFFAuXJq2Y1cKcfF0Is46H8Qh54dQkRKBAylhuhasysG1x+MAfUGoKJFRbXsS19IyUqB40pHpMje9qhZGVshYloELI0tRbBM83DBlgsu2DgfjFIJnDnDRNrhw1Ao5TjXuwH2dLLDIaUf4jLjYW5kjl7OvTC4/mD0cu4FaxNrsa3maIL3Cbb3Nfdi7blg0zMSEpgnfcsW4M4dwNQUGDKEibdOndT28iKQgDsRd7Dffz/2PdmHoIQgSCVSdK7eGYPrD8aQBkPgYKkHIRUlxPOuJ6acnII0edpb6yyMLLCmxxqMbTZWBMs0T2GCTR/SenA44hEdDWzdCmzcCOFlKK64WMN7en3stQpDrOwJrORW6Fu3LwbXH4wetXvA3MhcbIs5HI4mKFeOxbZNnAjcu8de3rIHoNSpA0yYwLpMS+h1k0qkaOnYEi0dW8LjEw88fP0Q+57sw37//Zh0YhJ+PPkjutbsimENh2FQ/UEoZ/b+/aVkpcDHzwcBcQFwtnPG0IZDYWViVSI7NU1AXECBYg0A0uRpCIwP1LJF4sM9bBxOfoiAGzeADRuAvXvxoJwM//R2gnetNIQr4mFmaIZ+dfthmMsw9KjdA6aGpmJbzNEm+uxh493c6iU9Hdi7F/jrL+D6dcDMjMW6ff89i3VTM37RfvB+7I3dj3cjKCEIRlIj9HTuiREuI9Cvbj+YGZm91UZf48C4h413iQLggo1TCFlZbKTnmjUID7yLXS2M8U87KzwyjIOh1BA9avfAcJfh6Fe3X6mNneAUgW+/ZctNm4rX/Chrv6lv8dqXiLIwkEQs7t0D/viDDVbIyADatmXnefBgwFC9nVlEhDuRd7D70W54+3kjIiUC1ibWGFJ/CL78+Et0rN4RUolUr+PA9Nn2ksIFWy64YOPkIToa2LgR6Zs24IBdNLzaW+CcQzoIhLZV22JU41H4ouEXqGBeQWxLOZySkT/lSP5ps0pbUmQxSEgA/v6beegDAwEnJ2DSJOCbb9Q2SCE3SkGJCyEXsOPhDuz3349UWSqq2VTDqEajYG5kDo8rHmrzUmm7a1VfvYMlhQu2XHDBxgEA+PuDVizHzXM7sNVFDu8mBkgxUKJWuVr4svGXGNl4JGqXry22lUWHd3dxikJu0ZZNGRdrGhEiggD8+y9L/3LuHGBuDnz9NTv3H32kHsPzkSZLw+Fnh7Hj4Q6cDjr93qS8M9vPhEc3jyJtWyzxlCpLhc9jHwTGB6J2+doY6jK01HrWsuGCLRdcsJVxrl1D7PL52B51Cp7NJfCvQDA3MMPnLl/g6yZfq7oT9A7e3aUd9LlLNBsiQJrrGheEMivWtCJEHjxgwm3XLpYqZMgQYMYMoFkz9Wy/ACJSIvDD8R9w8OlBUAHTbH+Ih60sd0+KQWGCTQ+fShxOMRAE0NGjuNi3MUasbA9Hl1OY1h2wadgMnn09EfXza3gN8ELnGp31U6wBzIOWP6N+7u6uAQPEtrB08Pw5K8VtHvccz+OK377ElPbpxj6AlKwU9NrZCymyFFW3YZo8DSky9n2qLFU9O/r4Y2DbNiAkBPj5ZzYFVvPmwKefsnRBGjj3VayqwGuAFyyMLApcrxAU6FG7R5G25ePnU6i3TiABPo99im0np+jo6ZOJwykigoBEby+sGeKE+qf7wbXFIxxvZIrxLSbg4YSHuD7hNsY2G6vzQ9yLRP5plaRSHpvEyQufNisPWhcilSsDixcDL1+y5ePHTLS1bg0cO6b2829lYoUTo07AythKJdyMpcaQSqTIUmahzvo6GHdkHO5G3n3ndniKDd2ACzZO6USpxKNtizHhKzs4PvoaUz6OQDnHj+DVxxMRv8Zhbb8/0cihkdhWqp9s0ZYbLtY42Whi7lQ9RjQhYmPDukSDg1lKkNhYoG9f5nU7dIgJaTXRoVoHREyLwJoeazCz/Uz80fsPJM1Mwn/f/IcRLiOw+/FuNN/UHO23tsfuR7shU749b6qznXOhnjoLIwv9ivXVY7hg45QqFLJM7Pv
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
|
|||
|
"draw_means(fig, X_mean, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)\n",
|
|||
|
"plot_prob(fig, X_mean, X_std, classes, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 39,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXxT15k38N8jW5JtWWbfl7CGhBDigNu0WaCTNBkCDWkSWqbbdDq0mUmbsKUFOuk2bbM0nRdC2mmnKW3nnTbT0jdAlkKzDglk2iaBFBIIIRiCWcxiFmN50WLpvH9cXUuWtdpXPpLv7/v5+BNbkq8eKzb3p3POfY4opUBERERE+ePQXQARERFRX8fARURERJRnDFxEREREecbARURERJRnDFxEREREecbARURERJRnpboL6I7BgwercePG6S6DyNYUgMN+P86FQri4ogLekhLdJRWdsFJ4u6UFThFcXFEBp4jukoioh3bu3HlGKTUk8faiDFzjxo3Djh07dJdBZFuhSASf3rcPbzY04KEJE7By7FjdJRWtbY2NmPf222h1ufBydTVGud26SyKiHhCRumS3c0qRiHLSHongs/v24YmGBvyfiRMZtnpoVv/+eH76dJwKBvE3u3ahPhDQXRIR5QEDFxFlLaIUFu3fj983NOCHEyZg+ZgxukvqEz7crx+enT4dJ4JB3Lh7NxqCQd0lEZHFGLiIKCtKKSyprcV/nTqF744bh69yZMtSV/frh2emTcMhvx9z3noLTe3tuksiIgsxcBFRVr5XV4cfHz+Oe0ePxjcuukh3OX3SRwYMwIbLLsPu5mZ8fM8eBCIR3SURkUUYuIgoo1+cOIFvHz6Mvx82DD+cOBHCq+nyZu6gQfjVJZdga2Mj/uHddxFRSndJRGSBorxKkYh6z/PnzuGf9u/HTQMGYN2UKQxbveBzw4ejPhjEqkOHMK6sDA9OmKC7JCLqIQYuIkppb0sLPrF3Ly7zePDEZZfB6eCgeG9ZMWYMDvv9eOjIEUwqL8eiESN0l0REPcB/PYkoqTPBIG55+21UlJTgD5dfDm8p35/1JhHBjyZNwk0DBuCu997DtsZG3SURUQ8wcBFRF+2RCD75zjuoDwTw5LRpGFNWprskWyp1OLB+6lSMLyvDHXv3os7v110SEXUTAxcRdbHi0CFsbWzEY1Om4KqqKt3l2Fp/pxNPX345gpEIbt+zB23hsO6SiKgbGLiIqJP1p09jzbFjuHvUKPz98OG6yyEAUyoq8JtLL8Wbzc24+8AB3eUQUTcwcBFRh/2trfji/v24uqoKqydO1F0Oxbll8GDcN3YsfnnyJH514oTucogoRwxcRAQAaAuHsWDvXpRF1w3xisTC86/jx+P6/v3x5QMHsKe5WXc5RJQD/otKRACAZbW12NPSgl9fcglGc5F8QSoRwX9PnYp+JSVY+M47aOV6LqKiwcBFRNjY0ICfnTiBr40ZgzmDBukuh9IY5nLh15deindaW7G8tlZ3OUSUJQYuIpurDwTwpf37UeP14vvjx+suh7Jw48CB+NqYMfjZiRN4+swZ3eUQURYYuIhsTCmFL7z7LvyRCH5z6aVwcd1W0fje+PGorqzEov37cToY1F0OEWXAf12JbOyn9fV4/vx5/NvEiZhSUaG7HMqB2+HA45deCl97O+7cvx+Km1wTFTQGLiKbOtTWhq8dPIibBgzAP48cqbsc6oapHg/unzABT509i9+cOqW7HCJKg4GLyIYiSuGL+/ejVATrpkyBiOguibpp6ejRuLqqCktqa3EiENBdDhGlwMBFZEM/P3ECWxsb8W8TJ3KfxCJXIoJfXnIJ2iIRfIVd6IkKFgMXkc3UBwJYcfAgru/fH18cMUJ3OWSBKRUV+M64cdh05gw2NjToLoeIkmDgIrKZew4cQFAp/OziizmV2IcsHz0a1ZWVuPvAATS1t+suh4gSMHAR2cjms2ex8cwZfOuiizCJVyX2KU6HA49dfDFOBoP4xvvv6y6HiBIwcBHZRGs4jLsPHMDUigrcO2aM7nIoDz5QVYUvjxyJfz9+HDt9Pt3lEFEcBi4im3joyBEc9vvxk4svZoPTPuz748djsNOJr7z3HiLszUVUMPivLpENHGxrw8NHjuDTQ4didv/+usuhPOrvdOLhiRPxms+H/3vypO5yiCiKgYvIBu6trYXT4cDDEyfqLoV6weeGDcPVVVVYdegQF9ATFQgGLqI+7qXz5/HU2bO4b+xYjHK7dZdDvcAhgrWTJqEhFML9dXW6yyEiMHAR9WlhpbCsthbjy8qwdPRo3eVQL6qpqsLnhw/HI8eO4WBbm+5yiGyPgYuoD/vViRN4u6UFP5gwAWUlJbrLoV52//jxKBXBqkOHdJdCZHsMXER9VEs4jG8ePowPV1VhwZAhusshDUa63fjamDF4oqEBf75wQXc5RLamPXCJSJmIvC4iu0Vkr4j8q+6aiPqCNUeP4mQwiH+bOJEd5W3sq2PGYLjLhRWHDkGxTQSRNtoDF4AAgOuVUlcAqAYwR0Q+pLkmoqJ2JhjEw0eP4uODB+Pqfv10l0MaVZaW4tsXXYRXL1zA5rNndZdDZFvaA5cyNEe/dEY/+DaMqAceOHIELeEw7h8/XncpVAAWjRiByeXl+Jf332czVCJNtAcuABCREhHZBeA0gBeUUq/promoWB0PBPCT48fx98OHY6rHo7scKgBOhwPfHTcOb7e0YP3p07rLIbKlgghcSqmwUqoawGgAHxSRaYmPEZE7RWSHiOxoaGjo/SKJisT9dXWIAPjWRRfpLoUKyCeHDsXlHg++c/gw2iMR3eUQ2U5BBC6TUqoRwMsA5iS57zGlVI1SqmYIr7giSqrO78e6EyewaMQIjC8v110OFRCHCL47bhzea2vD4xzlIup12gOXiAwRkf7Rz8sBfBTAu3qrIipOD9TVQQD8y9ixukuhAnTr4MG4srIS3+MoF1Gv0x64AIwAsFVE3gLwBow1XH/QXBNR0Tni9+NXJ09i0YgRGFNWprscKkAigm+PG4eDfj9HuYh6WanuApRSbwG4UncdRMXuB0eOAABWcXSL0pg/aBCu8HjwQF0dPjtsGErYo42oVxTCCBcR9dCJQAC/OHECnx8+HGM5ukVpiAjuu+givNfWhg28AImo1zBwEfUBa44dQ0gprBwzRncpVARuHzIEl1RU4IG6OnafJ+olDFxERa4xFMJP6+uxcOhQTKqo0F0OFYESEawcMwa7W1rw7LlzusshsgUGLqIi99P6ejSHw1jB0S3KwaeHDcNot7tj7R8R5RcDF1ER84fDWHvsGG4aMADVXq/ucqiIuBwOLBs9Gq9cuIDXm5p0l0PU5zFwERWx/z59GqdCIXyNo1vUDV8aMQL9Skrwf44e1V0KUZ/HwEVUpJRSWH30KK7weHDDgAG6y6Ei5C0txZ0jR2JDQwPq/H7d5RD1aQxcREXqhfPnsbe1FcvGjIGwlxJ10z2jRgEAHj12THMlRH0bAxdRkVp77BiGOZ34u6FDdZdCRWxMWRk+MXQofnHiBJrb23WXQ9RnMXARFaEDra3Ycu4c7ho1Cm4H/4ypZ5aMGoUL4TD+69Qp3aUQ9Vn8l5qoCP34+HE4RfBPI0boLoX6gKuqqlDj9eLHx4+zESpRnjBwERWZ5vZ2/OfJk/jEkCEY7nbrLof6ABHBPaNGYV9rK/6nsVF3OUR9EgMXUZF5/PRpNIXDuDu62JnICp8cMgSDSkvxk+PHdZdC1CcxcBEVEaUUfnr8OK7wePChqird5VAfUlZSgkUjRuCpM2dwPBDQXQ5Rn8PARVREXmtqwu6WFvzzyJFsBUGW+6eRIxEGsO7ECd2lEPU5DFxEReRnJ06gsqQEnxk2THcp1AdNKC/HTQMG4BcnTiDMxfNElmLgIioSF9rbsf70aXxq6FB4S0t1l0N91JdGjMDRQADPnTunuxSiPoWBi6hI/PbUKbRFIvgSW0FQHs0fPBhDnU78nNOKRJZi4CIqEutOnMB0jwc1Xq/uUqgPczkc+Pzw4XjmzBmcCgZ1l0PUZzBwERWBt5ubsbO5GYt
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 40,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"theta = [[-0.31582268]\n",
|
|||
|
" [ 0.43496774]\n",
|
|||
|
" [-0.21840373]\n",
|
|||
|
" [-7.88802319]\n",
|
|||
|
" [22.73897346]\n",
|
|||
|
" [-4.43682364]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
|
|||
|
"theta_start = np.matrix(np.zeros(Xbnp.shape[1])).reshape(Xbnp.shape[1], 1)\n",
|
|||
|
"theta, errors = GD(h, J, dJ, theta_start, Xbnp, Ybn, \n",
|
|||
|
" alpha=0.05, eps=10**-7, maxSteps=100000)\n",
|
|||
|
"print(r'theta = {}'.format(theta))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 41,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzde5xT5Z0/8M+TTDKX5ASYOxcRFeoFlMsMqC21tdqqsAqCLSrebf1t2xWE7qr9dbfb32613XYLYtu9WLrrBazUctOC9VbrtQIzFAS8gRRRYDLDMJCTTCbX5/fHyUkymSSTzCRzkszn/XrlJbmdPHMGZj4+z/d8HyGlBBERERHlj8noARARERGVOgYuIiIiojxj4CIiIiLKMwYuIiIiojxj4CIiIiLKMwYuIiIiojwrM3oAA1FbWysnTJhg9DCIiIpG0BWEd78XVZ+pglkxGz0copLV2tp6XEpZl/h4UQauCRMmoKWlxehhEBEVjb0L9uJU1ylc/M7FMJVzcYMoX4QQHyd7nP/qiIhKnP+4H53PdqLxtkaGLSKD8F8eEVGJ6/hdB2RQon5xvdFDIRq2GLiIiEqc83Enqs6tgn2q3eihEA1bDFxERCXMvccN159dGP310RBCGD0comGLgYuIqIQ5n3BClAk03NJg9FCIhjUGLiKiEiXDEu1PtWPUFaNgrbUaPRyiYY2Bi4ioRHW93AXfJz40LObsFpHRGLiIiErUsUeOwVJrQd2CPj0YiWiIMXAREZWgoCuI488eR/0N9ey9RVQA+K+QiKgEdazvgPRJ1N/A3ltEhYCBi4ioBB1bfQyVn6mE4yKH0UMhIjBwERGVnO4PuuF6y4XR32DvLaJCwcBFRFRinE86AQFenUhUQBi4iIhKiAxLONc6MfLSkSgfXW70cIgogoGLiKiEnHzlJHo+6sHoO0YbPRQiisPARURUQtoebUPZyDLULqw1eihEFIeBi4ioRIS6Qzi+6ThqF9bCXGE2ejhEFIeBi4ioRBzfeBwhdwgNN7FYnqjQMHAREZWIo48cReXESoy8ZKTRQyGiBAxcREQlwHvIi1OvnULjbY0QJvbeIio0DFxERCWg/TftAID6G7mVD1EhYuAiIipyUkq0PdqGEbNHoPKMSqOHQ0RJMHARERW5U6+fgvdDL0Z/g723iAoVAxcRUZFzrnXCVGVC3cI6o4dCRCkwcBERFbGwL4yOpztQO78WZht7bxEVKgYuIqIi1rGhA8GuIBpvazR6KESUBgMXEVERa/vfNlRMqMCoy0YZPRQiSoOBi4ioSPnafOh6uQv1i+vZe4uowDFwEREVqfYn24Ew0LCYW/kQFToGLiKiIiSlxLHVx+C4yAHbuTajh0NE/WDgIiIqQmqLiu73utF4B4vliYoBAxcRURFyrnVCWAXqvsreW0TFwPDAJYSoEEJsF0LsFkLsE0L8P6PHRERUyML+MNqfbEfN39TAMtJi9HCIKANlRg8AgA/Al6SUbiGEBcAbQojnpJRvGz0wIqJC1PlsJwIdAYz+OrfyISoWhgcuKaUE4I7ctURu0rgREREVtrYn2mAdbUX1V6qNHgoRZcjwJUUAEEKYhRC7ALQDeFFKuc3oMRERFaLAiQBOPHcC9dfXQ5jZe4uoWBRE4JJShqSU0wCMAzBLCDEl8TVCiLuEEC1CiJaOjo6hHyQRUQFwrnVC+iUabmHvLaJiUhCBSyelPAngTwCuTPLcI1LKZillc10dr8ohouGp7X/aYG+yQ5mmGD0UIsqC4YFLCFEnhBgZ+XMlgMsBvG/sqIiICo9nnwfuXW403sLeW0TFxvCieQCjATwmhDBDC4C/lVL+3uAxEREVHOcaJ2AG6hfVGz0UIsqS4YFLSvkOgOlGj4OIqJCFg2G0PdqGmqtqYG2wGj0cIsqS4UuKRETUv64XuuBv86PxTi4nEhUjBi4ioiLgXOtE2agy1MypMXooRDQADFxERAUu6Ari+MbjqF9UD5OVP7aJihH/5RIRFbj2p9oR9obReDuXE4mKFQMXEVGBa3usDVWTq6DMZO8tomLFwEVEVMC8f/XC9ZYLDYsbIAS38iEqVgxcREQFrO2xNkAADYu5lQ9RMWPgIiIqUDIs0fZoG0ZdPgoV4yuMHg4RDQIDFxFRgTr1+in4Pvah8TYWyxMVOwYuIqIC5VzjhMlmQu28WqOHQkSDxMBFRFSAQp4Q2te1o+66OphtZqOHQ0SDxMBFRFSAOjZ0IKSGMPqO0UYPhYhygIGLiKgAOdc4UTGhAiM+P8LooRBRDjBwEREVGN9RH7pe6kL94nr23iIqEQxcREQFpu3xNiAMNN7CqxOJSgUDFxFRAZFSwvmYEyNmj0DVZ6qMHg4R5QgDFxFRAXHvdKP7/W403MzO8kSlhIGLiKiAtD3eBmEVqPtqndFDIaIcYuAiIioQYV8YzjVO1M6rhWWUxejhEFEOMXARERWIzuc6ETwRROPtLJYnKjUMXEREBcK5xglLvQWjvjzK6KEQUY4xcBERFYBAZwCdz3ai/oZ6mMr4o5mo1PBfNRFRAXA+6YT0S4y+nVv5EJUiBi4iogLgfMIJ21Qb7FPtRg+FiPKAgYuIyGDdH3ZD3aGi4Sb23iIqVQxcREQGa3u0DTABDTcycBGVKgYuIiIDyZBE2+NtqL6yGuVjyo0eDhHlCQMXEZGBTr56Ev4jfm5UTVTiGLiIiAzkfMIJs2JGzdU1Rg+FiPKIgYuIyCBBdxDtT7ej7mt1MFeZjR4OEeWR4YFLCHGaEOIVIcR7Qoh9QoilRo+JiGgoHN9wHGFPGI23cTmRqNSVGT0AAEEA35FS7hRCKABahRAvSinfNXpgRET55FzjRMUZFRjxuRFGD4WI8szwGS4p5TEp5c7In1UA7wEYa+yoiIjyy3fEh66Xu9CwuAFCCKOHQ0R5ZnjgiieEmABgOoBtxo6EiCi/2p5oA8JAw63svUU0HBRM4BJC2AGsB3CPlNKV5Pm7hBAtQoiWjo6OoR8gEVGOSCnhfNwJx+ccqJpYZfRwiGgIFETgEkJYoIWttVLKDcleI6V8RErZLKVsrqurG9oBEhHlkHunG93vdaPxZhbLEw0XhgcuoRUv/BrAe1LKFUaPh4go39qeaIOwCtR9lf/zSDRcGB64AHwOwM0AviSE2BW5zTF6UERE+RD2h9G+th2182phqbYYPRwiGiKGt4WQUr4BgJfoENGwcOL5EwgcD6DhFhbLEw0nhTDDRUQ0bDjXOGGptaD6imqjh0JEQ4iBi4hoiAS6Aji++Tjqr6+HycIfv0TDCf/FExENkY6nOyB9kr23iIYhBi4ioiHifMKJqnOroDQpRg+FiIYYAxcR0RDw/tWLU2+cQsNN3MqHaDhi4CIiGgLOJ5wAgIbFXE4kGo4YuIiI8kxKCecTToy8dCQqTq8wejhEZAAGLiKiPHNtc8F7wIuGmzm7RTRcMXAREeWZ8wknTBUm1C3kVj5EwxUDFxFRHoV9YbQ/1Y7a+bUocxi+uQcRGYSBi4gojzqf60TwRJBb+RANcwxcRER55FzjhKXeglFfHmX0UIjIQAxcRER5EugKoPPZTm0rnzL+uCUazvgTgIgoTzp+2wHpl2i8pdHooRCRwVjBSUSUJ8412lY+9hl2o4dCRHkmpUTIE0r5PAMXEVEeeA9pW/mc8cAZ3MqHqAT52nxwt7rh2u6C620X1BYVIz4/IuXrGbiIiPKgfW07AKD+xnqDR0JEgxV0BaG2qtpthwrXWy74PvVpT5oA2/k21C2sw8hLRwKbkx+DgYuIKMeklGh7vA0jLhmBygmVRg+HiLIQDoTheccD1zZt1kptVeHZ6wHC2vPlp5fD8VkHHBc5oDQpsM+wo8weF6duTH5cBi4iohxTW1R4P/TitH84zeihEFEaMizhPeCFukPFqT+fgtqiwrPbg3CPlq4sdRbYZ9hRe20tRlw8AvYZdljrrAP6LAYuIqIcc651QlgFt/IhKjC+Iz5t1qolErB2qAi5tEJ3k80EpVnBmG+OgeNCB5QLFVScXpGzGkwGLiKiHAoHta18aubWwDLKYvRwiIa
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xbnp, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 42,
|
|||
|
"metadata": {
|
|||
|
"scrolled": true,
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"<ipython-input-15-da039958d168>:8: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);\n",
|
|||
|
"<ipython-input-21-f44dd646c57d>:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFmCAYAAAC4FUTmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeXyU1b0/8M+ZPbNl30jYQdmECAFcEKxWq1DFBaVK1Xpd7rVVEbxFfrft7Wprvffi1ltvK22tSitWQKXiUi0CbixBQJElAVmTTPbMvj7n98czT2YymZnMJDN5Zibf9+uVF8ksz5wJSeYz53yf72GccxBCCCGEkPRRyD0AQgghhJBcR4GLEEIIISTNKHARQgghhKQZBS5CCCGEkDSjwEUIIYQQkmYUuAghhBBC0kwl9wAGoqSkhI8ZM0buYRAyrHEAJ9xudPh8OEevh0mplHtIWSfAOT53OKBmDOfo9VAzlrbH8lv9cNW7oD9HD6WJ/q8ISZe6uro2znlp5OVZGbjGjBmDPXv2yD0MQoYtnyDg1kOHsLe1FY+NG4dHRo2Se0hZa3tXFxZ9/jmcGg0+qKlBlVablsf54oYv0N3ZjQsPXAiFlhY3CEkXxtjJaJfTbx0hJCl+QcC3Dx3Cq62t+J/x4ylsDdL8ggK8O306LF4vvrZvHxo9npQ/hrfNi/bN7aj4TgWFLUJkQr95hJCECZzjriNH8EprK/5r3DisHDlS7iHlhAvz8/H29Olo8npxxf79aPV6U3r81ldbwf0cZcvKUnpcQkjiKHARQhLCOcfyhga8YLHgZ2PG4N9pZiulLsrPx+Zp03Dc7cZVBw7A6ven7NiWFyzQT9bDOMOYsmMSQpJDgYsQkpCfnzyJ35w9i4erq/HD0aPlHk5OurSwEBumTsV+ux3XffEFPIIw6GPaP7fD+okVlXdXgqWxKJ8QEh8FLkJIv/7Q1IQfnziB28vL8V/jx9MLdxotLC7GnyZNwtauLnzn8GEInA/qeJYXLWAqhvLby1M0QkLIQGTlWYqEkKHzbkcH/vXIEVxZWIi1555LYWsI3FZRgUavF6uPH8cYnQ6/GjduQMfhAkfLyy0o/EYhNCWaFI+SEJIMClyEkJgOOhy46eBBTDUY8OrUqVAraFJ8qKwaORIn3G48duoUJuTl4a7KyqSP0fl+JzynPRj364EFNkJI6tBfT0JIVG1eL675/HPolUr8/bzzYFLR+7OhxBjDMxMm4MrCQtx39Ci2d3UlfYym3zdBXaJG6Q19ejASQoYYBS5CSB9+QcDNX36JRo8Hr02bhpE6ndxDGpZUCgXWT5mCsTodbjx4ECfd7oTv67f60ba5DWW3lFHvLUIyAP0WEkL6WHX8OLZ2deH3556LuWaz3MMZ1grUarxx3nnwCgJu+OILuAKBhO7XuqEV3MNRdgv13iIkE1DgIoT0sr6lBU+cOYP7q6pwe0WF3MMhAM7V6/HS5MnYa7fj/vr6hO7TtLYJeefkwXwBBWZCMgEFLkJIjyNOJ+4+cgQXmc1YM3683MMhYa4pKcEPRo3CH5ub8aempri3dR5xwvqxFZX3UO8tQjIFBS5CCADAFQhgycGD0AXrhuiMxMzz07FjcVlBAb5bX48v7PaYt7P8xQIwoHwZ9d4iJFPQX1RCCABgRUMDvnA48OKkSaimIvmMpGQMf5kyBflKJZZ++SWcUeq5uMBhWWdBwdcKoK3UyjBKQkg0FLgIIdjY2orfNTXh+yNH4qriYrmHQ+Io12jw4uTJ+NLpxMqGhj7Xd23tgvuYG5X/knzfLkJI+lDgImSYa/R4cM+RI6g1mfCLsWPlHg5JwBVFRfj+yJH4XVMT3mhr63Vd8/PNUBWoUHJjiUyjI4REQ4GLkGGMc447Dx+GWxDw0uTJ0FDdVtb4+dixqDEacdeRI2jxegEAAWcAba+1oeTGEih1SplHSAgJR39dCRnGnm1sxLudnfjv8eNxrl4v93BIErQKBdZNngyb3497jxwB5xxtm9oQsAdQ/m0qlick01DgImSYOu5y4fvHjuHKwkL824gRcg+HDMAUgwGPjhuH19vb8ZLFgsbfNyJvQh4K5hfIPTRCSAQKXIQMQwLnuPvIEagYw9pzz6VeTVnsoepqXGQ24xc76tG9vRsV36kAU9D/JyGZhgIXIcPQc01N2NrVhf8eP572ScxySsbwx0mTcME/BABA2a20lQ8hmYgCFyHDTKPHg1XHjuGyggLcXUmtA3LBOXl5+Nb7Khw4D3jLGLshKiFEPhS4CBlmHqivh5dz/O6cc2gpMUd07+hG3nEfPr9Ri/vr62H1++UeEiEkAgUuQoaRN9vbsbGtDf85ejQm0FmJOcOyzgKFXoF77p2EZq8XP/zqK7mHRAiJQIGLkGHCGQjg/vp6TNHr8fDIkXIPh6SI4BHQ+rdWlFxXgjmVhfjuiBH437NnUWezyT00QkgYClyEDBOPnTqFE243fnvOOdTgNIe0bmyFv9OPiu9UAAB+MXYsStRqfO/oUQicyzw6QoiE/uoSMgwcc7nw+KlTuLWsDAsKqEdTLmn+UzN0Y3QovLwQAFCgVuPx8eOx02bDn5ubZR4dIURCgYuQYeDhhgaoFQo8Pn683EMhKeRp9qDz/U6ULSvr1XvrtvJyXGQ2Y/Xx41RAT0iGoMBFSI57v7MTr7e34wejRqFKq5V7OCSFWv7SAghA+bLeW/koGMNTEyag1efDoydPyjQ6Qkg4ClyE5LAA51jR0ICxOh0eqq6WezgkhTjnaFrbBPMFZhgmG/pcX2s2446KCjx55gyOuVwyjJAQEo4CFyE57E9NTfjc4cCvx42DTqmUezgkhWx7bHAecqLiXypi3ubRsWOhYgyrjx8fwpERQqKhwEVIjnIEAvjRiRO40GzGktJSuYdDUsyyzgKmYSi9Kfb/7QitFt8fORKvtrbik+7uIRwdISSS7IGLMaZjjO1ijO1njB1kjP1U7jERkgueOH0azV4v/nv8eOoon2MEr4CWv7Sg+JvFUBeo497230eORIVGg1XHj4NTmwhCZCN74ALgAXAZ53wGgBoAVzHGLpB5TIRktTavF4+fPo3rSkpwUX6+3MMhKda+uR2+Vh8q7+5/L0yjSoUfjx6ND7u78WZ7+xCMjhASjeyBi4uk3VbVwQ96G0bIIPzy1Ck4AgE8Onas3EMhadD8YjM0lRoUXVmU0O3vqqzExLw8/MdXX1EzVEJkInvgAgDGmJIxtg9AC4B/cM53yj0mQrLVWY8Hvz17FrdXVGCKoe/ZayS7+Tp86HirA2XfKgNTJrZUrFYo8LMxY/C5w4H1LS1pHiEhJJqMCFyc8wDnvAZANYA5jLFpkbdhjN3LGNvDGNvT2to69IMkJEs8evIkBAD/OXq03EMhaWBZZwH3cpTfXt7/jcPcXFaG8wwG/OTECfgFIU2jI4TEkhGBS8I57wLwAYCrolz3e855Lee8tpTOuCIkqpNuN9Y2NeGuykqMzcuTezgkDZr/2AzjLCNMNaak7qdgDD8bMwZHXS6so1kuQoac7IGLMVbKGCsIfp4H4OsADss7KkKy0y9PngQD8B+jRsk9FJIGjoMO2PfZUXF77N5b8SwuKcH5RiN+TrNchAw52QMXgEoAWxljBwDshljD9XeZx0RI1jnlduNPzc24q7ISI3U6uYdD0sDykgVQAmVLywZ0f8YYfjxmDI653TTLRcgQU8k9AM75AQDnyz0OQrLdr0+dAgCsptmtnCT4BTQ/34ziq4uhKdcM+DjXFhdjhsGAX548iW+Xl0NJPdoIGRKZMMNFCBmkJo8Hf2hqwh0VFRhFs1s5qfPdTnibvai4a2DLiRLGGH4wejSOulzYQCcgETJkKHARkgOeOHMGPs7xyMiRcg+FpIllnQWqQhWKFxYP+lg3lJZikl6PX548Sd3nCRkiFLgIyXJdPh+ebWzE0rIyTNDr5R4OSQO/1Y+2TW0oW1oGhWbwf7aVjOGRkSOx3+HA2x0dKRghIaQ/FLgIyXLPNjbCHghgFc1u5ayWl1sguARU3Dm45cRwt5aXo1qr7an9I4SkFwUuQrK
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
|
|||
|
"plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)\n",
|
|||
|
"plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Naiwny klasyfikator Bayesa nie działa, jeżeli dane nie różnią się ani średnią, ani odchyleniem standardowym."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 8.2. Algorytm $k$ najbliższych sąsiadów"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### KNN – intuicja"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Do której kategorii powinien należeć punkt oznaczony gwiazdką?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 43,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przydatne importy\n",
|
|||
|
"\n",
|
|||
|
"import ipywidgets as widgets\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas\n",
|
|||
|
"\n",
|
|||
|
"%matplotlib inline"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 44,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych (gatunki kosaćców)\n",
|
|||
|
"\n",
|
|||
|
"data_iris = pandas.read_csv('iris.csv')\n",
|
|||
|
"data_iris_setosa = pandas.DataFrame()\n",
|
|||
|
"data_iris_setosa['dł. płatka'] = data_iris['pl'] # \"pl\" oznacza \"petal length\"\n",
|
|||
|
"data_iris_setosa['szer. płatka'] = data_iris['pw'] # \"pw\" oznacza \"petal width\"\n",
|
|||
|
"data_iris_setosa['Iris setosa?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-setosa' else 0)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data_iris_setosa.values.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data_iris_setosa.values[:, 0:n].reshape(m, n)\n",
|
|||
|
"\n",
|
|||
|
"X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
|
|||
|
"Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 45,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres danych (wersja macierzowa)\n",
|
|||
|
"def plot_data_for_classification(X, Y, xlabel, ylabel): \n",
|
|||
|
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
|||
|
" ax = fig.add_subplot(111)\n",
|
|||
|
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
|||
|
" X = X.tolist()\n",
|
|||
|
" Y = Y.tolist()\n",
|
|||
|
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
|
|||
|
" \n",
|
|||
|
" ax.set_xlabel(xlabel)\n",
|
|||
|
" ax.set_ylabel(ylabel)\n",
|
|||
|
" ax.margins(.05, .05)\n",
|
|||
|
" return fig"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 46,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def plot_new_example(fig, x, y):\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" ax.scatter([x], [y], c='k', marker='*', s=100, label='?')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 47,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFkCAYAAAD13eXtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3dfZRddX3v8c83mYAhGURJbIAQw0PEAtVJJiJWLOAzYGEyhQRqK7YsU1psR7QrEG/lqr01rNx7nQ6WYilayL2KITIEFmDxAdRwhUoSEiQ8BZSUNCCCApPISjLnfO8f+xzmzMw5Z++Zs5/OOe/XWnvN7Ifz29/9myzOl71/+/c1dxcAAACSMyXrAAAAAFodCRcAAEDCSLgAAAASRsIFAACQMBIuAACAhJFwAQAAJKwj6wAmatasWT5//vyswwAAABhl06ZNL7j77Gr7mi7hmj9/vjZu3Jh1GAAAAKOY2Y5a+3ikCAAAkDASLgAAgISRcAEAACSMhAsAACBhJFwAAAAJI+ECAABIGAkXAABAwki4AAAAEpZYwmVmR5rZPWb2qJltM7O+KsecZmYvm9mW0nJFUvEAAJAId+mWW4KfUbYndY404sCkJXmHa1jSZ9z9dyWdLOkSMzu+ynEb3L2rtHwxwXgAAIjf+vVSb6906aUjSY17sN7bG+xP4xxpxIFJS6y0j7s/K+nZ0u9DZvaopCMkPZLUOQEASF1Pj9TXJw0MBOv9/UGSMzAQbO/pSe8cSceBSTNP4Rajmc2X9GNJJ7r7KxXbT5N0s6SdknZJ+lt331avrcWLFzu1FAEAuVK+k1ROdqQgyenvl8zSO0cacaAmM9vk7our7ks64TKzmZJ+JOkf3H1wzL6DJRXdfbeZnSlpwN0XVGljuaTlkjRv3rzuHTtq1oYEACAb7tKUipE6xWL8SU6Uc6QRB6qql3Al+paimU1TcAfrG2OTLUly91fcfXfp9zslTTOzWVWOu9bdF7v74tmzZycZMgAAE1e+s1SpcixVWudIIw5MSpJvKZqkr0l61N2/XOOYOaXjZGYnleJ5MamYAACIXeVjvL6+4I5SeSxVXMlOlHOkEQcmLbFB85LeLelPJf3MzLaUtn1W0jxJcvevSjpX0l+a2bCkVyWd72kMKgMAIC7r148kOeWxUv39wb6BAenUU6UlS5I/R/n3JOPApKUyaD5ODJoHAOSKe5AQ9fSMHitVa3tS55CSjwN1ZTpoPm4kXAAAII8yGzQPAAAAEi4AAIDEkXABANLXLHX/ikXpssuCn1G2AzWQcAEA0tcsdf9WrpRWr5a6u0eSq2IxWF+9OtgPREDCBQBIX2VtwHLSlce6f6tWSV1d0pYtI0lXd3ew3tUV7AciSHIeLgAAqhs7R1S59l/e6v5NmSJt2jSSZE2dGmzv6gq2T+G+BaJhWggAQHaape5fsTiSbElSoUCyhXGYFgIAkD/NUvev/BixUuWYLiACEi4AQPqape7f2DFbhcL4MV1ABCRcAID01aoNWE668vSWYjnZKo/Z2rRpJOniLUVExBguAED60qg/GIdiMUiqVq0aP9as2na0NWopAgAAJIxB8wAAABki4QIApC+stE+xGF76J4420riWKOfJSxutJG/94e5NtXR3dzsAoMkNDgYpU1+fe7EYbCsWg3XJfcWK+vsHB+NpI41riXKevLTRSjLoD0kbvUb+knkCNdGFhAsAWkDlF1/5C7FyvVCov79YjKeNNK4lynny0kYryaA/SLgAAPlT+QVYXmrdjai2P6420riWZmqjlaTcH/USLt5SBABkx0NK+4Ttj6uNOMRxnry00UpS7A/eUgQA5I+HlPYJ2x9XG3GI4zx5aaOV5Kk/at36yuvCI0UAaAGM4cpnG62EMVwkXADQ9nhLMZ9ttBLeUiThAoC2VywGX3hj7zKUtxcK9feX73A12kYa1xL17lQe2mglGfRHvYSLQfMAAAAxYNA8AABAhki4AAAAEkbCBQBALR5DPb442mg3LdhnJFwAANSyfr3U21t9bq/e3mB/Gm20mxbss46sAwAAILd6eqS+PmlgIFjv7w++9AcGgu09Pem00W5asM94SxEAgHrKd1bKX/5S8KXf3x+9REwcbbSbJuyzem8pknABABDGqXGYiSbrM6aFAABgssp3WipR4zB5LdZnJFwAANRS+Virry+4w1IeWxT1yz+ONtpNC/YZg+YBAKhl/fqRL/3y2KH+/mDfwIB06qnSkiXJt9FuWrDPGMMFAEAt7sGXf0/P6LFDtbYn1Ua7adI+Y9A8AABAwhg0DwAAkCESLgAAgISRcAEAWlOUenxhxxSLjbdBvcXR2ulaK5BwAQBaU5R6fGHHrFzZeBvUWxytna61krs31dLd3e0AAIQqFt37+oJ7UH191dfDjikUGm+jWIwn1lbRwtcqaaPXyF8yT6AmupBwAQAiq/wyLy9jv9TDjomjjbhibRUteq31Ei6mhQAAtDaPUI8v7Jg42ogr1lbRgtfKtBAAgPbkEerxhR0TRxtxxdoq2ulay2rd+srrwiNFAEAkjOHKpxa+VjGGCwDQdgYHx3+JV365Dw6GH7NiReNtDA7GE2uraOFrrZdwMYYLANCaPEI9Pqn+MeecI916a2NtUG9xtBa+VmopAgAAJIxB8wAAABki4QIAAEhYYgmXmR1pZveY2aNmts3M+qocY2Z2lZk9aWYPmdmipOIBAMTEU6g/GKUNpC/s7xbX3yWt86QoyTtcw5I+4+6/K+lkSZeY2fFjjjlD0oLSslzSNQnGAwCIQxr1B6O0gfSlVQexFest1np9Me5F0q2SPjBm279IuqBi/XFJh9Vrh2khACBjacxdFaUNpC+tObSadK4uZT0Pl6T5kv5T0sFjtt8u6ZSK9R9IWlyvLRIuAMiBNOoPtmi9vaaX1t+lCf/+9RKuxKeFMLOZkn4k6R/cfXDMvjskrXL3e0vrP5C0wt03jTluuYJHjpo3b173jh07Eo0ZABCBp1B/MEobSF9af5cm+/tnNi2EmU2TdLOkb4xNtkp2SjqyYn2upF1jD3L3a919sbsvnj17djLBAgCiK4+nqRR3/cEobSB9af1dWu3vX+vWV6OLJJO0RtI/1jnmLEnfKR17sqSfhrXLI0UAyBhjuNoXY7jqUhZjuCSdIsklPSRpS2k5U9LFki72kaTsaklPSfqZQsZvOQkXAGQvjfqDUdpA+tKqg9ik9RbrJVyU9gEATIyH1MKLo/5glDZyPJanZYX97eP6u6R1nphRSxEAACBh1FIEAADIEAkXAABAwki4AADx8gh18IpF6bLLgp+Vam2f7HnaCf2RayRcAIB4RamDt3KltHq11N09klwVi8H66tXB/jjO007oj3yr9fpiXhemhQCAnIsyh1Kh4N7VFWzr6qq+Hsd52gn9kTkxLQQAIFXlOysDAyPb+vqk/v6R1/nLd7S2bBk5pqtL2rRpdDmXRs/TTuiPTDEtBAAgfR6hDl6xKE2dOrJeKERPtiZynnZCf2SGaSEAAOkq32mpNLYOXvkOV6XKMV1xnaed0B+5RcIFAIhX5WOtvr4ggerrC9bLX/6VjxO7uoI7W11dwXrUpCvKedoJ/ZFvtQZ35XVh0DwA5FyUOnjlWomVA+QrB86vWBHPedoJ/ZE5MWgeAJAaj1AHzz2Y+mHVqvHjjaptn+x52mnsEv2ROQbNAwAAJIxB8wAAABki4QIAjCgUpCVLgp+1trdSWZ6waykUGo8zjmtNq7/y8ndpRbUGd+V1YdA8ACSopycYYD1rlvvwcLBteDhYl4L9rTTgPexayv3RSJxxXGta/ZWXv0uTUp1B85knUBNdSLgAIEGVyVU56Rq73kplecKuZXi48TjjuNa0+isvf5cmRcIFAIiuMskqL5V3vNxHJyblJWqyVVb5ZV5esvhSD7uWOOLMSxt5Ok8Lqpdw8ZYiAGC8QkHq6BhZHx4eXYJHaq2yPGHXEkeceWkjT+dpMbylCACIrlCQ5swZvW3OnNED6VupLE/YtcQRZ17ayNN52k2tW195XXikCAAJYgwXY7jy8HdpUmIMFwA
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_new_example(fig, 2.8, 0.9)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Wydaje się sensownym przyjąć, że punkt oznaczony gwiazdką powinien być czerwony, ponieważ sąsiednie punkty są czerwone. Najbliższe czerwone punkty są położone bliżej niż najbliższe zielone."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Algorytm oparty na tej intuicji nazywamy algorytmem **$k$ najbliższych sąsiadów** (*$k$ nearest neighbors*, KNN)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Idea (KNN dla $k = 1$):\n",
|
|||
|
" 1. Dla nowego przykładu $x'$ znajdź najbliższy przykład $x$ ze zbioru uczącego.\n",
|
|||
|
" 1. Jego klasa $y$ to szukana klasa $y'$."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 48,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"from scipy.spatial import Voronoi, voronoi_plot_2d\n",
|
|||
|
"\n",
|
|||
|
"def plot_voronoi(fig, points):\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" vor = Voronoi(points)\n",
|
|||
|
" ax.scatter(vor.vertices[:, 0], vor.vertices[:, 1], s=1)\n",
|
|||
|
" \n",
|
|||
|
" for simplex in vor.ridge_vertices:\n",
|
|||
|
" simplex = np.asarray(simplex)\n",
|
|||
|
" if np.all(simplex >= 0):\n",
|
|||
|
" ax.plot(vor.vertices[simplex, 0], vor.vertices[simplex, 1],\n",
|
|||
|
" color='orange', linewidth=1)\n",
|
|||
|
" \n",
|
|||
|
" xmin, ymin = points.min(axis=0).tolist()[0]\n",
|
|||
|
" xmax, ymax = points.max(axis=0).tolist()[0]\n",
|
|||
|
" pad = 0.1\n",
|
|||
|
" ax.set_xlim(xmin - pad, xmax + pad)\n",
|
|||
|
" ax.set_ylim(ymin - pad, ymax + pad)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 49,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAl8AAAFkCAYAAAAe6l7uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3hUZfbHvzeNNFoghRJ6rxFCDxAUFFABWbFgxXVFXV2s+ENd2xZW17LWFSu6gtICoUrvCCGBAKGHBBIgIT2kJzNzfn8cxrQp987cuXcG3s/zzAOZee97zr2ZyT3zvuecr0REEAgEAoFAIBBog5feDggEAoFAIBDcSIjgSyAQCAQCgUBDRPAlEAgEAoFAoCEi+BIIBAKBQCDQEBF8CQQCgUAgEGiICL4EAoFAIBAINMRHbweU0rp1a+rUqZPebrgfRceApt0A7wC9PRFYo+gY0LQ74O2vtyfyKb8ISN5AQBtt7VYXAlX5gCQBPsGAf7i29pVQkQUYy4HgrnWeJKDyClCRDQS0BfzDdHNPUIeKywAZgMAO/HNBEhDUAagpAYK72D+eTMDVk4B/BNCklWt9FTAlZwG/VkCTEOtjTDVA8XGgxQBA0ndNKSkpKY+IQu2N87jgq1OnTkhMTNTbDfcjYTbQrBfQ63m9PRFY47dHgdbDgO5P6e2JfC6uAU5/DNyyRVu7FVeAtT2Bm7cAO+8ApuwCfAK19UEuhgpgbS9g5IdA2Jj6rxWfBPY/Bng3AYZ9CzTtankOgeshE7C6KzB6LRAyiJ9bLAF3HwHiOwFTNsoLqIqOA1tjgVvWAC36udJjQflFYP0AYFqK/c//9klApweAzg9q45sVJEm6IGec2Ha8XoiYAGRt1tsLgS3CxwFXdujthTLCRgP5BwBjlbZ2A8KBwHYACAgdCZz9r7b2leATAES9CyQ9zzf4ujTvDUzYA7SbAmwaBpz6D2Ay6uPnjU7OTsC3GdDypvrP+7UA2k4Gzi+WN0+LvsBN7wN77gZqStX3U1BL+o9A5N3yvnh1mQWkfe96n1RCBF/XC+E3A7l7tL9JCuQTHgvk7AA8SVXCrwXQrCeQn6C97bBY4Mp2oP9bwMl/A4Yy7X2QS8d7eXUr/cfGr3l5A71fACb8BmTGAVvGAFdPa+/jjc657/kGLUmNX1N64+7yCBAawzsOnvR59iSIan9ncmg/BSg6ApSed6lbaiGCr+uFJiH8LTtvn96eCKwR1BHwDgKKT+jtiTLCx3EQpJfdFv15O+/M59r7IBdJAgZ9BBx5zfpqSLPuwPgdQMf7gM2jgBP/FqtgWlFzFbi0mrelLBF+M1CVBxQekT/n4E+B4hQg9St1fBTUJ3cvf3FpPVzeeG9/oMN9QPoPrvVLJUTwdT0hth7dn/BxvPrlSYTp5HPYWCBvL2AyAP3eBE594N7bPK2H8e/3xLvWx0heQM9ngdsSgKwNwOaRnheMeyIXlnKA5W8lD9rLG+j8iLLVL58AIGYZcPR1oOCQOn4KakmzsVJpja6zgLSFjbf/3RARfF1PREwAskXw5daEx+qziuQMYTG87Wis1NaufyhXpRUkcZ5N+M3Amc+09UEpA+cDZ78AyjJsjwvuwsUEXWYBW8YCx//JFVsC15AmY/uqy6Oc92Wslj9vsx5A9GfAnnuA6mKnXBTUwVDGW/SdHlJ2XMtBgG9Tzu9zc0TwdT3RegRQcoZL9AXuiXnlywO+mf2ObzOgeV8gb7/2tutuefZ7Azj1IW8huStBkUCPZ4Dk/7M/VvICuj8JTEwCruwENg4HCo+63scbjaungdI0oO0k2+OaduXUjctrlc3f8V6gzUTgwGMi/0stMpZzoU1gW2XHSRIH2efcP/HeZcGXJEmRkiRtlyTppCRJxyVJmmNhTKwkScWSJCVfe7zhKn9uCLz9gNDRQPZWvT0RWCOwPeDbknvSeBJ6VWrW3aZt3ptXd09/qr0fSugzF8jZBeT+Jm98UAdg3K9Ajz8D224Bjr6lbPVFYJu0hdx+wEtGZyVHb9yDPgDKLgCnP7E+hghYubJxgGbteWeRY09rn+QiZ6XSGp0e5Pw+d/6SBteufBkAvEhEvQEMB/BnSZL6WBi3m4iirj3ecaE/NwZi69H98citx1ggRwefw8YAuftqt+T6vwGc/o97b/H4BAED/wkcstB6whqSBHR9DJiUDBQkAhuHiDwiNTAZuQJV7o088m6uGq/IVmbHuwnnfx3/h/UV4lWrgOnTgeefrw1qiPjn6dP5dTWRY09rn+RQmsZfTtvd6djx/qH8pe3CUnX9UhmXBV9ElEVEh679vwTASQDtXGVPcI02t3LwJZa/3Re9qgedITSGc68MFdrabdKK86PyD/LPzXoCbSZx41d3pvODABmBC78oOy6wHTB2DdD7JWD7RK6eFO1jHCd7E682N7f0vd8CvsFA5F1A+v+U2wruDAz7Gth7n+XUj2nTgDlzgI8/rg12nn+ef54zh19XEzn2tPZJDmkLgY73c0DrKJ7Q84uIXP4A0AlABoBmDZ6PBZAP4AiADQD62ptr8ODBJLCByUQU146o+LTengisUXaJaFkIkcmotyfK2DiCKGur9nYTnyNK+Uftz8VniJa3Iqoq1N4XJVzZTbQykqimzLHjyy8T7ZxGtLYPUe4BdX27Udg9g+jMf62/vgiNn7uyi2hNb/5b6ghJLxJtn2z5820yEc2ZY97w48ecOY7bsocce1r7ZNNfI9HKDkT5h5ybx1hNtCKcqPiUOn4pAEAiyYiLXJ5wL0lSMIAVAJ4jooabsIcAdCSigQA+BWBxjVOSpCckSUqUJCkxNzfXtQ57OpIEtBFbj25NYFugSWugyMOSq81NT7Wm4TZts+68JXHqP9r7ooSwGC6COfmBY8cHtAFGxwF9/wrsmgIcnqv9yqMnU1UAZG3ivmpKCI3hbW5HGwtHzWdt0pP/bvyaJAEffVT/uY8+UtZOQQly7Gntky2ubAP8WgIhN9kfawsvX879SluoiluuwKXBlyRJvuDAaxERxTV8nYiuElHptf+vB+ArSVJrC+O+IqJoIooODbWrVykQeV/ujyduPYaP0y/vK29//e23vq8DZz/jm5w7E/Uu56iVX3LseEkCOt0HTD4KlJ0HNkRxDpzAPucXs2yQXwtlx0kSt51wdNvKyxcYtQQ49REXXtTFvK1Xl7r5Vmojx57WPtlCSUd7e3SZxfl+btrI2JXVjhKAbwGcJKIPrYyJuDYOkiQNveaP6JPgLBHjuTJN9A1yXzxR5zF0JFCYrL3Mj19LoGmP2rwvgNsCtJsKnLT4p8V9CO4EdJvNuVvO4B8GxCzlRP49d7OOpDvLLbkDzlTMdX4YyFjq+EpjUCQwfCGwdyZQmcPPNcynMpka51upiRx7Wvtki+oibvNhTYVAKS36AgHtOO/PHZGzN+nIA0AMAAJwFEDytcdkAE8CePLamGcAHAfnfO0HMNLevCLnSybro4hy9ujthcAa5VlES1sQGQ16e6KMTaOILm/S3m7SC0RH36n/XEka585V5mnvjxKqrxLFtSHKO6jOfJV5RHsfIIrvSpS9Q505rzcKjnC+nb3Pl6WcLzNbbyVKX+ScH8mvEW0dz37ExTXOp6qbbxUX55ythsixp7VPtjjzJdGu6SrP+QXn/WkIZOZ8aZJwr+ZDBF8yOfQy0ZE39fZCYIs1vYnyE/X2QhnJrxMdnqe93YtriLaMa/z8/j/p449SUr8h2hSjbhJzZjwX1yQ8TVRdot681wOJz/F71R62gq/0nzlwcgZjDdHmWKKjb10rhopr/B6w9ryzyLGntU+2+HUYf87VpKqAaGlzosp8dee1gdzgS3S4v14ReV/ujyduPeqlTRk62rLEUb/XgNQFQGWe9j4pofOjgKEUyFyu3pztpwC3HwOM5cD6/kD2FvXm9mTKs0BnPseynKEoKHOiWW3kNO61Zk8qyhZePsCoxfwevbIVuOuuxonskmT5eQvUbBiOc3ETUXC1xL5ta/PWfV7OGC0oPslNattMVHdev5asbHB+sbrzqoAIvq5XQmO4ms6dm1He6Hh
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_new_example(fig, 2.8, 0.9)\n",
|
|||
|
"plot_voronoi(fig, X[:, 1:])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Podział płaszczyzny jak na powyższym wykresie nazywamy **diagramem Woronoja** (*Voronoi diagram*)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Taki algorytm wyznacza dość efektowne granice klas, zwłaszcza jak na tak prosty algorytm. "
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Niestety jest bardzo podatny na obserwacje odstające:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 50,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"X_outliers = np.vstack((X, np.matrix([[1.0, 3.9, 1.7]])))\n",
|
|||
|
"Y_outliers = np.vstack((Y, np.matrix([[1]])))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 51,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAl8AAAFkCAYAAAAe6l7uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydd3xUVfr/35NA6C2U0KugSBWCUgUUFFARVCzYFt0Vt4mu/nRdXVfd4lfXlXXVXbFiAaU36b23hN47oZckhPRkZp7fH4eYNuXemTtzB3Ler9e8YOaee85z7kzmPnPO8zwfh4ig0Wg0Go1GowkPUXYboNFoNBqNRlOW0M6XRqPRaDQaTRjRzpdGo9FoNBpNGNHOl0aj0Wg0Gk0Y0c6XRqPRaDQaTRjRzpdGo9FoNBpNGClntwFmqVOnjjRv3txuMyKPSzuh2nUQXcluS65N8lIg5zxUv8FuSzSRTP5lyD4D1dtAzjnIPguVGkLFenZbpgHIPg3ihMpN1fOURKjSFPLToWpL/+eLGy7vhYr1oULt0NqqUaQfhJjaUCHWext3PqTthpodwWHvmlJiYuJFEanrr91V53w1b96chIQEu82IPDaNVo7BDS/Ybcm1idsF8zpA139AgzvstkYTqeRfhhkN4f71EB0DaXthw1MQXQFu+RKqtbLbwrKLuGF2K+jzE8R2Ua9NdMAD22FWcxi60JhDdWk3LO0Ht8+Bmu1DabEm6yTM6wjDdkG5yr7bLh8MzR+FFo+FxzYvOByO40ba6W3Ha4X6A+HMYrutuHaJiob2b8COv4AuTKzxRvnqULUVpG5Tz2u0hYFroNFQWHQL7Pu3cuQ14ef8SvX+1Lqp+OsxNaHhEDg20Vg/NdvBTe/DmgcgP8N6OzWFHP0Wmjzg3/ECaDkKjnwdepssQjtf1wpxt8GFNeDKtduSa5emI8CZDmcW2m2JJpKp0xMurit8HhUNbf8AA9fDiemw5Fa4vN8++8oqh79WN2iHo/Qxszfulk9C3d5qx0H/GAsNIoXvmREaD4VL2yHjWEjNsgrtfF0rVIhVv7KLfulrrCUqGjr8BXa8ob9wNd6p29Pz32H11jBgBTR7GBb3gj3/1Ktg4SL/MpyarbalPBF3G+RehNTtxvvs+hGk7YJDn1ljo6Y4F9aq79w63Y21j64ITR+Go9+E1i6L0M7XtYTeegw9Te4Hdw6cnme3JZpIpU5PdePw5KA7ouD638Odm+DMfFjcE9L2hN/GssbxycrBquglDjoqGlo8aW71q1wl6D0FdrwOKVussVNTyBEfK5XeaDUKjoxX8X0Rjna+riXqD4Sz2vkKKY4o6PAm7NSxXxovVG0J4oKsJN9tbluibi5L+sLuf6iMLU1oOGJg+6rlL1TclyvPeL/V20D8x7DmQchLC8pETRGcmWqLvvnj5s6r1QXKV1PxfRGOdr6uJer0gPQDkJtstyXXNo2HgdsJp+bYbYkmEnE4rqx++QkBcERB62dhUCKcWwkLu0PqjvDYWJa4vB8yjkDDwb7bVWulQjdO/2Su/2YPQYNBsPEp/YPMKpKmqu37yg3NnedwKCf7cOQH3ofM+XI4HE0cDsdyh8Ox1+Fw7HY4HGM8tOnncDjSHA7HtiuPN0JlT5kgOgbq9oGzS+225Nrm59WvN/WXrcYz3uK+PFGlKfRfAG1+C8tuhx1vmlt90fjmyHhVfiDKQGWlQG/cXf4Fmcdh/3+8txGBGTNKf2d4ez1YjIwXbpuMYmSl0hvNH1PxffmXrbXJYkK58uUEXhSRtkB34LcOh+NGD+1Wi0jnK4+3Q2hP2UBvPYaHxveqf0/OstcOTUSSHt2I3IPfkJKRY+wEhwNaPQWDt0FKAizspuOIrMDtUuUKjN7Imzygssazz5obJ7qCiv/a/Xe4uMFzm5kz4b774IUXCp0aEfX8vvvUcSsxMl64bTJCxhFVMLXRPYGdX7EuxPVXcX4RTMicLxE5IyJbrvw/HdgLNArVeJorNLhDOV96RSa0OBzQ4a0rsV+RH9ypCS8/nupABUln6+r/mjuxciPoOwfavgTLB8H213T5mGA4uwgqN4Yann73e6B8VWgyHI5+Z36sqi3gls9h7cOeQz+GDYMxY+DDDwudnRdeUM/HjFHHrcTIeOG2yQhHxkOzR5RDGyhXQ80vEQn5A2gOJAHVS7zeD0gGtgPzgXb++uratatofOB2i0xvJJK2325Lrn3cbpH58SLHp9ptiSbCSM7IFZmAOKc1FsnPDKyTrNMiK4eJ/HSjyIWN1hpYVlg9QuTA/7wfn0Dp186tEpnTVv19B0LiiyLLh4i4XaWPud0iY8YUbPipx5gxgY/lDyPjhdsmn/a6RGY0FUneElw/rjyRaXEiafusscsEQIIY8IscEuIVEofDURVYCfxdRKaXOFYdcItIhsPhGAJ8KCKtPfTxDPAMQNOmTbseP26oen/ZZcMoiI1XMSSa0HJqLmz7IwzZbrummCbCmOhQhXlrtIcOAYazisDxSbDleWjxhFptLaf1Ww2RmwKzW8K9x1QVe09MdMBID/FOc9pAz++hzi3mx3XnqwzWxvfCja+UPi4CUUW+K9xuc+UUzGJkvHDb5I2zS2DLSzBkW/B9bXkJospD53eC78sEDocjUUTi/bUL6d3C4XCUB6YBE0o6XgAicllEMq78fx5Q3uFw1PHQ7jMRiReR+Lp1/epVanTcV/hoOERJXyRNtdsSTSTS+V3Y/yFknQrsfIcDmj8MQ3ZA5jGY39l/FqVGcWyi+vv05nh5w+FQZScC3baKKg+9JsG+sXB+VfFjBdt6RSkab2U1RsYLt02+MFPR3h8tR6l4vwgtZBzKbEcH8CWwV0Q+8NKm/pV2OByOm6/Yo+skBEv9AXBuha4bFA4KYr92vRWxf+QaG6naAq57Brb/Kbh+KtaD3pOh0z+UpmDiC6oWksY7wWTMtXgCkiaDMzuw86s0ge7jYe1IyDmvXisZT+V2l463shIj44XbJl/kXVJlPrypEJilZjuo1EjF/UUiRvYmA3kAvQEBdgDbrjyGAM8Cz15p8ztgNyrmawPQ01+/OubLIPM6i5xfY7cVZQO3W2RBd5GjP9htiSaSKIgnyksTmd5A5OJma/rNuSiy9lGRWa1Ezq6wps9rjZTtIjOaiLicvtt5ivkqYOkdIkcnBGfHttdElg5QdkyfXjqeqmi81fTpwY1VEiPjhdsmXxz4VGTVfRb3+V8V9xdGiJSYL6uJj4+XhIQEu82IfLa+DNGVoeObdltSNjizGBKfgyG7lFSJRlM0nujQF3B0PAxYbV0szcnZsPk3Krao87sqU0+jSHwBylWFTn/13c5TzFcBx36EI1/CbUGEcLidsGwgxPWD9m+o0g3DhhX/DIh4fj1YvPVb9HUIr02+WNgd2r8Oje62rs+8VJjVAoYeUfrHYSAiYr40NqLjvsJL/QFQoTYc/9FuSzSRSMtRkJ8BJyyMDWw8FO7aCa4smNdBBStrIOsMcuATppy/mZTMIIrVNhmmaq1l+pCJ8kdUOeg1EQ6Ng3NLYfjw0s6Mw+H5dQ/kz+/O4emDSLmc7n9sb/0Wfd1Im3CQtlcVqW0wyNp+Y2opZYNjE63t1wK083WtUrc3XNqh9cbCxc+xX2+rX7saTVGioqHrWLUi7TJYeNUIMbWg+9fQ7b+w4SnY+EzEV/YOKeKGTb/kQMW7GXFhKFMSTgTeV3RFJR105JvgbKrUAHpOgHWPQ9bpwPtJmkbu5SSSzl2EBV0heXNwdkUSR76GFo8bUyEwS4TW/NLO17VKuUpK6/HccrstKTvE3QaV6kfkryxNBBDXH2p1hn3/tr7vhoNhyE71/7nt4fQC68e4GtjzHuSlUneAKpI6Ir5JcP21HKW2i4MtpBzXX5X+WftwYD/O0g/B5mdx9ZrKgbY/EtPpdVh5typzY6Uzbwdupypqa1WWY0nibofc8xGnm6qdr2sZvfUYXn5e/fqrXv3SeOamf8K+983L1xghpgbc8hl0/wo2/1rV+8tLtX6cUCMB6g2eXwX7/w29JhFbrQoAsVVigrMlNh6iK8H
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X_outliers, Y_outliers, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
"plot_new_example(fig, 2.8, 0.9)\n",
|
|||
|
"plot_voronoi(fig, X_outliers[:, 1:])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Pojedyncza obserwacja odstająca dramatycznie zmienia granice klas."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Aby temu zaradzić, użyjemy więcej niż jednego najbliższego sąsiada ($k > 1$)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Algorytm $k$ najbliższych sąsiadów dla problemu klasyfikacji\n",
|
|||
|
"\n",
|
|||
|
"1. Dany jest zbiór uczący zawierajacy przykłady $(x_i, y_i)$, gdzie: $x_i$ – zestaw cech, $y_i$ – klasa.\n",
|
|||
|
"1. Dany jest przykład testowy $x'$, dla którego chcemy określić klasę.\n",
|
|||
|
"1. Oblicz odległość $d(x', x_i)$ dla każdego przykładu $x_i$ ze zbioru uczącego.\n",
|
|||
|
"1. Wybierz $k$ przykładów $x_{i_1}, \\ldots, x_{i_k}$, dla których wyliczona odległość jest najmniejsza.\n",
|
|||
|
"1. Jako wynik $y'$ zwróć tę spośrod klas $y_{i_1}, \\ldots, y_{i_k}$, która występuje najczęściej."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Algorytm $k$ najbliższych sąsiadów dla problemu klasyfikacji – przykład"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 52,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Odległość euklidesowa\n",
|
|||
|
"def euclidean_distance(x1, x2):\n",
|
|||
|
" return np.linalg.norm(x1 - x2)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 53,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Algorytm k najbliższych sąsiadów\n",
|
|||
|
"def knn(X, Y, x_new, k, distance=euclidean_distance):\n",
|
|||
|
" data = np.concatenate((X, Y), axis=1)\n",
|
|||
|
" nearest = sorted(\n",
|
|||
|
" data, key=lambda xy:distance(xy[0, :-1], x_new))[:k]\n",
|
|||
|
" y_nearest = [xy[0, -1] for xy in nearest]\n",
|
|||
|
" return max(y_nearest, key=lambda y:y_nearest.count(y))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 54,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wykres klas dla KNN\n",
|
|||
|
"def plot_knn(fig, X, Y, k, distance=euclidean_distance):\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" x1min, x2min = X.min(axis=0).tolist()[0]\n",
|
|||
|
" x1max, x2max = X.max(axis=0).tolist()[0]\n",
|
|||
|
" pad1 = (x1max - x1min) / 10\n",
|
|||
|
" pad2 = (x2max - x2min) / 10\n",
|
|||
|
" step1 = (x1max - x1min) / 50\n",
|
|||
|
" step2 = (x2max - x2min) / 50\n",
|
|||
|
" x1grid, x2grid = np.meshgrid(\n",
|
|||
|
" np.arange(x1min - pad1, x1max + pad1, step1),\n",
|
|||
|
" np.arange(x2min - pad2, x2max + pad2, step2))\n",
|
|||
|
" z = np.matrix([[knn(X, Y, [x1, x2], k, distance) \n",
|
|||
|
" for x1, x2 in zip(x1row, x2row)] \n",
|
|||
|
" for x1row, x2row in zip(x1grid, x2grid)])\n",
|
|||
|
" plt.contour(x1grid, x2grid, z, levels=[0.5]);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 55,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przygotowanie interaktywnego wykresu\n",
|
|||
|
"\n",
|
|||
|
"slider_k = widgets.IntSlider(min=1, max=10, step=1, value=1, description=r'$k$', width=300)\n",
|
|||
|
"\n",
|
|||
|
"def interactive_knn_1(k):\n",
|
|||
|
" fig = plot_data_for_classification(X_outliers, Y_outliers, xlabel=u'dł. płatka', ylabel=u'szer. płatka')\n",
|
|||
|
" plot_voronoi(fig, X_outliers[:, 1:])\n",
|
|||
|
" plot_knn(fig, X_outliers[:, 1:], Y_outliers, k)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 56,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "6e20fc55e2ad4e59874fd59b9f128073",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"text/plain": [
|
|||
|
"interactive(children=(IntSlider(value=1, description='$k$', max=10, min=1), Button(description='Run Interact',…"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<function __main__.interactive_knn_1(k)>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 56,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"widgets.interact_manual(interactive_knn_1, k=slider_k)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 58,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych (inny przykład)\n",
|
|||
|
"\n",
|
|||
|
"alldata = pandas.read_csv('classification.tsv', sep='\\t')\n",
|
|||
|
"data = np.matrix(alldata)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data[:, 1:].reshape(m, n)\n",
|
|||
|
"\n",
|
|||
|
"X2 = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)\n",
|
|||
|
"Y2 = np.matrix(data[:, 0]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 59,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df4wc533f8c/nJJ5QHw+JSVEyLYmREx6CWmosq1vlh4meZZuudKjD4yHFyRVcpSVKCK3QM6UEYOE0NZoWdR3Y7DlRbNCOALlQrEth8kQk58iyGkRhDKc6GvpBWpbvrPoHfaxEU459uhQilf32j5kVR8fdu70fu/Ps3vsFLHbmeWaW3x3u7X1u5pkZR4QAAACQrp6yCwAAAMDSCGwAAACJI7ABAAAkjsAGAACQOAIbAABA4ghsAAAAibu87ALKcOWVV8b1119fdhkAAABvcOLEiR9GxLbF7RsysF1//fWanp4uuwwAAIA3sP3deu0cEgUAAEgcgQ0AACBxBDYAAIDEEdgAAAASR2ADAABIHIENAAAgcQQ2AACAxBHYAAAAEkdgAwAA3SdCOno0e26mPXEENgAA0H0mJ6WREenAgYvhLCKbHxnJ+jvIhrw1FQAA6HLDw9LYmDQ+ns0fOpSFtfHxrH14uNz6VojABgAAuo+dhTQpC2m14DY2lrXb5dW2Co4OO4a7HiqVSnDzdwAANoAIqacwAqxaTTqs2T4REZXF7YxhA8rSZQNiASA5tTFrRcUxbR2EwAaUpcsGxAJAUmrfp7Uxa9XqxTFtHRjaGMMGlKXLBsQCQFImJy9+n9bGrBXHtA0OSnv3llvjCjCGDShT8S/Amg4dEAsASYnIQtvw8Bu/Txu1J6LRGDYCG1C2DhsQCwBoHU46AFLURQNiAQCtQ2ADytJlA2IBAK3DSQdAWbpsQCwAoHUIbEBZhoelI0feOPC1FtoGBzlLFADwOgIbUBa7/h60Ru0AgA2LMWwAAACJSyKw2b7N9vO2Z20frNP/m7afyh8nbf+d7S1533dsP5v3ca0OAADQdUo/JGr7Mkn3S9ot6bSkJ20fi4hv1JaJiN+V9Lv58h+QdCAiXi68zK0R8cM2lg0AANA2Kexhu0XSbES8EBHnJT0sac8Sy39Q0hfaUhkAAEACUghs10j6fmH+dN52CdtvknSbpC8WmkPSl22fsL2/ZVUCAACUpPRDopLq3YOn0RVDPyDprxYdDn1XRMzZvkrSY7a/GRFPXPKPZGFuvyTt2LFjrTUDAAC0TQp72E5Luq4wf62kuQbL3qFFh0MjYi5/fknSUWWHWC8REYcjohIRlW3btq25aAAAgHZJIbA9KWnA9tts9yoLZccWL2T7pyQNSnqk0NZnu782Len9kk62pWoAAIA2Kf2QaES8ZvseSY9KukzSAxFxyvbdef9n8kX3SvpyRCwUVr9a0lFnV4m/XNIfRcSfta96AACA1nNswBtMVyqVmJ7mkm0AACAttk9ERGVxewqHRAEAALAEAhsAAEDiCGwAAACJI7ABAAAkjsAGAACQOAIbAABA4ghsAAAAiSOwAQAAJI7Atl4ipKNHs+dm2gEAAJpEYFsvk5PSyIh04MDFcBaRzY+MZP0AAACrUPq9RLvG8LA0NiaNj2fzhw5lYW18PGsfHi63PgAA0LEIbOvFzkKalIW0WnAbG8vasxvUAwAArBg3f19vEVJP4UhztUpYAwAATeHm7+1QG7NWVBzTBgAAsAoEtvVSC2u1MWvV6sUxbYQ2AACwBoxhWy+TkxfDWm3MWnFM2+CgtHdvuTUCAICORGBbL8PD0pEj2XNtzFottA0OcpYoAABYNQLberHr70Fr1A4AANAkxrABAAAkjsAGAACQOAIbAABA4ghsAAAAiSOwAQAAJI7ABgAAkDgCGwAAQOIIbAAAAIkjsAEAACSOwAYAAJA4AhsAAEDikghstm+z/bztWdsH6/S/2/aPbT+VP3672XUBAAA6Xek3f7d9maT7Je2WdFrSk7aPRcQ3Fi36lxHxT1e5LoAG5l+d18SpCc2cm9HA1gGN3jCq/iv6yy4LAFBQemCTdIuk2Yh4QZJsPyxpj6RmQtda1gU2vOPfO66hh4ZUjaoWLiyob1Of7n30Xk3dOaVdO3aVXR4AIJfCIdFrJH2/MH86b1vsl20/bftLtm9Y4boAFpl/dV5DDw1p/vy8Fi4sSJIWLixo/nzW/sr5V0quEABQk0Jgc522WDT/dUk/ExHvkPR7kiZXsG62oL3f9rTt6bNnz666WKBbTJyaUDWqdfuqUdXEyYk2VwQAaCSFwHZa0nWF+WslzRUXiIifRMQr+fSUpE22r2xm3cJrHI6ISkRUtm3btp71Ax1p5tzM63vWFlu4sKDZl2fbXBEAoJEUAtuTkgZsv812r6Q7JB0rLmD7LbadT9+irO5zzawLoL6BrQPq29RXt69vU592btnZ5ooAAI2UHtgi4jVJ90h6VNJzkv44Ik7Zvtv23flivybppO2nJX1K0h2Rqbtu+98F0HlGbxhVj+t/BfS4R6M3jra5IgBAI46oO+Srq1UqlZieni67DKB09c4S7XEPZ4kCQElsn4iIyuL2FC7rAaAku3bs0tx9c5o4OaHZl2e1c8tOjd44qs29m8suDQBQQGADNrjNvZu17+Z9ZZcBAFhC6WPYAAAAsDQCGwAAQOI4JAoAieC+rgAaIbABQAK4ryuApXBIFABKxn1dASyHwAYAJeO+rgCWQ2ADgJJxX1cAyyGwAUDJuK8rgOUQ2ACgZNzXFcByCGwAULL+K/o1deeU+nv7X9/T1repT/29WTu3CgPAZT0AIAHc1xVIUIQ0OSkND0v28u0tRGADgERwX1cgMZOT0siINDYmHTqUhbMI6cABaXxcOnJE2ru3LaVwSBTNi5COHs2em2kHAKCTDQ9nYW18PAtpxbA2Npb1twmBDc2r/aVR+9BKFz+8IyNZPwAA3cLO9qzVQltPz8WwVtvj1iYENjQvob80AABoi1poK2pzWJMIbFiJhP7SAACgLWo7J4qKR5rahMCGlUnkLw0AAFpu8ZGkavXSI01tQmDDyiTylwYAAC03OXnpkaTikaY2jt0msKF5Cf2lAQBAyw0PZ5fuKB5JqoW2I0faOnab67CheY3+0pCy9sHBtl2PBgCAlrPr/15r1N5CBDY0r/aXRvHKzrXQNjjIWaIAALQIgQ3NS+gvDQAANhLGsAEAACSOwAYAAJA4AhsAAEDiCGwAAACJI7ABAAAkjsAGAACQuCQCm+3bbD9ve9b2wTr9d9p+Jn981fY7Cn3fsf2s7adsT7e3cgAAgNYr/Tpsti+TdL+k3ZJOS3rS9rGI+EZhsf8jaTAifmT7dkmHJf1iof/WiPhh24oGAABooxT2sN0iaTYiXoiI85IelrSnuEBEfDUifpTPfk3StW2uEQAAoDQpBLZrJH2/MH86b2tkn6QvFeZD0pdtn7C9v9FKtvfbnrY9ffbs2TUVDAAA0E6lHxKV5DptUXdB+1ZlgW1XofldETFn+ypJj9n+ZkQ8cckLRhxWdihVlUql7usDAACkKIU9bKclXVeYv1bS3OKFbP+CpM9J2hMR52rtETGXP78k6aiyQ6wAAABdI4XA9qSkAdtvs90r6Q5Jx4oL2N4h6YikD0XEtwrtfbb7a9OS3i/pZNsqBwAAaIPSD4lGxGu275H0qKTLJD0QEads3533f0bSb0vaKukPbEvSaxFRkXS1pKN52+WS/igi/qyEtwEAANAyjth4w7kqlUpMT3PJNgAAkBbbJ/KdUm+QwiFRAAAALIHABgAAkDgCGwAAQOIIbAAAYOOKkI4ezZ6baS8JgQ0AAGxck5PSyIh04MDFcBaRzY+MZP0JKP2yHgAAAKUZHpbGxqTx8Wz+0KEsrI2PZ+3Dw+XWlyOwAQCAjcvOQpqUhbRacBsby9pd7w6a7cd12AAAACKknsJIsWq1lLDGddgAAADqqY1ZKyqOaUsAgQ0AAGxctbBWG7NWrV4c05ZQaGMMGwAAWBfzr85r4tSEZs7NaGDrgEZvGFX/Ff1ll7W0ycmLYa0
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 60,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przygotowanie interaktywnego wykresu\n",
|
|||
|
"\n",
|
|||
|
"slider_k = widgets.IntSlider(min=1, max=10, step=1, value=1, description=r'$k$', width=300)\n",
|
|||
|
"\n",
|
|||
|
"def interactive_knn_2(k):\n",
|
|||
|
" fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')\n",
|
|||
|
" plot_voronoi(fig, X2[:, 1:])\n",
|
|||
|
" plot_knn(fig, X2[:, 1:], Y2, k)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 61,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"application/vnd.jupyter.widget-view+json": {
|
|||
|
"model_id": "db278f40e7fa4fdc90a9fe2f15a917c6",
|
|||
|
"version_major": 2,
|
|||
|
"version_minor": 0
|
|||
|
},
|
|||
|
"text/plain": [
|
|||
|
"interactive(children=(IntSlider(value=1, description='$k$', max=10, min=1), Button(description='Run Interact',…"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<function __main__.interactive_knn_2(k)>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 61,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"widgets.interact_manual(interactive_knn_2, k=slider_k)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Algorytm $k$ najbliższych sąsiadów dla problemu regresji\n",
|
|||
|
"\n",
|
|||
|
"1. Dany jest zbiór uczący zawierajacy przykłady $(x_i, y_i)$, gdzie: $x_i$ – zestaw cech, $y_i$ – liczba rzeczywista.\n",
|
|||
|
"1. Dany jest przykład testowy $x'$, dla którego chcemy określić klasę.\n",
|
|||
|
"1. Oblicz odległość $d(x', x_i)$ dla każdego przykładu $x_i$ ze zbioru uczącego.\n",
|
|||
|
"1. Wybierz $k$ przykładów $x_{i_1}, \\ldots, x_{i_k}$, dla których wyliczona odległość jest najmniejsza.\n",
|
|||
|
"1. Jako wynik $y'$ zwróć średnią liczb $y_{i_1}, \\ldots, y_{i_k}$:\n",
|
|||
|
" $$ y' = \\frac{1}{k} \\sum_{j=1}^{k} y_{i_j} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Wybór $k$\n",
|
|||
|
"\n",
|
|||
|
"* Wartość $k$ ma duży wpływ na wynik działania algorytmu KNN:\n",
|
|||
|
" * Jeżeli $k$ jest zbyt duże, wszystkie nowe przykłady są klasyfikowane jako klasa większościowa.\n",
|
|||
|
" * Jeżeli $k$ jest zbyt małe, granice klas są niestabilne, a algorytm jest bardzo podatny na obserwacje odstające.\n",
|
|||
|
"* Aby dobrać optymalną wartość $k$, najlepiej użyć zbioru walidacyjnego."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Miary podobieństwa"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Odległość euklidesowa\n",
|
|||
|
"$$ d(x, x') = \\sqrt{ \\sum_{i=1}^n \\left( x_i - x'_i \\right) ^2 } $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Dobry wybór w przypadku numerycznych cech.\n",
|
|||
|
"* Symetryczna, traktuje wszystkie wymiary jednakowo.\n",
|
|||
|
"* Wrażliwa na duże wahania jednej cechy."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Odległość Hamminga\n",
|
|||
|
"$$ d(x, x') = \\sum_{i=1}^n \\mathbf{1}_{x_i \\neq x'_i} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Dobry wybór w przypadku cech zero-jedynkowych.\n",
|
|||
|
"* Liczba cech, którymi różnią się dane przykłady."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Odległość Minkowskiego ($p$-norma)\n",
|
|||
|
"$$ d(x, x') = \\sqrt[p]{ \\sum_{i=1}^n \\left| x_i - x'_i \\right| ^p } $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Dla $p = 2$ jest to odległość euklidesowa.\n",
|
|||
|
"* Dla $p = 1$ jest to odległość taksówkowa.\n",
|
|||
|
"* Jeżeli $p \\to \\infty$, to $p$-norma zbliża się do logicznej alternatywy.\n",
|
|||
|
"* Jeżeli $p \\to 0$, to $p$-norma zbliża się do logicznej koniunkcji."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### KNN – praktyczne porady"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Co zrobić z remisami?\n",
|
|||
|
" * Można wybrać losową klasę.\n",
|
|||
|
" * Można wybrać klasę o wyższym prawdopodobieństwie _a priori_.\n",
|
|||
|
" * Można wybrać klasę wskazaną przez algorytm 1NN."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* KNN źle radzi sobie z brakującymi wartościami cech (nie można wówczas sensownie wyznaczyć odległości)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 8.3. Drzewa decyzyjne"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Drzewa decyzyjne – przykład"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 61,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przydatne importy\n",
|
|||
|
"\n",
|
|||
|
"import ipywidgets as widgets\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas\n",
|
|||
|
"\n",
|
|||
|
"%matplotlib inline"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 64,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
" Day Outlook Humidity Wind Play\n",
|
|||
|
"0 1 Sunny High Weak No\n",
|
|||
|
"1 2 Sunny High Strong No\n",
|
|||
|
"2 3 Overcast High Weak Yes\n",
|
|||
|
"3 4 Rain High Weak Yes\n",
|
|||
|
"4 5 Rain Normal Weak Yes\n",
|
|||
|
"5 6 Rain Normal Strong No\n",
|
|||
|
"6 7 Overcast Normal Strong Yes\n",
|
|||
|
"7 8 Sunny High Weak No\n",
|
|||
|
"8 9 Sunny Normal Weak Yes\n",
|
|||
|
"9 10 Rain Normal Weak Yes\n",
|
|||
|
"10 11 Sunny Normal Strong Yes\n",
|
|||
|
"11 12 Overcast High Strong Yes\n",
|
|||
|
"12 13 Overcast Normal Weak Yes\n",
|
|||
|
"13 14 Rain High Strong No\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"alldata = pandas.read_csv('tennis.tsv', sep='\\t')\n",
|
|||
|
"print(alldata)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 65,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"{'Outlook': {'Overcast', 'Rain', 'Sunny'},\n",
|
|||
|
" 'Humidity': {'High', 'Normal'},\n",
|
|||
|
" 'Wind': {'Strong', 'Weak'}}"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 65,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Dane jako lista słowników\n",
|
|||
|
"data = alldata.T.to_dict().values()\n",
|
|||
|
"features = ['Outlook', 'Humidity', 'Wind']\n",
|
|||
|
"\n",
|
|||
|
"# Możliwe wartości w poszczególnych kolumnach\n",
|
|||
|
"values = {feature: set(row[feature] for row in data)\n",
|
|||
|
" for feature in features}\n",
|
|||
|
"values"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Czy John zagra w tenisa, jeżeli będzie padać, przy wysokiej wilgotności i silnym wietrze?\n",
|
|||
|
"* Algorytm drzew decyzyjnych spróbuje _zrozumieć_ „taktykę” Johna.\n",
|
|||
|
"* Wykorzystamy metodę „dziel i zwyciężaj”."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 66,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Podziel dane\n",
|
|||
|
"def split(features, data):\n",
|
|||
|
" values = {feature: list(set(row[feature]\n",
|
|||
|
" for row in data))\n",
|
|||
|
" for feature in features}\n",
|
|||
|
" if not features:\n",
|
|||
|
" return data\n",
|
|||
|
" return {val: split(features[1:],\n",
|
|||
|
" [row for row in data\n",
|
|||
|
" if row[features[0]] == val])\n",
|
|||
|
" for val in values[features[0]]}"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 67,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 1:\tSunny\tHigh\tWeak\tNo\n",
|
|||
|
"Day 2:\tSunny\tHigh\tStrong\tNo\n",
|
|||
|
"Day 8:\tSunny\tHigh\tWeak\tNo\n",
|
|||
|
"Day 9:\tSunny\tNormal\tWeak\tYes\n",
|
|||
|
"Day 11:\tSunny\tNormal\tStrong\tYes\n",
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 4:\tRain\tHigh\tWeak\tYes\n",
|
|||
|
"Day 5:\tRain\tNormal\tWeak\tYes\n",
|
|||
|
"Day 6:\tRain\tNormal\tStrong\tNo\n",
|
|||
|
"Day 10:\tRain\tNormal\tWeak\tYes\n",
|
|||
|
"Day 14:\tRain\tHigh\tStrong\tNo\n",
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 3:\tOvercast\tHigh\tWeak\tYes\n",
|
|||
|
"Day 7:\tOvercast\tNormal\tStrong\tYes\n",
|
|||
|
"Day 12:\tOvercast\tHigh\tStrong\tYes\n",
|
|||
|
"Day 13:\tOvercast\tNormal\tWeak\tYes\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"split_data = split(['Outlook'], data)\n",
|
|||
|
"\n",
|
|||
|
"for outlook in values['Outlook']:\n",
|
|||
|
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
|
|||
|
" for row in split_data[outlook]:\n",
|
|||
|
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
|
|||
|
" .format(**row))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Obserwacja: John lubi grać, gdy jest pochmurnie.\n",
|
|||
|
"\n",
|
|||
|
"W pozostałych przypadkach podzielmy dane ponownie:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 68,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 1:\tSunny\tHigh\tWeak\tNo\n",
|
|||
|
"Day 2:\tSunny\tHigh\tStrong\tNo\n",
|
|||
|
"Day 8:\tSunny\tHigh\tWeak\tNo\n",
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 9:\tSunny\tNormal\tWeak\tYes\n",
|
|||
|
"Day 11:\tSunny\tNormal\tStrong\tYes\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"split_data_sunny = split(['Outlook', 'Humidity'], data)\n",
|
|||
|
"\n",
|
|||
|
"for humidity in values['Humidity']:\n",
|
|||
|
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
|
|||
|
" for row in split_data_sunny['Sunny'][humidity]:\n",
|
|||
|
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
|
|||
|
" .format(**row))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 69,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 6:\tRain\tNormal\tStrong\tNo\n",
|
|||
|
"Day 14:\tRain\tHigh\tStrong\tNo\n",
|
|||
|
"\n",
|
|||
|
"\tOutlook\tHumid\tWind\tPlay\n",
|
|||
|
"Day 4:\tRain\tHigh\tWeak\tYes\n",
|
|||
|
"Day 5:\tRain\tNormal\tWeak\tYes\n",
|
|||
|
"Day 10:\tRain\tNormal\tWeak\tYes\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"split_data_rain = split(['Outlook', 'Wind'], data)\n",
|
|||
|
"\n",
|
|||
|
"for wind in values['Wind']:\n",
|
|||
|
" print('\\n\\tOutlook\\tHumid\\tWind\\tPlay')\n",
|
|||
|
" for row in split_data_rain['Rain'][wind]:\n",
|
|||
|
" print('Day {Day}:\\t{Outlook}\\t{Humidity}\\t{Wind}\\t{Play}'\n",
|
|||
|
" .format(**row))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Outlook=\n",
|
|||
|
" * Overcast\n",
|
|||
|
" * → Playing\n",
|
|||
|
" * Sunny\n",
|
|||
|
" * Humidity=\n",
|
|||
|
" * High\n",
|
|||
|
" * → Not playing\n",
|
|||
|
" * Normal\n",
|
|||
|
" * → Playing\n",
|
|||
|
" * Rain\n",
|
|||
|
" * Wind=\n",
|
|||
|
" * Weak\n",
|
|||
|
" * → Playing\n",
|
|||
|
" * Strong\n",
|
|||
|
" * → Not playing"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* (9/5)\n",
|
|||
|
" * Outlook=Overcast (4/0)\n",
|
|||
|
" * YES\n",
|
|||
|
" * Outlook=Sunny (2/3)\n",
|
|||
|
" * Humidity=High (0/3)\n",
|
|||
|
" * NO\n",
|
|||
|
" * Humidity=Normal (2/0)\n",
|
|||
|
" * YES\n",
|
|||
|
" * Outlook=Rain (3/2)\n",
|
|||
|
" * Wind=Weak (3/0)\n",
|
|||
|
" * YES\n",
|
|||
|
" * Wind=Strong (0/2)\n",
|
|||
|
" * NO"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Algorytm ID3"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Pseudokod algorytmu:\n",
|
|||
|
"\n",
|
|||
|
"* podziel(węzeł, zbiór przykładów):\n",
|
|||
|
" 1. A ← najlepszy atrybut do podziału zbioru przykładów\n",
|
|||
|
" 1. Dla każdej wartości atrybutu A, utwórz nowy węzeł potomny\n",
|
|||
|
" 1. Podziel zbiór przykładów na podzbiory według węzłów potomnych\n",
|
|||
|
" 1. Dla każdego węzła potomnego i podzbioru:\n",
|
|||
|
" * jeżeli podzbiór jest jednolity: zakończ\n",
|
|||
|
" * w przeciwnym przypadku: podziel(węzeł potomny, podzbiór)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Jak wybrać „najlepszy atrybut”?\n",
|
|||
|
"* powinien zawierać jednolity podzbiór\n",
|
|||
|
"* albo przynajmniej „w miarę jednolity”\n",
|
|||
|
"\n",
|
|||
|
"Skąd wziąć miarę „jednolitości” podzbioru?\n",
|
|||
|
"* miara powinna być symetryczna (4/0 vs. 0/4)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Entropia\n",
|
|||
|
"\n",
|
|||
|
"$$ H(S) = - p_{(+)} \\log p_{(+)} - p_{(-)} \\log p_{(-)} $$\n",
|
|||
|
"\n",
|
|||
|
"* $S$ – podzbiór przykładów\n",
|
|||
|
"* $p_{(+)}$, $p_{(-)}$ – procent pozytywnych/negatywnych przykładów w $S$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Entropię można traktować jako „liczbę bitów” potrzebną do sprawdzenia, czy losowo wybrany $x \\in S$ jest pozytywnym, czy negatywnym przykładem."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Przykład:\n",
|
|||
|
"\n",
|
|||
|
"* (3 TAK / 3 NIE):\n",
|
|||
|
"$$ H(S) = -\\frac{3}{6} \\log\\frac{3}{6} - \\frac{3}{6} \\log\\frac{3}{6} = 1 \\mbox{ bit} $$\n",
|
|||
|
"* (4 TAK / 0 NIE):\n",
|
|||
|
"$$ H(S) = -\\frac{4}{4} \\log\\frac{4}{4} - \\frac{0}{4} \\log\\frac{0}{4} = 0 \\mbox{ bitów} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### *Information gain*\n",
|
|||
|
"\n",
|
|||
|
"*Information gain* – różnica między entropią przed podziałem a entropią po podziale (podczas podziału entropia zmienia się):\n",
|
|||
|
"\n",
|
|||
|
"$$ \\mathop{\\rm Gain}(S,A) = H(S) - \\sum_{V \\in \\mathop{\\rm Values(A)}} \\frac{|S_V|}{|S|} H(S_V) $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Przykład:\n",
|
|||
|
"\n",
|
|||
|
"$$ \\mathop{\\rm Gain}(S, Wind) = H(S) - \\frac{8}{14} H(S_{Wind={\\rm Weak}}) - \\frac{6}{14} H(S_{Wind={\\rm Strong}}) = \\\\\n",
|
|||
|
"= 0.94 - \\frac{8}{14} \\cdot 0.81 - \\frac{6}{14} \\cdot 1.0 = 0.049 $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* _Information gain_ jest całkiem sensowną heurystyką wskazującą, który atrybut jest najlepszy do dokonania podziału.\n",
|
|||
|
"* **Ale**: _information gain_ przeszacowuje użyteczność atrybutów, które mają dużo różnych wartości.\n",
|
|||
|
"* **Przykład**: gdybyśmy wybrali jako atrybut *datę*, otrzymalibyśmy bardzo duży *information gain*, ponieważ każdy podzbiór byłby jednolity, a nie byłoby to ani trochę użyteczne!"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### _Information gain ratio_\n",
|
|||
|
"\n",
|
|||
|
"$$ \\mathop{\\rm GainRatio}(S, A) = \\frac{ \\mathop{\\rm Gain}(S, A) }{ -\\sum_{V \\in \\mathop{\\rm Values}(A)} \\frac{|S_V|}{|S|} \\log\\frac{|S_V|}{|S|} } $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* _Information gain ratio_ może być lepszym wyborem heurystyki wskazującej najużyteczniejszy atrybut, jeżeli atrybuty mają wiele różnych wartości."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Drzewa decyzyjne a formuły logiczne\n",
|
|||
|
"\n",
|
|||
|
"Drzewo decyzyjne można pzekształcić na formułę logiczną w postaci normalnej (DNF):\n",
|
|||
|
"\n",
|
|||
|
"$$ Play={\\rm True} \\Leftrightarrow \\left( Outlook={\\rm Overcast} \\vee \\\\\n",
|
|||
|
"( Outlook={\\rm Rain} \\wedge Wind={\\rm Weak} ) \\vee \\\\\n",
|
|||
|
"( Outlook={\\rm Sunny} \\wedge Humidity={\\rm Normal} ) \\right) $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Klasyfikacja wieloklasowa przy użyciu drzew decyzyjnych\n",
|
|||
|
"\n",
|
|||
|
"Algorytm przebiega analogicznie, zmienia się jedynie wzór na entropię:\n",
|
|||
|
"\n",
|
|||
|
"$$ H(S) = -\\sum_{y \\in Y} p_{(y)} \\log p_{(y)} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Skuteczność algorytmu ID3\n",
|
|||
|
"\n",
|
|||
|
"* Przyjmujemy, że wśród danych uczących nie ma duplikatów (tj. przykładów, które mają jednakowe cechy $x$, a mimo to należą do różnych klas $y$).\n",
|
|||
|
"* Wówczas algorytm drzew decyzyjnych zawsze znajdzie rozwiązanie, ponieważ w ostateczności będziemy mieli węzły 1-elementowe na liściach drzewa."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Nadmierne dopasowanie drzew decyzyjnych\n",
|
|||
|
"\n",
|
|||
|
"* Zauważmy, że w miarę postępowania algorytmu dokładność przewidywań drzewa (*accuracy*) liczona na zbiorze uczącym dąży do 100% (i w ostateczności osiąga 100%, nawet kosztem jednoelementowych liści).\n",
|
|||
|
"* Takie rozwiązanie niekoniecznie jest optymalne. Dokładność na zbiorze testowym może być dużo niższa, a to oznacza nadmierne dopasowanie."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Jak zapobiec nadmiernemu dopasowaniu?\n",
|
|||
|
"\n",
|
|||
|
"Aby zapobiegać nadmiernemu dopasowaniu drzew decyzyjnych, należy je przycinać (*pruning*)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Można tego dokonywać na kilka sposobów:\n",
|
|||
|
"* Można zatrzymywać procedurę podziału w pewnym momencie (np. kiedy podzbiory staja się zbyt małe).\n",
|
|||
|
"* Można najpierw wykonać algorytm ID3 w całości, a następnie przyciąć drzewo, np. kierując się wynikami uzyskanymi na zbiorze walidacyjnym.\n",
|
|||
|
"* Algorytm _sub-tree replacement pruning_ (algorytm zachłanny)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Algorytm _Sub-tree replacement pruning_\n",
|
|||
|
"\n",
|
|||
|
"1. Dla każdego węzła:\n",
|
|||
|
" 1. Udaj, że usuwasz węzeł wraz z całym zaczepionym w nim poddrzewem.\n",
|
|||
|
" 1. Dokonaj ewaluacji na zbiorze walidacyjnym.\n",
|
|||
|
"1. Usuń węzeł, którego usunięcie daje największą poprawę wyniku.\n",
|
|||
|
"1. Powtarzaj, dopóki usuwanie węzłów poprawia wynik."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Zalety drzew decyzyjnych\n",
|
|||
|
"\n",
|
|||
|
"* Zasadę działania drzew decyzyjnych łatwo zrozumieć człowiekowi.\n",
|
|||
|
"* Atrybuty, które nie wpływają na wynik, mają _gain_ równy 0, zatem są od razu pomijane przez algorytm.\n",
|
|||
|
"* Po zbudowaniu, drzewo decyzyjne jest bardzo szybkim klasyfikatorem (złożoność $O(d)$, gdzie $d$ jest głębokościa drzewa)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Wady drzew decyzyjnych\n",
|
|||
|
"\n",
|
|||
|
"* ID3 jest algorytmem zachłannym – może nie wskazać najlepszego drzewa.\n",
|
|||
|
"* Nie da się otrzymać granic klas (*decision boundaries*), które nie są równoległe do osi wykresu."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Lasy losowe"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Algorytm lasów losowych – idea\n",
|
|||
|
"\n",
|
|||
|
"* Algorytm lasów losowych jest rozwinięciem algorytmu ID3.\n",
|
|||
|
"* Jest to bardzo wydajny algorytm klasyfikacji.\n",
|
|||
|
"* Zamiast jednego, będziemy budować $k$ drzew."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Algorytm lasów losowych – budowa lasu\n",
|
|||
|
"\n",
|
|||
|
"1. Weź losowy podzbiór $S_r$ zbioru uczącego.\n",
|
|||
|
"1. Zbuduj pełne (tj. bez przycinania) drzewo decyzyjne dla $S_r$, używając algorytmu ID3 z następującymi modyfikacjami:\n",
|
|||
|
" * podczas podziału używaj losowego $d$-elementowego podzbioru atrybutów,\n",
|
|||
|
" * obliczaj _gain_ względem $S_r$.\n",
|
|||
|
"1. Powyższą procedurę powtórz $k$-krotnie, otrzymując $k$ drzew ($T_1, T_2, \\ldots, T_k$)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Algorytm lasów losowych – predykcja\n",
|
|||
|
"\n",
|
|||
|
"1. Sklasyfikuj $x$ według każdego z drzew $T_1, T_2, \\ldots, T_k$ z osobna.\n",
|
|||
|
"1. Użyj głosowania większościowego: przypisz klasę przewidzianą przez najwięcej drzew."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"celltoolbar": "Slideshow",
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.8.3"
|
|||
|
},
|
|||
|
"livereveal": {
|
|||
|
"start_slideshow_at": "selected",
|
|||
|
"theme": "white"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 4
|
|||
|
}
|