1275 lines
100 KiB
Plaintext
1275 lines
100 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## Uczenie maszynowe UMZ 2018/2019\n",
|
|||
|
"### 31 marca 2020\n",
|
|||
|
"# 4. Metody ewaluacji"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 4.1. Metodologia testowania"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"W uczeniu maszynowym bardzo ważna jest ewaluacja budowanego modelu. Dlatego dobrze jest podzielić posiadane dane na odrębne zbiory – osobny zbiór danych do uczenia i osobny do testowania. W niektórych przypadkach potrzeba będzie dodatkowo wyodrębnić tzw. zbiór walidacyjny."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Zbiór uczący a zbiór testowy"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Na zbiorze uczącym (treningowym) uczymy algorytmy, a na zbiorze testowym sprawdzamy ich poprawność.\n",
|
|||
|
"* Zbiór uczący powinien być kilkukrotnie większy od testowego (np. 4:1, 9:1 itp.).\n",
|
|||
|
"* Zbiór testowy często jest nieznany.\n",
|
|||
|
"* Należy unikać mieszania danych testowych i treningowych – nie wolno „zanieczyszczać” danych treningowych danymi testowymi!"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Czasami potrzebujemy dobrać parametry modelu, np. $\\alpha$ – który zbiór wykorzystać do tego celu?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Zbiór walidacyjny"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Do doboru parametrów najlepiej użyć jeszcze innego zbioru – jest to tzw. **zbiór walidacyjny**"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
" * Zbiór walidacyjny powinien mieć wielkość zbliżoną do wielkości zbioru testowego, czyli np. dane można podzielić na te trzy zbiory w proporcjach 3:1:1, 8:1:1 itp."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Walidacja krzyżowa"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Którą część danych wydzielić jako zbiór walidacyjny tak, żeby było „najlepiej”?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
" * Niech każda partia danych pełni tę rolę naprzemiennie!"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"<img width=\"100%\" src=\"https://chrisjmccormick.files.wordpress.com/2013/07/10_fold_cv.png\"/>\n",
|
|||
|
"Żródło: https://chrisjmccormick.wordpress.com/2013/07/31/k-fold-cross-validation-with-matlab-code/"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Walidacja krzyżowa\n",
|
|||
|
"\n",
|
|||
|
"* Podziel dane $D = \\left\\{ (x^{(1)}, y^{(1)}), \\ldots, (x^{(m)}, y^{(m)})\\right\\} $ na $N$ rozłącznych zbiorów $T_1,\\ldots,T_N$\n",
|
|||
|
"* Dla $i=1,\\ldots,N$, wykonaj:\n",
|
|||
|
" * Użyj $T_i$ do walidacji i zbiór $S_i$ do trenowania, gdzie $S_i = D \\smallsetminus T_i$. \n",
|
|||
|
" * Zapisz model $\\theta_i$.\n",
|
|||
|
"* Akumuluj wyniki dla modeli $\\theta_i$ dla zbiorów $T_i$.\n",
|
|||
|
"* Ustalaj parametry uczenia na akumulowanych wynikach."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Walidacja krzyżowa – wskazówki\n",
|
|||
|
"\n",
|
|||
|
"* Zazwyczaj ustala się $N$ w przedziale od $4$ do $10$, tzw. $N$-krotna walidacja krzyżowa (_$N$-fold cross validation_). \n",
|
|||
|
"* Zbiór $D$ warto zrandomizować przed podziałem.\n",
|
|||
|
"* W jaki sposób akumulować wyniki dla wszystkich zbiórow $T_i$?\n",
|
|||
|
"* Po ustaleniu parametrów dla każdego $T_i$, trenujemy model na całych danych treningowych z ustalonymi parametrami.\n",
|
|||
|
"* Testujemy na zbiorze testowym (jeśli nim dysponujemy)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### _Leave-one-out_\n",
|
|||
|
"\n",
|
|||
|
"Jest to szczególny przypadek walidacji krzyżowej, w której $N = m$."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"* Jaki jest rozmiar pojedynczego zbioru $T_i$?\n",
|
|||
|
"* Jakie są zalety i wady tej metody?\n",
|
|||
|
"* Kiedy może być przydatna?"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Zbiór walidujący a algorytmy optymalizacji\n",
|
|||
|
"\n",
|
|||
|
"* Gdy błąd rośnie na zbiorze uczącym, mamy źle dobrany parametr $\\alpha$. Należy go wtedy zmniejszyć.\n",
|
|||
|
"* Gdy błąd zmniejsza się na zbiorze trenującym, ale rośnie na zbiorze walidującym, mamy do czynienia ze zjawiskiem **nadmiernego dopasowania** (_overfitting_).\n",
|
|||
|
"* Należy wtedy przerwać optymalizację. Automatyzacja tego procesu to _early stopping_."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 4.2. Miary jakości"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Aby przeprowadzić ewaluację modelu, musimy wybrać **miarę** (**metrykę**), jakiej będziemy używać.\n",
|
|||
|
"\n",
|
|||
|
"Jakiej miary użyc najlepiej?\n",
|
|||
|
" * To zależy od rodzaju zadania.\n",
|
|||
|
" * Innych metryk używa się do regresji, a innych do klasyfikacji"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Metryki dla zadań regresji\n",
|
|||
|
"\n",
|
|||
|
"Dla zadań regresji możemy zastosować np.:\n",
|
|||
|
" * błąd średniokwadratowy (*root-mean-square error*, RMSE):\n",
|
|||
|
" $$ \\mathrm{RMSE} \\, = \\, \\sqrt{ \\frac{1}{m} \\sum_{i=1}^{m} \\left( \\hat{y}^{(i)} - y^{(i)} \\right)^2 } $$\n",
|
|||
|
" * średni błąd bezwzględny (*mean absolute error*, MAE):\n",
|
|||
|
" $$ \\mathrm{MAE} \\, = \\, \\frac{1}{m} \\sum_{i=1}^{m} \\left| \\hat{y}^{(i)} - y^{(i)} \\right| $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"W powyższych wzorach $y^{(i)}$ oznacza **oczekiwaną** wartości zmiennej $y$ w $i$-tym przykładzie, a $\\hat{y}^{(i)}$ oznacza wartość zmiennej $y$ w $i$-tym przykładzie wyliczoną (**przewidzianą**) przez nasz model."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"### Metryki dla zadań klasyfikacji\n",
|
|||
|
"\n",
|
|||
|
"Aby przedstawić kilka najpopularniejszych metryk stosowanych dla zadań klasyfikacyjnych, posłużmy się następującym przykładem:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przydatne importy\n",
|
|||
|
"\n",
|
|||
|
"import ipywidgets as widgets\n",
|
|||
|
"import matplotlib.pyplot as plt\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"import pandas\n",
|
|||
|
"import random\n",
|
|||
|
"import seaborn\n",
|
|||
|
"\n",
|
|||
|
"%matplotlib inline"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def powerme(x1,x2,n):\n",
|
|||
|
" \"\"\"Funkcja, która generuje n potęg dla zmiennych x1 i x2 oraz ich iloczynów\"\"\"\n",
|
|||
|
" X = []\n",
|
|||
|
" for m in range(n+1):\n",
|
|||
|
" for i in range(m+1):\n",
|
|||
|
" X.append(np.multiply(np.power(x1,i),np.power(x2,(m-i))))\n",
|
|||
|
" return np.hstack(X)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def plot_data_for_classification(X, Y, xlabel=None, ylabel=None, Y_predicted=[], highlight=None):\n",
|
|||
|
" \"\"\"Wykres danych dla zadania klasyfikacji\"\"\"\n",
|
|||
|
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
|||
|
" ax = fig.add_subplot(111)\n",
|
|||
|
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
|||
|
" X = X.tolist()\n",
|
|||
|
" Y = Y.tolist()\n",
|
|||
|
" X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]\n",
|
|||
|
" X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]\n",
|
|||
|
" \n",
|
|||
|
" if len(Y_predicted) > 0:\n",
|
|||
|
" Y_predicted = Y_predicted.tolist()\n",
|
|||
|
" X1tn = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 0]\n",
|
|||
|
" X1fn = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 0]\n",
|
|||
|
" X1tp = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 1]\n",
|
|||
|
" X1fp = [x[1] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 1]\n",
|
|||
|
" X2tn = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 0]\n",
|
|||
|
" X2fn = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 0]\n",
|
|||
|
" X2tp = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 1 and yp[0] == 1]\n",
|
|||
|
" X2fp = [x[2] for x, y, yp in zip(X, Y, Y_predicted) if y[0] == 0 and yp[0] == 1]\n",
|
|||
|
" \n",
|
|||
|
" if highlight == 'tn':\n",
|
|||
|
" ax.scatter(X1tn, X2tn, c='r', marker='x', s=100, label='Dane')\n",
|
|||
|
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" elif highlight == 'fn':\n",
|
|||
|
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fn, X2fn, c='g', marker='o', s=100, label='Dane')\n",
|
|||
|
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" elif highlight == 'tp':\n",
|
|||
|
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1tp, X2tp, c='g', marker='o', s=100, label='Dane')\n",
|
|||
|
" ax.scatter(X1fp, X2fp, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" elif highlight == 'fp':\n",
|
|||
|
" ax.scatter(X1tn, X2tn, c='k', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fn, X2fn, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1tp, X2tp, c='k', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fp, X2fp, c='r', marker='x', s=100, label='Dane')\n",
|
|||
|
" else:\n",
|
|||
|
" ax.scatter(X1tn, X2tn, c='r', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fn, X2fn, c='g', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1tp, X2tp, c='g', marker='o', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1fp, X2fp, c='r', marker='x', s=50, label='Dane')\n",
|
|||
|
"\n",
|
|||
|
" else:\n",
|
|||
|
" ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')\n",
|
|||
|
" ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')\n",
|
|||
|
" \n",
|
|||
|
" if xlabel:\n",
|
|||
|
" ax.set_xlabel(xlabel)\n",
|
|||
|
" if ylabel:\n",
|
|||
|
" ax.set_ylabel(ylabel)\n",
|
|||
|
" \n",
|
|||
|
" ax.margins(.05, .05)\n",
|
|||
|
" return fig"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych\n",
|
|||
|
"import pandas\n",
|
|||
|
"import numpy as np\n",
|
|||
|
"\n",
|
|||
|
"alldata = pandas.read_csv('data-metrics.tsv', sep='\\t')\n",
|
|||
|
"data = np.matrix(alldata)\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"\n",
|
|||
|
"X2 = powerme(data[:, 1], data[:, 2], n)\n",
|
|||
|
"Y2 = np.matrix(data[:, 0]).reshape(m, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAm0AAAFmCAYAAAA/JK3gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dfXRc9X3n8c9XYDkbWS22cVLH4EBqNQ04u0BVmm18KkIDIfoDDw6JTMiWtM5ySJPiAu3inHSbHNKcQrInirKlaalDQ3d9YCiRhbtV6uWxPd4NKYLlwYZDpJAtuGKDY5N0rDSSyHz3j3uvfTWakUb2zH2Yeb/OmaO5v3tn/Js7dzyfuff3YO4uAAAAZFtH2hUAAADA4ghtAAAAOUBoAwAAyAFCGwAAQA4Q2gAAAHKA0AYAAJADp6ZdgTScfvrpftZZZ6VdDQAAgDmeeOKJH7j7mmrr2jK0nXXWWRobG0u7GgAAAHOY2T/VWsflUQAAgBwgtAEAAOQAoQ0AACAHCG0AAAA5kInQZmZ3mtmrZra/xnozs6+Y2YSZPWNmF8TWXWNm4+HtmuRqDQAAkJxMhDZJX5d02QLr3y+pJ7xdK+mrkmRmqyR9RtKvSLpQ0mfMbGVTawoAAJCCTIQ2d/8HSUcW2GSzpL/ywGOSTjOztZLeJ+kBdz/i7q9JekALhz8AAIBcykRoq8M6SS/Hlg+GZbXKAQAAWkpeQptVKfMFyuc/gdm1ZjZmZmOHDh1qaOUAAACaLS+h7aCkM2PLZ0iaXKB8Hne/w9173b13zZqqs0MAAABkVl5C2x5JvxH2In2XpB+5+yuS9kq61MxWhh0QLg3LAMS5S7t3B3/rKQcAZE4mQpuZ3S3pW5LebmYHzWybmV1nZteFm4xKelHShKS/kPTbkuTuRyR9TtLj4e2WsAxA3MiItGWLdMMNxwOae7C8ZUuwHgCQaZmYMN7dr1pkvUv6RI11d0q6sxn1AlpGoSBt3y4NDQXLg4NBYBsaCsoLhXTrBwBYVCZCG4AmMwuCmhQEtSi8bd8elFu1Pj0AgCwxb8O2LL29vT42NpZ2NYDkuUsdsVYR5TKBDQAyxMyecPfeausy0aYNQAKiNmxx8TZuAIBMI7QB7SAKbFEbtnL5eBs3ghsA5AJt2oB2MDJyPLBFbdjibdz6+qQrrki3jgCABRHagHZQKEjDw8HfqA1bFNz6+ug9CgA5QGgD2oFZ9TNptcoBAJlDmzYAAIAcILQBAADkAKENAAAgBwhtAAAAOUBoAwAAyAFCGwAAQA4Q2gAAAHKAcdoAAKhQmi6peKCo8cPj6lndo4FzB9S9vDvtaqHNEdoAAIjZ99I+9e/qV9nLmpqdUteyLt2490aNXj2qTes3pV09tDEujwIAECpNl9S/q1+lmZKmZqckSVOzUyrNBOVHZ46mXEO0M0IbAACh4oGiyl6uuq7sZRX3FxOuEXAcoQ0AgND44fFjZ9gqTc1OaeLIRMI1Ao4jtAEAEOpZtUFdtrzqui5brg2rfj7hGgHHEdoAAAgNfK9LHT+Zrrqu4yfTGnjxjQnXCDiO0AYAQKj7A1dpdPpKdU9LXR4MsNDlp6p7WhqdvlIrPnBVyjVEO2PIDwAAImba9F/u1eQNn1Dx776qiVXShiOva+DXPq4Vg7dLZmnXEG3M3D3tOiSut7fXx8bG0q4GACCr3KWO2MWocpnAhkSY2RPu3lttHZdHAQCIc5duuGFu2Q03BOVAightAABEosA2NCRt3x6cYdu+PVgmuCFltGkDACAyMnI8sA0OBpdEBweDdUNDUl+fdMUV6dYRbSsToc3MLpM0JOkUSTvd/daK9YOS3hMuvlHSm9z9tHDdTyU9G657yd0vT6bWAICWUyhIw8PB36gNWxTc+vqCciAlqXdEMLNTJH1H0iWSDkp6XNJV7v5cje1/R9L57v5b4fJRd1+xlH+TjggAACCLst4R4UJJE+7+orvPSLpH0uYFtr9K0t2J1AwAACAjshDa1kl6ObZ8MCybx8zeKulsSQ/Hit9gZmNm9piZcd4aQHLcpd275zdOr1UOACchC6Gt2sA3tf6n2yrpPnf/aaxsfXga8cOSvmxmVSeGM7Nrw3A3dujQoZOrMQBIQaP1LVvm9iqMeh9u2RKsB4AGyUJoOyjpzNjyGZIma2y7VRWXRt19Mvz7oqRHJZ1f7YHufoe797p775o1a062zgAQNEqvHA4iPlwEjdYBNFAWeo8+LqnHzM6W9M8KgtmHKzcys7dLWinpW7GylZJ+7O7TZna6pHdL+kIitQaAyuEghoaC+/HhIgCgQVI/0+bur0v6pKS9kp6XdK+7HzCzW8wsPnzHVZLu8bndXd8haczMnpb0iKRba/U6BYCmiAe3CIENQBNk4Uyb3H1U0mhF2R9WLH+2yuP+t6R3NrVyALCQWlMeEdwANFjqZ9oAILeY8ghAgjJxpg0AcokpjwAkiNAGACeKKY8AJIjQBgAnyqz6mbRa5QBwEmjTBgAAkAOENgAAgBwgtAEAAOQAoQ0AACAHCG0AAAA5QGgDAADIAUIbAABADhDaAAAAcoDQBgAAkAOENgAAgBwgtAEAAOQAoQ0AACAHCG0AAAA5QGgDAADIAUIbAABADhDaAAAAcoDQBgAAkAOENgAAgBwgtAEAAOQAoQ0AACAHCG0AAAA5QGgDAADIAUIbAABADmQitJnZZWb2gplNmNmOKus/amaHzOyp8Pax2LprzGw8vF2TbM0BAACScWraFTCzUyTdLukSSQclPW5me9z9uYpNi+7+yYrHrpL0GUm9klzSE+FjX0ug6gAAAInJwpm2CyVNuPuL7j4j6R5Jm+t87PskPeDuR8Kg9oCky5pUTwAAgNRkIbStk/RybPlgWFbpA2b2jJndZ2ZnLvGxMrNrzWzMzMYOHTrUiHoDAAAkJguhzaqUecXy30g6y93/raQHJd21hMcGhe53uHuvu/euWbPmhCsLpK00XdLOJ3fq5gdu1s4nd6o0XUq7SgCABKTepk3B2bEzY8tnSJqMb+Duh2OLfyHptthjL6p47KMNryGQEfte2qf+Xf0qe1lTs1PqWtalG/feqNGrR7Vp/aa0qwcAaKIsnGl7XFKPmZ1tZp2StkraE9/AzNbGFi+X9Hx4f6+kS81spZmtlHRpWAa0nNJ0Sf27+lWaKWlqdkqSNDU7pdJMUH505mjKNQQANFPqoc3dX5f0SQVh63lJ97r7ATO7xcwuDze73swOmNnTkq6X9NHwsUckfU5B8Htc0i1hGdByigeKKnu56rqyl1XcX0y4RgCAJGXh8qjcfVTSaEXZH8buf0rSp2o89k5Jdza1gkAGjB8eP3aGrdLU7JQmjkwkXCMAQJJSP9MGoD49q3vUtayr6rquZV3asGpDwjUCACSJ0AbkxMC5A+qw6h/ZDuvQwMaBhGtUhbu0e3fwt55yAEDdCG1ATnQv79bo1aPq7uw+dsata1mXujuD8hWdK1KuoaSREWnLFumGG44HNPdgecuWYD0A4IRkok0bgPpsWr9JkzdNqri/qIkjE9qwaoMGNg5kI7BJUqEgbd8uDQ0Fy4ODQWAbGgrKC4V06wcAOWbehpcrent7fWxsLO1qAK0pOrMWBTcpCGyDg5JVGw8bABAxsyfcvbfqOkIbgIZzlzpirS/KZQIbANRhodBGm7ZWRqNwpCE60xYXb+MGADghhLZWRqNwJC1+aXT79uAMW9TGjeAGACeFjgitjEbhSNrIyPHjK2rDNjgYrBsakvr6pCuuSLeOAJBTtGlrdTQKR5Lcg+BWKMw9vmqVAwDmoCNChbYKbRKNwgEAyAk6IrQzGoUDANASCG2tjEbhAAC0DDoitDIahQMA0DIIba2sUJCGh+c2/o6CW18fvUcBAMgRLo+2MrPgTFplp4Na5WhvDMYMAJlGaAOg0nRJO7/2Cd38p1u08+ZLVPrJvwQrGIy55ZWmS9r55E7d/MDN2vnkTpWmS2lXCUANDPkBtLl9L+1T/65+lb2sqdkpdU1LHZ3LNPqbD2nT4Dfmt4tEy5j33i/rUod1aPTqUW1avynt6jU
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$')"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def safeSigmoid(x, eps=0):\n",
|
|||
|
" \"\"\"Funkcja sigmoidalna zmodyfikowana w taki sposób, \n",
|
|||
|
" żeby wartości zawsz były odległe od asymptot o co najmniej eps\n",
|
|||
|
" \"\"\"\n",
|
|||
|
" y = 1.0/(1.0 + np.exp(-x))\n",
|
|||
|
" if eps > 0:\n",
|
|||
|
" y[y < eps] = eps\n",
|
|||
|
" y[y > 1 - eps] = 1 - eps\n",
|
|||
|
" return y\n",
|
|||
|
"\n",
|
|||
|
"def h(theta, X, eps=0.0):\n",
|
|||
|
" \"\"\"Funkcja hipotezy (regresja logistyczna)\"\"\"\n",
|
|||
|
" return safeSigmoid(X*theta, eps)\n",
|
|||
|
"\n",
|
|||
|
"def J(h,theta,X,y, lamb=0):\n",
|
|||
|
" \"\"\"Funkcja kosztu dla regresji logistycznej\"\"\"\n",
|
|||
|
" m = len(y)\n",
|
|||
|
" f = h(theta, X, eps=10**-7)\n",
|
|||
|
" j = -np.sum(np.multiply(y, np.log(f)) + \n",
|
|||
|
" np.multiply(1 - y, np.log(1 - f)), axis=0)/m\n",
|
|||
|
" if lamb > 0:\n",
|
|||
|
" j += lamb/(2*m) * np.sum(np.power(theta[1:],2))\n",
|
|||
|
" return j\n",
|
|||
|
"\n",
|
|||
|
"def dJ(h,theta,X,y,lamb=0):\n",
|
|||
|
" \"\"\"Gradient funkcji kosztu\"\"\"\n",
|
|||
|
" g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))\n",
|
|||
|
" if lamb > 0:\n",
|
|||
|
" g[1:] += lamb/float(y.shape[0]) * theta[1:] \n",
|
|||
|
" return g\n",
|
|||
|
"\n",
|
|||
|
"def classifyBi(theta, X):\n",
|
|||
|
" \"\"\"Funkcja predykcji - klasyfikacja dwuklasowa\"\"\"\n",
|
|||
|
" prob = h(theta, X)\n",
|
|||
|
" return prob"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def GD(h, fJ, fdJ, theta, X, y, alpha=0.01, eps=10**-3, maxSteps=10000):\n",
|
|||
|
" \"\"\"Metoda gradientu prostego dla regresji logistycznej\"\"\"\n",
|
|||
|
" errorCurr = fJ(h, theta, X, y)\n",
|
|||
|
" errors = [[errorCurr, theta]]\n",
|
|||
|
" while True:\n",
|
|||
|
" # oblicz nowe theta\n",
|
|||
|
" theta = theta - alpha * fdJ(h, theta, X, y)\n",
|
|||
|
" # raportuj poziom błędu\n",
|
|||
|
" errorCurr, errorPrev = fJ(h, theta, X, y), errorCurr\n",
|
|||
|
" # kryteria stopu\n",
|
|||
|
" if abs(errorPrev - errorCurr) <= eps:\n",
|
|||
|
" break\n",
|
|||
|
" if len(errors) > maxSteps:\n",
|
|||
|
" break\n",
|
|||
|
" errors.append([errorCurr, theta]) \n",
|
|||
|
" return theta, errors"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"theta = [[ 1.37136167]\n",
|
|||
|
" [ 0.90128948]\n",
|
|||
|
" [ 0.54708112]\n",
|
|||
|
" [-5.9929264 ]\n",
|
|||
|
" [ 2.64435168]\n",
|
|||
|
" [-4.27978238]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Uruchomienie metody gradientu prostego dla regresji logistycznej\n",
|
|||
|
"theta_start = np.matrix(np.zeros(X2.shape[1])).reshape(X2.shape[1],1)\n",
|
|||
|
"theta, errors = GD(h, J, dJ, theta_start, X2, Y2, \n",
|
|||
|
" alpha=0.1, eps=10**-7, maxSteps=10000)\n",
|
|||
|
"print('theta = {}'.format(theta))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"def plot_decision_boundary(fig, theta, X):\n",
|
|||
|
" \"\"\"Wykres granicy klas\"\"\"\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" xx, yy = np.meshgrid(np.arange(-1.0, 1.0, 0.02),\n",
|
|||
|
" np.arange(-1.0, 1.0, 0.02))\n",
|
|||
|
" l = len(xx.ravel())\n",
|
|||
|
" C = powerme(xx.reshape(l, 1), yy.reshape(l, 1), n)\n",
|
|||
|
" z = classifyBi(theta, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))\n",
|
|||
|
"\n",
|
|||
|
" plt.contour(xx, yy, z, levels=[0.5], lw=3);"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"Y_expected = Y2.astype(int)\n",
|
|||
|
"Y_predicted = (classifyBi(theta, X2) > 0.5).astype(int)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przygotowanie interaktywnego wykresu\n",
|
|||
|
"\n",
|
|||
|
"dropdown_highlight = widgets.Dropdown(options=['all', 'tp', 'fp', 'tn', 'fn'], value='all', description='highlight')\n",
|
|||
|
"\n",
|
|||
|
"def interactive_classification(highlight):\n",
|
|||
|
" fig = plot_data_for_classification(X2, Y2, xlabel=r'$x_1$', ylabel=r'$x_2$',\n",
|
|||
|
" Y_predicted=Y_predicted, highlight=highlight)\n",
|
|||
|
" plot_decision_boundary(fig, theta, X2)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stderr",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'\n",
|
|||
|
" # Remove the CWD from sys.path while we load stuff.\n"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFmCAYAAADQ5sbeAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdd3iV5f3H8fedkARIwkgIKxBWwgYRoqCCoODCFVAMiuun1Wq1ZWgLdc+KODDaalXqaKUaqwFpBReKypCNbEjYECBAgAwg69y/P06Ch5DISvKc8Xld17nOOc/znOR7IOOTexprLSIiIiLivYKcLkBEREREfp0Cm4iIiIiXU2ATERER8XIKbCIiIiJeToFNRERExMspsImIiIh4uVpOF+CERo0a2datWztdhkjVOHAANmyAxo2hZctfjm/bBllZ0K4dNGjgXH0iInLSFi9evNdaG1P+eEAGttatW7No0SKnyxCpGtbC6NGQkgI33ggTJ/7yfORI93NjnK5SREROgjFmS0XHAzKwifgVY9yhDNwhLSXF/VhhTUTEb5hA3OkgMTHRqoVN/I61EOQxLNXlUlgTEfExxpjF1trE8sc16UDEH5R1i3oaPdp9XEREfJ4Cm4iv8xzDNnKku2Vt5Ej3c4U2ERG/oDFsIr5u6tTjJxh4jmnr3x+GDHG2RhEROSMKbCK+LikJ0tLc92Vj1spCW//+7uMiIuLTFNhEfJ0xFbegVXZcRER8jsawiYiIiHg5BTYRERERL6fAJiIiIuLlFNhEREREvJwCm4iIiIiXU2ATERER8XIKbCIiIiJeToFNRERExMspsImIiIh4Oe10ICIicpJyC3JJXZVK+r50EqITSO6STGRYpNNlSQBQYBMRETkJs7fOZvDkwbisi/yifMJDwhnz5Rimj5hO37i+Tpcnfk5doiIiIieQW5DL4MmDyS3MJb8oH4D8onxyC93H8wrzHK5Q/J0Cm4iIyAmkrkrFZV0VnnNZF6krU2u4Igk0CmwiIiInkL4v/WjLWnn5RflkZGfUcEUSaBTYRERETiAhKp5wE1bhuXATRnxUuxquSAKNApuIiMgJJG8KJ+hIQYXngo4UkLyxbg1XJIFGgU1EROQEIq+7kekF1xNZAOHWvcBCuK1FZAFML7ieiOtudLhC8Xda1kNEROREjKHvix+TOfo+Ur94g4woiM8uJvnCe4mY+DcwxukKxc8Za63TNdS4xMREu2jRIqfLEBERX2MtBHl0TrlcCmtSpYwxi621ieWPq0tURETkZFgLo0cfe2z0aPdxkWqmwCYiInIiZWEtJQVGjnS3rI0c6X6u0CY1wCvGsBljLgdSgGBgkrV2fLnzE4GLSp/WBRpbaxuUnisBVpSe22qtvaZmqhYRkYAxdeovYW3iRHc36MSJ7nMpKdC/PwwZ4myN4tccH8NmjAkG1gOXANuBhcCN1trVlVz/e+Bsa+0dpc/zrLURp/I5NYZNREROibXu0JaUdOyYtcqOi5wmbx7Ddi6QYa3daK0tBD4Crv2V628EPqyRykRERMAdxoYMOT6UVXZcpIp5Q2CLBbZ5PN9eeuw4xphWQBvgW4/DtY0xi4wxPxljkqqvTBERERFneENgq+jPksr6aYcDn1hrSzyOxZU2Hd4EvGKMqXB/EGPM3aXBbtGePXvOrGIRCVzWwpQpxw8yr+y4iEgV8IbAth1o6fG8BZBZybXDKdcdaq3NLL3fCMwCzq7ohdbat6y1idbaxJiYmDOtWUQC1dSpMHTosTMDy2YQDh3qPi8iUsW8IbAtBBKMMW2MMaG4Q9m08hcZYzoADYF5HscaGuPejdcY0wi4AKhwsoKISJVISjp+OQfP5R6SNDJDRKqe48t6WGuLjTH3A1/iXtbjHWvtKmPMU8Aia21ZeLsR+MgeO621E/CmMcaFO3yOr2x2qYhIlSi/nENKivux53IPIiJVzPFlPZygZT1E5IxpiyIRqQbevKyHiIhv0RZFIlLDFNhERE6FtigSEQc4PoZNRMSnaIsiEXGAApuIyKlISoK0tGO3IioLbf37a5aoiFQLBTYRkVNRthXRyR4XEakCGsMmIiIi4uUU2ERERES8nAKbiIiIiJdTYBMRERHxcgpsIiIiIl5OgU1ERETEyymwiYiIiHg5BTYRERERL6fAJiIiIuLlFNhEREREvJwCm4iIiIiX016iIiJCSUkJh3IOk3cgn/yDhziSd4TDeUc4kl/A4dLHhYcLKTxSROGRX+6LCoopKSmhpLiEkmIXJcUluIpLcLlshZ8nKDiIWiHBBNcqvQUHUSu0FqG1QwmtHUJo7VBCaocQVieU2uFh1I2sQ53IOtSNrE3tiNqE16tLZFQEdevVIShIbQ4SOBTYRET8jLWW3P157N99kAO7D5K96wAH9+RwcG8OB/fmkrOv9H5vLnkH8sk7kM+hnMMn/fGDgow7YNUJpVZoLY8AFkRwrWCCgoMICj4+TFlrcZW4jgl2JcUuigqKKCwoovBwIcVFJSddQ0TDCCKj3LcGMfXctyYNaNikPg0a1yeqaQMaxUYRHRtFnfDaJ/3+RLyRApuIiA8pKixi7/ZssrbuZc/2fezdkc2+Hdnszcx2P87M5sDugxQVFh/3WmMMkVER1IuOoF6jejRu1Yh2PVoT0SD86C28QV3q1qtL3cja1ImoTe3wsvswwuqGEVo7hOBawRhjquX9lZSUUFRQTOHhwqMte4dyD3Ok9D7vwCHy9ueRtz+fnOw8cvfnkbMvlz3b95G+ZCMHsnIoKT4+9EU2DKdRi2gaxUbRuGUjmrZpfPTWpHVjGsTUq7b3JFIVFNhERLxISXEJWdv2snNjFrs27mbnxt3s2pxF1ta97N6yh+ydB7D22O7GuvXq0Cg2ikaxUbS8uCtRTRoQ1bQhDZq4W5kaNqlP/Zh6REZFEBwc7NA7OznBwcEE1w2mdt0w6kVHnvLrXS4Xefvz2Z91kP27DrB3RzZ7t+87Gm73bN/H+kUbOLg395jX1Q4PIzahGS3aN6NFQnNi2zejRfvmxHVsTnj98Kp6eyKnzZT/xg8EiYmJdtGiRU6XISIBylpL9q4DbF+Xyfb1mWxbl8mO9J1sW5fJrk1Zx7QQBdcKpkmrRjRuFUPjuEY0iXPfN45rRExLd4tRnYg6Dr4b33Q47zC7Nu9h16Ysdm3KYufG3ezI2Mn29TvZtSkLV4nr6LVNWsXQpnscbbu1ok23ONp0b0WL9s28PvyKbzLGLLbWJh53XIFNRKR6WGvJ2rqXLau3s2X1drau3saWNe7HnmPGQmuH0KK9u1Untl1TmrVrSrO2jWnerimNWkQpGNSwosIidm3KYtu6TLas2s6mlVvYtHwrW9fuOBrkatcNI75nG9r3akf7xHa0T2xLbEIzTYSQM6bA5kGBTUSqWt6BfDb8vJlNK7ayecVWNq3cyuaV2ziU+0swa9ikPq06tyCuUwtadoylZYfmtOzQnEYtovWL3gcUFhSxdc12Ni3fSvqSjaxfvIGMJZsoOFwIuLumO/ZOoHOf9nQ+vwOdz2tPeL26DlctvkaBzYMCm4iciYN7c0hfsomMJRtJX7qJ9MUb2blx99HzkQ3Dad0tjjZd42jTrRWtu7hD2umMyRLvVlJcwta1O1i/aAPrFmSw+qf1bFq+BZfLEhRkaNejNd37d6HnoO50u7CTZqvKCSmweVBgE5GTdSj3MOmLN7JuYQbrSn8p796y5+j5Zm2bEN+zDQlnt6Vdj9a07R5HdPMozTgMYIdyD7N2fjorflzDih/XsHreeooKiqgVEkzn8zvQc2B3zh7UjQ7ntFN3txxHgc2DApuIVMTlcrFt7Q5Wz1vP6nnrWTs/nS2rtx+dldm0TWM6nNOODonxJPRyB7TIhhEOVy3e7sihAlbNWcuSb1awdOZyMpZuxlpLZFQE51zeg96De5J4eQ/qRakFVhTYjqHAJiLgHpO0fmEGy39Yw8o5a1kzbz15B/IBiIyKoFOfBDqem0CHc+LpcE476jeq53DF4g8O7s1h6cwVLJixlIUzlnJgTw5BwUF079+ZvkN60++63kQ
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
},
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"<function __main__.interactive_classification(highlight)>"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"widgets.interact(interactive_classification, highlight=dropdown_highlight)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Zadanie klasyfikacyjne z powyższego przykładu polega na przypisaniu punktów do jednej z dwóch kategorii:\n",
|
|||
|
" 0. <font color=\"red\">czerwone krzyżyki</font>\n",
|
|||
|
" 1. <font color=\"green\">zielone kółka</font>\n",
|
|||
|
"\n",
|
|||
|
"W tym celu zastosowano regresję logistyczną.\n",
|
|||
|
"\n",
|
|||
|
"W rezultacie otrzymano model, który dzieli płaszczyznę na dwa obszary:\n",
|
|||
|
" 0. <font color=\"red\">na zewnątrz granatowej krzywej</font>\n",
|
|||
|
" 1. <font color=\"green\">wewnątrz granatowej krzywej</font>\n",
|
|||
|
" \n",
|
|||
|
"Model przewiduje klasę <font color=\"red\">0 („czerwoną”)</font> dla punktów znajdujący się w obszarze na zewnątrz krzywej, natomiast klasę <font color=\"green\">1 („zieloną”)</font> dla punktów znajdujących sie w obszarze wewnąrz krzywej."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Wszysktie obserwacje możemy podzielić zatem na cztery grupy:\n",
|
|||
|
" * **true positives (TP)** – prawidłowo sklasyfikowane pozytywne przykłady (<font color=\"green\">zielone kółka</font> w <font color=\"green\">wewnętrznym obszarze</font>)\n",
|
|||
|
" * **true negatives (TN)** – prawidłowo sklasyfikowane negatywne przykłady (<font color=\"red\">czerwone krzyżyki</font> w <font color=\"red\">zewnętrznym obszarze</font>)\n",
|
|||
|
" * **false positives (FP)** – negatywne przykłady sklasyfikowane jako pozytywne (<font color=\"red\">czerwone krzyżyki</font> w <font color=\"green\">wewnętrznym obszarze</font>)\n",
|
|||
|
" * **false negatives (FN)** – pozytywne przykłady sklasyfikowane jako negatywne (<font color=\"green\">zielone kółka</font> w <font color=\"red\">zewnętrznym obszarze</font>)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Innymi słowy:\n",
|
|||
|
"\n",
|
|||
|
"<img width=\"50%\" src=\"https://blog.aimultiple.com/wp-content/uploads/2019/07/positive-negative-true-false-matrix.png\">"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"TP = 5\n",
|
|||
|
"TN = 35\n",
|
|||
|
"FP = 3\n",
|
|||
|
"FN = 6\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"# Obliczmy TP, TN, FP i FN\n",
|
|||
|
"\n",
|
|||
|
"tp = 0\n",
|
|||
|
"tn = 0\n",
|
|||
|
"fp = 0\n",
|
|||
|
"fn = 0\n",
|
|||
|
"\n",
|
|||
|
"for i in range(len(Y_expected)):\n",
|
|||
|
" if Y_expected[i] == 1 and Y_predicted[i] == 1:\n",
|
|||
|
" tp += 1\n",
|
|||
|
" elif Y_expected[i] == 0 and Y_predicted[i] == 0:\n",
|
|||
|
" tn += 1\n",
|
|||
|
" elif Y_expected[i] == 0 and Y_predicted[i] == 1:\n",
|
|||
|
" fp += 1\n",
|
|||
|
" elif Y_expected[i] == 1 and Y_predicted[i] == 0:\n",
|
|||
|
" fn += 1\n",
|
|||
|
" \n",
|
|||
|
"print('TP =', tp)\n",
|
|||
|
"print('TN =', tn)\n",
|
|||
|
"print('FP =', fp)\n",
|
|||
|
"print('FN =', fn)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Możemy teraz zdefiniować następujące metryki:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Dokładność (_accuracy_)\n",
|
|||
|
"$$ \\mbox{accuracy} = \\frac{\\mbox{przypadki poprawnie sklasyfikowane}}{\\mbox{wszystkie przypadki}} = \\frac{TP + TN}{TP + TN + FP + FN} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Dokładność otrzymujemy przez podzielenie liczby przypadków poprawnie sklasyfikowanych przez liczbę wszystkich przypadków:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Accuracy: 0.8163265306122449\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"accuracy = (tp + tn) / (tp + tn + fp + fn)\n",
|
|||
|
"print('Accuracy:', accuracy)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"**Uwaga:** Nie zawsze dokładność będzie dobrą miarą, zwłaszcza gdy klasy są bardzo asymetryczne!\n",
|
|||
|
"\n",
|
|||
|
"*Przykład:* Wyobraźmy sobie test na koronawirusa, który **zawsze** zwraca wynik negatywny. Jaką przydatność będzie miał taki test w praktyce? Żadną. A jaka będzie jego *dokładność*? Policzmy:\n",
|
|||
|
"$$ \\mbox{accuracy} \\, = \\, \\frac{\\mbox{szacowana liczba osób zdrowych na świecie}}{\\mbox{populacja Ziemi}} \\, \\approx \\, \\frac{7\\,700\\,000\\,000 - 600\\,000}{7\\,700\\,000\\,000} \\, \\approx \\, 0.99992 $$\n",
|
|||
|
"(zaokrąglone dane z 27 marca 2020)\n",
|
|||
|
"\n",
|
|||
|
"Powyższy wynik jest tak wysoki, ponieważ zdecydowana większość osób na świecie nie jest zakażona, więc biorąc losowego Ziemianina możemy w ciemno strzelać, że nie ma koronawirusa.\n",
|
|||
|
"\n",
|
|||
|
"W tym przypadku duża różnica w liczności obu zbiorów (zakażeni/niezakażeni) powoduje, że *accuracy* nie jest dobrą metryką.\n",
|
|||
|
"\n",
|
|||
|
"Dlatego dysponujemy również innymi metrykami:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Precyzja (_precision_)\n",
|
|||
|
"$$ \\mbox{precision} = \\frac{TP}{TP + FP} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Precision: 0.625\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"precision = tp / (tp + fp)\n",
|
|||
|
"print('Precision:', precision)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Precyzja określa, jaka część przykładów sklasyfikowanych jako pozytywne to faktycznie przykłady pozytywne."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### Pokrycie (czułość, _recall_)\n",
|
|||
|
"$$ \\mbox{recall} = \\frac{TP}{TP + FN} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"Recall: 0.45454545454545453\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"recall = tp / (tp + fn)\n",
|
|||
|
"print('Recall:', recall)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Pokrycie mówi nam, jaka część przykładów pozytywnych została poprawnie sklasyfikowana."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"#### _$F$-measure_ (_$F$-score_)\n",
|
|||
|
"$$ F = \\frac{2 \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\mbox{precision} + \\mbox{recall}} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"F-score: 0.5263157894736842\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fscore = (2 * precision * recall) / (precision + recall)\n",
|
|||
|
"print('F-score:', fscore)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"$F$-_measure_ jest kompromisem między precyzją a pokryciem (a ściślej: jest średnią harmoniczną precyzji i pokrycia)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"$F$-_measure_ jest szczególnym przypadkiem ogólniejszej miary:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"_$F_\\beta$-measure_:\n",
|
|||
|
"$$ F_\\beta = \\frac{(1 + \\beta) \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\beta^2 \\cdot \\mbox{precision} + \\mbox{recall}} $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "fragment"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Dla $\\beta = 1$ otrzymujemy:\n",
|
|||
|
"$$ F_1 \\, = \\, \\frac{(1 + 1) \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{1^2 \\cdot \\mbox{precision} + \\mbox{recall}} \\, = \\, \\frac{2 \\cdot \\mbox{precision} \\cdot \\mbox{recall}}{\\mbox{precision} + \\mbox{recall}} \\, = \\, F $$"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "slide"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"## 4.3. Obserwacje odstające"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"**Obserwacje odstające** (*outliers*) – to wszelkie obserwacje posiadające nietypową wartość.\n",
|
|||
|
"\n",
|
|||
|
"Mogą być na przykład rezultatem błędnego pomiaru albo pomyłki przy wprowadzaniu danych do bazy, ale nie tylko.\n",
|
|||
|
"\n",
|
|||
|
"Obserwacje odstające mogą niekiedy znacząco wpłynąć na parametry modelu, dlatego ważne jest, żeby takie obserwacje odrzucić zanim przystąpi się do tworzenia modelu."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"W poniższym przykładzie można zobaczyć wpływ obserwacji odstających na wynik modelowania na przykładzie danych dotyczących cen mieszkań zebranych z ogłoszeń na portalu Gratka.pl: tutaj przykładem obserwacji odstającej może być ogłoszenie, w którym podano cenę w tys. zł zamiast ceny w zł."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Przydatne funkcje\n",
|
|||
|
"\n",
|
|||
|
"def h_linear(Theta, x):\n",
|
|||
|
" \"\"\"Funkcja regresji liniowej\"\"\"\n",
|
|||
|
" return x * Theta\n",
|
|||
|
"\n",
|
|||
|
"def linear_regression(theta):\n",
|
|||
|
" \"\"\"Ta funkcja zwraca funkcję regresji liniowej dla danego wektora parametrów theta\"\"\"\n",
|
|||
|
" return lambda x: h_linear(theta, x)\n",
|
|||
|
"\n",
|
|||
|
"def cost(theta, X, y):\n",
|
|||
|
" \"\"\"Wersja macierzowa funkcji kosztu\"\"\"\n",
|
|||
|
" m = len(y)\n",
|
|||
|
" J = 1.0 / (2.0 * m) * ((X * theta - y).T * (X * theta - y))\n",
|
|||
|
" return J.item()\n",
|
|||
|
"\n",
|
|||
|
"def gradient(theta, X, y):\n",
|
|||
|
" \"\"\"Wersja macierzowa gradientu funkcji kosztu\"\"\"\n",
|
|||
|
" return 1.0 / len(y) * (X.T * (X * theta - y)) \n",
|
|||
|
"\n",
|
|||
|
"def gradient_descent(fJ, fdJ, theta, X, y, alpha=0.1, eps=10**-5):\n",
|
|||
|
" \"\"\"Algorytm gradientu prostego (wersja macierzowa)\"\"\"\n",
|
|||
|
" current_cost = fJ(theta, X, y)\n",
|
|||
|
" logs = [[current_cost, theta]]\n",
|
|||
|
" while True:\n",
|
|||
|
" theta = theta - alpha * fdJ(theta, X, y)\n",
|
|||
|
" current_cost, prev_cost = fJ(theta, X, y), current_cost\n",
|
|||
|
" if abs(prev_cost - current_cost) > 10**15:\n",
|
|||
|
" print('Algorithm does not converge!')\n",
|
|||
|
" break\n",
|
|||
|
" if abs(prev_cost - current_cost) <= eps:\n",
|
|||
|
" break\n",
|
|||
|
" logs.append([current_cost, theta]) \n",
|
|||
|
" return theta, logs\n",
|
|||
|
"\n",
|
|||
|
"def plot_data(X, y, xlabel, ylabel):\n",
|
|||
|
" \"\"\"Wykres danych (wersja macierzowa)\"\"\"\n",
|
|||
|
" fig = plt.figure(figsize=(16*.6, 9*.6))\n",
|
|||
|
" ax = fig.add_subplot(111)\n",
|
|||
|
" fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)\n",
|
|||
|
" ax.scatter([X[:, 1]], [y], c='r', s=50, label='Dane')\n",
|
|||
|
" \n",
|
|||
|
" ax.set_xlabel(xlabel)\n",
|
|||
|
" ax.set_ylabel(ylabel)\n",
|
|||
|
" ax.margins(.05, .05)\n",
|
|||
|
" plt.ylim(y.min() - 1, y.max() + 1)\n",
|
|||
|
" plt.xlim(np.min(X[:, 1]) - 1, np.max(X[:, 1]) + 1)\n",
|
|||
|
" return fig\n",
|
|||
|
"\n",
|
|||
|
"def plot_regression(fig, fun, theta, X):\n",
|
|||
|
" \"\"\"Wykres krzywej regresji (wersja macierzowa)\"\"\"\n",
|
|||
|
" ax = fig.axes[0]\n",
|
|||
|
" x0 = np.min(X[:, 1]) - 1.0\n",
|
|||
|
" x1 = np.max(X[:, 1]) + 1.0\n",
|
|||
|
" L = [x0, x1]\n",
|
|||
|
" LX = np.matrix([1, x0, 1, x1]).reshape(2, 2)\n",
|
|||
|
" ax.plot(L, fun(theta, LX), linewidth='2',\n",
|
|||
|
" label=(r'$y={theta0:.2}{op}{theta1:.2}x$'.format(\n",
|
|||
|
" theta0=float(theta[0][0]),\n",
|
|||
|
" theta1=(float(theta[1][0]) if theta[1][0] >= 0 else float(-theta[1][0])),\n",
|
|||
|
" op='+' if theta[1][0] >= 0 else '-')))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Wczytanie danych (mieszkania) przy pomocy biblioteki pandas\n",
|
|||
|
"\n",
|
|||
|
"alldata = pandas.read_csv('data_flats_with_outliers.tsv', sep='\\t',\n",
|
|||
|
" names=['price', 'isNew', 'rooms', 'floor', 'location', 'sqrMetres'])\n",
|
|||
|
"data = np.matrix(alldata[['price', 'sqrMetres']])\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data[:, 0:n]\n",
|
|||
|
"\n",
|
|||
|
"Xo = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n + 1)\n",
|
|||
|
"yo = np.matrix(data[:, -1]).reshape(m, 1)\n",
|
|||
|
"\n",
|
|||
|
"Xo /= np.amax(Xo, axis=0)\n",
|
|||
|
"yo /= np.amax(yo, axis=0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFoCAYAAADq7KeuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAcvElEQVR4nO3de7CkZ10n8O9vkpDozBQBEgyEZNHKLIgXAnskQVK7iFILs2zCrlgT19KAwYhcRLmsQSxBtMq4LiJCECNyiWvhWGhhdAOIXMQUBphgAiQxO2NcYZwRwmXjyZALk3n2j+5xTk56Jj3n9Ol+Ts/nU3Wqu9/3ed/3d97uPvOd57081VoLAAD92jDrAgAAODKBDQCgcwIbAEDnBDYAgM4JbAAAnRPYAAA6N7PAVlVnVNVHqurmqrqxql46ok1V1W9V1a6q+kxVPXEWtQIAzNLxM9z2/iQvb619uqo2J7muqj7YWrtpSZtnJtky/DknyW8PHwEAjhkz62Frre1trX16+Hwxyc1JTl/W7IIkV7aBa5OcXFWPmHKpAAAz1cU5bFX16CRPSPKJZbNOT/KFJa935/6hDgBgrs3ykGiSpKo2JfnjJD/TWvuX5bNHLHK/sbSq6pIklyTJxo0b/91jH/vYidcJALAa11133Zdba6euZNmZBraqOiGDsPYHrbU/GdFkd5Izlrx+VJI9yxu11q5IckWSLCwstB07dqxBtQAAK1dV/7jSZWd5lWgl+b0kN7fWfuMwza5K8mPDq0XPTXJ7a23v1IoEAOjALHvYnpLkR5N8tqquH077+SRnJklr7a1Jrk6yNcmuJF9P8rwZ1AkAMFMzC2yttWsy+hy1pW1akhdNpyIAgD51cZUoAACHJ7ABAHROYAMA6JzABgDQOYENAKBzAhsAQOcENgCAzglsAACdE9gAADonsAEAdE5gAwDonMAGANA5gQ0AoHMCGwBA5wQ2AIDOCWwAAJ0T2AAAOiewAQB0TmADAOicwAYA0DmBDQCgcwIbAEDnBDYAgM4JbAAAnRPYAAA6J7ABAHROYAMA6JzABgDQOYENAKBzAhsAQOcENgCAzglsAACdE9gAADonsAEAdE5gAwDonMAGANA5gQ0AoHMCGwBA5wQ2AIDOCWwAAJ0T2AAAOiewAQB0TmADAOjcTANbVb29qr5UVZ87zPynVtXtVXX98OcXp10jAMCsHT/j7b8zyZuTXHmENn/dWnvWdMoBAOjPTHvYWmsfS/LVWdYAANC79XAO25Or6oaqel9VfcesiwEAmLZZHxJ9IJ9O8m9aa3dU1dYk702yZXmjqrokySVJcuaZZ063QgCANdZ1D1tr7V9aa3cMn1+d5ISqOmVEuytaawuttYVTTz116nUCAKylrgNbVZ1WVTV8/qQM6v3KbKsCAJiumR4Srap3J3lqklOqaneS1yQ5IUlaa29N8pwkP1VV+5PcmeTC1lqbUbkAADMx08DWWvvhB5j/5gxu+wEAcMzq+pAoAAACGwBA9wQ2AIDOCWwAAJ0T2AAAOiewAQB0TmADAOicwAYA0DmBDQCgcwIbAEDnBDYAgM4JbAAAnRPYAAA6J7ABAHROYAMA6JzABgDQOYENAKBzAhsAQOcENgCAzglsAACdE9gAADonsAEAdE5gAwDonMAGANA5gQ0AoHMCGwBA5wQ2AIDOCWwAAJ0T2AAAOiewAQB0TmADAOicwAYA0DmBDQCgcwIbAEDnBDYAgM4JbAAAnRPYAAA6J7ABAHROYAMA6JzABgDQOYENAKBzAhsAQOcENgCAzs00sFXV26vqS1X1ucPMr6r6raraVVWfqaonTrtG6NbiYvK2tyU/93ODx8XFWVcEwBo5fsbbf2eSNye58jDzn5lky/DnnCS/PXyEY9s11yRbtyYHDiT79iUbNyYve1ly9dXJeefNujoAJmymPWyttY8l+eoRmlyQ5Mo2cG2Sk6vqEdOpDjq1uDgIa4uLg7CWDB4PTr/jjtnWB8DE9X4O2+lJvrDk9e7hNDh2bd8+6Fkb5cCBwXwA5krvga1GTGv3a1R1SVXtqKodt9122xTKghnaufNQz9py+/Ylu3ZNtx4A1lzvgW13kjOWvH5Ukj3LG7XWrmitLbTWFk499dSpFQczsWXL4Jy1UTZuTM46a7r1ALDmeg9sVyX5seHVoucmub21tnfWRcFMbduWbDjMV3fDhsF8AObKTK8Srap3J3lqklOqaneS1yQ5IUlaa29NcnWSrUl2Jfl6kufNplLoyObNg6tBl18lumHDYPqmTbOuEIAJm2lga6398APMb0leNKVyYP0477xkz57BBQa7dg0Og27bJqwBzKlZ34cNWKlNm5KLL551FQBMQe/nsAEAHPMENgCAzglsAACdE9gAADonsAEAdE5gAwDonMAGANA5gQ0AoHMCGwBA5wQ2AIDOCWwAAJ0T2AAAOiewAQB0TmADAOicwAYA0DmBDQCgcwIbAEDnBDYAgM4JbAAAnRPYAAA6d/ysCwBWaXEx2b492bkz2bIl2bYt2bx51lUBMEECG6xn11yTbN2aHDiQ7NuXbNyYvOxlydVXJ+edN+vqAJgQh0RhvVpcHIS1xcVBWEsGjwen33HHbOsDYGIENlivtm8f9KyNcuDAYD4Ac0Fgg/Vq585DPWvL7duX7No13XoAWDMCG6xXW7YMzlkbZePG5KyzplsPAGtGYIP1atu2ZMNhvsIbNgzmAzAXBDZYrzZvHlwNunnzoZ62jRsPTd+0abb1ATAxbusB69l55yV79gwuMNi1a3AYdNs2YQ1gzghssN5t2pRcfPGh14uLydve5ka6AHNEYIN54ka6AHPJOWwwL9xIF2BuCWwwL9xIF2BuCWwwL9xIF2BuCWwwL9xIF2BuCWwwL9xIF2BujX2VaFU9JMmWJCcdnNZa+9haFAWswMEb5i6/SnTDBjfSBVjnxgpsVfX8JC9N8qgk1yc5N8nfJHna2pUGHDU30gWYS+P2sL00yfckuba19n1V9dgkv7R2ZQErtvxGugCse+Oew3ZXa+2uJKmqE1trf5fkMWtXFgAAB43bw7a7qk5O8t4kH6yqryXZs3ZlASu2uDg4JGpoKoC5Ua21o1ug6j8keXCS97fW7lmTqlZhYWGh7dixY9ZlwGyMGprq4EUHhqYCmKmquq61trCSZce+rUdVHVdVj0zyDxlceHDaSja4bJ3PqKpbqmpXVV06Yv5zq+q2qrp++PP81W4T5pahqQDm1rhXib4kyWuSfDHJwbFvWpLvXumGq+q4JJcneXqS3Uk+VVVXtdZuWtZ0e2vtxSvdDsy1pYc///mfk298Y3S7b3xj0M7FCADr0tFcJfqY1tpXJrjtJyXZ1Vq7NUmq6g+TXJBkeWADRll++PP445P9+0e3veuu5CZfLYD1atxDol9IcvuEt336cL0H7R5OW+4Hq+ozVfWeqjpj1Iqq6pKq2lFVO2677bYJlwkdGnX483Bh7aCvTPL/WwBM07g9bLcm+WhV/e8kdx+c2Fr7jVVsu0ZMW34FxJ8leXdr7e6qekGSd2XEzXpba1ckuSIZXHSwippgfdi+fdCzdjQe9rC1qQWANTduYPv88OdBw59J2J1kaY/Zo7LsViHLDsH+bpJfm9C2YX3bufNQz9o4Tjopedzj1q4eANbUWIGttfZLSVJVG1trR/GvxBF9KsmWqvrWJP+U5MIk/21pg6p6RGtt7/Dl+UluntC2YX3bsiU57rjk3nvHa3/CCQZ/B1jHxjqHraqeXFU3ZRiYqurxVfWW1Wy4tbY/yYuTfGC43j9qrd1YVa+rqvOHzX66qm6sqhuS/HSS565mmzA3tm4dL6xt3HhoUHjjiQKsW+MeEv3NJP8xyVVJ0lq7oar+/Wo33lq7OsnVy6b94pLnr0ryqtVuB+bOG9/4wG2qkksuSV73OmENYJ0b+8a5rbUvLJs05rEYYKIWF5Pf/M0Hbtda8oY3JB//+NrXBMCaGvu2HlX1vUlaVT2oql4R55PBbGzfPv65a0ly/vlGOQBY58YNbC9I8qIM7pO2O8nZSV64VkUBR/C3f3t0gW3
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data(Xo, yo, xlabel=u'metraż', ylabel=u'cena')\n",
|
|||
|
"theta_start = np.matrix([0.0, 0.0]).reshape(2, 1)\n",
|
|||
|
"theta, logs = gradient_descent(cost, gradient, theta_start, Xo, yo, alpha=0.01)\n",
|
|||
|
"plot_regression(fig, h_linear, theta, Xo)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"source": [
|
|||
|
"Na powyższym przykładzie obserwacja odstająca jawi sie jako pojedynczy punkt po prawej stronie wykresu. Widzimy, że otrzymana krzywa regresji zamiast odwzorowywać ogólny trend, próbuje „dopasować się” do tej pojedynczej obserwacji.\n",
|
|||
|
"\n",
|
|||
|
"Dlatego taką obserwację należy usunąć ze zbioru danych (zobacz ponizej)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Odrzućmy obserwacje odstające\n",
|
|||
|
"alldata_no_outliers = [\n",
|
|||
|
" (index, item) for index, item in alldata.iterrows() \n",
|
|||
|
" if item.price > 100 and item.sqrMetres > 10]\n",
|
|||
|
"\n",
|
|||
|
"alldata_no_outliers = alldata.loc[(alldata['price'] > 100) & (alldata['sqrMetres'] > 100)]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "notes"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"data = np.matrix(alldata_no_outliers[['price', 'sqrMetres']])\n",
|
|||
|
"\n",
|
|||
|
"m, n_plus_1 = data.shape\n",
|
|||
|
"n = n_plus_1 - 1\n",
|
|||
|
"Xn = data[:, 0:n]\n",
|
|||
|
"\n",
|
|||
|
"Xo = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n + 1)\n",
|
|||
|
"yo = np.matrix(data[:, -1]).reshape(m, 1)\n",
|
|||
|
"\n",
|
|||
|
"Xo /= np.amax(Xo, axis=0)\n",
|
|||
|
"yo /= np.amax(yo, axis=0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {
|
|||
|
"slideshow": {
|
|||
|
"slide_type": "subslide"
|
|||
|
}
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmwAAAFoCAYAAADq7KeuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dfZDkVX3v8c93nndnengIq7ssbJBiFbEUMBOBuOXlqiSwZSBX0cWbiosXiktYooJaQrzXx1hiKvgMmBXJAjG4CaZ0E1cpHy/ZKOhAQFxgYUJKWWfFFXCnZ3aeeuZ7//j9eqen59c93TPd/TvT/X5VdfXD79fdZ6anlw/nfM855u4CAABAuNrSbgAAAADKI7ABAAAEjsAGAAAQOAIbAABA4AhsAAAAgSOwAQAABC61wGZmJ5rZ983sMTPba2bvTDjHzOyzZjZkZj81s1em0VYAAIA0daT43jlJ73b3B80sI+kBM/u2uz9acM4FkjbGl7Mk3RJfAwAAtIzUetjc/YC7Pxjfzkp6TNL6otMuknSHR+6TdLSZrWtwUwEAAFIVRA2bmZ0k6UxJ9xcdWi/p6YL7+7Uw1AEAADS1NIdEJUlm1ifpq5Le5e4jxYcTnrJgLy0zu0LSFZLU29v7e6eeemrN2wkAALAcDzzwwG/cfc1SnptqYDOzTkVh7cvu/s8Jp+yXdGLB/RMkDRef5O7bJW2XpIGBAR8cHKxDawEAAJbOzH6+1OemOUvUJH1J0mPu/skSp+2S9LZ4tujZkg65+4GGNRIAACAAafawvVrSn0l6xMweih/7S0kbJMndvyBpt6TNkoYkHZb09hTaCQAAkKrUApu771FyjVrhOS5pW2NaBAAAEKYgZokCAACgNAIbAABA4AhsAAAAgSOwAQAABI7ABgAAEDgCGwAAQOAIbAAAAIEjsAEAAASOwAYAABA4AhsAAEDgCGwAAACBI7ABAAAEjsAGAAAQOAIbAABA4AhsAAAAgSOwAQAABI7ABgAAEDgCGwAAQOAIbAAAAIEjsAEAAASOwAYAABA4AhsAAEDgCGwAAACBI7ABAAAEjsAGAAAQOAIbAABA4AhsAAAAgSOwAQAABI7ABgAAEDgCGwAAQOAIbAAAAIEjsAEAAASOwAYAABA4AhsAAEDgCGwAAACBI7ABAAAEjsAGAAAQOAIbAABA4AhsAAAAgSOwAQAABI7ABgAAEDgCGwAAQOAIbAAAAIEjsAEAAASOwAYAABA4AhsAAEDgCGwAAACBI7ABAAAELtXAZma3mdmvzexnJY6fa2aHzOyh+PKBRrcRAAAgbR0pv/8OSZ+XdEeZc/7N3d/QmOYAAACEJ9UeNne/V9JzabYBAAAgdCuhhu0cM3vYzL5pZi9LOsHMrjCzQTMbPHjwYKPbBwAAUFehB7YHJf2uu58u6XOSvpZ0krtvd/cBdx9Ys2ZNQxsIAABQb0EHNncfcffR+PZuSZ1mdlzKzQIAAGiooAObma01M4tvv0pRe59Nt1UAAACNleosUTO7S9K5ko4zs/2SPiipU5Lc/QuSLpb052aWkzQu6RJ395SaCwAAkIpUA5u7v3WR459XtOwHAABAywp6SBQAAAAENgAAgOAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAAkdgAwAACByBDQAAIHAENgAAgMAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAAkdgAwAACByBDQAAIHAENgAAgMAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAAkdgAwAACByBDQAAIHAENgAAgMAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAApdqYDOz28zs12b2sxLHzcw+a2ZDZvZTM3tlo9sIIGXZrHTrrdL73hddZ7NptwgAGq4j5fffIenzku4ocfwCSRvjy1mSbomvAbSCPXukzZul2VlpbEzq7ZWuvVbavVvatCnt1gFAw6Taw+bu90p6rswpF0m6wyP3STrazNY1pnUAUpXNRmEtm43CmhRd5x8fHU23fQDQQKHXsK2X9HTB/f3xYwCa3c6dUc9aktnZ6DgAtIjQA5slPOYLTjK7wswGzWzw4MGDDWgWgLp78sm5nrViY2PS0FBj2wMAKQo9sO2XdGLB/RMkDRef5O7b3X3A3QfWrFnTsMYBqKONG6OatSS9vdIppzS2PQCQotAD2y5Jb4tni54t6ZC7H0i7UQAaYMsWqa3EP1FtbdFxAGgRqc4SNbO7JJ0r6Tgz2y/pg5I6JcndvyBpt6TNkoYkHZb09nRaCqDhMploNmjxLNG2tujxvr60WwgADZNqYHP3ty5y3CVta1BzAIRm0yZpeDiaYDA0FA2DbtlCWAPQctJehw0Ayuvrky67LO1WAECqQq9hAwAAaHkENgAAgMAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAAkdgAwAACByBDQAAIHAENgAAgMAR2AAAAAJHYAMAAAgcgQ0AACBwBDYAAIDAEdgAAAACR2ADAAAIHIENAAAgcAQ2AACAwBHYAAAAAteRdgMAYEXIZqWdO6Unn5Q2bpS2bJEymbRbBaBFENgAYDF79kibN0uzs9LYmNTbK117rbR7t7RpU9qtA9ACGBIFgHKy2SisZbNRWJOi6/zjo6Pptg9ASyCwAUA5O3dGPWtJZmej4wBQZxUPiZrZMZI2SurJP+bu99ajUQAQjCefnOtZKzY2Jg0NNbY9AFpSRYHNzC6X9E5JJ0h6SNLZkn4k6bX1axoABGDjxqhmLSm09fZKp5zS+DYBaDmVDom+U9LvS/q5u/93SWdKOli3VgFAKLZskdpK/FPZ1hYdB4A6qzSwTbj7hCSZWbe7Py7pJfVrFgAEIpOJZoNmMlGPmhRd5x/v60u3fQBaQqU1bPvN7GhJX5P0bTN7XtJw/ZoFAAHZtEkaHo4mGAwNRcOgW7YQ1gA0jLl7dU8w+2+SjpL0LXefqkurlmFgYMAHBwfTbgYAAMA8ZvaAuw8s5bnVzBJtl/RCSf8VP7RW0i+W8qYAAACoXKWzRP9C0gclPSMpvyCRS3pFndoFAAgF23IBqau0h+2dkl7i7s/WszEAgMCwLRcQhEpniT4t6VA9GwIACAzbcgHBqLSH7SlJPzCzb0iazD/o7p+sS6sAAOmrZFuuyy5rbJuAFlVpYPtFfOmKLwCAZse2XEAwKgps7v5hSTKzXncv8e0FADQVtuUCglFRDZuZnWNmj0p6LL5/upndXNeWAQDSxbZcQDAqnXTwaUl/JOlZSXL3hyW9pl6NAgAEgG25gGBUvHCuuz9tZoUPzdS+OQCAoLAtFxCESgPb02b2B5LczLokvUPx8CgAoMn19TEbFEhZpUOiV0raJmm9pP2SzpB0Vb0aBQAAgDmV9rDdKOlqd39ekszsmPix/1WvhgEAAsL2VECqKg1sr8iHNUly9+fN7Mw6tQkAEBK2pwJSV+mQaFvcqyZJMrNjVcWEhVLM7Hwz22dmQ2Z2XcLxS83soJk9FF8uX+57AgCqwPZUQBCqGRL9oZndLcklvUXSx5bzxmbWLukmSecpqov7iZntcvdHi07d6e5XL+e9AABLxPZUQBAq3engDjMblPRaSSbpjQnBqlqvkjTk7k9Jkpl9RdJFkpb7ugBaGbVWtcX2VEAQqlmH7VHVNkytl/R0wf39ks5KOO9NZvYaSU9Iusbdn044BwDqW2s1PCxdf730+OPSqadKH/+4dPzxtWl3yE48sfzxE05oTDuAFldpDVs9WMJjXnT/XySd5O6vkPQdSbcnvpDZFWY2aGaDBw8erHEzAawI9ay1uvlmaf166Y47pB//OLpevz56HAAaIM3Atl9S4f+6nSBpuPAEd3/W3Sf
|
|||
|
"text/plain": [
|
|||
|
"<Figure size 691.2x388.8 with 1 Axes>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {
|
|||
|
"needs_background": "light"
|
|||
|
},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"fig = plot_data(Xo, yo, xlabel=u'metraż', ylabel=u'cena')\n",
|
|||
|
"theta_start = np.matrix([0.0, 0.0]).reshape(2, 1)\n",
|
|||
|
"theta, logs = gradient_descent(cost, gradient, theta_start, Xo, yo, alpha=0.01)\n",
|
|||
|
"plot_regression(fig, h_linear, theta, Xo)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Na powyższym wykresie widać, że po odrzuceniu obserwacji odstających otrzymujemy dużo bardziej „wiarygodną” krzywą regresji."
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"celltoolbar": "Slideshow",
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.8.3"
|
|||
|
},
|
|||
|
"livereveal": {
|
|||
|
"start_slideshow_at": "selected",
|
|||
|
"theme": "amu"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 4
|
|||
|
}
|