1034 lines
64 KiB
Plaintext
1034 lines
64 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Klasyfikacja w Pythonie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 1** Które z poniższych problemów jest problemem regresji, a które klasyfikacji?\n",
|
||
" 1. Sprawdzenie, czy wiadomość jest spamem.\n",
|
||
" 1. Przewidzenie oceny (od 1 do 5 gwiazdek) na podstawie komentarza.\n",
|
||
" 1. OCR cyfr: rozpoznanie cyfry z obrazka.\n",
|
||
" \n",
|
||
" Jeżeli problem jest klasyfikacyjny, to jakie mamy klasy?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Miary dla klasyfikacji"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Istnieje wieje miar (metryk), na podstawie których możemy ocenić jakość modelu. Podobnie jak w przypadku regresji liniowej potrzebne są dwie listy: lista poprawnych klas i lista predykcji z modelu. Najpopularniejszą z metryk jest trafność, którą definiuje się w następujący sposób:\n",
|
||
" $$ACC = \\frac{k}{N}$$ \n",
|
||
" $$SUM = \\sum_{x}^{y}{k}{n}{i}$$\n",
|
||
" \n",
|
||
" gdzie: \n",
|
||
" * $k$ to liczba poprawnie zaklasyfikowanych przypadków,\n",
|
||
" * $N$ liczebność zbioru testującego."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zadanie** Napisz funkcję, która jako parametry przyjmnie dwie listy (lista poprawnych klas i wyjście z klasyfikatora) i zwróci trafność."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"ACC: 0.4\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"def accuracy_measure(true, predicted):\n",
|
||
" if len(true_label) != len(predicted):\n",
|
||
" raise ValueError(\"Input lists can't have different sizes.\")\n",
|
||
" \n",
|
||
" correct_values = 0\n",
|
||
" for i in range(len(true_label)):\n",
|
||
" if true_label[i] == predicted[i]:\n",
|
||
" correct_values += 1\n",
|
||
"\n",
|
||
" return correct_values / len(predicted)\n",
|
||
"\n",
|
||
"true_label = [1, 1, 1, 0, 0]\n",
|
||
"predicted = [0, 1, 0, 1, 0]\n",
|
||
"print(\"ACC:\", accuracy_measure(true_label, predicted))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Klasyfikator $k$ najbliższych sąsiadów *(ang. k-nearest neighbors, KNN)*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Klasyfikator [KNN](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm), który został wprowadzony na ostatnim wykładzie, jest bardzo intuicyjny. Pomysł, który stoi za tym klasyfikatorem jest bardzo prosty: Jeżeli mamy nowy obiekt do zaklasyfikowania, to szukamy wśród danych trenujących $k$ najbardziej podobnych do niego przykładów i na ich podstawie decydujemy (np. biorąc większość) do jakie klasy powinien należeć dany obiekt."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"** Przykład 1** Mamy za zadanie przydzielenie obiektów do dwóch klas: trójkątów lub kwadratów. Rozpatrywany obiekt jest zaznaczony zielonym kółkiem. Przyjmując $k=3$, mamy wśród sąsiadów 2 trójkąty i 1 kwadrat. Stąd obiekt powinienm zostać zaklasyfikowany jako trójkąt. Jak zmienia się sytuacja, gdy przyjmiemy $k=5$?\n",
|
||
"\n",
|
||
"![Przykład 1](./KnnClassification.svg.png)\n",
|
||
"\n",
|
||
"( Grafika pochodzi z https://pl.wikipedia.org/wiki/K_najbli%C5%BCszych_s%C4%85siad%C3%B3w )"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Herbal Iris"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Herbal Iris* jest klasycznym zbiorem danych w uczeniu maszynowym, który powstał w 1936 roku. Zawiera on informacje na 150 egzemplarzy roślin, które należą do jednej z 3 odmian."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 2** Wczytaj do zmiennej ``data`` zbiór *Herbal Iris*, który znajduje się w pliku ``iris.data``. Jest to plik csv.\n",
|
||
"\n",
|
||
"Kolumny są następujące:\n",
|
||
"\n",
|
||
"1. sepal length in cm\n",
|
||
"2. sepal width in cm\n",
|
||
"3. petal length in cm\n",
|
||
"4. petal width in cm\n",
|
||
"5. class: \n",
|
||
" * Iris Setosa\n",
|
||
" * Iris Versicolour\n",
|
||
" * Iris Virginica"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Sepal length</th>\n",
|
||
" <th>Sepal width</th>\n",
|
||
" <th>Petal length</th>\n",
|
||
" <th>Petal width</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Class</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>5.1</td>\n",
|
||
" <td>3.5</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.9</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.7</td>\n",
|
||
" <td>3.2</td>\n",
|
||
" <td>1.3</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.6</td>\n",
|
||
" <td>3.1</td>\n",
|
||
" <td>1.5</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>5.0</td>\n",
|
||
" <td>3.6</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.7</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.2</td>\n",
|
||
" <td>2.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.3</td>\n",
|
||
" <td>2.5</td>\n",
|
||
" <td>5.0</td>\n",
|
||
" <td>1.9</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.5</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.2</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.2</td>\n",
|
||
" <td>3.4</td>\n",
|
||
" <td>5.4</td>\n",
|
||
" <td>2.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>5.9</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.1</td>\n",
|
||
" <td>1.8</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>150 rows × 4 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Sepal length Sepal width Petal length Petal width\n",
|
||
"Class \n",
|
||
"Iris-setosa 5.1 3.5 1.4 0.2\n",
|
||
"Iris-setosa 4.9 3.0 1.4 0.2\n",
|
||
"Iris-setosa 4.7 3.2 1.3 0.2\n",
|
||
"Iris-setosa 4.6 3.1 1.5 0.2\n",
|
||
"Iris-setosa 5.0 3.6 1.4 0.2\n",
|
||
"... ... ... ... ...\n",
|
||
"Iris-virginica 6.7 3.0 5.2 2.3\n",
|
||
"Iris-virginica 6.3 2.5 5.0 1.9\n",
|
||
"Iris-virginica 6.5 3.0 5.2 2.0\n",
|
||
"Iris-virginica 6.2 3.4 5.4 2.3\n",
|
||
"Iris-virginica 5.9 3.0 5.1 1.8\n",
|
||
"\n",
|
||
"[150 rows x 4 columns]"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"df = pd.read_csv(\n",
|
||
" './iris.data', \n",
|
||
" index_col=4, \n",
|
||
" names=[\n",
|
||
" 'Sepal length', \n",
|
||
" 'Sepal width', \n",
|
||
" 'Petal length', \n",
|
||
" 'Petal width',\n",
|
||
" 'Class'\n",
|
||
" ])\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 3** Odpowiedz na poniższe pytania:\n",
|
||
" 1. Które atrybuty są wejściowe, a w której kolumnie znajduje się klasa wyjściowa?\n",
|
||
" 1. Ile jest różnych klas? Wypisz je ekran.\n",
|
||
" 1. Jaka jest średnia wartość w kolumnie ``sepal_length``? Jak zachowuje się średnia, jeżeli policzymy ją dla każdej z klas osobno?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"3"
|
||
]
|
||
},
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"np.average(df['Sepal length'])\n",
|
||
"df.groupby('Class')['Sepal length'].mean()\n",
|
||
"len(df.index.drop_duplicates())\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Wytrenujmy klasyfikator *KNN*, ale najpierw przygotujmy dane. Fukcja ``train_test_split`` dzieli zadany zbiór danych na dwie części. My wykorzystamy ją do podziału na zbiór treningowy (66%) i testowy (33%), służy do tego parametr ``test_size``."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.model_selection import train_test_split\n",
|
||
"\n",
|
||
"data = pd.read_csv(\n",
|
||
" './iris.data', \n",
|
||
" names=[\n",
|
||
" 'sepal_length', \n",
|
||
" 'sepal_width', \n",
|
||
" 'petal_length', \n",
|
||
" 'petal_width',\n",
|
||
" 'class'\n",
|
||
" ])\n",
|
||
"\n",
|
||
"X = data.loc[:, 'sepal_length':'petal_width']\n",
|
||
"Y = data['class']\n",
|
||
"\n",
|
||
"(train_X, test_X, train_Y, test_Y) = train_test_split(X, Y, test_size=0.33, random_state=42)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Trenowanie klasyfikatora wygląda bardzo podobnie do treningi modelu regresji liniowej:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<style>#sk-container-id-1 {color: black;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier(n_neighbors=3)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier(n_neighbors=3)</pre></div></div></div></div></div>"
|
||
],
|
||
"text/plain": [
|
||
"KNeighborsClassifier(n_neighbors=3)"
|
||
]
|
||
},
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.neighbors import KNeighborsClassifier\n",
|
||
"\n",
|
||
"model = KNeighborsClassifier(n_neighbors=3)\n",
|
||
"model.fit(train_X, train_Y)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Mając wytrenowany model możemy wykorzystać go do predykcji na zbiorze testowym."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n",
|
||
"Zaklasyfikowane: Iris-setosa, Orginalne: Iris-setosa\n",
|
||
"Zaklasyfikowane: Iris-virginica, Orginalne: Iris-virginica\n",
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n",
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n",
|
||
"Zaklasyfikowane: Iris-setosa, Orginalne: Iris-setosa\n",
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n",
|
||
"Zaklasyfikowane: Iris-virginica, Orginalne: Iris-virginica\n",
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n",
|
||
"Zaklasyfikowane: Iris-versicolor, Orginalne: Iris-versicolor\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"predicted = model.predict(test_X)\n",
|
||
"\n",
|
||
"for i in range(10):\n",
|
||
" print(\"Zaklasyfikowane: {}, Orginalne: {}\".format(predicted[i], test_Y.reset_index()['class'][i]))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Możemy obliczyć *accuracy*:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.98\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.metrics import accuracy_score\n",
|
||
"\n",
|
||
"print(accuracy_score(test_Y, predicted))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 4** Wytrenuj nowy model ``model_2`` zmieniając liczbę sąsiadów na 20. Czy zmieniły się wyniki?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0.98\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.neighbors import KNeighborsClassifier\n",
|
||
"\n",
|
||
"model = KNeighborsClassifier(n_neighbors=20)\n",
|
||
"model.fit(train_X, train_Y)\n",
|
||
"\n",
|
||
"from sklearn.metrics import accuracy_score\n",
|
||
"\n",
|
||
"print(accuracy_score(test_Y, predicted))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 5** Wytrenuj model z $k=1$. Przeprowadź walidację na zbiorze trenującym zamiast na zbiorze testowym? Jakie wyniki otrzymałeś? Czy jest to wyjątek? Dlaczego tak się dzieje?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"1.0\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.neighbors import KNeighborsClassifier\n",
|
||
"\n",
|
||
"model = KNeighborsClassifier(n_neighbors=1)\n",
|
||
"model.fit(train_X, train_Y)\n",
|
||
"\n",
|
||
"predicted = model.predict(train_X)\n",
|
||
"from sklearn.metrics import accuracy_score\n",
|
||
"\n",
|
||
"print(accuracy_score(train_Y, predicted))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Walidacja krzyżowa"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Zbiór *herbal Iris* jest bardzo małym zbiorem. Wydzielenie z niego zbioru testowego jest obciążone dużą wariancją wyników, tj. w zależności od sposoby wyboru zbioru testowego wyniki mogą się bardzo różnic. Żeby temu zaradzić, stosuje się algorytm [walidacji krzyżowej](https://en.wikipedia.org/wiki/Cross-validation_(statistics). Algorytm wygląda następująco:\n",
|
||
" 1. Podziel zbiór danych na $n$ części (losowo).\n",
|
||
" 1. Dla każdego i od 1 do $n$ wykonaj:\n",
|
||
" 1. Weź $i$-tą część jako zbiór testowy, pozostałe dane jako zbiór trenujący.\n",
|
||
" 1. Wytrenuj model na zbiorze trenującym.\n",
|
||
" 1. Uruchom model na danych testowych i zapisz wyniki.\n",
|
||
" 1. Ostateczne wyniki to średnia z $n$ wyników częściowych. \n",
|
||
" \n",
|
||
" W Pythonie służy do tego funkcja ``cross_val_score``, która przyjmuje jako parametry (kolejno) model, zbiór X, zbiór Y. Możemy ustawić parametr ``cv``, który określa na ile części mamy podzielić zbiór danych oraz parametr ``scoring`` określający miarę.\n",
|
||
" \n",
|
||
" W poniższym przykładzie dzielimy zbiór danych na 10 części (10-krotna walidacja krzyżowa) i jako miarę ustawiany celność (ang. accuracy)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.model_selection import cross_val_score\n",
|
||
"\n",
|
||
"knn = KNeighborsClassifier(n_neighbors=k)\n",
|
||
"scores = cross_val_score(knn, X, Y, cv=10, scoring='accuracy')\n",
|
||
"print(\"Wynik walidacji krzyżowej:\", scores.mean())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**zad. 6** Klasyfikator $k$ najbliższych sąsiadów posiada jeden parametr: $k$, który określa liczbę sąsiadów podczas klasyfikacji. Jak widzieliśmy, wybór $k$ może mieć duże znaczenie dla jakości klasyfikatora. Wykonaj:\n",
|
||
" 1. Stwórz listę ``neighbors`` wszystkich liczb nieparzystych od 1 do 50.\n",
|
||
" 1. Dla każdego elementu ``i`` z listy ``neighbors`` zbuduj klasyfikator *KNN* o liczbie sąsiadów równej ``i``. Nastepnie przeprowadz walidację krzyżową (parametry takie same jak powyżej) i zapisz wyniki do tablicy ``cv_scores``.\n",
|
||
" 1. Znajdź ``k``, dla którego klasyfikator osiąga najwyższy wynik."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.9800000000000001"
|
||
]
|
||
},
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.model_selection import cross_val_score\n",
|
||
"\n",
|
||
"neighbors = list(range(2,52,2))\n",
|
||
"\n",
|
||
"cv_scores = list()\n",
|
||
"\n",
|
||
"for i in neighbors:\n",
|
||
" knn = KNeighborsClassifier(n_neighbors=i)\n",
|
||
" cv_scores.append(cross_val_score(knn, X, Y, cv=10, scoring='accuracy').mean())\n",
|
||
"\n",
|
||
"np.max(cv_scores)\n",
|
||
"\n",
|
||
" \n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Wykres przedstawiający precent błedów w zależnosci od liczby sąsiadów."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAkAAAAGxCAYAAACKvAkXAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAABV9UlEQVR4nO3de1xUdf4/8NfMADNch6vcb+YFUQRBRby38gvLSspW1rW8oX1rLS2qTdvUdnu01ramtbq15iXb1dWstM3KNEpSxBsDXvKuwIAwXETu95nz+wOZIlFhGDhzeT0fj3k8dOYzZ15z1Hj3OZ/z/kgEQRBAREREZEWkYgcgIiIi6m0sgIiIiMjqsAAiIiIiq8MCiIiIiKwOCyAiIiKyOiyAiIiIyOqwACIiIiKrwwKIiIiIrI6N2AFMlU6nQ2FhIZydnSGRSMSOQ0RERJ0gCAKqq6vh5+cHqfT28zwsgG6jsLAQgYGBYscgIiIiA+Tn5yMgIOC2r7MAug1nZ2cArSfQxcVF5DRERETUGVVVVQgMDNT/HL8dFkC30XbZy8XFhQUQERGRmbnb8hUugiYiIiKrwwKIiIiIrA4LICIiIrI6LICIiIjI6rAAIiIiIqvDAoiIiIisDgsgIiIisjosgIiIiMjqsAAiIiIiq8MCiIiIiKwOCyAiIiKyOiyAiIiIyOqwACIiIqJetXCbCu+lXkJlXbNoGVgAERERUa85XVCJr04V4b3US6hrbhEtBwsgIiIi6jUbDl0FADw41Be+SnvRcrAAIiIiol5RWFGPr04VAQDmj+srahYWQERERNQrthzORYtOwKi+7hjirxQ1CwsgIiIi6nE1jS3YdkwNAJg/VtzZH4AFEBEREfWCT47no7qhBX09HfGbsD5ix2EBRERERD1LqxOwKT0HADBvbCikUonIiVgAERERUQ/79icNCm7Uw83BFtOiA8SOA4AFEBEREfWwDQdbb31/fFQw7O1kIqdpxQKIiIiIekxm3g2o1BWwk0nxRFyw2HH0WAARERFRj9l4s/Hh1Cg/9HFWiJzmZyyAiIiIqEfkl9dh7xkNAPEbH/4aCyAiIiLqEZvSc6ATgHH9PTHQx1nsOO2wACIiIiKjq6xvxifH8wEAC0xs9gdgAUREREQ94L/H1Kht0mKgtzPG9fcUO84tWAARERGRUTVrdfgoPRcAkDwuFBKJ+I0Pf40FEBERERnVV6eKoKlqgKeTHFOj/MSO0yGTKIDWrVuHkJAQKBQKxMbG4tixY3ccv3PnToSFhUGhUCAiIgJff/11u9clEkmHj7fffrsnvwYREZHVEwQBG27e+j47LhhyG9NofPhrohdAO3bsQEpKClasWAGVSoXIyEgkJCSgpKSkw/GHDx/GjBkzkJycjKysLCQmJiIxMRFnzpzRjykqKmr32LRpEyQSCaZNm9ZbX4uIiMgqHblajjPXqqCwlWLmKNNpfPhrEkEQBDEDxMbGYsSIEVi7di0AQKfTITAwEM8++yyWLFlyy/ikpCTU1tZiz549+udGjRqFqKgofPDBBx1+RmJiIqqrq5GamtrpXFVVVVAqlaisrISLi0sXvxUREZF1mr/lOL47V4KZsUF445GIXv/8zv78FnUGqKmpCZmZmYiPj9c/J5VKER8fj4yMjA7fk5GR0W48ACQkJNx2fHFxMb766iskJyffMUtjYyOqqqraPYiIiKjzrpTW4LtzrVdwkseGipzmzkQtgMrKyqDVauHt7d3ueW9vb2g0mg7fo9FoujR+y5YtcHZ2xqOPPnrHLCtXroRSqdQ/AgMDu/BNiIiIaNOhHABA/KA+6OvlJHKaOxN9DVBP27RpE2bOnAmF4s77jyxduhSVlZX6R35+fi8lJCIiMn/ltU34NLMAgOlte9ERGzE/3NPTEzKZDMXFxe2eLy4uho+PT4fv8fHx6fT4gwcP4sKFC9ixY8dds8jlcsjl8i6kJyIiojZbj+ShsUWHIf4uiA11FzvOXYk6A2RnZ4eYmJh2i5N1Oh1SU1MRFxfX4Xvi4uJuWcy8f//+Dsdv3LgRMTExiIyMNG5wIiIi0mto1mJLRh6A1m0vTLHx4a+JOgMEACkpKZg9ezaGDx+OkSNHYs2aNaitrcXcuXMBALNmzYK/vz9WrlwJAFi8eDEmTJiAVatWYcqUKdi+fTtOnDiB9evXtztuVVUVdu7ciVWrVvX6dyIiIrIm/ztZiLKaRvgqFXggwlfsOJ0iegGUlJSE0tJSLF++HBqNBlFRUdi7d69+obNarYZU+vNE1ejRo7Ft2za8+uqreOWVV9C/f3/s3r0bQ4YMaXfc7du3QxAEzJgxo1e/DxERkTURBAEbD7Yufp4zOgS2MvNYXix6HyBTxT5AREREd/fjxVLM2nQMjnYyHF46CUp7W1HzmEUfICIiIjJvHx5s3fZi+ohA0YufrmABRERERAa5oKnGwUtlkEqAeWNMu/Hhr7EAIiIiIoNsuDn7M3mIDwLdHURO0zUsgIiIiKjLSqob8EV2IQDzaHz4ayyAiIiIqMv+nZGHJq0O0UGuiA5yEztOl7EAIiIioi6pb9LiP0d+bnxojlgAERERUZd8pirAjbpmBLrb477BHW9dZepYABEREVGn6XSCftf3eWNCIZOa/rYXHWEBRERERJ32/fkSXC2rhbPCBr8dHih2HIOxACIiIqJOa2t8+PvYIDjJRd9Ry2AsgIiIiKhTThdU4mhOOWykEswZHSJ2nG5hAURERESdsuFQ6+zPg0N94au0FzlN97AAIiIiorsqrKjHV6eKAJhn48NfM9+Ld0RERHRHF4ursTk9F81aXbePdbW0Bi06AaP6umOIv9II6cTFAoiIiMhCLf38NDLzbhj1mOba+PDXWAARERFZIJX6BjLzbsBOJsXi+P6QSrrfr8ffzR6/CetjhHTiYwFERERkgdp2ap8a5YeF9/YTOY3p4SJoIiIiC5NfXoe9ZzQAgORxoSKnMU0sgIiIiCzMpvQc6ARgXH9PhPm4iB3HJLEAIiIisiCV9c345Hg+AMu4Xb2nsAAiIiKyINuPqVHbpMUAbyeM7+8pdhyTxQKIiIjIQjRrdfjocC4AYP7YvpAY4c4vS8UCiIiIyEJ8fboIRZUN8HSSY+owP7HjmDQWQERERBZAEAT9Tu2z4oIht5GJnMi0sQAiIiKyAEdzynHmWhXkNlI8PipY7DgmjwUQERGRBdhwMAcAMC0mAO6OdiKnMX0sgIiIiMzc1dIapJ4vBgAkj2Xjw85gAURERGTmNqXnQBCA+EF9cI+Xk9hxzAILICIiIjN2o7YJn2YWAACSx7LxYWexACIiIjJjW4/moaFZhyH+LhjV113sOGaDBRAREZGZamzRYktGHgA2PuwqFkBERERm6n/ZhSitboSPiwJThvqKHcessAAiIiIyQ4IgYOOh1lvf54wJga2MP9K7gmeLiIjIDB26XIbzmmo42MkwY2SQ2HHMDgsgIiIiM/ThzcaH04cHQmlvK3Ia88MCiIiIyMxc0FTjx4ulkEqAeWPY+NAQLICIiIjMzMZDrZueJgz2QZCHg8hpzBMLICIiIjNSUt2A3VmFAID549j40FAsgIiIiMzIfzLy0KTVYViQK2KC3cSOY7ZYABEREZmJ+iYt/n2ktfHhAs7+dAsLICIiIjPxeVYBbtQ1I8DNHveFe4sdx6yxACIiIjIDOp2AjTdvfZ83JhQ2bHzYLTx7REREZuCHCyW4WlYLZ4UNpo8IFDuO2WMBREREZAY+PNh66/vvRwbBSW4jchrzxwKIiIjIxJ25VokjV8thI5Vg9ugQseNYBBZAREREJm7DzdmfKUN94edqL3Iay8ACiIiIyIQVVdZjz6kiAMD8sbz13VhYABEREZmwjw7nokUnIDbUHREBSrHjWAwWQERERCaqtrEF246qAXDbC2NjAURERGSiPjmRj+qGFoR6OmJSWB+x41gUFkBEREQmSKsTsCn9ZuPDsaGQSiUiJ7IsLICIiIhM0L6fNMgvr4ebgy0eiw4QO47FMYkCaN26dQgJCYFCoUBsbCyOHTt2x/E7d+5EWFgYFAoFIiIi8PXXX98y5ty5c3j44YehVCrh6OiIESNGQK1W99RXICIiMqoNh1pnfx4fFQx7O5nIaSyP6AXQjh07kJKSghUrVkClUiEyMhIJCQkoKSnpcPzhw4cxY8YMJCcnIysrC4mJiUhMTMSZM2f0Y65cuYKxY8ciLCwMBw4cwKlTp7Bs2TIoFIre+lpEREQGU6lvIDPvBuxkUjwRFyx2HIskEQRBEDNAbGwsRowYgbVr1wIAdDodAgMD8eyzz2LJkiW3jE9KSkJtbS327Nmjf27UqFGIiorCBx98AAD43e9+B1tbW/z73/82OFdVVRWUSiUqKyvh4uJi8HGIiIi6auFWFb46XYTfxgTg7d9Gih3HrHT257eoM0BNTU3IzMxEfHy8/jmpVIr4+HhkZGR0+J6MjIx24wEgISFBP16n0+Grr77CgAEDkJCQgD59+iA2Nha7d+++Y5bGxkZUVVW1exAREfW2/PI6fHOmtfFh8rhQkdNYLlELoLKyMmi1Wnh7e7d73tvbGxqNpsP3aDSaO44vKSlBTU0N3nzzTUyePBn79u3DI488gkcffRRpaWm3zbJy5UoolUr9IzCQO+0SEVHv+/JUIXQCMLafJ8J8eAWip4i+BsjYdDodAGDq1Kl4/vnnERUVhSVLluDBBx/UXyLryNKlS1FZWal/5Ofn91ZkIiIiPVXeDQDAxIFeIiexbDZifrinpydkMhmKi4vbPV9cXAwfH58O3+Pj43PH8Z6enrCxsUF4eHi7MYMGDcKhQ4dum0Uul0MulxvyNYiIiIxCEASo1BUAgGFBbuKGsXCizgDZ2dkhJiYGqamp+ud0Oh1SU1MRFxfX4Xvi4uLajQeA/fv368fb2dlhxIgRuHDhQrsxFy9eRHAwV9ITEZHpyrteh/LaJtjJpBjiz8tfPUnUGSAASElJwezZszF8+HCMHDkSa9asQW1tLebOnQsAmDVrFvz9/bFy5UoAwOLFizFhwgSsWrUKU6ZMwfbt23HixAmsX79ef8yXXnoJSUlJGD9+PO69917s3bsXX375JQ4cOCDGVyQiIuoUlbr18tcQfxfIbdj7pyeJXgAlJSWhtLQUy5cvh0ajQVRUFPbu3atf6KxWqyGV/jxRNXr0aGzbtg2vvvoqXnnlFfTv3x+7d+/GkCFD9GMeeeQRfPDBB1i5ciUWLVqEgQMH4rPPPsPYsWN7/fsRERF1VubN9T/RvPzV40TvA2Sq2AeIiIh62/3vHsS5oir8c2Y0HojwFTuOWTKLPkBERETUqqaxBRc0rT3oYoI5A9TTWAARERGZgFP5FdAJgL+rPbxduHVTT2MBREREZALa1v8MC3IVN4iVYAFERERkAtruAOMC6N7BAoiIiEhkgiAgK78CANf/9BYWQERERCK7WlaLirpmyG2kGOTLO497AwsgIiIikbWt/xkaoISdDX809waeZSIiIpFlcf1Pr2MBREREJDJVXgUAIJrrf3oNCyAiIiIRVTU042JJNQDOAPUmFkBEREQiylZXQBCAQHd7eDnLxY5jNVgAERERiYj9f8TBAoiIiEhEKnUFAPb/6W0sgIiIiESi0wm8A0wkLICIiIhEcrm0BtUNLbC3lSHMx1nsOFaFBRAREZFIVDcbIEYGKmEj44/k3sSzTUREJBIugBYPCyAiIiKRtG2BwQKo97EAIiIiEkFFXROulNYCAIYFuYobxgqxACIiIhJBVn4FACDU0xEeTmyA2NtYABEREYkg6+blL87+iIMFEBERkQgyuQBaVCyAiIiIeplWJyD7ZgdoFkDiYAFERETUyy4WV6O2SQtHOxkGsgGiKFgAERER9bK2/j9RQa6QSSUip7FOBhVA33//PRoaGoydhYiIyCqw/4/4bAx508MPP4yWlhaMGDECEydOxIQJEzBmzBjY29sbOx8REZHFyeL6H9EZNAN048YNpKam4v7778exY8fwyCOPwNXVFWPGjMGrr75q7IxEREQWo7y2CTllbIAoNokgCEJ3D/LTTz/h7bffxtatW6HT6aDVao2RTVRVVVVQKpWorKyEi4uL2HGIiMhCpJ4rRvKWE7jHyxGpL0wUO47F6ezPb4MugV28eBEHDhzAgQMHkJaWhsbGRowbNw5///vfMXHiREMzExERWTyu/zENBhVAYWFh8PLywuLFi7FkyRJERERAIuEqdiIiorvR7wAfzAJITAatAVq0aBH8/f3xl7/8BU899RT+9Kc/Yd++fairqzN2PiIiIovRotXhZH4lACCGBZCoDCqA1qxZA5VKBY1Gg6VLl6KpqQl/+tOf4OnpiTFjxhg7IxERkUU4r6lGfbMWzgob9PNyEjuOVetWI0StVovm5mY0NjaioaEBjY2NuHDhgrGyERERWRR9A8RAV0jZAFFUBq0BWrRoEQ4cOICzZ8/Czc0N48ePx4IFCzBx4kREREQYOyMREXWSIAj4149XcbmkRuwoPcrLWY7n4vtDbiMTO0qXqLgA2mQYVAAVFRXhySefxMSJEzFkyBBjZyIiIgMduFiKN785L3aMXtHHWY65Y0LFjtElqpsNELn+R3wGFUA7d+40dg4iIjKCDQevAgDuHeiFkaEeIqfpGZdKqvG56ho2pedgVlyI2eylVVrdCHV5HSSS1j3ASFwGFUAAcOXKFaxZswbnzp0DAISHh2Px4sW45557jBaOiIg672xhFdIvX4dMKsHriUMQ4OYgdqQeUd+kxffnS5BfXo99P2lwf4Sv2JE6pW39T/8+TnBR2Iqchjq1CFqlUrXr7vztt98iPDwcx44dw9ChQzF06FAcPXoUgwcPxv79+3ssLBER3d6GQ62zP/cP8bHY4gcA7O1keDw2GADw4c0ZL3PQVgDx8pdp6FQBlJaWhilTpqC2tnXvkiVLluD555/H0aNH8c477+Cdd97B0aNH8dxzz+Hll1/u0cBERHSr4qoGfHmyEAAwf1xfkdP0vFmjg2Enk0KlrtB3VjZ1WXkVAIBhXABtEjpVAD3//PMYP348JkyYAAA4d+4ckpOTbxk3b948nD171rgJiYjorrYczkWzVsCIEDdEBbqKHafH9XFW4OEoPwDAxkOmPwvUrNXhZEEFAN4BZio63QfolVdewerVqwEAXl5eyM7OvmVMdnY2+vTpY7RwRER0d3VNLdh6VA3AOmZ/2swf13oH2N4zGuSXm/ZOBGcLq9DYooPS3hZ9PR3FjkPo4iLocePGAQAWLFiAJ598ElevXsXo0aMBAOnp6XjrrbeQkpJi/JRERHRbn2YWoLK+GcEeDogf5C12nF4T5uOCcf09cfBSGTal52DFQ4PFjnRb+v2/gtgA0VQYdBfYsmXL4OzsjFWrVmHp0qUAAD8/P7z22mtYtGiRUQMSEdHtaXUCNh7KAQAkjw01m1vCjWX+uL44eKkMnxzPx3PxA6C0N827q9r6//Dyl+kwaCsMiUSC559/HgUFBaisrERlZSUKCgqwePFi7gpPRNSLvjtXjLzrdVDa2+KxmACx4/S68f09McDbCbVNWmw/phY7zm3pO0DzDjCT0a29wADA2dkZzs7OxshCRERd1Nb4cGZsEBzsDG7tZrYkEgnmj21d9/TR4Vw0a3UiJ7pVcVUDrlXUQyoBIq1ggbq56PS/lmHDhnV6dkelUhkciIiIOic7vwLHc2/AVibB7NEhYscRzdRhfvjbtxdQVNmAr08XYWqUv9iR2mmb/Rno4wInufUVqaaq0zNAiYmJmDp1KqZOnYqEhARcuXIFcrkcEydOxMSJE6FQKHDlyhUkJCT0ZF4iIrqpbfbnoUg/eLsoRE4jHrmNDLPifm6MKAiCyIna++UCaDIdnS5FV6xYof/1/PnzsWjRIrz++uu3jMnPzzdeOiIi6lDBjTp8c0YDAPpLQNbs8VHBWPfDZZy5VoWjOeUY1dd09kHL5A7wJsmgNUA7d+7ErFmzbnn+8ccfx2effdbtUEREdGcfpedCqxMwpp8Hwv1cxI4jOndHO0y7uQh8gwltj9HYosWZa1UAuADa1BhUANnb2yM9Pf2W59PT06FQWO80LBFRb6huaMb2462z7Zz9+Vny2NbGiN+dK8HV0hqR07T6qbAKTVod3B3tEOJhufuzmSODVmM999xzePrpp6FSqTBy5EgAwNGjR7Fp0yYsW7bMqAGJiKi9HcfzUdPYgn59nDBhgJfYcUzGPV5OmBTWB6nnS7DxUA7eeCRC7Eg/3/4e5Mo2MSbGoBmgJUuWYMuWLcjMzMSiRYuwaNEiqFQqbN68GUuWLOny8datW4eQkBAoFArExsbi2LFjdxy/c+dOhIWFQaFQICIiAl9//XW71+fMmQOJRNLuMXny5C7nIiIyNS1aHTan5wIA5o8NZVfhX2nbCuQzVQHKa5tETvPzAmhugGp6DO4DNH36dKSnp6O8vBzl5eVIT0/H9OnTu3ycHTt2ICUlBStWrIBKpUJkZCQSEhJQUlLS4fjDhw9jxowZSE5ORlZWFhITE5GYmIgzZ860Gzd58mQUFRXpH//9738N+p5ERKbkmzMaXKuoh4ejHRKHmdbt3qZgVF93DPZzQUOzDluP5IkdB6qbO8BzAbTp6VYjxO+//x6rV6/GH//4R5SXlyMnJwfXrl3Tv15bW3vXY7zzzjtYsGAB5s6di/DwcHzwwQdwcHDApk2bOhz/7rvvYvLkyXjppZcwaNAgvP7664iOjsbatWvbjZPL5fDx8dE/3Nz4l4+IzJsgCPoFvk/EBUNhKxM5kemRSCRYcHMWaEtGHhpbtKJlKayoh6aqATKpBJGBStFyUMcMKoDUajViYmLw4IMPYsuWLVi9ejUqKipw+PBh/SUwQRAwePCdN6ZrampCZmYm4uPjfw4klSI+Ph4ZGRkdvicjI6PdeABISEi4ZfyBAwfQp08fDBw4EE8//TSuX79+xyyNjY2oqqpq9yAiMiUn8m7gZEEl7GykeHxUsNhxTNaUob7wcVGgrKYRX2QXipaj7fLXIF9nq+zSbeoMKoCeeuop+Pj4QK1WIzs7W3/nV1RUFLZu3YopU6Zg8eLF0GrvXHmXlZVBq9XC27v97sXe3t7QaDQdvkej0dx1/OTJk/Hxxx8jNTUVb731FtLS0nD//fffMc/KlSuhVCr1j8DAwDtmJyLqbW2zP9Oi/eHpJBc5jemylUkxZ0wIAGDjwRzRGiOy/49pM6gAOnDgAP72t7/B09Oz3fNKpRK2trZ4++23ERISgt27dxsjY5f97ne/w8MPP4yIiAgkJiZiz549OH78OA4cOHDb9yxdulS/sWtlZSUbOhKRScktq8W+s8UAfr7dm25vxsggONjJcKG4GgcvlYmSgTvAmzaD5uScnJxQVnbrXyiNRgMvLy+Eh4cjPDz8rsfx9PSETCZDcXFxu+eLi4vh4+PT4Xt8fHy6NB4A+vbtC09PT1y+fBmTJk3qcIxcLodczv+jIiLTtDk9B4IA3DvQC/36cAPqu1Ha22L68EB8dDgXGw7lYHwvtwtoaNbibGElACCGDRBNkkEzQA8//DD+8Ic/4OTJkwCg723wz3/+E9OmTev0cezs7BATE4PU1FT9czqdDqmpqYiLi+vwPXFxce3GA8D+/ftvOx4ACgoKcP36dfj6+nY6GxGRqaioa8InJwoA/HybN93dvDGhkEqAHy+W4oKmulc/+8y1SjRrBXg6yRHgZt+rn02dY1ABtGrVKvj5+SE6Ohq+vr6oq6vDqFGjcOXKFfz1r3/t0rFSUlLw4YcfYsuWLTh37hyefvpp1NbWYu7cuQCAWbNmYenSpfrxixcvxt69e7Fq1SqcP38er732Gk6cOIFnnnkGAFBTU4OXXnoJR44cQW5uLlJTUzF16lT069ePG7USkVnadkyN+mYtBvm6YPQ9prPHlakL8nBAwuDWqwMbD/Xu9hiZbIBo8gy6BKZUKrF//34cOnQIp06dQk1NDaKjo2+5O6szkpKSUFpaiuXLl0Oj0SAqKgp79+7VL3RWq9WQSn+u00aPHo1t27bh1VdfxSuvvIL+/ftj9+7dGDJkCABAJpPh1KlT2LJlCyoqKuDn54f77rsPr7/+Oi9xEZHZaWrRYcvhXACtjQ/5w7Rr5o/ri2/OaLA7qxAvJgxEH+fe2a5JvwM8L3+ZLIlgwPJ4tVqNrVu3tpuZsTRVVVVQKpWorKyEiws3GiQicXyuKkDKJyfRx1mOQy//BnY23WrfZpUe+Wc6stQVWPSbfki5b2CPf54gCBj511SUVjdi51NxGBHi3uOfST/r7M/vTs8Avffee/pft7S0YOXKlairq4OX160LyxYtWtTFuERE9GuCIODDgzkAgNmjQ1j8GGjBuL74w1YV/n0kD09P7Ad7u55tIFlwox6l1Y2wkUoQ4c8GiKaq0wXQ6tWr2/3ew8MDf//73+Hi4gIHh593uJVIJCyAiIiMIOPKdZwrqoK9rQwzY4PEjmO27gv3RoCbPQpu1OPzrALMjO3ZJpJtl78G+7mwW7cJ63QBlJOTc8tzW7duxZdffont27cbNRQREQEf3mx8+NvhAXB1sBM5jfmykUkxb0wo/rLnLDYezMGMEUE9uomsfgd4rv8xad2aT505cyY2b95srCxERHTT5ZJq/HChFBJJ6+3c1D3TRwTCWWGDq2W1+OFCx5ttGwsbIJoHgwug1NRUPPjggxgyZAjuuecePPjgg/juu++MmY2IyGptPNQ66/7/BnkjxNNR5DTmz0lug9+PbL2M2Daz1hPqmlpwtqh1L0nOAJk2gwqgf/7zn5g8eTKcnZ2xePFiLF68GC4uLnjggQewbt06Y2ckIrIq12sa8ZnqGgBgwXg2PjSWOWNCYCOV4MjVcpy5Vtkjn3GqoBJanQBvFzn8lL1zyz0ZxqAC6K9//StWr16N//73v1i0aBEWLVqEbdu2YfXq1V1uhEhERO39+0gemlp0iAxQYjhnEYzGV2mPKUNbdwTY0EOzQG0LoGOC3dizycQZVABVVFRg8uTJtzx/3333obKyZ6pqIiJr0NCsxb8z8gC0NvHjD1Hjmj+2dUZtz6kiFFXWG/34qrwKAFz/Yw4M3gts165dtzz/xRdf4MEHH+x2KCIia7U76xqu1zbB39Ue9w+5/SbPZJiIACViQ93RohPw0c0O28YiCIJ+BmgYCyCTZ1AjxPDwcLzxxhs4cOCAfhPSI0eOID09HS+88ILxUxIRWQGdTsCGm4uf544JgY2MjQ97woJxfXE0pxzbjqrx7G/6w0lu0K5Qt8i7Xofy2ibYyaQY4s8dBEydwY0Q3dzccPbsWZw9e1b/nKurKzZt2oRXX33VeAmJiKxE2qVSXC6pgZPcBtNHBIodx2L9JqwP+no64mpZLXaeyMdcI7UZaJv9GeLvArkNGyCaum41QiQiIuNpW5j7uxGBcFHYipzGckmlEswbG4pXd5/BpvQczIoLgcwIjRH1G6Dy8pdZ4PwqEZEJOFtYhfTL1yGTSjBnTIjYcSzetOgAuDnYIr+8Hvt+0hjlmJltC6B5555ZMM6FT+qU6oZmfJB2BRc01Vj/xPAebcVORD0vO78C/z2qhlYQun2ss4WtzfPuH+KDADeHu4ym7rK3k+HxUcH4x/eXsfKb80g9373u0IIAXNDcbIDIGSCzwAKoFylsZdh4KAcNzTpcLatFvz5OYkciIgNpdQKe256F3Ot1Rj3u/HFsfNhbnogLxr9+vAp1eR3U5cb5cwzxcIAPGyCaBRZAvchWJsVQf1ccyy2HKu8GCyAiM/bduWLkXq+Di8IGT0/sZ5Rj9uvjhKhAV6Mci+6uj7MCW+fH4kTuDaMcTyIB7h3YxyjHop5nUAGkVqsRGBh4S4MuQRCQn5+PoKAgo4SzRMOCbxZA6hu8y4PIjG082HpjyMxRwXh64j0ipyFDjQhxx4gQd7FjkAgMWgQdGhqK0tLSW54vLy9HaCh3Lb6TmJvXhtvuFiAi83MyvwLHcsthI5VgdlyI2HGIyAAGFUCCIHTYnr2mpgYKBa993knb3QGXSmpQWd8schoiMkTbbuIPR/pxvQeRmerSJbCUlBQAgEQiwbJly+Dg8POdClqtFkePHkVUVJRRA1oaTyc5gtwdoC6vQ3Z+BSYM8BI7EhF1QcGNOnxzpvW26eRxnPEmMlddKoCysrIAtM4AnT59GnZ2dvrX7OzsEBkZiRdffNG4CS1QdJAr1OV1UOXdYAFEZGY+Ss+FVidg9D0eGOynFDsOERmoSwXQDz/8AACYO3cu3n33Xbi4cK8TQ8QEu2F3diHXARGZmeqGZmw/ng8AmM/ZHyKzZtBdYJs3bzZ2DqvStktwtroCOp3AhohEZmLH8XzUNLbgHi9HTBzA252JzJlBBVBtbS3efPNNpKamoqSkBDqdrt3rV69eNUo4SxXm4wwHOxmqG1twqaQGA32cxY5ERHfRotVhc3ouACB5bF/+jwuRmTOoAJo/fz7S0tLwxBNPwNfXt8M7wuj2bGRSDA1Q4sjV1n5ALICITN83ZzS4VlEPd0c7PBrtL3YcIuomgwqgb775Bl999RXGjBlj7DxWIybYrbUAyruBGSPZOJLIlAmCoN+p/fFRwVDYykRORETdZVAfIDc3N7i7s3Nmd7RtlpfJhdBEJu9E3g2cLKiEnY0UT4wKFjsOERmBQQXQ66+/juXLl6OuzribAFqTtoXQV0trUVHXJHIaIrqTttmfR6L84eUsFzkNERmDQZfAVq1ahStXrsDb2xshISGwtbVt97pKpTJKOEvm7miHUE9H5JTVIktdgXvDeEcJkSnKLavFvrPFANj4kMiSGFQAJSYmGjmGdYoOckNOWS1U6hssgIhM1Ob0HAgCMGGAFwZ484YFIkthUAG0YsUKY+ewStHBrvhMVYDMPK4DIjJFFXVN+OREAQBgwbi+IqchImMyaA0QAFRUVGDDhg1YunQpysvLAbRe+rp27ZrRwlm6toXQJ/MroNUJIqchol/bdkyN+mYtwnycMaafh9hxiMiIDJoBOnXqFOLj46FUKpGbm4sFCxbA3d0dn3/+OdRqNT7++GNj57RIA7yd4SS3QU1jCy5oqhHux61FiExFU4sOWw7nAgDmj+vLfmdEFsagGaCUlBTMmTMHly5dgkKh0D//wAMP4McffzRaOEsnk0oQFegKANwXjMjE7DlViOKqRvRxluPhSD+x4xCRkRlUAB0/fhz/93//d8vz/v7+0Gg03Q5lTaKDXAEAKq4DIjIZgiDgw4M5AIDZo0NgZ2PwagEiMlEG/auWy+Woqqq65fmLFy/Cy8ur26GsybDg1nVAnAEiMh0ZV67jXFEV7G1lmBnLTu1ElsigAujhhx/GX/7yFzQ3NwMAJBIJ1Go1Xn75ZUybNs2oAS1ddGBrAZR7vQ7XaxpFTkNEAPDhzcaHj8UEwNXBTuQ0RNQTDCqAVq1ahZqaGvTp0wf19fWYMGEC+vXrB2dnZ7zxxhvGzmjRlA626NfHCQCQpa4QNwwR4XJJNX64UAqJBJg3lo0PiSyVQXeBKZVK7N+/H+np6Th58iRqamoQHR2N+Ph4Y+ezCtFBrrhcUoNM9Q3Eh3uLHYfIqm081Lr2J36QN0I9HUVOQ0Q9xaACqM2YMWO4I7wRRAe54ZMTBVwITSSy6zWN+EzV2suMjQ+JLJtBl8AWLVqE995775bn165di+eee667maxOzM2F0KcKKtGi1Ymchsh6/ftIHppadBgaoMSIEDex4xBRDzKoAPrss886nPkZPXo0Pv30026Hsjb3eDnBRWGD+mYtzmuqxY5DZJUamrX4d0YeADY+JLIGBhVA169fh1KpvOV5FxcXlJWVdTuUtZFKJYi6uS0G9wUjEsfurGu4XtsEP6UC9w/xETsOEfUwgwqgfv36Ye/evbc8/80336BvX143N4S+ISL7ARH1OkEQsOHm4ue5Y0JhK2PjQyJLZ9Ai6JSUFDzzzDMoLS3Fb37zGwBAamoqVq1ahTVr1hgzn9WIYUNEItEcuFiKyyU1cJLbIGlkoNhxiKgXGFQAzZs3D42NjXjjjTfw+uuvAwBCQkLw/vvvY9asWUYNaC2iAl0hkQD55fUoqW5AH2fF3d9EREax8ea2F0kjAuGisBU5DRH1BoPneZ9++mkUFBSguLgYVVVVuHr1KoufbnBW2GJAH2cAgCqvQtwwRFbkbGEVDl0ug1QCzBkdInYcIuol3brQXVpaigsXLiA7O5uLn40gOtgVAJDFy2BEvaat8eH9Eb4IdHcQOQ0R9RaDCqDa2lrMmzcPvr6+GD9+PMaPHw9fX18kJyejrq7O2BmtRnQQ1wER9abiqgb872Rr48P53PaCyKoYVAClpKQgLS0NX375JSoqKlBRUYEvvvgCaWlpeOGFF4yd0WpE/6IhYlMLGyIS9bQth3PRrBUwPNgNw4LY+JDImhjcCHHjxo24//774eLiAhcXFzzwwAP48MMP2QixG/p6OsLVwRaNLTqcLaoSOw6RRatrasHWo2oAwPxxnP0hsjYGFUB1dXXw9r51084+ffrwElg3SCQSDAt0BQDuC0bUwz7NLEBlfTOC3B3w/8LZ+JDI2hhUAMXFxWHFihVoaGjQP1dfX48///nPiIuL6/Lx1q1bh5CQECgUCsTGxuLYsWN3HL9z506EhYVBoVAgIiICX3/99W3HPvXUU5BIJGbTn4j9gIh6nlYnYNPNxc/zxoRAJuW2F0TWxqACaM2aNUhPT0dAQAAmTZqESZMmITAwEIcPH8a7777bpWPt2LEDKSkpWLFiBVQqFSIjI5GQkICSkpIOxx8+fBgzZsxAcnIysrKykJiYiMTERJw5c+aWsbt27cKRI0fg5+dnyNcURdtC6Cx1hbhBiCzYd+eKkXu9Di4KG/x2OBsfElkjgwqgiIgIXLp0CStXrkRUVBSioqLw5ptv4tKlSxg8eHCXjvXOO+9gwYIFmDt3LsLDw/HBBx/AwcEBmzZt6nD8u+++i8mTJ+Oll17CoEGD8PrrryM6Ohpr165tN+7atWt49tlnsXXrVtjamk9js8hAV0glwLWKemgqG+7+BiLqsrbGh7+PDYaj3KB+sERk5rr8L7+5uRlhYWHYs2cPFixY0K0Pb2pqQmZmJpYuXap/TiqVIj4+HhkZGR2+JyMjAykpKe2eS0hIwO7du/W/1+l0eOKJJ/DSSy91uiBrbGxEY2Oj/vdVVeIsQnaU22CgjwvOFVVBpb6BByJ8RclBZEoaW7RYvf8Symoa7z74LppadDiWWw4bqYSND4msWJcLIFtb23Zrf7qjrKwMWq32lgXV3t7eOH/+fIfv0Wg0HY7XaDT637/11luwsbHBokWLOp1l5cqV+POf/9yF9D0nJti1tQDKYwFEBAD/OaLGB2lXjHrMhyP94KPkljNE1sqgud+FCxfirbfewoYNG2BjY1rTx5mZmXj33XehUqkgkXR+YePSpUvbzSxVVVUhMFCctQHRQW74zxE1MrkQmggtWp1+wfK06AD06+PU7WPKbaR4NNq/28chIvNlUPVy/PhxpKamYt++fYiIiICjo2O71z///PNOHcfT0xMymQzFxcXtni8uLoaPT8e3pfr4+Nxx/MGDB1FSUoKgoCD961qtFi+88ALWrFmD3NzcDo8rl8shl8s7lbuntS2E/ulaFRpbtJDbyERORCSevT9pcK2iHu6OdnjjkSFQ2PLfAxF1n0GLoF1dXTFt2jQkJCTAz88PSqWy3aOz7OzsEBMTg9TUVP1zOp0Oqampt72dPi4urt14ANi/f79+/BNPPIFTp04hOztb//Dz88NLL72Eb7/91oBv2/uCPRzg7miHJq0OZ66xISJZL0EQ8OHNBcuPjwpm8UNERmPQDNDmzZuNFiAlJQWzZ8/G8OHDMXLkSKxZswa1tbWYO3cuAGDWrFnw9/fHypUrAQCLFy/GhAkTsGrVKkyZMgXbt2/HiRMnsH79egCAh4cHPDw82n2Gra0tfHx8MHDgQKPl7kkSiQTRQW747lwxstQ39L2BiKxNZt4NnMyvgJ2NFE+MChY7DhFZkC7NAOl0Orz11lsYM2YMRowYgSVLlqC+vr5bAZKSkvD3v/8dy5cvR1RUFLKzs7F37179Qme1Wo2ioiL9+NGjR2Pbtm1Yv349IiMj8emnn2L37t0YMmRIt3KYmrad4TPZEZqs2Iabsz+PRPnDy9k0LlETkWWQCIIgdHbw66+/jtdeew3x8fGwt7fHt99+ixkzZty2Z485q6qqglKpRGVlJVxcXHr9849cvY7frT8Cbxc5jiyd1KUF3USWIO96LSb+/QAEAdj3/HgM8HYWOxIRmYHO/vzu0gzQxx9/jH/+85/49ttvsXv3bnz55ZfYunUrdDruXG5skQGukEklKK5qRCEbIpIV2pyeC0EAJgzwYvFDREbXpQJIrVbjgQce0P8+Pj4eEokEhYWFRg9m7eztZAj3ba1cuTEqWZvKumZ8ciIfALBgXF+R0xCRJepSAdTS0gKFon3jMFtbWzQ3Nxs1FLWKDnIFwHVAZH22HVOjrkmLMB9njOnncfc3EBF1UZfuAhMEAXPmzGnXL6ehoQFPPfVUu15Ane0DRHcWHeyGLRl5yGJDRLIiTS06fHS4dfHz/HF9uf6NiHpElwqg2bNn3/Lc448/brQw1J6+IWJhFRqateyBQlbhq9OFKK5qhJezHA9FcisYIuoZXSqAjNn/h+4uwM0eXs5ylFY34vS1SowIcRc7ElGPEgQBH/7YOvszZ3QIu6ATUY8xqBM09Y7WhoiuALgOiKxDxtXrOFtUBXtbGWbGBt39DUREBmIBZOLaLoPxTjCyBm2NDx+LCYCrg53IaYjIkrEAMnFt22Co1BXoQs9KIrNzuaQG358vgUQCzBsbKnYcIrJwLIBM3BB/JWxlEpTVNKLgRve2HSEyZRsPtc7+xA/yRqin411GExF1DwsgE6ewlSHcTwmA64DIcl2vacTnqgIAbHxIRL2DBZAZaFsIrWI/ILJQ/zmiRmOLDkMDlBgR4iZ2HCKyAiyAzMDP64BYAJHlaWjW4t9HcgGw8SER9R4WQGag7U6wc0XVqGtqETkNkXF9kX0NZTVN8FMqcP8QH7HjEJGVYAFkBvxc7eHjooBWJ+BkfqXYcYiMRhAE/a3vc8eEwlbG/yQRUe/gf23MRHSwKwBeBiPLknaxFJdKauAkt0HSyECx4xCRFWEBZCbaLoNxY1SyJG2zP0kjAuGisBU5DRFZExZAZiKaDRHJwpwrqsKhy2WQSlr3/SIi6k0sgMzEYD8X2MmkKK9tQu71OrHjEHVb2+zP/RG+CHR3EDkNEVkbFkBmQm4jwxB/FwDcF4zMX0lVA/538hoAYD63vSAiEbAAMiPsB0SWYktGLpq1AoYHu2FYEBsfElHvYwFkRvQ7w6srxA1C1A11TS3YelQNAJg/jrM/RCQOFkBmpG0h9AVNFWoa2RCRzNNnmQWoqGtGkLsD/l84Gx8SkThYAJkRbxcF/F3toROAk/kVYsch6jKdTtDv+j5vTAhkUm57QUTiYAFkZvS3w3MhNJmh784VI/d6HVwUNvjtcDY+JCLxsAAyM9wZnszZhpuzP7+PDYaj3EbkNERkzVgAmZlfLoTW6dgQkczHqYIKHMsph41UwsaHRCQ6FkBmJtzPBQpbKSrrm3G1rFbsOESd1tb48KFIP/goFSKnISJrxwLIzNjKpBjq7wqAl8HIfFyrqMdXp4sAAMlsfEhEJoAFkBka1rYzPBdCk5nYcjgXWp2AuL4eGOKvFDsOERELIHP08zogFkBk+qobmvHfm40PF4zn7A8RmQYWQGaorQC6VFKDqoZmkdMQ3dknJwpQ3diCe7wcMXFAH7HjEBEBYAFklryc5Qhyd4AgANncFoNMWItWh003b31PHtsXUjY+JCITwQLITLX1A8rkOiAyYd/+VIxrFfVwd7TDo9H+YschItJjJzIzFR3sht3ZhfhMVYBrFfVix+kxPi4KLI7vD1sZa3VzIwgCPjx4FQDw+KhgKGxlIiciIvoZCyAzFRvqAQAouFGPTzMLRE7Ts/xc7fH72CCxY1AXqcvrkJ1fAVuZBE+MChY7DhFROyyAzNRAH2f864kYXC213GaI5zVV+CK7EBsOXcXvRgRy/YiZabs8O8RfCS9nuchpiIjaYwFkxhIG+4gdoUfVNLbg+3MluFpaiwMXS/CbMG+xI1EXtLVpaLtrkYjIlHBhBZksJ7kNZty89PXhjzkip6GuUuVVAABiglkAEZHpYQFEJm3O6BDIpBJkXL2OM9cqxY5DnVTb2ILzmioAnAEiItPEAohMmp+rPaZE+AIANh7iLJC5OJlfAZ0A+CkV3PiUiEwSCyAyefPHtW6f8OXJQhRVWu4t/5akbf3PMF7+IiITxQKITN7QAFeMDHVHi07AlsN5YsehTlDd7FAew8tfRGSiWACRWVgwri8AYNvRPNQ2toichu5EEISf7wDjDBARmSgWQGQWJoX1QainI6oaWrDzRL7YcegOrpbVoqKuGXIbKcJ9XcSOQ0TUIRZAZBakUgnmjW1dC7QpPRdanSByIrod1c0GiBH+StjZ8D8xRGSa+F8nMhuPRQfA1cEW6vI67D+rETsO3YZ+/Q8vfxGRCWMBRGbD3k6Gx2Nb95T68CBviTdVWW13gHEBNBGZMBZAZFZmxQXDTiZFZt4N/UJbMh1VDc24UFwNAIgOdhU3DBHRHbAAIrPSx0WBh6P8AAAbOQtkck7mV0AQgAA3e/RxZgNEIjJdLIDI7CTfXAz9zZki5JfXiZyGfon7fxGRuWABRGZnkK8LxvX3hE4ANqfnih2HfoE7wBORuTCJAmjdunUICQmBQqFAbGwsjh07dsfxO3fuRFhYGBQKBSIiIvD111+3e/21115DWFgYHB0d4ebmhvj4eBw9erQnvwL1svk3GyPuOK5GVUOzyGkIAHQ6gQUQEZkN0QugHTt2ICUlBStWrIBKpUJkZCQSEhJQUlLS4fjDhw9jxowZSE5ORlZWFhITE5GYmIgzZ87oxwwYMABr167F6dOncejQIYSEhOC+++5DaWlpb30t6mHj+3tigLcTapu02H5MLXYcAnCltAbVDS1Q2EoR5ussdhwiojuSCIIgake52NhYjBgxAmvXrgUA6HQ6BAYG4tlnn8WSJUtuGZ+UlITa2lrs2bNH/9yoUaMQFRWFDz74oMPPqKqqglKpxHfffYdJkyZ1KlfbeyorK+Hiwm62puiT4/n442en4KdUIO2P98JWJno9b9V2HFfj5c9OIzbUHTv+L07sOERkpTr781vUnxhNTU3IzMxEfHy8/jmpVIr4+HhkZGR0+J6MjIx24wEgISHhtuObmpqwfv16KJVKREZG3jZLY2Mjqqqq2j3ItD0c5QdPJzsUVjbg69NFYsexem0LoLn/FxGZA1ELoLKyMmi1Wnh7e7d73tvbGxpNx51+NRpNp8bv2bMHTk5OUCgUWL16Nfbv3w9PT8/bZlm5ciWUSqX+ERgYaOC3ot6isJVhVlwIAGDjoRyIPJlp9TK5/oeIzIjFXjO49957kZ2djcOHD2Py5MmYPn36bdcVAcDSpUtRWVmpf+Tnc8NNczAzNghyGylOFVTiWE652HGsVmVdMy6X1AAAooNcxQ1DRNQJohZAnp6ekMlkKC4ubvd8cXExfHx8OnyPj49Pp8Y7OjqiX79+GDVqFDZu3AgbGxts3LjxtlnkcjlcXFzaPcj0eTjJMS0mAACw4RAbI4olK7919ifEwwEeTnKR0xAR3Z2oBZCdnR1iYmKQmpqqf06n0yE1NRVxcR0vooyLi2s3HgD2799/2/G/PG5jY2P3Q5PJmTemtTHid+eKkVNWK3Ia69S2AzwvfxGRuRD9ElhKSgo+/PBDbNmyBefOncPTTz+N2tpazJ07FwAwa9YsLF26VD9+8eLF2Lt3L1atWoXz58/jtddew4kTJ/DMM88AAGpra/HKK6/gyJEjyMvLQ2ZmJubNm4dr167ht7/9rSjfkXpWvz5OmBTWB4IAbOIskCjadoAfxgXQRGQmbMQOkJSUhNLSUixfvhwajQZRUVHYu3evfqGzWq2GVPpznTZ69Ghs27YNr776Kl555RX0798fu3fvxpAhQwAAMpkM58+fx5YtW1BWVgYPDw+MGDECBw8exODBg0X5jtTzkseFIvV8CXZm5iPl/w2Am6Od2JGshlYnIDu/AgAQwxkgIjITovcBMlXsA2ReBEHAg/84hJ8Kq/BSwkAsvLef2JGsxnlNFSavOQhHOxlOvZYAmVQidiQismJm0QeIyFgkEgnmj2tdC/TR4Vw0tmhFTmQ9Mm+u/4kMdGXxQ0RmgwUQWYwpEX7wcVGgtLoRX55kY8Teom+AyMtfRGRGWACRxbCzkWL26BAAwIaDV9kYsZdk3WyAGMMF0ERkRlgAkUX5/cggONjJcF5TjfTL18WOY/HKa5tw9WbrgWFsgEhEZoQFEFkUpYMtpg9v3cbkw4NXRU5j+dpmf/p6OcLVgXfeEZH5YAFEFmfemFBIJEDaxVJcLK4WO45FU3H/LyIyUyyAyOIEeTggIbx1a5SNB9kYsSe1LYDm+h8iMjcsgMgiLRjfekv8rqxrKK3mFig9oUWrw8mCCgCcASIi88MCiCxSdJAbogJd0aTV4d9H8sSOY5HOa6pR16SFs9wG/fs4iR2HiKhLWACRRZJIJFgwri8A4D9H8tDQzMaIxta2ADoqyBVSNkAkIjPDAogsVsJgb/i72qO8tgmfq66JHcfitG2AystfRGSOWACRxbKRSTFvbOtaoA2HrkKnY2NEY9LfAcYF0ERkhlgAkUWbPjwAznIbXC2txYGLJWLHsRhlNY3Iu14HAIgKdBU3DBGRAWzEDkDUk5wVtpgRG4T1P17F63vO4evTGrEjtWMrk2LemBD093YWO0qXqG5ugNq/jxOU9rYipyEi6joWQGTx5owOweb0HOSU1SLn5rYNpuRySTV2PjVa7Bhd0rb+h/1/iMhcsQAii+fnao+P58UiO79C7Cjt6AQBa767iOO5N5CdX2FWl5LYAZqIzB0LILIKcfd4IO4eD7Fj3OJqaS0+UxVgw8GrWPv7aLHjdEqzVodTbQ0Qg11FzUJEZCgugiYSUfLNu9S+OaNBwY06kdN0zrmiKjQ066C0t0VfTzZAJCLzxAKISEThfi4Y288TWp2Azem5YsfplLYF0MPYAJGIzBgLICKRJY9rnQXacTwfVQ3NIqe5OzZAJCJLwAKISGQTB3ihfx8n1DS2YMexfLHj3FVmHhdAE5H5YwFEJDKJRKJfC7Q5PQctWp3IiW6vpKoB1yrqIZUAkYFKseMQERmMBRCRCUgc5g8PRzsUVjbg6zOm1azxl9pufx/g7QxnBRsgEpH5YgFEZAIUtjI8ERcMANhw8CoEwTT3LdNf/mIDRCIycyyAiEzEE6OCYWcjxamCShzPvSF2nA5xATQRWQoWQEQmwsNJjmnR/gBaZ4FMTVOLDqevVQLgFhhEZP5YABGZkLbF0PvPFZvcvmU/FVaiqUUHd0c7hHg4iB2HiKhbWAARmZB+fZxx70AvCELrHWGmpG39z7BAV0gkbIBIROaNBRCRiVkwri8AYOeJAlTUNYmc5mdZbet/ePmLiCwACyAiExN3jwcG+bqgvlmLrUfVYsfR4w7wRGRJWAARmRiJRIIFN7fH2HI4F00t4jdGLKyoR1FlA2RSCRsgEpFFYAFEZIIeHOoHbxc5Sqob8eXJQrHj6Gd/wnyc4WBnI3IaIqLuYwFEZILsbKSYPToEAPChCTRGVOVVAODlLyKyHCyAiEzUzJHBsLeV4bymGoevXBc1S9sMEPv/EJGlYAFEZKKUDraYPjwAQOsskFgamrX4qbC1ASJngIjIUrAAIjJh88aGQiIBDlwoxaXialEynLlWiWatAE8nOwS624uSgYjI2FgAEZmwYA9H3BfuDQDYeEicxohtl7+GBbmxASIRWQwWQEQmrq0x4udZ11BW09jrn9+2AJrrf4jIkrAAIjJxMcFuiAx0RVOLDv/OyOvVzxYEAZlsgEhEFogFEJGJ+2VjxP8cyUNDs7bXPrvgRj1KqxthI5VgaAAbIBKR5WABRGQGJg/2gb+rPa7XNmFX1rVe+9y29T/hfi5Q2Mp67XOJiHoaCyAiM2Ajk2LumBAArYuhdbreaYyo3wCVl7+IyMKwACIyE0kjAuEst8HlkhqkXSztlc/Ub4DKBdBEZGFYABGZCWeFLX43MhAAsOFQzzdGrG/S4mxhFQAgOsi1xz+PiKg3sQAiMiNzxoRCJpUg/fJ1fXfmnnKqoAItOgHeLnL4u7IBIhFZFhZARGbE39UeD0T4Auj5xoiqX6z/YQNEIrI0LICIzMz8sa23xH95shDFVQ099jkq9v8hIgvGAojIzEQGumJkiDuatQK2HM7tkc8QBAGqvLYF0K498hlERGJiAURkhpJvNkbcelSNuqYWox9fXV6H67VNsJNJMdiPDRCJyPKwACIyQ/GDvBHi4YDK+mbsPFFg9OO3Xf4a7M8GiERkmVgAEZkhmVSCeTfXAm1Kz4HWyI0RM/O4/oeILBsLICIz9VhMAJT2tsi7Xof9Z4uNeuy2HeBZABGRpTKJAmjdunUICQmBQqFAbGwsjh07dsfxO3fuRFhYGBQKBSIiIvD111/rX2tubsbLL7+MiIgIODo6ws/PD7NmzUJhYWFPfw2iXuVgZ4OZsUEAgI1GbIxY29iC85qbDRC5AJqILJToBdCOHTuQkpKCFStWQKVSITIyEgkJCSgpKelw/OHDhzFjxgwkJycjKysLiYmJSExMxJkzZwAAdXV1UKlUWLZsGVQqFT7//HNcuHABDz/8cG9+LaJeMXt0CGxlEhzPvYHs/AqjHPNkQQV0AuCnVMBXyQaIRGSZJIIg9M6uircRGxuLESNGYO3atQAAnU6HwMBAPPvss1iyZMkt45OSklBbW4s9e/bonxs1ahSioqLwwQcfdPgZx48fx8iRI5GXl4egoKBO5aqqqoJSqURlZSVcXFwM+GZEvSPlk2x8rrqGB4f6Yu3vo7t9vLXfX8Lf913ElKG+WGeE4xER9abO/vy26cVMt2hqakJmZiaWLl2qf04qlSI+Ph4ZGRkdvicjIwMpKSntnktISMDu3btv+zmVlZWQSCRwdXW97ZjGxkY0Njbqf19VVdW5L0Eksvlj++Jz1TV8c0aDFz45ie42bT6WUw6A63+IyLKJWgCVlZVBq9XC29u73fPe3t44f/58h+/RaDQdjtdoNB2Ob2howMsvv4wZM2bcsRJcuXIl/vznP3fxGxCJL9zPBWP7eeLQ5TJ8pjLeLfGxoe5GOxYRkakRtQDqac3NzZg+fToEQcD7779/x7FLly5tN7NUVVWFwMDAno5IZBSrpkfiy5OFaNYa54p2iIcDhvizASIRWS5RCyBPT0/IZDIUF7e/hbe4uBg+Pj4dvsfHx6dT49uKn7y8PHz//fd3Xccjl8shl8sN+BZE4vN2UWD+uL5ixyAiMhui3gVmZ2eHmJgYpKam6p/T6XRITU1FXFxch++Ji4trNx4A9u/f3258W/Fz6dIlfPfdd/Dw8OiZL0BERERmSfRLYCkpKZg9ezaGDx+OkSNHYs2aNaitrcXcuXMBALNmzYK/vz9WrlwJAFi8eDEmTJiAVatWYcqUKdi+fTtOnDiB9evXA2gtfh577DGoVCrs2bMHWq1Wvz7I3d0ddnZ24nxRIiIiMhmiF0BJSUkoLS3F8uXLodFoEBUVhb179+oXOqvVakilP09UjR49Gtu2bcOrr76KV155Bf3798fu3bsxZMgQAMC1a9fwv//9DwAQFRXV7rN++OEHTJw4sVe+FxEREZku0fsAmSr2ASIiIjI/nf35LXonaCIiIqLexgKIiIiIrA4LICIiIrI6LICIiIjI6rAAIiIiIqvDAoiIiIisDgsgIiIisjosgIiIiMjqsAAiIiIiqyP6Vhimqq1BdlVVlchJiIiIqLPafm7fbaMLFkC3UV1dDQAIDAwUOQkRERF1VXV1NZRK5W1f515gt6HT6VBYWAhnZ2dUV1cjMDAQ+fn53BesF1VVVfG8i4DnXRw87+LgeRdHT553QRBQXV0NPz+/dpup/xpngG5DKpUiICAAACCRSAAALi4u/AciAp53cfC8i4PnXRw87+LoqfN+p5mfNlwETURERFaHBRARERFZHRZAnSCXy7FixQrI5XKxo1gVnndx8LyLg+ddHDzv4jCF885F0ERERGR1OANEREREVocFEBEREVkdFkBERERkdVgAERERkdVhAdQJ69atQ0hICBQKBWJjY3Hs2DGxI1mUH3/8EQ899BD8/PwgkUiwe/fudq8LgoDly5fD19cX9vb2iI+Px6VLl8QJayFWrlyJESNGwNnZGX369EFiYiIuXLjQbkxDQwMWLlwIDw8PODk5Ydq0aSguLhYpsWV4//33MXToUH3zt7i4OHzzzTf613nOe8ebb74JiUSC5557Tv8cz73xvfbaa5BIJO0eYWFh+tfFPucsgO5ix44dSElJwYoVK6BSqRAZGYmEhASUlJSIHc1i1NbWIjIyEuvWrevw9b/97W9477338MEHH+Do0aNwdHREQkICGhoaejmp5UhLS8PChQtx5MgR7N+/H83NzbjvvvtQW1urH/P888/jyy+/xM6dO5GWlobCwkI8+uijIqY2fwEBAXjzzTeRmZmJEydO4De/+Q2mTp2Kn376CQDPeW84fvw4/vWvf2Ho0KHtnue57xmDBw9GUVGR/nHo0CH9a6Kfc4HuaOTIkcLChQv1v9dqtYKfn5+wcuVKEVNZLgDCrl279L/X6XSCj4+P8Pbbb+ufq6ioEORyufDf//5XhISWqaSkRAAgpKWlCYLQeo5tbW2FnTt36secO3dOACBkZGSIFdMiubm5CRs2bOA57wXV1dVC//79hf379wsTJkwQFi9eLAgC/773lBUrVgiRkZEdvmYK55wzQHfQ1NSEzMxMxMfH65+TSqWIj49HRkaGiMmsR05ODjQaTbs/A6VSidjYWP4ZGFFlZSUAwN3dHQCQmZmJ5ubmduc9LCwMQUFBPO9GotVqsX37dtTW1iIuLo7nvBcsXLgQU6ZMaXeOAf5970mXLl2Cn58f+vbti5kzZ0KtVgMwjXPOzVDvoKysDFqtFt7e3u2e9/b2xvnz50VKZV00Gg0AdPhn0PYadY9Op8Nzzz2HMWPGYMiQIQBaz7udnR1cXV3bjeV5777Tp08jLi4ODQ0NcHJywq5duxAeHo7s7Gye8x60fft2qFQqHD9+/JbX+Pe9Z8TGxuKjjz7CwIEDUVRUhD//+c8YN24czpw5YxLnnAUQkZVbuHAhzpw50+7aPPWcgQMHIjs7G5WVlfj0008xe/ZspKWliR3LouXn52Px4sXYv38/FAqF2HGsxv3336//9dChQxEbG4vg4GB88sknsLe3FzFZK14CuwNPT0/IZLJbVqUXFxfDx8dHpFTWpe0888+gZzzzzDPYs2cPfvjhBwQEBOif9/HxQVNTEyoqKtqN53nvPjs7O/Tr1w8xMTFYuXIlIiMj8e677/Kc96DMzEyUlJQgOjoaNjY2sLGxQVpaGt577z3Y2NjA29ub574XuLq6YsCAAbh8+bJJ/H1nAXQHdnZ2iImJQWpqqv45nU6H1NRUxMXFiZjMeoSGhsLHx6fdn0FVVRWOHj3KP4NuEAQBzzzzDHbt2oXvv/8eoaGh7V6PiYmBra1tu/N+4cIFqNVqnncj0+l0aGxs5DnvQZMmTcLp06eRnZ2tfwwfPhwzZ87U/5rnvufV1NTgypUr8PX1NY2/772y1NqMbd++XZDL5cJHH30knD17VnjyyScFV1dXQaPRiB3NYlRXVwtZWVlCVlaWAEB45513hKysLCEvL08QBEF48803BVdXV+GLL74QTp06JUydOlUIDQ0V6uvrRU5uvp5++mlBqVQKBw4cEIqKivSPuro6/ZinnnpKCAoKEr7//nvhxIkTQlxcnBAXFydiavO3ZMkSIS0tTcjJyRFOnTolLFmyRJBIJMK+ffsEQeA5702/vAtMEHjue8ILL7wgHDhwQMjJyRHS09OF+Ph4wdPTUygpKREEQfxzzgKoE/7xj38IQUFBgp2dnTBy5EjhyJEjYkeyKD/88IMA4JbH7NmzBUFovRV+2bJlgre3tyCXy4VJkyYJFy5cEDe0mevofAMQNm/erB9TX18v/OEPfxDc3NwEBwcH4ZFHHhGKiorEC20B5s2bJwQHBwt2dnaCl5eXMGnSJH3xIwg8573p1wUQz73xJSUlCb6+voKdnZ3g7+8vJCUlCZcvX9a/LvY5lwiCIPTOXBMRERGRaeAaICIiIrI6LICIiIjI6rAAIiIiIqvDAoiIiIisDgsgIiIisjosgIiIiMjqsAAiIiIiq8MCiIjIBF2+fBl//etfUV9fL3YUIovEAoiIDCKRSLB79+5uHyc3NxcSiQTZ2dndPlZv+eijj+Dq6trt44SEhGDNmjW3PN/Q0IDHHnsMfn5+JrFrNpElshE7ABGZpjlz5qCiouK2RU5RURHc3Nx6N5SJSEpKwgMPPNBjx3/22WeRmJiIOXPm9NhnEFk7FkBEZBAfHx+xI4jG3t6+R2dmPvzwwx47NhG14iUwIjLIry+BFRQUYMaMGXB3d4ejoyOGDx+Oo0ePAmi91CORSG55/NL58+cxevRoKBQKDBkyBGlpafrXtFotkpOTERoaCnt7ewwcOBDvvvvuHfPduHEDM2fOhJeXF+zt7dG/f39s3rxZ//rLL7+MAQMGwMHBAX379sWyZcvQ3Nysf/3kyZO499574ezsDBcXF8TExODEiRMAbr0EduXKFUydOhXe3t5wcnLCiBEj8N1337XLU1JSgoceegj29vYIDQ3F1q1bb8msVqsxdepUODk5wcXFBdOnT0dxcTEAoLKyEjKZTJ9Bp9PB3d0do0aN0r//P//5DwIDA+94XoioFWeAiKjbampqMGHCBPj7++N///sffHx8oFKpoNPpAADHjx+HVqsF0FrMPPbYY7C1tW13jJdeeglr1qxBeHg43nnnHTz00EPIycmBh4cHdDodAgICsHPnTnh4eODw4cN48skn4evri+nTp3eYadmyZTh79iy++eYbeHp64vLly+0WFDs7O+Ojjz6Cn58fTp8+jQULFsDZ2Rl//OMfAQAzZ87EsGHD8P7770MmkyE7O/uWzL/8/g888ADeeOMNyOVyfPzxx3jooYdw4cIFBAUFAWi9pFhYWIgffvgBtra2WLRoEUpKSvTH0Ol0+uInLS0NLS0tWLhwIZKSknDgwAEolUpERUXhwIEDGD58OE6fPg2JRIKsrCzU1NTo3zdhwgQD/xSJrEyv7TtPRGZl9uzZwtSpU2/7OgBh165dgiAIwr/+9S/B2dlZuH79+l2Pu2jRIiE4OFgoKSkRBEEQcnJyBADCm2++qR/T3NwsBAQECG+99dZtj7Nw4UJh2rRpt339oYceEubOnXvXPG3efvttISYmRv97Z2dn4aOPPupw7ObNmwWlUnnH4w0ePFj4xz/+IQiCIFy4cEEAIBw7dkz/+rlz5wQAwurVqwVBEIR9+/YJMplMUKvV+jE//fRTu/elpKQIU6ZMEQRBENasWSMkJSUJkZGRwjfffCMIgiD069dPWL9+fae/M5E14yUwIuq27OxsDBs2DO7u7ncct379emzcuBH/+9//4OXl1e61uLg4/a9tbGwwfPhwnDt3Tv/cunXrEBMTAy8vLzg5OWH9+vVQq9W3/aynn34a27dvR1RUFP74xz/i8OHD7V7fsWMHxowZAx8fHzg5OeHVV19td7yUlBTMnz8f8fHxePPNN3HlypXbflZNTQ1efPFFDBo0CK6urnBycsK5c+f0xzt37hxsbGwQExOjf09YWFi7y2jnzp1DYGBgu0tY4eHhcHV11Z+HCRMm4NChQ9BqtUhLS8PEiRMxceJEHDhwAIWFhbh8+TImTpx425xE9DMWQETUbZ1ZEPzDDz/g2Wefxccff4yhQ4d26fjbt2/Hiy++iOTkZOzbtw/Z2dmYO3cumpqabvue+++/H3l5eXj++edRWFiISZMm4cUXXwQAZGRkYObMmXjggQewZ88eZGVl4U9/+lO747322mv46aefMGXKFHz//fcIDw/Hrl27OvysF198Ebt27cJf//pXHDx4ENnZ2YiIiLhjPkOMHz8e1dXVUKlU+PHHH9sVQGlpafDz80P//v2N+plElooFEBF129ChQ5GdnY3y8vIOX798+TIee+wxvPLKK3j00Uc7HHPkyBH9r1taWpCZmYlBgwYBANLT0zF69Gj84Q9/wLBhw9CvX787zsi08fLywuzZs/Gf//wHa9aswfr16wEAhw8fRnBwMP70pz9h+PDh6N+/P/Ly8m55/4ABA/D8889j3759ePTRR9stov6l9PR0zJkzB4888ggiIiLg4+OD3Nxc/ethYWH679TmwoULqKio0P9+0KBByM/PR35+vv65s2fPoqKiAuHh4QAAV1dXDB06FGvXroWtrS3CwsIwfvx4ZGVlYc+ePVz/Q9QFLICI6LYqKyuRnZ3d7vHLH9BtZsyYAR8fHyQmJiI9PR1Xr17FZ599hoyMDNTX1+Ohhx7CsGHD8OSTT0Kj0egfv7Ru3Trs2rUL58+fx8KFC3Hjxg3MmzcPANC/f3+cOHEC3377LS5evIhly5bh+PHjd8y+fPlyfPHFF7h8+TJ++ukn7NmzR19Q9e/fH2q1Gtu3b8eVK1fw3nvvtZvdqa+vxzPPPIMDBw4gLy8P6enpOH78uP79v9a/f398/vnnyM7OxsmTJ/H73/9evwAcAAYOHIjJkyfj//7v/3D06FFkZmZi/vz57WbO4uPjERERgZkzZ0KlUuHYsWOYNWsWJkyYgOHDh+vHTZw4EVu3btUXO+7u7hg0aBB27NjBAoioK8RehEREpmn27NkCgFseycnJgiC0XwQtCIKQm5srTJs2TXBxcREcHByE4cOHC0ePHtUvcu7oIQg/L4Letm2bMHLkSMHOzk4IDw8Xvv/+e/2xGxoahDlz5ghKpVJwdXUVnn76aWHJkiVCZGTkbfO//vrrwqBBgwR7e3vB3d1dmDp1qnD16lX96y+99JLg4eEhODk5CUlJScLq1av1C5sbGxuF3/3ud0JgYKBgZ2cn+Pn5Cc8884xQX18vCMKti6BzcnKEe++9V7C3txcCAwOFtWvXChMmTBAWL16sH1NUVCRMmTJFkMvlQlBQkPDxxx8LwcHB+kXQgiAIeXl5wsMPPyw4OjoKzs7Owm9/+1tBo9G0+167du0SAAjvv/++/rnFixcLAITz58/f8c+UiH4mEQRBEKHuIiIyKY8++iiWLFmCkSNHih2FiHoBCyAisnoVFRUIDg7GjRs3AABSKVcHEFk6/isnIqvn4uKCkSNHIiAgAO+9957YcYioF3AGiIiIiKwOZ4CIiIjI6rAAIiIiIqvDAoiIiIisDgsgIiIisjosgIiIiMjqsAAiIiIiq8MCiIiIiKwOCyAiIiKyOiyAiIiIyOr8f0Nm4EXiTlmYAAAAAElFTkSuQmCC",
|
||
"text/plain": [
|
||
"<Figure size 640x480 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"import matplotlib.pyplot as plt\n",
|
||
"# changing to misclassification error\n",
|
||
"MSE = [1 - x for x in cv_scores]\n",
|
||
"\n",
|
||
"# plot misclassification error vs k\n",
|
||
"plt.plot(neighbors, MSE)\n",
|
||
"plt.xlabel('Liczba sąsiadów')\n",
|
||
"plt.ylabel('Procent błędów')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## TF IDF Vectorizer\n",
|
||
"\n",
|
||
"Czasami, żeby wytrenować model nie da się zastosować bezpośrednio danego typu danych, ponieważ najczęściej wejściem do algorytmu ML jest wektor, macierz lub tensor.\n",
|
||
"Dane tekstowe musimy również przekształcić do wektorów. Przydatny w tym przypadku jest TF IDF Vectorizer.\n",
|
||
"Oto przyład z dokumentacji jak można z niego skorzystać (https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html)\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"(4, 9)\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
|
||
"corpus = [\n",
|
||
" 'This is the first document.',\n",
|
||
" 'This document is the second document.',\n",
|
||
" 'And this is the third one.',\n",
|
||
" 'Is this the first document?',\n",
|
||
"]\n",
|
||
"vectorizer = TfidfVectorizer()\n",
|
||
"X = vectorizer.fit_transform(corpus)\n",
|
||
"vectorizer.get_feature_names_out()\n",
|
||
"print(X.shape)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',\n",
|
||
" 'this'], dtype=object)"
|
||
]
|
||
},
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"vectorizer.get_feature_names_out()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"<4x9 sparse matrix of type '<class 'numpy.float64'>'\n",
|
||
"\twith 21 stored elements in Compressed Sparse Row format>"
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"X"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"matrix([[0. , 0.46979139, 0.58028582, 0.38408524, 0. ,\n",
|
||
" 0. , 0.38408524, 0. , 0.38408524],\n",
|
||
" [0. , 0.6876236 , 0. , 0.28108867, 0. ,\n",
|
||
" 0.53864762, 0.28108867, 0. , 0.28108867],\n",
|
||
" [0.51184851, 0. , 0. , 0.26710379, 0.51184851,\n",
|
||
" 0. , 0.26710379, 0.51184851, 0.26710379],\n",
|
||
" [0. , 0.46979139, 0.58028582, 0.38408524, 0. ,\n",
|
||
" 0. , 0.38408524, 0. , 0.38408524]])"
|
||
]
|
||
},
|
||
"execution_count": 60,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"X.todense()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Na podstawie tych danych możemy wytrenowąc model regresji logistycznej. Jest to model regresji liniowej z dodatkową nałożoną funkcją logistyczną:\n",
|
||
" ( https://en.wikipedia.org/wiki/Logistic_function )\n",
|
||
" \n",
|
||
" \n",
|
||
"![Przykład 1](./logistic.png)\n",
|
||
"\n",
|
||
"\n",
|
||
"Dzięki wyjściu modelu zawsze pomiędzy 0, a 1 można traktować wynik jako prawdopodobieństwo\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.linear_model import LogisticRegression"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 62,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([0, 0, 1, 0])"
|
||
]
|
||
},
|
||
"execution_count": 62,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"y = [0,0,1,1]\n",
|
||
"model = LogisticRegression()\n",
|
||
"model.fit(X, y)\n",
|
||
"model.predict(X)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"array([[0.51514316, 0.48485684],\n",
|
||
" [0.56428483, 0.43571517],\n",
|
||
" [0.40543928, 0.59456072],\n",
|
||
" [0.51514316, 0.48485684]])"
|
||
]
|
||
},
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"model.predict_proba(X)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Sieci neuronowe\n",
|
||
"\n",
|
||
"Warto zauważyć, że sieci neuronowe w najprostszym wariancie to tak naprawdę złożenie wielu funkcji regresji logistycznej ze sobą, gdzie wejściem jednego modelu regresji logistycznej jest wyjście poprzedniej. W przypadku danych tekstowych zazwyczaj jest wybierana wtedy inna reprezentacja danych niż TF IDF, ponieważ TF IDF nie uwzględnia kolejności słów"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Standard Scaler\n",
|
||
"\n",
|
||
"**Zadanie 7**\n",
|
||
"\n",
|
||
"\n",
|
||
"Sprawdź dokumentację https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html\n",
|
||
"\n",
|
||
"KNN jest wrażliwy na liniowe skalowanie danych (w przeciwieństwie do modeli bazujących na regresji, gdyż współczynniki liniowe rekompensują skalowanie liniowe).\n",
|
||
"\n",
|
||
"Wytrenuj dowolny model KNN na cechach pozyskanych ze StandardScaler. Pamiętaj, żeby wyskalować zarówno dane ze zbioru test jak i train.\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"Zauważ, że scaler ma podobne API (fit, transform) jak TF IDF Vectorier"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Sepal length</th>\n",
|
||
" <th>Sepal width</th>\n",
|
||
" <th>Petal length</th>\n",
|
||
" <th>Petal width</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Class</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>5.1</td>\n",
|
||
" <td>3.5</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.9</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.7</td>\n",
|
||
" <td>3.2</td>\n",
|
||
" <td>1.3</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>4.6</td>\n",
|
||
" <td>3.1</td>\n",
|
||
" <td>1.5</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-setosa</th>\n",
|
||
" <td>5.0</td>\n",
|
||
" <td>3.6</td>\n",
|
||
" <td>1.4</td>\n",
|
||
" <td>0.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.7</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.2</td>\n",
|
||
" <td>2.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.3</td>\n",
|
||
" <td>2.5</td>\n",
|
||
" <td>5.0</td>\n",
|
||
" <td>1.9</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.5</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.2</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>6.2</td>\n",
|
||
" <td>3.4</td>\n",
|
||
" <td>5.4</td>\n",
|
||
" <td>2.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Iris-virginica</th>\n",
|
||
" <td>5.9</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" <td>5.1</td>\n",
|
||
" <td>1.8</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>150 rows × 4 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Sepal length Sepal width Petal length Petal width\n",
|
||
"Class \n",
|
||
"Iris-setosa 5.1 3.5 1.4 0.2\n",
|
||
"Iris-setosa 4.9 3.0 1.4 0.2\n",
|
||
"Iris-setosa 4.7 3.2 1.3 0.2\n",
|
||
"Iris-setosa 4.6 3.1 1.5 0.2\n",
|
||
"Iris-setosa 5.0 3.6 1.4 0.2\n",
|
||
"... ... ... ... ...\n",
|
||
"Iris-virginica 6.7 3.0 5.2 2.3\n",
|
||
"Iris-virginica 6.3 2.5 5.0 1.9\n",
|
||
"Iris-virginica 6.5 3.0 5.2 2.0\n",
|
||
"Iris-virginica 6.2 3.4 5.4 2.3\n",
|
||
"Iris-virginica 5.9 3.0 5.1 1.8\n",
|
||
"\n",
|
||
"[150 rows x 4 columns]"
|
||
]
|
||
},
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"from sklearn.preprocessing import StandardScaler\n",
|
||
"from sklearn \n",
|
||
"\n",
|
||
"df = pd.read_csv(\n",
|
||
" './iris.data', \n",
|
||
" index_col=4, \n",
|
||
" names=[\n",
|
||
" 'Sepal length', \n",
|
||
" 'Sepal width', \n",
|
||
" 'Petal length', \n",
|
||
" 'Petal width',\n",
|
||
" 'Class'\n",
|
||
" ])\n",
|
||
"df"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.11"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 2
|
||
}
|