mpsic_projekt_1_bayes_class.../projekt.ipynb

2895 lines
333 KiB
Plaintext
Raw Permalink Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Naiwna klasyfikacja bayesowska</b>\n",
"\n",
"Naiwna klasyfikacja bayesowska jest to klasyfikacja polegająca na przydzielaniu obiektom prawdopodobieństwa przynależności do danej klasy. Naiwny klasyfikator jest to bardzo prosta metoda klasyfikacji jednak mimo swej prostoty sprawdza się w wielu przypadkach gdy bardziej złożone metody zawodzą.Jednym z powodów dla których naiwny klasyfikator wypadak dobrze jest jego założenie o niezależności predykatów, własnie dlatego nazywany jest klasyfikatorem naiwnym, naiwnie zakłada niezależność atrybutów opisujących dany przykład, co często nie ma odzwierciedlenia w rzeczywistości. Klasyfikator nazywany jets bayesowskim dlatego że wykorzystuje twierdzenie bayesa do obliczania prawdopodobieństw.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Twierdzenie Bayesa</b>\n",
"\n",
"Twierdzneie bayesa jets to twierdzenie tworii prawdopodobieństwa wiążące prawdopodobieństwa warunkowe dwóch zdarzeń warunkujących się nawzajem. Prawdopodobieństwo wystąpienia zdarzenia A gdy wystąpiło zdarzenie B oblicza się wzorem:\n",
"\n",
"$P(A|B) = \\frac{P(B|A) P(A)}{P(B)}$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Wykorzystanie twierdzenia Bayesa w naiwnej klasyfikacji bayesowskiej</b>\n",
"\n",
"<b>1\\) Nawiązując do twierdzenia bayesa prawdopodobieństwo problem klasyfikacji danego obiektu możemy zapisać w następujący sposb:</b>\n",
"<br><br><br>\n",
"$P(K|X)$ Zapis ten oznacza że mając obiekt X chcemy obliczyć prawdopodobieństo przynaleźności do klasy K\n",
"\n",
"$P(K|X) = \\frac{P(X|K) P(K)}{P(X)}$ \n",
"\n",
"$P(K)$ - jest to prawdopodobieństwo a-priori klasy K\n",
"\n",
"$P(X|K)$ - ten zapis możemy interpretować w taki sposób: jeżeli klasa K to szansa że X do niej należy \n",
"\n",
"$P(X)$ - ale jak interpretować prawdopodobieństwo od odibkeu X? okazuje się że nie trzeba tego obliczać gdyż obliczając prawpodobieństwa przynależności X do klasy K w mianowniku zawsze bęzie $P(X)$ więc możemy to pomijać\n",
"\n",
"Najlepiej będzie pokazać to na przykładzie:\n",
"<br><br><br><br>\n",
"<b>2\\) Prawdopobobieństwa a-priori</b>\n",
"<br><br><br>\n",
"Prawdopodobieństwo a priori jest to prawdopodobieństwo obliczane przed realizacją odpowiednich eksperymentów lub obserwacji.\n",
"\n",
"![klasy](https://www.statsoft.pl/textbook/graphics/xNaiveBayesIntro1.gif.pagespeed.ic.tNwdOtpcQH.webp)\n",
"\n",
"Powyższa ilustracja przedstawia 2 klasy czerwona(20 obiektów) i zielona(40 obiektów), prawdopodobieństwa a priori tym przykładzie to nic innego jak \n",
"\n",
"$\\frac{liczebność\\ klasy}{liczebność\\ wszystkich\\ elementów}$, więc prawdopodobieńswta a priori dla powyższego przykładu są równe\n",
"\n",
"\n",
"a priori klasy zielonej = $\\frac{2}{3}\\\\$\n",
"a priori klasy czerwonej = $\\frac{1}{3}\\\\$\n",
"\n",
"To jest nasze $P(K)$\n",
"<br><br><br><br>\n",
"<b>3\\) Klasyfikacja nowego obiektu</b>\n",
"<br><br><br>\n",
"![nowy obiekt](https://www.statsoft.pl/textbook/graphics/xNaiveBayesIntro4.gif.pagespeed.ic.kzJ0xzlOjU.webp)\n",
"\n",
"Klasyfikując nowy obiekt korzystamy z prawdopodobieństw a-priori, jednak same prawdopodobieństwa a-priori są niweystarczające, w tym przykładzie rozsądnym założeniem jest klasyfikowanie obiektu również na podstawie jego najbliższych sąsiadów. Zaznaczając okręgiem obszar możemy obliczyć prawdopodobieństwo że nowy obiekt będzie czerwony albo zielony, prawdopodobieństwa te obliczane są ze wzoru, prawdopodobieństwa te zostaną użyte do obliczenia prawdopodobieństwa a posteriori.\n",
"\n",
"$\\frac{liczba\\ obiektów\\ danej\\ klasy\\ w\\ sąsiedztwie\\ nowego\\ obiektu}{liczba\\ obiektów\\ danej\\ klasy}$ i wynoszą odpowiednio:\n",
"\n",
"prawdopodobieństow że obiekt będzie w klasie zielonej = $\\frac{1}{40}\\\\$\n",
"prawdopodobieństow że obiekt będzie w klasie czerwonej = $\\frac{3}{20}\\\\$\n",
"\n",
"To jets nasze $P(X|K)$\n",
"<br><br><br><br>\n",
"<b>4\\) Prawdopodobieństwo a posteriori</b>\n",
"<br><br><br>\n",
"Mając już obliczone wszystkie potrzebne prawdopodobieństwa możemy obliczyć prawdopodobieństwa a posteriori\n",
"\n",
"Prawdopodobieństwo a posteriori jest to prawdopodobieństwo pewnego zdarzenia gdy wiedza o tym zdarzeniu wzbogacona jest przez pewne obserwacje lub eksperymenty.\n",
"\n",
"W naszym przykładzie prawdopodobieństwo a posteriori obliczymy ze wzoru \n",
"\n",
"$(prawdopodobieństwo\\ a\\ priori\\ przynależności\\ do\\ danej\\ klasy) * (prawdopodobieństwo\\ że\\ nowy\\ obiekt\\ będzie\\ w\\ danej\\ klasie\\ na\\ podstawie\\ jego\\ sąsiadów)$ czyli $P(X|K) P(K)$\n",
"\n",
"A więc prawdopodobieństwa a posteriori są równe:\n",
"\n",
"Prawdopodobieństwo a posteriori ze nowy obiekt będzie w klasie zielonej = $\\frac{2}{3} * \\frac{1}{40} = \\frac{1}{60}\\\\$\n",
"Prawdopodobieństwo a posteriori ze nowy obiekt będzie w klasie czerwonej = $\\frac{1}{3} * \\frac{3}{40} = \\frac{1}{40}\\\\$\n",
"\n"
]
},
2022-05-17 01:12:04 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>Naiwna klasyfikacja bayesowska przy wielu cechach</b>\n",
"<br><br><br>\n",
"Klasyfikator bayesowski możemy stosować na wielu cechach, wykonuje się to w następujący sposób:\n",
"\n",
"W naszym przypadku obiekt X posiada wiele cech $X = (x1, x2, x3, ... xn)$ wtedy stosując znany już wzór:\n",
"\n",
"$P(K|X) = \\frac{P(X|K) P(K)}{P(X)}$ \n",
"\n",
"$P(K)$ pozostaje niezmienne jest to $\\frac{liczba\\ elementów\\ klasy\\ K}{wszystkie\\ elementy\\ zbioru}$\n",
"\n",
"Element X posiada wiele cech więc:\n",
"\n",
"$P(X|K) = P(x1|K)*P(x2|K)*...*P(xn|K)$\n",
"\n",
2022-05-18 00:17:17 +02:00
"Zauażmy że w rzeczywistości $P(X|K)$ obliczymy z twierdzenia bayesa, jednak dzięki temu że nasz klasyfikator jest naiwny zakładamy niezależność cech $x1, x2,...,xn$ więc możemy uprościć obliczenia do powyższego wzoru\n",
"<br><br><br>\n",
2022-05-17 01:12:04 +02:00
"$P(xk|K) = \\frac{Liczba\\ elementów\\ klasy\\ K\\ dla\\ których\\ wartość\\ cechy\\ Ak\\ jest\\ równa\\ xk}{liczba\\ wszystkich\\ obiektów\\ klasy\\ K}$\n",
"<br><br><br><br>\n",
"<b>Powyższy wzór ma jedna pewne problemy z zerowymi prawdopodobieństwami</b>\n",
"<br><br><br>\n",
"Co w przypadku gdy dla którejś cechy $P(xk|K) = 0$\n",
"\n",
"Może się tak zdarzyć gdy cecha $Ak$ nie będzie przyjmowała wartości $xk$ danego obiektu $X$ wtedy zgodnie ze wzorem otrzymamy $\\frac{0}{liczba\\ wszystkich\\ obiektów\\ klasy\\ K}$\n",
"Gdy tak się stanie obliczone prawdopodobieństwo będzie równe 0, a przecież brak wartości xk dla cechy Ak klasy K nie musi wcale oznaczać śe dany obiekt nie może należeć do klasy K. Aby temu zaradzić stosuje się wygładzanie\n",
"<br><br><br>\n",
"<b>Wygładzanie Laplace'a</b>\n",
"<br><br><br>\n",
2022-05-18 00:17:17 +02:00
"Wygładzanie Laplace'a zwane jest również wygładzaniem + 1, jest to bardzo prosty sposób wystarczy dla każdego $P(xk|K)$ dodać do licznika 1 a do mianownika dodać liczbę klas obiektu, dodając 1 do licznika możemy to interpretować jako dodanie nowego obiektu do klasy, robiąc to dla każdej klasy \"mamy\" dodatkowe obiekty równe liczbie klas dlatego do mianownika musimy dodać liczbę klas.\n",
2022-05-17 01:12:04 +02:00
"\n",
2022-05-18 00:17:17 +02:00
"$P(xk|K) = \\frac{(Liczba\\ elementów\\ klasy\\ K\\ dla\\ których\\ wartość\\ cechy\\ Ak\\ jest\\ równa\\ xk\\\\)\\ + 1}{(liczba\\ wszystkich\\ obiektów\\ klasy\\ K\\\\) + liczba klas}$\n",
2022-05-17 01:12:04 +02:00
"\n",
2022-05-18 00:17:17 +02:00
"Łatwo można zauważyć że samo dodanie 1 do licznika nie jest wystarczające ponieważ wtedy $P(xk|K)$ mogłoby być $> 1$\n",
2022-05-17 01:12:04 +02:00
"<br><br><br>\n",
"Dzięki wygładzaniu Laplace'a $P(X|K)$ nigdy nie bęzie zerowe minimalna wartość to\n",
"\n",
2022-05-18 00:17:17 +02:00
"$\\frac{1}{n *(liczba\\ wszystkich\\ obiektów\\ klasy\\ K\\\\) + liczba\\ klas}$\n",
2022-05-17 01:12:04 +02:00
"<br><br><br>\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### <b>Implementacja</b>\n",
2022-05-18 00:17:17 +02:00
"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 00:17:17 +02:00
"outputs": [],
"source": [
"import pandas as pd\n",
"from sklearn.model_selection import train_test_split\n",
2022-05-18 16:47:43 +02:00
"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n",
"import scipy.stats as stats\n",
"import numpy as np\n",
"import plotly\n",
"from plotly.subplots import make_subplots\n",
"import matplotlib.pyplot as plt\n",
2022-05-18 15:41:54 +02:00
"import seaborn as sns\n",
"import random"
]
},
{
"cell_type": "code",
"execution_count": 21,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
2022-05-18 00:17:17 +02:00
"source": [
2022-05-18 15:28:52 +02:00
"class NaiveBayes():\n",
"\n",
2022-05-18 16:47:43 +02:00
" def __init__(self, classes, className, attribsNames, data):\n",
2022-05-18 15:28:52 +02:00
" self.classes = classes\n",
" self.className = className\n",
" self.attribsNames = attribsNames\n",
" self.data = data\n",
2022-05-18 16:47:43 +02:00
"\n",
2022-05-18 15:28:52 +02:00
" #przygotowanie prawdopodobienstw wartosci danych cech w zaleznosci od klasy\n",
" def getDictOfAttribProbs(self):\n",
" dictionaries = {}\n",
" for value in self.classes:\n",
" classFreq = {}\n",
" for i in range(len(self.attribsNames)):\n",
" classData = self.data[self.data[self.className] == value]\n",
" freq = {}\n",
" attribData = classData[self.attribsNames[i]]\n",
" for attrib in attribData:\n",
" count = freq.get(attrib, 1) + 1\n",
" freq[attrib] = count\n",
" freq = {k: v / len(classData) for k, v in freq.items()}\n",
" classFreq[self.attribsNames[i]] = freq\n",
" dictionaries[value] = classFreq\n",
" return dictionaries\n",
"\n",
" #a priori dla klas\n",
2022-05-18 16:47:43 +02:00
" def classProb(self, class_):\n",
2022-05-18 15:28:52 +02:00
" x = len(self.data[self.data[self.className] == class_][self.className])\n",
" y = len(self.data[self.className])\n",
" return x / y\n",
"\n",
" #prawdopodobienstwo dla wartosic danej cechy w zaelznosci od klasy\n",
2022-05-18 16:47:43 +02:00
" def getAttribProbs(self, attrib, value, data, clas, dictProbs):\n",
2022-05-18 15:28:52 +02:00
" return dictProbs[clas][attrib].get(value, 1.0 / len(data))\n",
"\n",
" #a posteriori dla danego obiektu\n",
2022-05-18 16:47:43 +02:00
" def getPosteriori(self, attribs, attribsNames, clas, dictProbs):\n",
2022-05-18 15:28:52 +02:00
" dic = {}\n",
" for i in range(len(attribs)):\n",
" dic[attribsNames[i]] = attribs[i]\n",
" sum = 0.0\n",
" for key in dic:\n",
2022-05-18 16:47:43 +02:00
" sum = sum + np.log(self.getAttribProbs(key, dic[key], X_train, clas, dictProbs))\n",
" return sum + np.log(self.classProb(clas))\n",
"\n",
2022-05-18 15:28:52 +02:00
" #predykcja dla danych\n",
" def predict(self, data, model):\n",
" attribNames = data.columns\n",
" predictions = []\n",
" for i in range(len(data)):\n",
" probs = {}\n",
" for name in self.classes:\n",
2022-05-18 16:47:43 +02:00
" probs[name] = self.getPosteriori(list(data.iloc[i]), list(attribNames), name, model)\n",
2022-05-18 15:28:52 +02:00
" keyMax = max(zip(probs.values(), probs.keys()))[1]\n",
" predictions.append(keyMax)\n",
" return predictions\n",
2022-05-18 16:47:43 +02:00
"\n",
2022-05-18 15:28:52 +02:00
" def fitModel(self):\n",
2022-05-18 16:47:43 +02:00
" probabilities = self.getDictOfAttribProbs()\n",
" return probabilities\n"
2022-05-18 00:17:17 +02:00
]
},
{
"cell_type": "code",
"execution_count": 22,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 00:17:17 +02:00
"outputs": [],
"source": [
"features = [\n",
" 'edible', 'cap-shape', 'cap-surface', 'cap-color', 'bruises', 'odor',\n",
" 'gill-attachment', 'gill-spacing', 'gill-size', 'gill-color',\n",
" 'stalk-shape', 'stalk-root', 'stalk-surface-above-ring',\n",
" 'stalk-surface-below-ring', 'stalk-color-above-ring',\n",
" 'stalk-color-below-ring', 'veil-type', 'veil-color', 'ring-number',\n",
" 'ring-type', 'spore-print-color', 'population', 'habitat'\n",
2022-05-18 00:17:17 +02:00
"]\n",
"\n",
"mushrooms = pd.read_csv('mushrooms.tsv', sep='\\t', names=features)\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 15:41:54 +02:00
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>edible</th>\n",
" <th>cap-shape</th>\n",
" <th>cap-surface</th>\n",
" <th>cap-color</th>\n",
" <th>bruises</th>\n",
" <th>odor</th>\n",
" <th>gill-attachment</th>\n",
" <th>gill-spacing</th>\n",
" <th>gill-size</th>\n",
" <th>gill-color</th>\n",
" <th>...</th>\n",
" <th>stalk-surface-below-ring</th>\n",
" <th>stalk-color-above-ring</th>\n",
" <th>stalk-color-below-ring</th>\n",
" <th>veil-type</th>\n",
" <th>veil-color</th>\n",
" <th>ring-number</th>\n",
" <th>ring-type</th>\n",
" <th>spore-print-color</th>\n",
" <th>population</th>\n",
" <th>habitat</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>edible</td>\n",
" <td>bell</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>bruises</td>\n",
" <td>anise</td>\n",
" <td>free</td>\n",
" <td>close</td>\n",
" <td>broad</td>\n",
" <td>brown</td>\n",
" <td>...</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>white</td>\n",
" <td>partial</td>\n",
" <td>white</td>\n",
" <td>one</td>\n",
" <td>pendant</td>\n",
" <td>brown</td>\n",
" <td>numerous</td>\n",
" <td>meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>poisonous</td>\n",
" <td>convex</td>\n",
" <td>scaly</td>\n",
" <td>brown</td>\n",
" <td>bruises</td>\n",
" <td>pungent</td>\n",
" <td>free</td>\n",
" <td>close</td>\n",
" <td>narrow</td>\n",
" <td>brown</td>\n",
" <td>...</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>white</td>\n",
" <td>partial</td>\n",
" <td>white</td>\n",
" <td>one</td>\n",
" <td>pendant</td>\n",
" <td>brown</td>\n",
" <td>several</td>\n",
" <td>grasses</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>edible</td>\n",
" <td>bell</td>\n",
" <td>scaly</td>\n",
" <td>white</td>\n",
" <td>bruises</td>\n",
" <td>almond</td>\n",
" <td>free</td>\n",
" <td>close</td>\n",
" <td>broad</td>\n",
" <td>white</td>\n",
" <td>...</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>white</td>\n",
" <td>partial</td>\n",
" <td>white</td>\n",
" <td>one</td>\n",
" <td>pendant</td>\n",
" <td>brown</td>\n",
" <td>numerous</td>\n",
" <td>meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>edible</td>\n",
" <td>bell</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>bruises</td>\n",
" <td>anise</td>\n",
" <td>free</td>\n",
" <td>close</td>\n",
" <td>broad</td>\n",
" <td>gray</td>\n",
" <td>...</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>white</td>\n",
" <td>partial</td>\n",
" <td>white</td>\n",
" <td>one</td>\n",
" <td>pendant</td>\n",
" <td>black</td>\n",
" <td>scattered</td>\n",
" <td>meadows</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>edible</td>\n",
" <td>convex</td>\n",
" <td>scaly</td>\n",
" <td>yellow</td>\n",
" <td>bruises</td>\n",
" <td>anise</td>\n",
" <td>free</td>\n",
" <td>close</td>\n",
" <td>broad</td>\n",
" <td>brown</td>\n",
" <td>...</td>\n",
" <td>smooth</td>\n",
" <td>white</td>\n",
" <td>white</td>\n",
" <td>partial</td>\n",
" <td>white</td>\n",
" <td>one</td>\n",
" <td>pendant</td>\n",
" <td>brown</td>\n",
" <td>numerous</td>\n",
" <td>meadows</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" edible cap-shape cap-surface cap-color bruises odor \\\n",
"0 edible bell smooth white bruises anise \n",
"1 poisonous convex scaly brown bruises pungent \n",
"2 edible bell scaly white bruises almond \n",
"3 edible bell smooth white bruises anise \n",
"4 edible convex scaly yellow bruises anise \n",
"\n",
" gill-attachment gill-spacing gill-size gill-color ... \\\n",
"0 free close broad brown ... \n",
"1 free close narrow brown ... \n",
"2 free close broad white ... \n",
"3 free close broad gray ... \n",
"4 free close broad brown ... \n",
"\n",
" stalk-surface-below-ring stalk-color-above-ring stalk-color-below-ring \\\n",
"0 smooth white white \n",
"1 smooth white white \n",
"2 smooth white white \n",
"3 smooth white white \n",
"4 smooth white white \n",
"\n",
" veil-type veil-color ring-number ring-type spore-print-color population \\\n",
"0 partial white one pendant brown numerous \n",
"1 partial white one pendant brown several \n",
"2 partial white one pendant brown numerous \n",
"3 partial white one pendant black scattered \n",
"4 partial white one pendant brown numerous \n",
"\n",
" habitat \n",
"0 meadows \n",
"1 grasses \n",
"2 meadows \n",
"3 meadows \n",
"4 meadows \n",
"\n",
"[5 rows x 23 columns]"
]
},
"execution_count": 23,
2022-05-18 15:41:54 +02:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"NAMES_DICT = {\n",
" 'edible': {\n",
" 'p': 'poisonous',\n",
" 'e': 'edible'\n",
" },\n",
" \"cap-shape\": {\n",
" 'b': 'bell',\n",
" 'c': 'conical',\n",
" 'x': 'convex',\n",
" 'f': 'flat',\n",
" 'k': 'knobbed',\n",
" 's': 'sunken'\n",
" },\n",
" \"cap-surface\": {\n",
" 'f': 'fibrous',\n",
" 'g': 'grooves',\n",
" 'y': 'scaly',\n",
" 's': 'smooth'\n",
" },\n",
" \"cap-color\": {\n",
" 'n': 'brown',\n",
" 'b': 'buff',\n",
" 'c': 'cinnamon',\n",
" 'g': 'gray',\n",
" 'r': 'green',\n",
" 'p': 'pink',\n",
" 'u': 'purple',\n",
" 'e': 'red',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"bruises\": {\n",
" 't': 'bruises',\n",
" 'f': 'none'\n",
" },\n",
" \"odor\": {\n",
" 'a': 'almond',\n",
" 'l': 'anise',\n",
" 'c': 'creosote',\n",
" 'y': 'fishy',\n",
" 'f': 'foul',\n",
" 'm': 'musty',\n",
" 'n': 'none',\n",
" 'p': 'pungent',\n",
" 's': 'spicy'\n",
" },\n",
" \"gill-attachment\": {\n",
" 'a': 'attached',\n",
" 'd': 'descending',\n",
" 'f': 'free',\n",
" 'n': 'notched'\n",
" },\n",
" \"gill-spacing\": {\n",
" 'c': 'close',\n",
" 'w': 'crowded',\n",
" 'd': 'distant'\n",
" },\n",
" \"gill-size\": {\n",
" 'b': 'broad',\n",
" 'n': 'narrow'\n",
" },\n",
" \"gill-color\": {\n",
" 'k': 'black',\n",
" 'n': 'brown',\n",
" 'b': 'buff',\n",
" 'h': 'chocolate',\n",
" 'g': 'gray',\n",
" 'r': 'green',\n",
" 'o': 'orange',\n",
" 'p': 'pink',\n",
" 'u': 'purple',\n",
" 'e': 'red',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"stalk-shape\": {\n",
" 'e': 'enlarging',\n",
" 't': 'tapering'\n",
" },\n",
" \"stalk-root\": {\n",
" 'b': 'bulbous',\n",
" 'c': 'club',\n",
" 'u': 'cup',\n",
" 'e': 'equal',\n",
" 'z': 'rhizomorphs',\n",
" 'r': 'rooted',\n",
" '?': 'missing'\n",
" },\n",
" \"stalk-surface-above-ring\": {\n",
" 'f': 'fibrous',\n",
" 'y': 'scaly',\n",
" 'k': 'silky',\n",
" 's': 'smooth'\n",
" },\n",
" \"stalk-surface-below-ring\": {\n",
" 'f': 'fibrous',\n",
" 'y': 'scaly',\n",
" 'k': 'silky',\n",
" 's': 'smooth'\n",
" },\n",
" \"stalk-color-above-ring\": {\n",
" 'n': 'brown',\n",
" 'b': 'buff',\n",
" 'c': 'cinnamon',\n",
" 'g': 'gray',\n",
" 'o': 'orange',\n",
" 'p': 'pink',\n",
" 'e': 'red',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"stalk-color-below-ring\": {\n",
" 'n': 'brown',\n",
" 'b': 'buff',\n",
" 'c': 'cinnamon',\n",
" 'g': 'gray',\n",
" 'o': 'orange',\n",
" 'p': 'pink',\n",
" 'e': 'red',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"veil-type\": {\n",
" 'p': 'partial',\n",
" 'u': 'universal'\n",
" },\n",
" \"veil-color\": {\n",
" 'n': 'brown',\n",
" 'o': 'orange',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"ring-number\": {\n",
" 'n': 'none',\n",
" 'o': 'one',\n",
" 't': 'two'\n",
" },\n",
" \"ring-type\": {\n",
" 'c': 'cobwebby',\n",
" 'e': 'evanescent',\n",
" 'f': 'flaring',\n",
" 'l': 'large',\n",
" 'n': 'none',\n",
" 'p': 'pendant',\n",
" 's': 'sheathing',\n",
" 'z': 'zone'\n",
" },\n",
" \"spore-print-color\": {\n",
" 'k': 'black',\n",
" 'n': 'brown',\n",
" 'b': 'buff',\n",
" 'h': 'chocolate',\n",
" 'r': 'green',\n",
" 'o': 'orange',\n",
" 'u': 'purple',\n",
" 'w': 'white',\n",
" 'y': 'yellow'\n",
" },\n",
" \"population\": {\n",
" 'a': 'abundant',\n",
" 'c': 'clustered',\n",
" 'n': 'numerous',\n",
" 's': 'scattered',\n",
" 'v': 'several',\n",
" 'y': 'solitary'\n",
" },\n",
" \"habitat\": {\n",
" 'g': 'grasses',\n",
" 'l': 'leaves',\n",
" 'm': 'meadows',\n",
" 'p': 'paths',\n",
" 'u': 'urban',\n",
" 'w': 'waste',\n",
" 'd': 'woods'\n",
" },\n",
"}\n",
"\n",
"for key in NAMES_DICT.keys():\n",
" mushrooms[key] = mushrooms[key].apply(lambda x: NAMES_DICT[key][x])\n",
"mushrooms.head()"
]
},
{
"cell_type": "markdown",
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
2022-05-18 16:47:43 +02:00
}
},
"source": [
"##### Features' distribution \n",
"\n",
"Wśród cech zawartych w danych możemy zauważyć, że rodzaj woalki/pierścienia (veil-type) jest tylko jeden, zatem ta cecha nie powinna mieć wpływu na skuteczność algorytmu, jako że wszystkie obserwacje dzielą tę samą wartość (partial)"
]
},
{
"cell_type": "code",
"execution_count": 24,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
2022-05-18 16:47:43 +02:00
}
},
2022-05-18 15:41:54 +02:00
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"domain": {
"x": [
0,
0.16799999999999998
],
"y": [
0.848,
1
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"edible",
"poisonous"
],
"name": "edible",
"textinfo": "none",
"type": "pie",
"values": [
51.89472233705388,
48.10527766294612
]
},
{
"domain": {
"x": [
0,
0.16799999999999998
],
"y": [
0.6359999999999999,
0.7879999999999999
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"convex",
"flat",
"knobbed",
"bell",
"sunken",
"conical"
],
"name": "cap-shape",
"textinfo": "none",
"type": "pie",
"values": [
44.90836433788067,
38.790133664048504,
10.26595011712829,
5.649717514124294,
0.3307151715584952,
0.05511919525974921
]
},
{
"domain": {
"x": [
0,
0.16799999999999998
],
"y": [
0.424,
0.576
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"scaly",
"smooth",
"fibrous",
"grooves"
],
"name": "cap-surface",
"textinfo": "none",
"type": "pie",
"values": [
39.98897616094805,
31.45928069450186,
28.51040374810528,
0.0413393964448119
]
},
{
"domain": {
"x": [
0,
0.16799999999999998
],
"y": [
0.212,
0.364
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"brown",
"gray",
"red",
"yellow",
"white",
"buff",
"pink",
"cinnamon",
"purple",
"green"
],
"name": "cap-color",
"textinfo": "none",
"type": "pie",
"values": [
28.34504616232603,
22.502411464792615,
18.23067383216205,
13.338845252859308,
12.87033209315144,
2.0118506269808463,
1.7362546506821,
0.5374121537825548,
0.22047678103899684,
0.20669698222405952
]
},
{
"domain": {
"x": [
0,
0.16799999999999998
],
"y": [
0,
0.152
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"none",
"bruises"
],
"name": "bruises",
"textinfo": "none",
"type": "pie",
"values": [
58.49524596940885,
41.50475403059115
]
},
{
"domain": {
"x": [
0.208,
0.376
],
"y": [
0.848,
1
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"none",
"foul",
"fishy",
"spicy",
"anise",
"almond",
"pungent",
"creosote",
"musty"
],
"name": "odor",
"textinfo": "none",
"type": "pie",
"values": [
43.43392586468237,
26.60879151164393,
7.096596389692711,
7.082816590877774,
5.015846768637178,
4.905608378117679,
3.1280143309907675,
2.2598870056497176,
0.4685131597078683
]
},
{
"domain": {
"x": [
0.208,
0.376
],
"y": [
0.6359999999999999,
0.7879999999999999
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"free",
"attached"
],
"name": "gill-attachment",
"textinfo": "none",
"type": "pie",
"values": [
97.40939782279179,
2.5906021772082126
]
},
{
"domain": {
"x": [
0.208,
0.376
],
"y": [
0.424,
0.576
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"close",
"crowded"
],
"name": "gill-spacing",
"textinfo": "none",
"type": "pie",
"values": [
83.7122778007441,
16.287722199255892
]
},
{
"domain": {
"x": [
0.208,
0.376
],
"y": [
0.212,
0.364
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"broad",
"narrow"
],
"name": "gill-size",
"textinfo": "none",
"type": "pie",
"values": [
69.3399476367645,
30.660052363235497
]
},
{
"domain": {
"x": [
0.208,
0.376
],
"y": [
0,
0.152
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"buff",
"pink",
"white",
"brown",
"gray",
"chocolate",
"purple",
"black",
"red",
"yellow",
"orange",
"green"
],
"name": "gill-color",
"textinfo": "none",
"type": "pie",
"values": [
21.28978916907813,
18.258233429791925,
14.620366542648478,
12.966790684856,
9.356483395342428,
9.122226815488494,
6.007992283312664,
5.043406366267052,
1.1161637040099215,
1.1161637040099215,
0.7992283312663635,
0.30315557392862064
]
},
{
"domain": {
"x": [
0.416,
0.584
],
"y": [
0.848,
1
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"tapering",
"enlarging"
],
"name": "stalk-shape",
"textinfo": "none",
"type": "pie",
"values": [
56.80033071517156,
43.19966928482844
]
},
{
"domain": {
"x": [
0.416,
0.584
],
"y": [
0.6359999999999999,
0.7879999999999999
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"bulbous",
"missing",
"equal",
"club",
"rooted"
],
"name": "stalk-root",
"textinfo": "none",
"type": "pie",
"values": [
46.1761058288549,
30.59115336916081,
13.848697809011988,
7.082816590877774,
2.3012264020945294
]
},
{
"domain": {
"x": [
0.416,
0.584
],
"y": [
0.424,
0.576
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"smooth",
"silky",
"fibrous",
"scaly"
],
"name": "stalk-surface-above-ring",
"textinfo": "none",
"type": "pie",
"values": [
63.400854347526526,
29.309632079371642,
6.972578200358274,
0.31693537274355793
]
},
{
"domain": {
"x": [
0.416,
0.584
],
"y": [
0.212,
0.364
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"smooth",
"silky",
"fibrous",
"scaly"
],
"name": "stalk-surface-below-ring",
"textinfo": "none",
"type": "pie",
"values": [
60.58977538927932,
28.35882596114097,
7.5926691470304535,
3.4587295025492626
]
},
{
"domain": {
"x": [
0.416,
0.584
],
"y": [
0,
0.152
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"white",
"pink",
"gray",
"brown",
"buff",
"orange",
"red",
"cinnamon",
"yellow"
],
"name": "stalk-color-above-ring",
"textinfo": "none",
"type": "pie",
"values": [
54.99517707041477,
22.998484222130358,
7.055256993247899,
5.6221579164944195,
5.2087639520463,
2.370125396169216,
1.1712828992696707,
0.4685131597078683,
0.11023839051949842
]
},
{
"domain": {
"x": [
0.624,
0.792
],
"y": [
0.848,
1
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"white",
"pink",
"gray",
"brown",
"buff",
"orange",
"red",
"cinnamon",
"yellow"
],
"name": "stalk-color-below-ring",
"textinfo": "none",
"type": "pie",
"values": [
54.07193054981397,
22.83312663635111,
7.013917596803086,
6.407606448945845,
5.319002342565798,
2.370125396169216,
1.2126222957144825,
0.4685131597078683,
0.30315557392862064
]
},
{
"domain": {
"x": [
0.624,
0.792
],
"y": [
0.6359999999999999,
0.7879999999999999
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"partial"
],
"name": "veil-type",
"textinfo": "none",
"type": "pie",
"values": [
100
]
},
{
"domain": {
"x": [
0.624,
0.792
],
"y": [
0.424,
0.576
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"white",
"brown",
"orange",
"yellow"
],
"name": "veil-color",
"textinfo": "none",
"type": "pie",
"values": [
97.51963621331129,
1.2126222957144825,
1.1575031004547334,
0.11023839051949842
]
},
{
"domain": {
"x": [
0.624,
0.792
],
"y": [
0.212,
0.364
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"one",
"two",
"none"
],
"name": "ring-number",
"textinfo": "none",
"type": "pie",
"values": [
92.14551467548574,
7.385972164806394,
0.4685131597078683
]
},
{
"domain": {
"x": [
0.624,
0.792
],
"y": [
0,
0.152
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"pendant",
"evanescent",
"large",
"flaring",
"none"
],
"name": "ring-type",
"textinfo": "none",
"type": "pie",
"values": [
48.68402921317349,
34.32547884800882,
15.915667631252584,
0.6063111478572413,
0.4685131597078683
]
},
{
"domain": {
"x": [
0.832,
1
],
"y": [
0.848,
1
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"white",
"brown",
"black",
"chocolate",
"green",
"yellow",
"orange",
"buff",
"purple"
],
"name": "spore-print-color",
"textinfo": "none",
"type": "pie",
"values": [
29.41987046989114,
24.169767121400028,
23.13628221027973,
20.104726470993523,
0.8819071241559874,
0.6200909466721786,
0.5649717514124294,
0.5649717514124294,
0.5374121537825548
]
},
{
"domain": {
"x": [
0.832,
1
],
"y": [
0.6359999999999999,
0.7879999999999999
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"several",
"solitary",
"scattered",
"numerous",
"abundant",
"clustered"
],
"name": "population",
"textinfo": "none",
"type": "pie",
"values": [
49.44191814799504,
21.04175279040926,
15.447154471544716,
5.00206696982224,
4.85048918285793,
4.216618437370815
]
},
{
"domain": {
"x": [
0.832,
1
],
"y": [
0.424,
0.576
]
},
"hole": 0.4,
"hoverinfo": "label+percent+name",
"labels": [
"woods",
"grasses",
"paths",
"leaves",
"urban",
"meadows",
"waste"
],
"name": "habitat",
"textinfo": "none",
"type": "pie",
"values": [
38.44563869367507,
26.774149097423177,
13.958936199531486,
10.36240870883285,
4.3957558219649995,
3.7067658812181343,
2.3563455973542786
]
}
],
"layout": {
"annotations": [
{
"font": {
"size": 10
},
"showarrow": false,
"text": "edible",
"x": 0.05300000000000002,
"y": 1.045
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "cap-shape",
"x": 0.05300000000000002,
"y": 0.823
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "cap-surface",
"x": 0.05300000000000002,
"y": 0.6009999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "cap-color",
"x": 0.05300000000000002,
"y": 0.3789999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "bruises",
"x": 0.05300000000000002,
"y": 0.1569999999999998
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "odor",
"x": 0.278,
"y": 1.045
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "gill-attachment",
"x": 0.278,
"y": 0.823
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "gill-spacing",
"x": 0.278,
"y": 0.6009999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "gill-size",
"x": 0.278,
"y": 0.3789999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "gill-color",
"x": 0.278,
"y": 0.1569999999999998
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-shape",
"x": 0.5030000000000001,
"y": 1.045
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-root",
"x": 0.5030000000000001,
"y": 0.823
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-surface-above-ring",
"x": 0.5030000000000001,
"y": 0.6009999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-surface-below-ring",
"x": 0.5030000000000001,
"y": 0.3789999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-color-above-ring",
"x": 0.5030000000000001,
"y": 0.1569999999999998
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "stalk-color-below-ring",
"x": 0.728,
"y": 1.045
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "veil-type",
"x": 0.728,
"y": 0.823
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "veil-color",
"x": 0.728,
"y": 0.6009999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "ring-number",
"x": 0.728,
"y": 0.3789999999999999
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "ring-type",
"x": 0.728,
"y": 0.1569999999999998
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "spore-print-color",
"x": 0.9530000000000001,
"y": 1.045
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "population",
"x": 0.9530000000000001,
"y": 0.823
},
{
"font": {
"size": 10
},
"showarrow": false,
"text": "habitat",
"x": 0.9530000000000001,
"y": 0.6009999999999999
}
],
"showlegend": false,
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
"font": {
"family": "Arial",
"size": 25
},
"text": "Features"
}
}
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def domain_plots(data):\n",
" specs = [[{'type': 'domain'} for _ in range(5)] for _ in range(5)]\n",
" fig = make_subplots(rows=5, cols=5, specs=specs)\n",
" a, b, xx, yy, l = 1, 1, -0.172, 1.267, []\n",
" for col in data.columns:\n",
" fig.add_trace(\n",
" plotly.graph_objects.Pie(\n",
" labels=[count for count in data[col].value_counts().index],\n",
" values=[\n",
" val for val in data[col].value_counts() * 100 /\n",
2022-05-18 16:47:43 +02:00
" sum(data[col].value_counts())\n",
" ],\n",
" name=col), a, b)\n",
" l.append(\n",
" dict(text=col,\n",
" x=xx + (0.225 * b),\n",
" y=yy - (0.222 * a),\n",
" font_size=10,\n",
" showarrow=False))\n",
" a += 1\n",
" if a > 5:\n",
" a = 1\n",
" b += 1\n",
" fig.update(layout_title_text='Features', layout_showlegend=False)\n",
"\n",
" fig.update_layout(title_font_family='Arial',\n",
" title_font_size=25,\n",
" annotations=l)\n",
" fig.update_traces(hole=.4, hoverinfo=\"label+percent+name\", textinfo='none')\n",
" fig.show()\n",
"\n",
"\n",
"domain_plots(mushrooms)\n"
]
},
{
"cell_type": "markdown",
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
2022-05-18 16:47:43 +02:00
}
},
"source": [
"##### Korelacja zmiennych\n",
"\n",
"Na poniższym wykresie możemy zauważyć silny związek gill-color z gill-attachment, lecz, to co nas bardziej interesuje, korelacja jadalności grzyba z zapachem, rozmiarem jego blaszek oraz kolorem zarodników. Możemy wyszukać cechy o najwyższym współczynniku korelacji z jadalnością grzyba i z dużym prawdopodobieństwiem uzyskać wyniki klasyfikacji "
2022-05-18 00:17:17 +02:00
]
},
{
"cell_type": "code",
"execution_count": 25,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
2022-05-18 16:47:43 +02:00
}
},
"outputs": [
{
"data": {
2022-05-18 16:50:23 +02:00
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA2wAAAMpCAYAAAB4+bCOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdd3xUVfrH8c8zSYBA6Gn03ntREKUXxd5dOzbE3te1Ibr2spbVVfnZuyio2BBBekd67zU9pFJCMnN+f8wYEkoSEJJBvu99zcuZe889zz03VzdPnnPPmHMOERERERERCT6esj4BEREREREROTglbCIiIiIiIkFKCZuIiIiIiEiQUsImIiIiIiISpJSwiYiIiIiIBKnQsj4B+dvS8qMiIiIixx8r6xM4FF9C8zL//dITu6bUr48qbCIiIiIiIkFKCZuIiIiIiEiQ0pRIEREREREJej58ZX0KZVLtUoVNREREREQkSClhExERERERCVKaEikiIiIiIkHP68p+SmRZJE+qsImIiIiIiAQpVdhERERERCTo+U7Qr/lVhU1ERERERCRIKWETEREREREJUkrYThBmNsTM3gi8H2Zm1wTeTzazrkW1FxEREREpa74g+F9Z0DNsJyDn3NtlfQ4iIiIiIlI8Vdj+JszsKjOba2aLzOwdMwsxs+vMbI2ZzQVOLdB2hJndX+DwqwPHLTOzkw/Sd5SZjTazeYHXqfu3ERERERE5lrzOlfmrLChh+xsws1bAZcCpzrmOgBe4CngCf6J2GtC6iC4qBo67FXj/IPtfA15xzp0EXAS8e9ROXkREREREDklTIv8e+gNdgHlmBhAO9AAmO+eSAczsK6D5IY7/AsA5N9XMqphZtf32DwBaB/oGqGJmEc657IKNzGwoMBTgnXfeYejQoX91XCIiIiIiJzQlbH8PBnzknHsof4PZ+cCFJTx+//ru/p89QHfn3J4iO3FuJDDyEH2IiIiIiBwxfQ+bHM8mAhebWTSAmdUAFgK9zaymmYUBlxRx/GWB404DMpxzGfvtHw/c8ecHM+t4FM9dREREREQOQRW2vwHn3AozexQYb2YeIBe4DRgBzALSgUVFdLHHzBYCYcD1B9l/J/CmmS3Bf89MBYYdrfMXERERESmO9wStsJkro9VO5G9PN5aIiIjI8ceKb1I2UuPqlvnvlzVrbyv166MpkSIiIiIiIkFKUyJFRERERCToadERERERERERCSqqsImIiIiISNDznqBrb6jCJiIiIiIiEqSUsImIiIiIiAQpTYkUEREREZGg5yvrEygjqrCJiIiIiIgEKSVsIiIiIiIiQUpTIkVEREREJOh5T9DvYVPCJsdEr3NfLPWYU8c+UOoxRURERESOJSVsIiIiIiIS9LwnZoFNz7CJiIiIiIgEKyVsIiIiIiIiQUpTIkVEREREJOjpe9hEREREREQkqKjCJiIiIiIiQc+LlfUplAlV2ERERERERIKUEjYREREREZEgpSmRIiIiIiIS9Hz6HjYREREREREJJqqw/U2Z2RCgq3Pu9rI+lz+d3Lkhd97YH0+I8dP4JXw2em6h/Zee15WzB7bD63OkZ+ziudfHkZicCcCkb+9jw+YUAJKSM3no6W9L/fxFREREpOycqIuOKGGTUuHxGPfcPJB7h48iOTWLkS9fzfS569m8NTW/zdoNidx07yJy9uZx3uCO3DKkNyNe/AGAnL153HD3R2V1+iIiIiIiZUJTIsuAmV1jZkvMbLGZfWJm55jZHDNbaGYTzCwm0G5EYP8sM1trZjcdor9LzGxZoL+pBXbVNrNxgWNfKND+LTObb2bLzeyJAts3mdkLZrbUzOaaWdPA9igzG21m8wKvUw93zK2a1WJ7fBrxiRnk5fmYOG0Vp3VrWqjNwqVbydmbB8CK1XFERVY+3DAiIiIiIn8rqrCVMjNrAzwK9HDOpZhZDcAB3Z1zzsxuBP4J3Bc4pD3QHagELDSzn5xzcft1Oxw43Tm33cyqFdjeEegE5ACrzey/zrmtwCPOuR1mFgJMNLP2zrklgWMynHPtzOwa4FXgbOA14BXn3HQzqw/8CrQ6nHFH1owgKSUr/3NyShatW9Q6ZPuzBrZjzh8b8j+XKxfKyJevxuvz8dk3c5g+Z93hhBcRERGR45ymREpp6Qd87ZxLAQgkTu2Ar8ysFlAO2Fig/ffOud3AbjObBJwMfLdfnzOAD81sFDCmwPaJzrkMADNbATQAtgKXmtlQ/D//WkBr4M+E7YsC/3wl8H4A0Nos/1+SKmYW4ZzLLngSgT6HAjRtfyG1GnQv+VUpYGCf1rRoGsudD32Zv+3SG94hZUc2tWKq8upTl7FhcwpxCelH1L+IiIiIyPFCUyKDw3+BN5xz7YCbgQoF9u2/gKkzs6fNbJGZLQJwzg3DX7WrB/xhZjUDbXMKHOcFQs2sEXA/0N851x74qYh4f7734K8Adgy86uyfrAXOY6Rzrqtzruv+yVpKajbRBaY4RkVWJjn1gC7o0qEB11zSnYee+pbcPO++43f428YnZrBo2VaaNY4+4FgRERER+fvyOSvzV1lQwlb6fgcu+TOpCkyJrApsD+y/dr/255lZhUD7PsA859wjfyZPgT6aOOfmOOeGA8n4E7dDqQLsBDICz8oN3m//ZQX+OSvwfjxwx58NzKxjCceab9XaeOrWrk6tmKqEhnro37MlM/ab1tiscTT33zqIh54aQ3rGrvztEZXKExYaAkDVyuG0a1WHTQUWKxERERER+bvSlMhS5pxbbmZPA1PMzAssBEYAX5tZGv6ErlGBQ5YAk4BI4N8HeX4N4EUzawYYMBFYjP/5tYPFX2xmC4FV+KdHztivSXUzW4K/Ond5YNudwJuB7aHAVGDY4Yzb63O8+s4EXhpxMR6Ph58nLGXT1lSuv+JUVq9LYMbc9dwypA/h4WE88eB5wL7l+xvWq8n9tw7C5xweMz4bPafQ6pIiIiIiIn9X5twJ+pXhxwEzGwFkO+deKqV4m/B/d1vKX+2r17kvlvqNNXXsA6UdUkREROTvJmhX9liwpX6ZJy6d628p9eujKZEiIiIiIiJBSlMig5hzbkQpx2tYmvFERERERErKe4LWmk7MUYuIiIiIiBwHlLCJiIiIiIgEKU2JFBERERGRoFdW34NW1lRhExERERERCVJK2ERERERERIKUpkSKiIiIiEjQ8wbvV8QdU6qwiYiIiIiIBClV2EREREREJOh53YlZa1LCJsdE+dScUo/Zb+BzpRrv99/+VarxREREROTEc2KmqSIiIiIiIscBVdhERERERCTo+U7QWtOJOWoREREREZHjgCpsIiIiIiIS9LSsv4iIiIiIiAQVJWwiIiIiIiJBSlMiRUREREQk6J2o38N2Yo5aRERERETkOKAKm4iIiIiIBD3fCbroiBK2E4SZ9QTeBnKBU5xzu0v7HLp2a8ytd5+Ox2P88sMivvp0ZqH97TrU55a7BtK4SQxPPz6GaZNX5e+78ZZ+nNyjKQCffTidKRNXFBvvpK6NuP3WAXg8Hn7+ZTFffDW70P6LLzqJMwd3wOv1kZGxixdf+pnEpEyaNInm7jtPp1LFcnh9js8+n8nkKasOEUVERERE5NhRwnYCMLMQ4ErgWefcp2VxDh6Pccd9g3nw7s9IScrkjXdvYNb0NWzZlJLfJikxgxef/oFLLu9e6NiTT2lK0xaxDBvyf5QLC+WlN65m3qx17Nq1t8h4d90xiAce/JLklCzeemMIM2etZfOW1Pw269YlcsttH5KTk8e5Z3di6E19+ffT35OzJ5fnXviR7dvTqFkzgrffHMK8+RvZuTPn6F8YEREREZEi6Bm2Y8TMrjGzJWa22Mw+MbNzzGyOmS00swlmFhNoNyKwf5aZrTWzmw7R3yVmtizQ39TAtiFm9kaBNj+aWZ/A+2wze9nMFgMPAZcC/zazz8wswswmmtkCM1tqZucd6rwD26LMbLSZzQu8Tj3c69GiVW3itu0gIS6dvDwfkycup0fP5oXaJCZksHF9Es65QtsbNIpk6aIt+LyOPXty2bAuia7dmxQZr2WLWmyPSyM+IYO8PB+/T15Bjx7
"text/plain": [
"<Figure size 1008x864 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def cramers_v(x, y):\n",
" confusion_matrix = pd.crosstab(x, y)\n",
" chi2 = stats.chi2_contingency(confusion_matrix)[0]\n",
" n = confusion_matrix.sum().sum()\n",
" phi2 = chi2 / n\n",
" r, k = confusion_matrix.shape\n",
" phi2corr = max(0, phi2 - ((k - 1) * (r - 1)) / (n - 1))\n",
2022-05-18 16:47:43 +02:00
" rcorr = r - ((r - 1) ** 2) / (n - 1)\n",
" kcorr = k - ((k - 1) ** 2) / (n - 1)\n",
" return np.sqrt(phi2corr / min((kcorr - 1), (rcorr - 1)))\n",
"\n",
"\n",
"reduced_data = mushrooms.drop(['veil-type'], axis=1)\n",
"corr_list = [[\n",
" round(cramers_v(reduced_data[col], reduced_data[corr_col]) * 100) / 100\n",
" for corr_col in reduced_data.columns\n",
"] for col in reduced_data.columns]\n",
"corr_df = pd.DataFrame(np.array(corr_list), columns=[reduced_data.columns])\n",
"corr_df.index = corr_df.columns\n",
"\n",
"plt.figure(figsize=(14, 12))\n",
"corr_mask = np.triu(np.ones_like(corr_df, dtype=bool))\n",
"sns.heatmap(corr_df, mask=corr_mask, cmap='viridis', annot=True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 26,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAwRklEQVR4nO3deZyddX33/9fbBAmLLEJ+3MhiooRNwmYasBZFUERRoHVDDYtaaS3Y2/7Uim29obS0tHVpXW4oVgwCQkEEU6QgqxYtSyKRLQgBWUKpxAAistTA5/7jXAMnw0wyCXPmJHO9no9HHnOd77V9zvecnDnv+V5LqgpJkiRJUju8qN8FSJIkSZLGjiFQkiRJklrEEChJkiRJLWIIlCRJkqQWMQRKkiRJUosYAiVJkiSpRQyBkqTVRpLZSf6633X0SpLjkpzR7zp6JcnWSR5LMqHftUiShmcIlCSNWJK7k/xPkk0Htd+QpJJM6VNpYy7J2kn+Nsm9SZ5IckeSTybJGNaQZp93NDXcm+Rvkrx4jPZ/d5I3Djyuqnurav2qerqZf1WS3x+LWiRJI2cIlCStrJ8B7x14kGQ6sO5YF5Fk4ljvc5BzgX2BtwIvAQ4FjgT+abR3tJzn+sVmn4c1NbwFeCNw9mjXIEkaPwyBkqSVdTqd0DHgcOAb3QsMHgFKckSSq5vpJPlCkgeTPJrkpiQ7da2+cZLvJvlVkmuTvLJrO5XkqCR3AHc0bR9OsjDJQ0nmJHlZ1/K/neT6JL9sfv72oBr/OsmPmkMY/y3JJknObOq6friRzST7AvsB76iqm6tqaVVdA8wCjkqyTbPc1CTfb57LpcDgEdQDk9yS5JGmnh265t2d5FNJbgR+PTgIJpkG/BHw/qr6z6aGW4B3AAckef2KXovm8T8lua95zvOS7NU177gk5yT5RvMcbkkyo5l3OrA18G9N//1pkinNazQxyQnAXsCXm/lfTvKVJJ8b9DzmJPmTofpZktQbhkBJ0sq6BtggyQ7NuV+HACtzntt+wOuAbYENgXcDS7rmHwL8JbAxsBA4YdD6BwN7ADsm2Qf422YbmwP30IyCJXkp8F06o2WbAJ8Hvptkk0H7OhTYAngl8J/A14GXAguAY4d5Dm8Crq2q+7obq+paYBGdEUKAbwLz6IS/v6ITmGnq2xY4C/gYMBm4iE6g6j6U873AAcBGVbV0UA37Aouq6rpBNdxH5zXab5jaB7se2JXOc/4mcG6SSV3zD6TTpxsBc4AvN/s5FLgXeHtzCOjfD6rjz4H/AI5u5h8NnAa8N8mLmj7YlM7I5TdHWKskaRQYAiVJq2JgNPBNdMLS/Sux7m/oHLq4PZCqWlBVD3TNP7+qrmtCz5l0Akq3v62qh6rqCeD9wKlV9eOqegr4NPCaZgTvAOCOqjq9GSU7C7gNeHvXtr5eVXdW1S+BfwfurKrLmn2fC+w2zHPYFHhgmHkPAJsm2Rr4LeAzVfVUVf0A+Leu5d4DfLeqLq2q3wCfBdYBfrtrmS9W1X3Nc13ZGiYPM28ZVXVGVS1p+uhzwNrAdl2LXF1VFzXn+Z0O7DKS7Q6zr+uAX/JcSD4EuKqqfr6q25QkrTxDoCRpVZwOvA84gkGHgq5IVV1BZzTpK8CDSU5JskHXIv/dNf04sP6gTXSPvr2MzujfwLYfozOquMXgeY17mnkDusPHE0M8HrzvAb+gM/I4lM2b+S8DHq6qXw/a/3C1P0PnuXXXt8xI4yrUsEJJPpFkQXPI7CN0Rme7D1sd/HpMeoHnY55G57BZmp+nv4BtSZJWgSFQkrTSquoeOheIeSvw7SEW+TXLXizmfw1a/4tV9WpgRzqHhX5yZXbfNf1fwMsHHiRZj86hn/cPntfYmpUbtRzOZcAeSbbqbkyyB7AVcAWd0biNm5q69z9c7WnW7a6v+7kOdgWwVZKZg2rYCtgTuKppGva1aM7/+1M6h9NuXFUb0RmpG+kVTpdX33DzzwAOSrILsANwwQj3JUkaJYZASdKq+hCwz6CRrgHzgd9Lsm5zkZQPDcxI8ltJ9kiyFp2A8iTwzCrWcBbwgSS7Jlkb+Bs65+rdTeccu22TvK+5UMl76ITOC1dxX8+qqsuAy4HzkrwqyYQke9IJOCdV1R1NUJ4L/GWSFyf5HZY9FPUcOhdw2bfpi48DTwE/GmENtwMnA2cm2bOp4VXAec02LmsWnc8wrwWdw3KXAouBiUn+D9A9KrsiPwdesTLzq2oRnfMQTwfOG+ZQV0lSDxkCJUmrpDmXbu4ws78A/A+dEHAanXP7BmwAfBV4mM7hkEuAf1jFGi4DPkMn+DxA5+IuhzTzlgBvoxOultAZ8XpbVY3oMMkReAdwJXAx8BidAPg14KNdy7yPzkVsHqJzkZlnD52tqp/SORzyS3QO3Xw7nYus/M9K1HA08C/Nvh8HbqbTpwc3h5fC8l+LS5r6b2/We5LlH4I62N8Cf9Fc3fQTQ8z/J+CdSR5O8sWu9tOA6XgoqCT1RapWdCSHJElaEyT5S+B3gddV1SN9LmdYSV5HJ7i+vPwiIkljzhAoSdI4kuRoYGFVXdzvWobSHPp6NvCTqjq+3/VIUhsZAiVJ0phIsgOd8yR/AuxfVY/2uSRJaiVDoCRJkiS1iBeGkSRJkqQWeSE3e11tbbrppjVlypR+lyFJkiRJfTFv3rxfVNXkoeaNyxA4ZcoU5s4d7qrlkiRJkjS+JblnuHkeDipJkiRJLWIIlCRJkqQWMQRKkiRJUouMy3MCJUmSJK2efvOb37Bo0SKefPLJfpcyLkyaNIktt9yStdZaa8TrGAIlSZIkjZlFixbxkpe8hClTppCk3+Ws0aqKJUuWsGjRIqZOnTri9TwcVJIkSdKYefLJJ9lkk00MgKMgCZtssslKj6oaAiVJkiSNKQPg6FmVvjQESpIkSVKLeE6gJEmSpL6Zd9KMUd3eqz8yd4XLTJgwgenTp7N06VJ22GEHTjvtNNZdd90hl50zZw633norxxxzzKjW2U+OBEqSJElqlXXWWYf58+dz88038+IXv5iTTz552GUPPPDAcRUAwRAoSZIkqcX22msvFi5cyEMPPcTBBx/MzjvvzJ577smNN94IwOzZszn66KMBOPfcc9lpp53YZZddeN3rXgd0LnTzgQ98gOnTp7Pbbrtx5ZVXPrve7/3e77H//vszbdo0/vRP//TZfZ511llMnz6dnXbaiU996lPPtq+//vrPTn/rW9/iiCOOGHa/L4SHg0qSJElqpaVLl/Lv//7v7L///hx77LHstttuXHDBBVxxxRUcdthhzJ8/f5nljz/+eC655BK22GILHnnkEQC+8pWvkISbbrqJ2267jf3224/bb78dgPnz53PDDTew9tprs9122/HRj36UCRMm8KlPfYp58+ax8cYbs99++3HBBRdw8MEHD1vnUPt9IRwJlCRJktQqTzzxBLvuuiszZsxg66235kMf+hBXX301hx56KAD77LMPS5Ys4dFHH11mvde+9rUcccQRfPWrX+Xpp58G4Oqrr2bWrFkAbL/99rz85S9/NgTuu+++bLjhhkyaNIkdd9yRe+65h+uvv569996byZMnM3HiRN7//vfzgx/8YLn1DrXfF8KRQEmSJEmtMnBO4Mo6+eSTufbaa/nud7/Lq1/9aubNm7fc5ddee+1npydMmMDSpUuXu3z37R667/031H432WSTla5/gCOBkiRJklpvr7324swzzwTgqquuYtNNN2WDDTZYZpk777yTPfbYg+OPP57Jkydz3333LbPe7bffzr333st222037H5mzpzJ97//fX7xi1/w9NNPc9ZZZ/H6178egM0224wFCxbwzDPPcP755y93vy+EI4GSJEmS+mYkt3QYC8cddxwf/OAH2XnnnVl33XU57bTTnrfMJz/5Se644w6qin333ZdddtmF7bffno985CNMnz6diRMnMnv27GVGAAfbfPPNOfHEE3nDG95AVXHAAQdw0EEHAXDiiSfytre9jcmTJzNjxgwee+yxYff7QqSqXtAGVkczZsyouXNXjzeTpPYZ7fsdjYbV5ResJEkLFixghx126HcZ48p
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def plot_chosen_features(data,\n",
" col,\n",
" hue=None,\n",
" color=['orange', 'lightgreen'],\n",
" labels=None):\n",
" fig, ax = plt.subplots(figsize=(15, 7))\n",
" sns.countplot(x=col,\n",
" hue=hue,\n",
" palette=color,\n",
" saturation=0.6,\n",
" data=data,\n",
" dodge=True,\n",
" ax=ax)\n",
" ax.set(title=f\"Mushroom {col.title()} Quantity\",\n",
" xlabel=f\"{col.title()}\",\n",
" ylabel=\"Quantity\")\n",
" if labels != None:\n",
" ax.set_xticklabels(labels)\n",
" if hue != None:\n",
" ax.legend(('Poisonous', 'Edible'), loc=0)\n",
" \n",
" plt.show()\n",
"\n",
"\n",
"training_cols = [\n",
" 'odor',\n",
" 'spore-print-color',\n",
" 'gill-color',\n",
" 'ring-type',\n",
" 'stalk-surface-above-ring',\n",
" 'gill-size',\n",
"]\n",
"\n",
2022-05-18 16:50:23 +02:00
"plot_chosen_features(mushrooms,\n",
" col='odor',\n",
" labels=NAMES_DICT['odor'].values(),\n",
" hue='edible')\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA3QUlEQVR4nO3deZglZX33//fHGWQRkG2CyOJgGDbZGQEXVEARFYUYF5BdIolRE39JVMyTBDTxCSY+ajQKwYisQcV1oriAyKKyzcjIjgzIMogybOIC6sD390dV46Gne6Zn6NNnuuv9uq6+us5d96n6Vp3T1f3pu6pOqgpJkiRJUjc8ZdAFSJIkSZImjiFQkiRJkjrEEChJkiRJHWIIlCRJkqQOMQRKkiRJUocYAiVJkiSpQwyBktRhSU5N8i+DrqNLkhyS5NuDrmPIVH8PJNkzyU2DrkOSViaGQElaiSW5LcnvkmwwrP2qJJVk5oBKm3BJjk5yY5JfJvl5knOTrLUS1PWSJI8l+VVb201Jjhqtf1WdVVX7jnHZRyb53hj6rZ3ko0nuaOu4pX28wbKeO96SrJrkX9taHk5yc5K/S5IJWn8l2WLocVVdUlVb9cy/LclLJ6IWSVpZGQIlaeX3E+DgoQdJtgfWmOgikkyf6HX2rPvFwP8FDq6qtYBtgM/1YT1JsiK/G39aVWsCawPvAT6VZNsRlj/u+zDJU4HvAM8B9mtreB5wH7DbeK+vZ72jbcs5wD7AK4G1gMOAPwf+X79qkSQtH0OgJK38zgAO73l8BHB6b4ckFyb5s57Hj48gtcHmI0nuSfJQkmuSbNfz9HWTfL0dxbo8yR/3LKeSvC3JzcDNbdtbkixIcn+SOUme2dP/+UmuTPKL9vvzh9X4L0l+0I5W/W+S9ZOc1dZ15VJGNp8LXFpVVwFU1f1VdVpV/bJd9qlJTkpyXrsdFyV51nLU9YEk3wd+Azw7ydbtsu5vR/besPSXqFGNrwAPANu2r8P32/1/H3D88NG9dh//RTti9mCST7Sv2TbAScDz2v314CirPRzYDPiTqrq+qh6rqnuq6p+r6tx2Hdu02/lgkuuSvGa0bVjG67vE+2HYc/cB9gX+tKqurarFVXUZcCjw10me3fZ7wmhckuOTnNnz+JwkP2tfr4uTPKdn3qntPlriPZvk4rbbj9p99sY0I7UL2/lntPvqf9v5726X845h23F1kj8ZbR9J0mRnCJSkld9lwNrtH/LTgIOAM5fxnF77Ai8CtgSeDryBZpRoyEHA+4B1gQXAB4Y9/0Bgd5pQszfwr+0yNgJuBz4LkGQ94OvAx4D1gQ8DX0+y/rB1HQZsDPwxcCnwGWA94AbguFG24XLg5Unel+QFSVYdoc8hwD8DGwDzgbOWo67DgGNoRq4WAecB/wP8UVvzJzPCyN5wSZ7Shod1gGva5t2BW4ENWXLfDtmfJujuQLNvX15VNwB/QRN+16yqdUZ57kuBb1bVr0apaRXgf4Fvt9vzDuCsJFuN0HfU17fHge02jbQ/XgZcXlV39jZW1eXAQpoRwrH4BjCrrfeHtK9ljxHfs1X1onb+ju0+e8JocVUdBtwBvLqd/2/AaTQhFYAkO9K8P78+xloladIxBErS5DA0GvgymrB013I89/c04WZrIFV1Q1Xd3TP/y1V1RVUtpvlje6dhz//XduTtYZqgdUpV/bCqfgu8l2akaibwKuDmqjqjHQE6G7gReHXPsj5TVbdU1S9o/tC/parOb9d9DrDzSBtQVZcArwV2ofnj/L4kH25D8ZCvV9XFbV3/p61r0zHWdWpVXdfWsR9wW1V9pu1/FfBF4PVL2cfPbEfq7qUJsodV1dDNSH5aVR9vl/XwKM8/oaoerKo7gO+y5GuwNOsDdy9l/h7Amu06fldVFwBfo+cU4x5Le32H9L4fhttgKbXcDcxY+qY0quqUqvplW8PxwI5Jnt7TZVnv2eUxB9gyyaz28WHA56rqd09imZK0UjMEStLkcAbwJuBIhp0KuiztH/3/CXwCuCfJyUnW7unys57p39AEhl69ozrPpBkdGlr2r2hGFTcePq91eztvyM97ph8e4fHwdfduxzeq6tU0o4YH0OyLP+vpcmdP318B97c1jaWu3m18FrB7e+rkg224OwR4RpLN2tMIf5Wkd+Ttp1W1TlWtV1U7VdVnR1n2aJb1GgAwyvrvoxm1G80zgTur6rGetuHb39t3tNd3yNK2596l1LJRO3+pkkxLckKam9s8BNzWzuq9yc2Y9tdYVNUjNNeXHprmetCDaX7eJGnKMgRK0iRQVbfT3CDmlcCXRujya554s5hnDHv+x6pqV5pT+LYE3rU8q++Z/ilNSAIgydNoRqLuGj6vtRnLN2q57GKaa96+A1wA9F7buGlPXWvShMWfjrGu3m28E7ioDXVDX2tW1Vur6o52es32RjBjKnmM/Zb53FHWfz7NqbJPG2UZPwU2zRNveDPa67K013fEmoY5nyZAb9rbmGT3dp0XtU1Le7++iSbkv5Tm9OWZQ4tZynqXx0j1n0YT9PcBflNVl47TuiRppWQIlKTJ42hg76r69Qjz5gOvTbJGmtvjHz00I8lzk+zeXhv2a+AR4LERljEWZwNHJdmpvS7v/9JcA3YbcC7NaXVvSjI9yRtpQufXVnBdj0tyQJKDkqzb3jRlN+DFNNdLDnllkhemuVvmPwOXtdemLW9dX2v7H5Zklfbrue2NWibaz4FN2m0azRk0wfWLaW5o85Q0N9z5+ySvpLme8jfAu9tteQnNqbDDr/WDpb++y1RV59PcqfSLSZ7TjurtQXMN6+k9p8jOBw5q65kNvK5nMWsBv6UZgVyjrWF5/Bx49vLMb0PfYzR3MHUUUNKUZwiUpEmivZZu7iizPwL8juYP3NN44o001gY+RXPHyttp/rj+9xWs4XzgH2mukbub5uYuB7Xz7qO5wcnftut4N7B/VS3zFMAxeAB4C80dKR+iCRX/XlW92/k/NNfj3Q/sSnuzj+Wtq73j6L7tdv2U5tTDDwIj3Yym3y4ArgN+lmS0en9LM2p2I80NbR4CrqA5ffLy9tq2VwOvoDkd85PA4VV14wjLGvX1XQ5/SnNd4zdp/uFwaTt9TE+ff2yX/QDNDV7+p2fe6TTv07uA63li0B+L44HT2lN5R7qr678C/9DO/7th692e5bvpkiRNSql6MmepSJI0eElOBRZW1T8MuhY9UZLTaK41fNXKfLOVJIcDx1TVCwddiyT1myOBkiSpn/6M5lrBXQZdyGiSrAH8JXDyoGuRpIlgCJQkSX1TVb+vqg+2Hxq/0knycprPhvw5TzwtVZKmLE8HlSRJkqQOcSRQkiRJkjpk+qAL6IcNNtigZs6cOegyJEmSJGkg5s2bd29VzRhp3pQMgTNnzmTu3NHuoi5JkiRJU1uS20eb5+mgkiRJktQhhkBJkiRJ6hBDoCRJkiR1yJS8JlCSJEnSyun3v/89Cxcu5JFHHhl0KVPCaqutxiabbMIqq6wy5ucYAiVJkiRNmIULF7LWWmsxc+ZMkgy6nEmtqrjvvvtYuHAhm2+++Zif5+mgkiRJkibMI488wvrrr28AHAdJWH/99Zd7VNUQKEmSJGlCGQDHz4rsS0OgJEmSJHWI1wRKkiRJGph5J84e1+Xt+ta5y+wzbdo0tt9+exYvXsw222zDaaedxhprrDFi3zlz5nD99ddz7LHHjmudg+RIoCRJkqROWX311Zk/fz7XXnstT33qUznppJNG7fua17xmSgVAMARKkiRJ6rA999yTBQsWcP/993PggQeyww47sMcee3D11VcDcOqpp/L2t78dgHPOOYftttuOHXfckRe96EVAc6Obo446iu23356dd96Z7373u48/77WvfS377bcfs2bN4t3vfvfj6zz77LPZfvvt2W677XjPe97zePuaa675+PQXvvAFjjzyyFHX+2R4OqgkSZKkTlq8eDHf+MY32G+//TjuuOPYeeed+cpXvsIFF1zA4Ycfzvz585/Q//3vfz/f+ta32HjjjXnwwQcB+MQnPkESrrnmGm688Ub23XdffvzjHwMwf/58rrrqKlZ
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_chosen_features(mushrooms,\n",
" col='spore-print-color',\n",
" labels=NAMES_DICT['spore-print-color'].values(),\n",
" hue='edible')"
]
},
{
"cell_type": "code",
"execution_count": 28,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA2IklEQVR4nO3deZwlZX3v8c+XGWBARLYJEgacMSCLgCAtYhQVUMQVNC4YEQaJ3BDcrnHB3GtAE25I9KrRGAgqguJFxQUnLiiICBpZZmRkRwYEGYIygLiCOPC7f1Q1HnpOz3QPfbqnpz7v16tfXeepOlVPPV2nz/me56mqVBWSJEmSpG5YZ6orIEmSJEmaPIZASZIkSeoQQ6AkSZIkdYghUJIkSZI6xBAoSZIkSR1iCJQkSZKkDjEESpIGIslpSf5xqusxUZJcneTZ7fTxSc5op+cmqSQzB7TdC5L81SDWvSZI8pok35rqekhSlxgCJamjktyc5P4kW4wov7wNNXOnqGqTLsl6Sf4+yfVJfpvktiTfSHLA8DJV9cSqumA1179Vkk8kuT3Jr5Ncl+Q9SR41YTsx9rpskuSkJD9L8rskVyY5fJK2vUJgrqrPVNUBPctUku0moz6S1FWGQEnqtp8Arx5+kGRXYMPJrsSgetHG4QvAQcBhwKbAPOBfgRc+0hUn2Qz4AbAB8LSqejTwXGAT4M8e6fpXst0V2jTJesB5wOOApwGPAd4O/EuSNw2qLpKkNYshUJK67dM0wWfY4cCnehcYORwxyfwk32unk+SDSe5I8qu2V2mXnqdvmuRrbe/XJUn+rGc9leSYJDcAN7Rlr0+yJMndSRYk+dOe5f88yWVJftn+/vMRdfzHJP+V5DdJ/jPJ5kk+09brstF6NpM8hyaUHVRVl1TV/e3POVX15p7lbm6XHa+3Ar8GDq2qmwGq6taqenNVXbGqfRtR13WS/O8kt7Rt/qkkj2nnDfeyHZnkp8D5fVbxWmBb4BVV9ZOq+kNVnQO8CfjHJBu163pYb1zv0N4kmyb5apJlSX7RTs/pWfaCJP+Q5Pvt3/1bPb3NF7a/72n/Tk8bcTwNz/9RO/9VSa5K8uKe9a+b5M4ke4z9TyBJ6mUIlKRuuxjYOMlOSWYAhwBnjOP5BwDPBJ5A06v0SuCunvmHAO+h6V1bApww4vkHA08Fdk6yH/BP7Tq2Am4BPgsP9aZ9DfgwsDnwAeBrSTYfsa3XAlvT9LD9APgksBlwLXDcKPvwHOCSqlo6jv0ej+cAX6qqB/vNHOO+DZvf/uwLPB7YCPi3Ecs8C9gJeF6f5z8X+EZV/XZE+RdpeoCfturdYR2adn0cTaC8t08d/hI4AvgTYD3gbW35M9vfm1TVRlX1g94nVdXw/Ce18z9H86XEoT2LvQC4vaouH0NdJUl9GAIlScO9gc+lCUu3jeO5fwAeDewIpKqurarbe+Z/uaourarlwGeA3Uc8/5+q6u6quhd4DXBqVf2wqn4PvAt4WtuD90Lghqr6dFUtr6ozgeuAF/es65NVdWNV/RL4BnBjVZ3XbvssYLSeoy2Anw0/SLJZknvaXrn7xtEWo9kcuH0l88eyb8NeA3ygqm6qqt/QtNEhI4Z+Hl9Vv23bdKQt+tWlbaM7gdmr2pmququqvlhVv6uqX9ME+2eNWOyTVfXjtg6fZ8W/+3icAbwgycbt49fSHLOSpNVkCJQkfZqm52Y+I4aCrkpVnU/TC/RR4I4kp/R8WIeecAX8jqbnqtetPdN/StP7N7zu39D0Km49cl7rlnbesJ/3TN/b5/HIbQ+7i6bncXi7d1fVJsCewPqjPKevJPu0wxh/k+TqfuvvYyz7NtqytwAzgS17ym5ldHf2q0sbIrdo569Ukg2T/Ec7JPVXNEM8N2l7koet6u8+ZlX138D3gb9IsgnwfJovFCRJq8kQKEkdV1W30Fwg5gXAl/os8lsefrGYx454/oerak9gZ5phoW8fz+Z7pv+bZoghAGmunLk5Tc/kw+a1tmV8vZaj+TbwlN7z2lZXVV3UDmPcqKqe2BafB7w0yWjvuePZt5HLbgss5+GBtxjdecDzs+JVSf8CuB+4pH38O0b/m/8tsAPw1KramD8O8cxKtjuWuq3M6TRDQl8B/KCqJuLvLkmdZQiUJAEcCezX51wxgMXAy9oeoO3aZQFI8pQkT02yLk1YvA/oe+7bGJwJHJFk9yTrA/+H5ly9m4GvA09I8pdJZiZ5FU3o/OpqbushVfUt4DvA2e2+rNfuz96PdN2tDwAbA6cneRxAkq2TfCDJboxv384E/meSee1FXP4P8Ll2OOdYfBpYCpzVXkhm3STPozkf8X3tUFpo/uZ/mWRGkgN5+HDPR9P0rN7Tns842rmW/SyjOT4ev5Jlft5n/tnAk4E3M87eaknSigyBkiTac+kWjjL7gzS9RD+n6ZHpHYq3MfAx4Bc0QxPvAt63mnU4D3g3zUVKbqe5uMsh7by7gBfR9ELdBbwDeFFVrXL44hi9lCZ0nQHcQ9Mz+hr6X1xlXKrqbuDPac6fvCTJr2l6H38JLBnnvp1KE+QubOt4H/DGcdTl9zQXqrmVptfvXuAc4EM0F/AZ9maacxLvoWmHs3vmfYjmdhd30lxY6JxxbP93NOcQfr8977Jf0D6eJjDfk+SV7fPupTku5tG/t1qSNA6pWt2RGZIkaTprezy/QTP0dH6twR8Kkvw98ISqOnSVC0uSVsqeQEmSOqqq/kBzPuCNNOf5rZHaYadHAqdMdV0kaW1gT6AkSVpjJXk9zRDUT1fVX09xdSRprWAIlCRJkqQOcTioJEmSJHXIzKmuwCBsscUWNXfu3KmuhiRJkiRNiUWLFt1ZVbP7zVsrQ+DcuXNZuHC0K51LkiRJ0totyS2jzRvYcNAkpya5I8lVI8rfmOS6JFcn+Zee8nclWZLk+vbGtcPlB7ZlS5IcO6j6SpIkSVIXDLIn8DTg34BPDRck2Rc4CHhSVf0+yZ+05TvT3BD4icCfAucleUL7tI8CzwWWApclWVBV1wyw3pIkSZK01hpYCKyqC5PMHVF8NHBiVf2+XeaOtvwg4LNt+U+SLAH2auctqaqbAJJ8tl3WEChJkiRJq2Gyzwl8ArBPkhOA+4C3VdVlwNbAxT3LLW3LAG4dUf7UfitOchRwFMC22267wvw//OEPLF26lPvuu++R7oNas2bNYs6cOay77rpTXRVJkiRJYzTZIXAmsBmwN/AU4PNJHj8RK66qU4BTAIaGhla4+eHSpUt59KMfzdy5c0kyEZvstKrirrvuYunSpcybN2+qqyNJkiRpjCb7PoFLgS9V41LgQWAL4DZgm57l5rRlo5WP23333cfmm29uAJwgSdh8883tWZUkSZKmmckOgWcD+wK0F35ZD7gTWAAckmT9JPOA7YFLgcuA7ZPMS7IezcVjFqzuxg2AE8v2lCRJkqafgQ0HTXIm8GxgiyRLgeOAU4FT29tG3A8cXlUFXJ3k8zQXfFkOHFNVD7TreQPwTWAGcGpVXT2oOkuSJEnS2m6QVwd99SizDh1l+ROAE/qUfx34+gRWDYBFJw1N6Pr2PHrVN6efMWMGu+66K8uXL2ennXbi9NNPZ8MNN+y77IIFC7jmmms49lhvjShJkiRp4kz2cNBO22CDDVi8eDFXXXUV6623HieffPKoy77kJS8xAEqSJEmacIbAKbLPPvuwZMkS7r77bg4++GB222039t57b6644goATjvtNN7whjcAcNZZZ7HLLrvwpCc9iWc+85lAc6GbI444gl133ZU99tiD73znOw8972UvexkHHngg22+/Pe94xzse2uaZZ57Jrrvuyi677MI73/nOh8o32mijh6a/8IUvMH/+/FG3K0mSJGl6m+xbRAhYvnw53/jGNzjwwAM57rjj2GOPPTj77LM5//zzOeyww1i8ePHDln/ve9/LN7/5TbbeemvuueceAD760Y+ShCuvvJLrrruOAw44gB//+McALF68mMsvv5z111+fHXbYgTe+8Y3MmDGDd77znSxatIhNN92UAw44gLPPPpuDDz541Hr2264kSZKk6c2ewEl07733svvuuzM
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_chosen_features(mushrooms,\n",
" col='gill-color',\n",
" labels=NAMES_DICT['gill-color'].values(),\n",
" hue='edible')"
]
},
{
"cell_type": "code",
"execution_count": 29,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAuAElEQVR4nO3de5hdZXk3/u9tggREFCGlnBSqkYMcgqaIWhRBEY+or/UYBU+0Klr7tla0taKWlr61tj8PhaJSUChWFDQiiiCixXogkZSzEhA0FAVBBBSswPP7Y6/BTZhJJmF2Jpn1+VzXvmbtZz1rrXvvrMzMd55nrV2ttQAAANAPD5juAgAAAFh7hEAAAIAeEQIBAAB6RAgEAADoESEQAACgR4RAAACAHhECAZhyVXV8Vf3NdNcxFarqkqrad7rrmOmq6piqetd01wHQB0IgQA9V1dVV9b9VtcUK7RdUVauq7aeptLWqqrbvXu9t3ePqqjp8uE9r7TGttXOn8JgPHzrebd3xfzn0fJ+pOtYq6tilqhZV1S+q6taqOqeq9l5Lxz6kqs4bbmut/XFr7X3d+n2ravnaqAWgj4RAgP76YZKXjT2pqt2SbLy2i6iq2Wv7mON4aGttkyQvSvKuqnr6qA7UWvtRa22TsUfXvMdQ23+O6thjquqRSb6Z5KIkOyTZOsnnkpxVVXuN+vgATC8hEKC/PpnkVUPPD07yieEOVXVuVb1u6Pk9Izg18E9VdX1V3VJVF1XVrkObb1ZVX+xGmb7TBY+x/bSqelNVXZHkiq7t9VW1rKpu6kaoth7q/8SqOr8btTq/qp64Qo1/U1X/1Y2kfaGqNq+qk7q6zp/syGZrbXGSS5LMH9r/1VX1tG75iKr6dFV9ontdl1TVgqG+j+1GU2+tqlOq6j8mOy22qn63qn5VVZuvsL8bqmqD7r3/ZlV9uHsfLq+q/Yf6PqSqPl5V11XVtd17MmuCwx2R5Futtb9srd3UWru1tfbBJCcm+ftuf/cZjVvhvdirqr5VVTd3x/xwVT1wqG+rqj+uqiu6Ph/pzpmdkxyT5Andv9fNXf/ju5oflORLSbYeGh3demXvzWTeXwB+SwgE6K9vJ9m0qnbuwsJLMwgBk3VAkicneXSShyR5cZIbh9a/NMl7kmyWZFmSI1fY/vlJHp9kl6raL8nfdfvYKsk1ST6VJFX1sCRfTPLBJJsn+UCSLw4Hgu5Yr0yyTZJHJvlWkn9L8rAklyV592ReUDcdcteu3ok8r6vtoUkWJflwt+0Dk5yW5PjuuCcnecFkjpskrbWfJDk3g/dgzCuTfKq19pvu+eOTXJlkiwxe06nd+5PuuHcmeVSSPTP493ldxvf0JKeM0/7pJPtU1ZxJlHxXkj/tanlCkv2TvHGFPs9J8vtJdu9e1zNaa5cl+eMMQugmrbWHDm/QWvtlkmcm+Z+h0dH/yarfGwAmSQgE6Lex0cCnZxCWrl2NbX+T5MFJdkpSrbXLWmvXDa0/rbX23dbanUlOytDoWufvulGo25O8IslxrbXvtdZ+neQdGYwUbZ/k2UmuaK19srV2Z2vt5CSXJ3nu0L7+rbV2ZWvtFxmMIl3ZWju7O/YpGYSilflZVd2eQXj8lwymRk7kvNbaGa21uzJ4//bo2vdOMjvJB1trv2mtnZrku6s47opOSLIwSbpg/rLuGGOuT/LP3f7/I8n3kzy7qrZM8qwkb22t/bK1dn2Sf8ogHI9niyTXjdN+XZJZGYTYlWqtLWmtfbv7N7k6yb8mecoK3Y5qrd3cWvtRkq/lvufA6ljVewPAJAmBAP32ySQvT3JIVpgKuiqttXMyGAX7SJLrq+rYqtp0qMtPhpZ/lWST3NuPh5a3zmD0b2zft2UwqrjNius613Trxvx0aPn2cZ6veOwVbdH1+bMk+yZZ2RTDFV/XnO66xq2TXNtaa0Pr73mNVfWloemNr5hg35/PYGR0hwyC+S9aa8NBcsX9X9Md9xFdzdd1Uy9vziCU/c4Ex/lZBiOuK9oqScu9R3THVVWPrqrTq+onVXVLkr/N4H0ctqpzYHWs6r0BYJKEQIAea61dk8ENYp6V5NRxuvwy975ZzO+usP0HW2uPS7JLBtNC37Y6hx9a/p8MgkySpLsubPMMRibvta7z8KzeqOWqi2ntrtbaB5LckftOa5yM65JsU1U11Lbd0P6fOTS98aQJargjgymZCzOY7rjiSNeK+394Bu/Pj5P8OskWrbWHdo9NW2uPmaDWs5P84TjtL07y7W409l7/9t3o29yhvkdnMCI7r7W2aZJ3JhmubWXa6q6fxHsDwCQJgQC8Nsl+3bVYK1qa5IVVtXFVParrmySpqt+vqsd3N+b4ZQbh6e41rOHkJK+uqvlVtWEGo0rf6aYZnpHk0VX18qqaXVUvySB0nr6Gx1qVo5L8xSSvixv2rQyukzusq/OgJGtyp81PZDAy+7zcN+j8TpK3dDeK+cMkOyc5o5uG+5Uk/1hVm1bVA6rqkVW14vTMMe9J8sSqOrKqHlZVD66qNyd5dZK/7vr8IINRzmd3/8Z/lWTDoX08OMktSW6rqp2SvGE1XuNPk2w7fCOZcdZvXlUPWaF9Ze8NAJMkBAL0XHct3eIJVv9Tkv/N4JfyEzK4tm/Mpkk+muTnGUxLvDHJP6xhDWcneVeSz2YwovbIdNeztdZuzOAGI3/WHeMvkjyntfazNTnWJHwxg9f0+tXZqLX2v0lemEFQvjmDEavTMxihW539fDODMP29bqR22HeSzMtgOueRSV7UvT/J4NrOBya5tKv/Mxl/ymdaa1ck+YMMrme8uqv3fUle0P1bpLu+8o1JPpbBqOsvkwzfLfTPM5hKfGsG58F/rMbLPCeDu7D+pKru8+/YWrs8gz8MXNVNb926a1/ZewPAJNW9Ly0AAKZKVX0nyTGttX9bze3OSfLvrbWPDbUdkuR1rbU/mNoqk6raNoO7xb67tfbxqd7/VBrvvQFg9RgJBIApUlVP6T7vb3ZVHZzBRyN8eTX38ftJHpvVG1m7X1pryzP4WIatqur+3LxlpKbjvQGYiWZPdwEAMIPsmMHNSx6U5KoMpmuO91EM46qqEzL4/MQ/aa3dOpIKJ9BauyjJRWvzmKtjOt8bgJnGdFAAAIAeMR0UAACgR2bkdNAtttiibb/99tNdBgAAwLRYsmTJz1prc8dbNyND4Pbbb5/Fiye62zkAAMDMVlUTfpSO6aAAAAA9IgQCAAD0iBAIAADQIzPymkAAAGDd9Jvf/CbLly/PHXfcMd2lzAhz5szJtttumw022GDS2wiBAADAWrN8+fI8+MEPzvbbb5+qmu5y1muttdx4441Zvnx5dthhh0lvZzooAACw1txxxx3ZfPPNBcApUFXZfPPNV3tUVQgEAADWKgFw6qzJeykEAgAA9IhrAgEAgGmz5OgFU7q/x71h8Sr7zJo1K7vttlvuvPPO7LzzzjnhhBOy8cYbj9t30aJFufTSS3P44YdPaZ3TyUggAADQKxtttFGWLl2aiy++OA984ANzzDHHTNj3ec973owKgIkQCAAA9Ng+++yTZcuW5aabbsrzn//87L777tl7771z4YUXJkmOP/74HHbYYUmSU045Jbvuumv22GOPPPnJT04yuNHNq1/96uy2227Zc88987Wvfe2e7V74whfmwAMPzLx58/IXf/EX9xzz5JNPzm677ZZdd901b3/72+9p32STTe5Z/sxnPpNDDjlkwuPeH6aDAgAAvXTnnXfmS1/6Ug488MC8+93vzp577pnPfe5zOeecc/KqV70qS5cuvVf/9773vTnzzDOzzTbb5Oabb06SfOQjH0lV5aKLLsrll1+eAw44ID/4wQ+SJEuXLs0FF1yQDTfcMDvuuGPe/OY3Z9asWXn729+eJUuWZLPNNssBBxyQz33uc3n+858/YZ3jHff+MBIIAAD0yu2335758+dnwYIFefjDH57Xvva1Oe+88/LKV74ySbLffvvlxhtvzC233HKv7Z70pCflkEMOyUc/+tHcddddSZLzzjsvCxcuTJLstNNOecQjHnFPCNx///3zkIc8JHPmzMkuu+ySa665Jue
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_chosen_features(mushrooms,\n",
" col='ring-type',\n",
" labels=list(NAMES_DICT['ring-type'].values()).append('cobwebby'),\n",
" hue='edible')"
]
},
{
"cell_type": "code",
"execution_count": 30,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA1E0lEQVR4nO3de5hdZX328e9NQCJyFFKKgCRK5KBAgBiwFuWgCIKA1ipUTkpLtdJX+7Yq9oS1YmnV+nqEoiJREAQRTBVFBFFpFUgkchYCIgQRQjgrWAO/94+1hu6MM8mEzJ5JZn0/1zXX7P2sZ631W/uQ7HueZ62dqkKSJEmS1A1rjHcBkiRJkqSxYwiUJEmSpA4xBEqSJElShxgCJUmSJKlDDIGSJEmS1CGGQEmSJEnqEEOgJK2Ckpye5APjXcdYS3J0kst77leSrcepliT5fJIHklw5HjUMJcn7kpwx3nUsT5K/TfLZ8a5jLCR5bpJHk0wa71okaSQMgZL0NCS5Pcn/JNlkUPvVbXCZOk6ljbkkxyS5KckjSe5JcmGS9dplYxZmk2yR5Lwk9yV5KMl1SY5eiU3+IfBKYIuqmjU6VY5c+9gtSbLZWO97JJJcluTxNvzcl+SrvbVW1Qer6k/7tO8keVeSW5I8luSOJB9M8ox+7G+I/d+e5BUD96vqjqpat6qeaJdflqQvxy5Jo8EQKElP38+AwwbuJNkBWGesi0iy5ljvs2ffLwc+CBxWVesB2wFfHqdyvgjcCWwFbAwcAdzzdDbUPqZbAbdX1a9GrcKR7/9ZwB8BDwGHj/X+V8BxVbUusDWwLvDhMdrvx4FjgSOB9YD9gVcAZ4/R/iVptWYIlKSn74s0H0IHHAV8obfD4BGB3umO7WjGR5Pcm+ThJNcmeVHP6hsl+UY7wnZFkuf3bKeSvD3JLcAtbdufJVmQ5P4kc5I8p6f/HyS5qh0huyrJHwyq8QNJ/rsd1fnPJBsnObOt66pljGy+GPhhVV0NUFX3V9XsqnokybHAm4B3D2y33d/xSW5tj+uGJK8dyYOd5A+T3Jlkz2XUcnpV/aqqllTV1VX1zXbdPZMsHLS9p0Zz2imWX0lyRpKHgWOAzwIvaWv/pyQbJfl6kkXtFNGvJ9miZ3vPTjN99Bft8gt6lh2YZH6SB9vHecflHO4fAQ8C76d5XQ02OcmX28fwx0l26tnXdu1z+mCS65Mc1LbvluSX6ZmymOS1Sa5pb6/R89wsTnJOkmcvp04AqupB4AJgRs+2n5q2mmRq+5o9qh21uy/J3/X0fWaS2e3jdmOSdw9+vnr6Tgf+AnhTVf2wfa6vbx+zA9o/TCzzvdfe/1j7eno4ybwkewyq/ZwkX2gf4+uTzGyXfRF4LvCf7Wvj3T3Ht2aSE4E9gE+2yz+Z5FNJPjLoOOYk+auRPL6SNNoMgZL09P0IWL/90D0JOBRYkXO19gVeBrwA2AB4A7C4Z/mhwD8BGwELgBMHrX8IsBuwfZK9gX9pt7EZ8HPaUZH2g/w3aEZPNgb+HfhGko0H7esIYHPg+cAPgc8DzwZuBE4Y5hiuAF7VhqSXJll7YEFVnQqcCfxbO1XuNe2iW2k+JG/QHt8ZWc6UxyT7AWcBf1RVlw3T7UfAp5IcmuS5y9reMA4GvgJsSBPm30oTcNetqhNo/s/8PM0I4XOBx4BP9qz/RZqR4BcCvwd8tK19Z+A04M9pHv//AOb0PlZDOIrmeM8Gtk2y6xC1nkvz/HwJuCDJWknWAv4T+HZbw18CZybZpqquAH4F7N2znT9p16ftewjwcuA5wAPAp5ZR41Pa19LraF6ny/KHwDbAPsA/JtmubT8BmAo8j2YK7rJGP/cBFlbVUudpVtWdNK+BfUdSM3AVTWgdeAzPTTK5Z/lBNI//hsAc2ue6qo4A7gBe0742/m1QHX8H/IB2lLSqjgNmA4clWQMgzTTyV/C/j70kjSlDoCStnIHRwFfShKW7VmDd39JMZdsWSFXdWFV39yw/v6qurKolNGFqxqD1/6UdeXuMZsTttKr6cVX9BngvzSjWVOAA4Jaq+mI7anIWcBPwmp5tfb6qbq2qh4BvArdW1XfafZ8L7DzUAVTVD2g+/O9CEzQXJ/n3LOMCGVV1blX9oqqerKov04xkLuucuz+mCU77D/7gP0S/HwD/APysHXl78TL6D/bDqrqgreuxIepeXFXnVdWvq+oRmlA+MOq0Gc2UxLdW1QNV9duq+l676rHAf1TVFVX1RFXNBn4D7D5UEW2A3Qv4UlXdA1zC0iPOAPOq6itV9VuaUD+53d7uNNMyT6qq/6mqS4Gv87/Tls8auJ3mvM1Xt23QhN6/q6qF7WvofcDrs+zpxh9P8hBwH7AJTZBcln+qqseq6ifAT4CBEcw3AB9sH7uFNH+wGM4mwN3DLLsbmLKcGgCoqjPa53RJVX0EWJsmoA64vKoubM/z+2JPrSusfd0+RBNgofmjy2Xt8ytJY84QKEkr54s0oylHM2gq6PK0H9A/STPacm+SU5Os39Pllz23f03z4b7XnT23n0Mz+jew7UdpRhU3H7ys9fN22YDeD6OPDXF/8L57j+Ob7Sjfs2lGqI4Ghr0oRpIje6ZGPgi8iOaD/XDeCZxTVdf1bONv26l2jyY5pa3jgao6vqpeCGwKzKcZIcsytt3rzmUtTLJOkv9I8vM0U0a/D2zYBt4tgfur6oEhVt0K+OuB422PeUvgOUne1HMc32z7HwHcWFXz2/tnAn/SjvL9Tq1V9SSwkOZ5fg5wZ9s2oPe5/hLwunYU8nXAj6tq4LWxFXB+T403Ak8AmyY5pafOv+3Z9v+pqg2AHWlGrLdg2YZ7TT+HpR//ZT0X99GMdg9ls3b5ciX5m3bq6UPt8W7A0q/DwbVOXk4gXp7Z/O8I5+E0/3ZI0rgwBErSSmg/QP+MZkTlq0N0+RVLXyzm9wet//Gq2hXYnmZa6LtWZPc9t39B8yEeeOrCIhvTjEwutaz1XFZs1HL5xTQjaJcAl9IEu8E1kmQr4DPAccDGVbUhcB2wrKD2x8AhSd7Rs68PtlPt1q2qtw5Ry300Fyl5Dk04Xep5aIPb4BGjYtn+mmakaLeqWp9mKi9t7XcCz06y4RDr3QmcWFUb9vysU1VnVdWZPcexf9v/SOB5ac7f+yXNSN8mNK+xAVv2HMsaNOHrF+3PlgPTDltPPddVdQNNKNyfpaeCDtS5/6A6J1fVXVX11p46Pzj4AKvqWuADNNNxRxq6e93N0gFyy+E60ry+tkyy1Ohxki1pRkIva5uGfe+15/+9m2YEcqP2dfgQy34d9lrea2Wo5WcAB6c5f3M7mnMoJWlcGAIlaeUdA+xdQ19Fcj7NyMs6ab7v7piBBUlenOZiHWvRfGB9HHhyiG2MxFnAm5PMaEd5PghcUVW3AxcCL0jyJ+2FK95IEzq//jT39ZQkB7fn4G2UxiyaKZI/arvcQ3Oe14Bn0XxAXtSu/2b+NzAO5xc00+jekeRty6jlX5O8qD3G9YC3AQuqajFwM81IzgHt4/33NNP/VsR6NKOiD7bnWT51nmQ7jfebwKfbx2KtJAMh8TPAW9vnOkme1dax3hDH8BKaczJn0Uz/nUHz+HyJpaeE7prkde3I1Dtpppf+iOYczV/TXIxnrTQX0XkNS18180vAO2hC7Lk97acAJ7ZBnSRTkhy8Ao/PbJoR2INWYJ0B5wDvbR+7zWn+SDCkqrq5rfXMJLsnmZTkhcB5wH8D32m7zmeY9x7Nc7mE5nW4ZpJ/BHpH4Zdn8Ot6ucvbaa5X0YwAnjfUlGNJGiuGQElaSe25dHOHWfxR4H9oPhTOppnaN2B9moDwAM3ozGLgQ0+zhu/QnAt3Hs2oyvNpzjuiDUEH0oxkLaYZATmwHS1bWQ8Af0ZzXt/DNKMdH6qqgeP8HM2Fax5MckE7EvURmgvP3APsAPzXCI7vDpogeHyG//61dYDzaa6qeRvN6OdB7foP0VxR8rM0o2K/oplCuSL+H/BMmumGPwK+NWj5ETTned4E3EsTzmhfG39GM/X3AZqLpxw9zD6OAr5WVddW1S8HfoCPAQfmf6/W+TXgje32jgBe156
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_chosen_features(mushrooms,\n",
" col='stalk-surface-above-ring',\n",
" labels=NAMES_DICT['stalk-surface-above-ring'].values(),\n",
" hue='edible')"
]
},
{
"cell_type": "code",
"execution_count": 31,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA4EAAAG5CAYAAAAwHDElAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAr/ElEQVR4nO3de5gmZXkn/u/tgCCeQJhlWUAhkXBQTjqiiYsiGMRDgCTGYERBTdg1kCv+Noni7mY1JmzMrq4bDZHFhIDBxXh2okRFEQ1JVAaZoIDKoCBDiIwg4AFMgPv3x1vDNmP3TA/0O81MfT7X1VdXPfVU1V3daPPleep5q7sDAADAODxksQsAAABg0xECAQAARkQIBAAAGBEhEAAAYESEQAAAgBERAgEAAEZECARg6qrq7Kr6g8WuY6FU1RVVddiw/YaqOnfY3qOquqq2eqDX3VJV1WOr6vtVtWSxawEYKyEQgFTVtVX1L1W10zrtlw2hZo9FKm2Tq6qHVtV/q6qvVdUPquqGqvqbqjpybZ/ufkJ3X3Q/r/2Wqlo9BKFrq+p/P9DrbuCeVVW/U1VXV9UdVfWtqvrvVfXQhbzPeu5/bVU9e+1+d3+rux/R3XcPxy+qql/dFLUAMCEEArDWN5O8eO1OVe2fZLtNXcT9HUVbQO9PckySlyXZIcmeSf44yfMX4NqvS7IsySFJHpnksCRfWoDrrs/bkpyUyfM8Mslzkzw7yXumfF8AHqSEQADW+stMgsJaJyR518wO647aVNWJVXXxsF1V9daquqmqbq+qL1fVE2ecvkNVfayqvldVX6iqn5xxna6qk6vq6iRXD22/VlWrquqWqlpeVf9uRv+fqapLquq24fvPrFPjH1TV3w+jbX9dVTtW1buHui6Za2RzGLH62STHdPcXuvtfhq+Pd/dvzuh3n9GtjfCUJB/q7n/qiWu7+96f8czrVtWtQ/3fH0Yk7x2RraoXVNXKoc/fV9UBczzPXkl+PclLuvsfuvuu7r4iyS8meX5VPXPGz2zW3+uw/8dVdf3w87u0qg6dcewNVfXeqnrX8Lu9oqqWDcf+Msljk/z18ByvmTlltqpOS3Jokj8Zjv9JVZ1eVW9Z5zmWV9X/dz9+3gDMQggEYK3PJ3lUVe07vK91XJJzN+L8I5M8I8lPJXl0khcluXnG8eOS/F4mo2urkpy2zvnHJnlqkv2q6vAkfzhcY5ck12UYuaqqxyT5WCYjXDsm+V9JPlZVO65zr5cm2TXJTyb5hyR/keQxSa5K8vo5nuHZSb7Q3as34rk3xueT/Keq+vWq2r+qaq6O3b39MG3yEZmMRP5tkhuq6uAkZyX5D5k8//9JsryqtpnlMkckWd3dX1zn2tcPtRw5yzmzuSTJQZn8/P5vkvdV1bYzjh+dye9n+yTLk/zJcJ+XJvlWkp8bnuV/rFPHfxme65Th+ClJzkny4qp6SJLUZIrys4f7ArAAhEAAZlo7GvizmYSlGzbi3H/NZLrhPkmqu6/q7htnHP9Qd3+xu+9K8u5MQsVMf9jdt3T3HUlekuSs7v5Sd/8ok2mUPz2MhD0/ydXd/ZfDyNZ5Sb6a5OdmXOsvuvua7r4tyd8kuaa7PzXc+31JDp7jGXZK8s9rd6rqMcNo221VdedG/Czm8odJ/mh4vhWZhLoT1ndCVf1ykl9J8ovd/a+ZTO38P8NI5d3dfU6SHyV52hzPc+Ms7Rnal86n6O4+t7tvHn7eb0myTZK9Z3S5uLvPH97z+8skB87nunPc64tJbsskwCaTQH9Rd3/7/l4TgPsSAgGY6S8zCRwnZp2poBvS3RdmMgJ0epKbqurMqnrUjC7/PGP7h0kesc4lrp+x/e8yGf1be+3vZzKquOu6xwbXDcfWmhkY7phlf917r3VzJiOPa+97S3dvn+TJmQSfeauqQ2dM57xiuN7d3X16dz89k1Gz05KcVVX7znGNgzP5mf58d68Zmh+X5LeGcHprVd2aZPdMfi7r+s7M51nHLsPx+TzLb1fVVUMYvjWTkd6Ziwit+7vd9gG+23lOkuOH7eMz+ecSgAUiBAJwr+6+LpMFYp6X5IOzdPlB7rtYzL9d5/y3dfeTk+yXybTQ39mY28/Y/qdMwk6SpKoensnUxxvWPTZ4bDZu1HIun07ylKra7YFeqLv/du10zu5+wizH7+ju05N8N5Of131U1b9J8uEkJ3f3ZTMOXZ/ktGG66Nqv7YYR0XVdmGT3qjpknWvvnsnI4UVD05y/1+H9v9dkMjV3hyEU35Zkzqms6z7q/Th+bpJjqurAJPtm8nMAYIEIgQCs65VJDu/uH8xybGWSX6iq7arq8UPfJElVPaWqnlpVW2cSKu5Mcs/9rOG8JC+vqoOGd93+eybv6l2b5PwkP1VVvzIsLvLLmYSoj97Pe92ruz+Z5DNJPjw8y0OH55ltquVGq6pXV9VhVfWwofYTMplCe9k6/bbKZJXSc7v7vetc5p1J/uNQX1XVw6vq+VX1yFme5+tJzkjy7qp6WlUtqaonJPlAkr9P8qmh68rM8Xsd6rsryZokW1XVf0syc4R3Q76d5Cc25vjwTuYlmYwAfmCYIgzAAhECAbiP4V26FXMcfmuSf8nkX9zPyeTdvrUelUlA+W4m0zNvTvI/72cNn0ryu5mElRszWdzluOHYzUlekOS3hnu8JskLunteUxvn4eczCZTnJrk1k5HRlyR5zgJc+4dJ3pLJ9MnvJDk5k3f9vrFOv90yWTXz1TOmlH6/qh47/G5+LZNpot/NZJGdE9dzz1OS/NnwPD9M8pVMfj/HdvfakL6+3+snknw8ydeH8+7MfafubsgfJvmvw9TV357l+B8neWFVfbeq3jaj/Zwk+8dUUIAFV90bmqUBAGwpqur3Mgm6z+juWxe5nDlV1TMyCa6Pa/+yArCghEAAGJmqOiXJqu7++GLXMpthCu57kvxjd79xsesB2NIIgQDAg8awUuqKJP+Y5Kjuvn2RSwLY4giBAAAAI2JhGAAAgBF5IB/k+qC100479R577LHYZQAAACyKSy+99DvdvXS2Y1tkCNxjjz2yYsVcq5sDAABs2arqurmOmQ4KAAAwIkIgAADAiAiBAAAAI7JFvhMIAAA8OP3rv/5rVq9enTvvvHOxS9kibLvtttltt92y9dZbz/scIRAAANhkVq9enUc+8pHZY489UlWLXc5mrbtz8803Z/Xq1dlzzz3nfZ7poAAAwCZz5513ZscddxQAF0BVZccdd9zoUVUhEAAA2KQEwIVzf36WQiAAAMCIeCcQAABYNJe+Y9mCXu/Jr1qxwT5LlizJ/vvvn7vuuiv77rtvzjnnnGy33Xaz9l2+fHmuvPLKnHrqqQta52IyEggAAIzKwx72sKxcuTJf+cpX8tCHPjRnnHHGnH2PPvroLSoAJkIgAAAwYoceemhWrVqVW265Jccee2wOOOCAPO1pT8vll1+eJDn77LNzyimnJEne97735YlPfGIOPPDAPOMZz0gyWejm5S9/efbff/8cfPDB+cxnPnPveb/wC7+Qo446KnvttVde85rX3HvP8847L/vvv3+e+MQn5rWvfe297Y94xCPu3X7/+9+fE088cc77PhCmgwIAAKN011135W/+5m9y1FFH5fWvf30OPvjgfPjDH86FF16Yl73sZVm5cuV9+r/xjW/MJz7xiey666659dZbkySnn356qipf/vKX89WvfjVHHnlkvv71rydJVq5cmcsuuyzbbLNN9t577/zGb/xGlixZkte+9rW59NJLs8MOO+TII4/Mhz/84Rx77LFz1jnbfR8II4EAAMCo3HHHHTnooIOybNmyPPaxj80rX/nKXHzxxXnpS1+aJDn88MNz88035/bbb7/PeU9/+tNz4okn5p3vfGfuvvvuJMnFF1+c448/Pkmyzz775HGPe9y9IfCII47Iox/96Gy77bbZb7/9ct111+WSSy7JYYcdlqVLl2arrbbKS17yknzuc59bb72z3feBMBIIAACMytp3AjfWGWeckS984Qv52Mc+lic/+cm59NJL19t/m222uXd7yZIlueuuu9bbf+bHPcz87L/Z7rvjjjtudP1
"text/plain": [
"<Figure size 1080x504 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plot_chosen_features(mushrooms,\n",
" col='gill-size',\n",
" labels=NAMES_DICT['gill-size'].values(),\n",
" hue='edible')"
]
},
{
"cell_type": "code",
"execution_count": 32,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 00:17:17 +02:00
"outputs": [],
"source": [
"X_train, X_test = train_test_split(mushrooms,\n",
" test_size=0.2,\n",
" random_state=0,\n",
" stratify=mushrooms['edible'])\n",
"columns = [\n",
" 'odor',\n",
" 'spore-print-color',\n",
" 'gill-color',\n",
" 'ring-type',\n",
" 'stalk-surface-above-ring',\n",
" 'gill-size',\n",
"]\n",
2022-05-18 00:17:17 +02:00
"className = 'edible'\n",
"classValue = list(set(X_train['edible']))"
]
},
{
"cell_type": "code",
"execution_count": 33,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 00:17:17 +02:00
"outputs": [],
"source": [
2022-05-18 15:28:52 +02:00
"X_test_data = X_test[columns]\n",
"X_test_results = X_test[className]\n"
2022-05-18 00:17:17 +02:00
]
},
{
"cell_type": "code",
"execution_count": 34,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 15:28:52 +02:00
"outputs": [],
"source": [
2022-05-18 16:47:43 +02:00
"bayModel = NaiveBayes(classValue, className, columns, X_train)\n",
2022-05-18 15:28:52 +02:00
"model = bayModel.fitModel()\n"
]
},
{
"cell_type": "code",
"execution_count": 35,
2022-05-18 16:47:43 +02:00
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
},
2022-05-18 00:17:17 +02:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
2022-05-18 15:41:54 +02:00
"accuracy score naiwnego klasyfikatora\n",
"acuracy score = 0.9944903581267218\n",
"\n",
"accuracy score losowych predykcji\n",
"acuracy score = 0.48829201101928377\n"
2022-05-18 00:17:17 +02:00
]
}
],
"source": [
2022-05-18 16:47:43 +02:00
"pred = bayModel.predict(X_test[columns], model)\n",
2022-05-18 15:41:54 +02:00
"print('accuracy score naiwnego klasyfikatora')\n",
2022-05-18 16:47:43 +02:00
"print(\"acuracy score = \", accuracy_score(list(X_test_results), pred))\n",
2022-05-18 15:41:54 +02:00
"\n",
"print('\\naccuracy score losowych predykcji')\n",
2022-05-18 16:47:43 +02:00
"randomPred = ['poisonous' if random.randint(0, 1) == 1 else 'edible' for _ in range(len(list(X_test_results)))]\n",
"print(\"acuracy score = \", accuracy_score(list(X_test_results), randomPred))"
2022-05-18 00:17:17 +02:00
]
2022-05-18 16:50:23 +02:00
},
{
"cell_type": "code",
"execution_count": 36,
2022-05-18 16:50:23 +02:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy score naiwnego klasyfikatora\n",
"acuracy score = 0.8980716253443526\n"
2022-05-18 16:50:23 +02:00
]
}
],
"source": [
"columns_wihtout_odor = [\n",
" 'spore-print-color',\n",
2022-05-18 16:50:23 +02:00
" 'gill-color',\n",
" 'ring-type',\n",
" 'stalk-surface-above-ring',\n",
" 'gill-size',\n",
"]\n",
"\n",
"X_test_data = X_test[columns_wihtout_odor]\n",
2022-05-18 16:50:23 +02:00
"X_test_results = X_test[className]\n",
"\n",
"bayModel = NaiveBayes(classValue, className, columns_wihtout_odor, X_train)\n",
2022-05-18 16:50:23 +02:00
"model = bayModel.fitModel()\n",
"pred = bayModel.predict(X_test[columns_wihtout_odor], model)\n",
"print('accuracy score naiwnego klasyfikatora')\n",
"print(\"acuracy score = \", accuracy_score(list(X_test_results), pred))\n"
2022-05-18 16:50:23 +02:00
]
}
],
"metadata": {
2022-05-18 00:17:17 +02:00
"interpreter": {
2022-05-18 15:28:52 +02:00
"hash": "393784674bcf6e74f2fc9b1b5fb3713f9bd5fc6f8172c594e5cfa8e8c12849bc"
2022-05-18 00:17:17 +02:00
},
"kernelspec": {
2022-05-18 15:28:52 +02:00
"display_name": "Python 3.9.2 64-bit",
2022-05-18 00:17:17 +02:00
"language": "python",
"name": "python3"
},
"language_info": {
2022-05-18 00:17:17 +02:00
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2022-05-18 15:28:52 +02:00
"version": "3.9.2"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}