ium/IUM_02.Dane.ipynb

1478 lines
402 KiB
Plaintext
Raw Normal View History

2021-03-15 11:51:20 +01:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Plan na dzisiaj\n",
"1. Motywacja\n",
"2. Podział danych\n",
"3. Skąd wziąć dane?\n",
"4. Przygotowanie danych\n",
"5. Zadanie"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Motywacja\n",
"- Zasada \"Garbage in - garbage out\"\n",
"- Im lepszej jakości dane - tym lepszy model\n",
"- Najlepsza architektura, najpotężniejsze zasoby obliczeniowe i najbardziej wyrafinowane metody nie pomogą, jeśli dane użyte do rozwoju modelu nie odpowiadają tym, z którymi będzie on używany, albo jeśli w danych nie będzie żadnych zależności\n",
"- Możemy stracić dużo czasu, energii i zasobów optymalizując nasz model w złym kierunku, jeśli dane są źle dobrane"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Źródła danych\n",
"- Gotowe zbiory:\n",
" - Otwarte wyzwania (challenge)\n",
" - Repozytoria otwartych zbiorów danych\n",
" - Dane udostępniane przez firmy\n",
" - Repozytoria zbiorów komercyjnych\n",
" - Dane wewnętrzne (np. firmy)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Źródła danych\n",
"- Tworzenie danych:\n",
" - Generowanie syntetyczne\n",
" - Crowdsourcing\n",
" - Data scrapping\n",
" - Ekstrakcja\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Otwarte wyzwania (shared task / challenge)\n",
"- Kaggle: https://www.kaggle.com/datasets\n",
"- Gonito: https://gonito.net/list-challenges - polski (+poznański +z UAM) Kaggle\n",
"- Semeval: https://semeval.github.io/ - zadania z semantyki\n",
"- Poleval: http://poleval.pl/ - przetwarzanie języka polskiego\n",
"- WMT http://www.statmt.org/wmt20/ (tłumaczenie maszynowe)\n",
"- IWSLT https://iwslt.org/2021/#shared-tasks (tłumaczenie mowy)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Repozytoria/wyszukiwarki otwartych zbiorów danych\n",
"- Papers with code: https://paperswithcode.com/datasets\n",
"- UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/\n",
"- Google dataset search: https://datasetsearch.research.google.com/\n",
"- Zbiory google:https://research.google/tools/datasets/\n",
"- https://registry.opendata.aws/\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Otwarte zbiory\n",
"- Rozpoznawanie mowy:\n",
" - https://www.openslr.org/ - Libri Speech, TED Lium\n",
" - Mozilla Open Voice: https://commonvoice.mozilla.org/\n",
"- NLP:\n",
" - Clarin PL: https://lindat.cz/repository/xmlui/\n",
" - Clarin: https://clarin-pl.eu/index.php/zasoby/\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Crowdsourcing\n",
"- Amazon Mechanical Turk: https://www.mturk.com/\n",
"- Yandex Toloka\n",
"- reCAPTCHA\n",
"<img src=\"https://upload.wikimedia.org/wikipedia/commons/8/8b/Tuerkischer_schachspieler_windisch4.jpg\">\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Licencje\n",
"- Przed podjęciem decyzji o użyciu danego zbioru koniecznie sprawdź jego licencję!\n",
"- Wiele dostępnych w internecie zbiorów jest udostępniana na podstawie otwartych licencji\n",
"- Zazwyczaj jednak ich użycie wymaga spełnienia pewnych warunków, np. podania źródła\n",
"- Wiele ogólnie dostępnych zbiorów nie może być jednak użytych za darmo w celach komercyjnych!\n",
"- Niektóre z nich mogą nawet powodować, że praca pochodna, która zostanie stworzona z ich wykorzystaniem, będzie musiała być udostępniona na tej samej licencji (GPL). Jest to \"niebezpieczeństwo\" w przypadku wykorzystania zasobów przez firmę komercyjną!\n",
"- Zasady działania licencji CC: https://creativecommons.pl/\n",
"- Najbardziej popularne licencje:\n",
" - Przyjazne również w zastosowaniach komercyjnych: MIT, BSD, Appache, CC (bez dopisku NC)\n",
" - GPL (GNU Public License) - \"zaraźliwa\" licencja Open Source"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Przykład \n",
"- Za pomocą standardowych narzędzi bash dokonamy wstępnej inspekcji i podziału danych\n",
"- Jako przykładu użyjemy klasycznego zbioru IRIS: https://archive.ics.uci.edu/ml/datasets/Iris\n",
"- Zbiór zawiera dane dotyczące długości i szerokości płatków kwiatowych trzech gatunków irysa:\n",
" - Iris Setosa\n",
" - Iris Versicolour\n",
" - Iris Virginica\n",
" \n",
"<img src=IUM_02/iris.png>\n",
"https://www.kaggle.com/vinayshaw/iris-species-100-accuracy-using-naive-bayes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inspekcja\n",
"- Zanim zaczniemy trenować model na danych, powinniśmy poznać ich specyfikę\n",
"- Pozwoli nam to:\n",
" - usunąć lub naprawić nieprawidłowe przykłady\n",
" - dokonać selekcji cech, których użyjemy w naszym modelu\n",
" - wybrać odpowiedni algorytm uczenia\n",
" - podjąć dezycję dotyczącą podziału zbioru i ewentualnej normalizacji\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inspekcja\n",
"- Do inspekcji danych użyjemy popularnej biblioteki pythonowej Pandas: https://pandas.pydata.org/\n",
"- Do wizualizacji użyjemy biblioteki Seaborn: https://seaborn.pydata.org/index.html\n",
"- Służy ona do analizy i operowania na danych tabelarycznych jak i szeregach czasowych"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: kaggle in /home/tomek/.local/lib/python3.8/site-packages (1.5.12)\n",
"Requirement already satisfied: python-dateutil in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2.8.1)\n",
"Requirement already satisfied: six>=1.10 in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (1.15.0)\n",
"Requirement already satisfied: urllib3 in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (1.25.11)\n",
"Requirement already satisfied: python-slugify in /home/tomek/.local/lib/python3.8/site-packages (from kaggle) (4.0.1)\n",
"Requirement already satisfied: certifi in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2020.6.20)\n",
"Requirement already satisfied: tqdm in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (4.50.2)\n",
"Requirement already satisfied: requests in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2.24.0)\n",
"Requirement already satisfied: text-unidecode>=1.3 in /home/tomek/.local/lib/python3.8/site-packages (from python-slugify->kaggle) (1.3)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /home/tomek/anaconda3/lib/python3.8/site-packages (from requests->kaggle) (3.0.4)\n",
"Requirement already satisfied: idna<3,>=2.5 in /home/tomek/anaconda3/lib/python3.8/site-packages (from requests->kaggle) (2.10)\n",
"Requirement already satisfied: pandas in /home/tomek/anaconda3/lib/python3.8/site-packages (1.1.3)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (2.8.1)\n",
"Requirement already satisfied: numpy>=1.15.4 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (1.19.2)\n",
"Requirement already satisfied: pytz>=2017.2 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (2020.1)\n",
"Requirement already satisfied: six>=1.5 in /home/tomek/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n"
]
}
],
"source": [
"#Zainstalujmy potrzebne biblioteki \n",
"!pip install --user kaggle #API Kaggle, do pobrania zbioru\n",
"!pip install --user pandas"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/tomek/.kaggle/kaggle.json'\n",
"iris.zip: Skipping, found more recently modified local copy (use --force to force download)\n"
]
}
],
"source": [
"# Żeby poniższa komenda zadziałała, musisz posiadać plik /.kaggle/kaggle.json, zawierający Kaggle API token.\n",
"# Instrukcje: https://www.kaggle.com/docs/api\n",
"!kaggle datasets download -d uciml/iris"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: iris.zip\r\n",
" inflating: Iris.csv \r\n",
" inflating: database.sqlite \r\n"
]
}
],
"source": [
"!unzip -o iris.zip"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species\r\n",
"1,5.1,3.5,1.4,0.2,Iris-setosa\r\n",
"2,4.9,3.0,1.4,0.2,Iris-setosa\r\n",
"3,4.7,3.2,1.3,0.2,Iris-setosa\r\n",
"4,4.6,3.1,1.5,0.2,Iris-setosa\r\n"
]
}
],
"source": [
"!head -n 5 Iris.csv"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>SepalLengthCm</th>\n",
" <th>SepalWidthCm</th>\n",
" <th>PetalLengthCm</th>\n",
" <th>PetalWidthCm</th>\n",
" <th>Species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>5.1</td>\n",
" <td>3.5</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>Iris-setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>4.9</td>\n",
" <td>3.0</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>Iris-setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>4.7</td>\n",
" <td>3.2</td>\n",
" <td>1.3</td>\n",
" <td>0.2</td>\n",
" <td>Iris-setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>4.6</td>\n",
" <td>3.1</td>\n",
" <td>1.5</td>\n",
" <td>0.2</td>\n",
" <td>Iris-setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>5.0</td>\n",
" <td>3.6</td>\n",
" <td>1.4</td>\n",
" <td>0.2</td>\n",
" <td>Iris-setosa</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>146</td>\n",
" <td>6.7</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.3</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>147</td>\n",
" <td>6.3</td>\n",
" <td>2.5</td>\n",
" <td>5.0</td>\n",
" <td>1.9</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>147</th>\n",
" <td>148</td>\n",
" <td>6.5</td>\n",
" <td>3.0</td>\n",
" <td>5.2</td>\n",
" <td>2.0</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>149</td>\n",
" <td>6.2</td>\n",
" <td>3.4</td>\n",
" <td>5.4</td>\n",
" <td>2.3</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>150</td>\n",
" <td>5.9</td>\n",
" <td>3.0</td>\n",
" <td>5.1</td>\n",
" <td>1.8</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>150 rows × 6 columns</p>\n",
"</div>"
],
"text/plain": [
" Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \\\n",
"0 1 5.1 3.5 1.4 0.2 \n",
"1 2 4.9 3.0 1.4 0.2 \n",
"2 3 4.7 3.2 1.3 0.2 \n",
"3 4 4.6 3.1 1.5 0.2 \n",
"4 5 5.0 3.6 1.4 0.2 \n",
".. ... ... ... ... ... \n",
"145 146 6.7 3.0 5.2 2.3 \n",
"146 147 6.3 2.5 5.0 1.9 \n",
"147 148 6.5 3.0 5.2 2.0 \n",
"148 149 6.2 3.4 5.4 2.3 \n",
"149 150 5.9 3.0 5.1 1.8 \n",
"\n",
" Species \n",
"0 Iris-setosa \n",
"1 Iris-setosa \n",
"2 Iris-setosa \n",
"3 Iris-setosa \n",
"4 Iris-setosa \n",
".. ... \n",
"145 Iris-virginica \n",
"146 Iris-virginica \n",
"147 Iris-virginica \n",
"148 Iris-virginica \n",
"149 Iris-virginica \n",
"\n",
"[150 rows x 6 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"iris=pd.read_csv('Iris.csv')\n",
"iris"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Id</th>\n",
" <th>SepalLengthCm</th>\n",
" <th>SepalWidthCm</th>\n",
" <th>PetalLengthCm</th>\n",
" <th>PetalWidthCm</th>\n",
" <th>Species</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150.000000</td>\n",
" <td>150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Iris-virginica</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>75.500000</td>\n",
" <td>5.843333</td>\n",
" <td>3.054000</td>\n",
" <td>3.758667</td>\n",
" <td>1.198667</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>43.445368</td>\n",
" <td>0.828066</td>\n",
" <td>0.433594</td>\n",
" <td>1.764420</td>\n",
" <td>0.763161</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>4.300000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.100000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>38.250000</td>\n",
" <td>5.100000</td>\n",
" <td>2.800000</td>\n",
" <td>1.600000</td>\n",
" <td>0.300000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>75.500000</td>\n",
" <td>5.800000</td>\n",
" <td>3.000000</td>\n",
" <td>4.350000</td>\n",
" <td>1.300000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>112.750000</td>\n",
" <td>6.400000</td>\n",
" <td>3.300000</td>\n",
" <td>5.100000</td>\n",
" <td>1.800000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>150.000000</td>\n",
" <td>7.900000</td>\n",
" <td>4.400000</td>\n",
" <td>6.900000</td>\n",
" <td>2.500000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \\\n",
"count 150.000000 150.000000 150.000000 150.000000 150.000000 \n",
"unique NaN NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN NaN \n",
"mean 75.500000 5.843333 3.054000 3.758667 1.198667 \n",
"std 43.445368 0.828066 0.433594 1.764420 0.763161 \n",
"min 1.000000 4.300000 2.000000 1.000000 0.100000 \n",
"25% 38.250000 5.100000 2.800000 1.600000 0.300000 \n",
"50% 75.500000 5.800000 3.000000 4.350000 1.300000 \n",
"75% 112.750000 6.400000 3.300000 5.100000 1.800000 \n",
"max 150.000000 7.900000 4.400000 6.900000 2.500000 \n",
"\n",
" Species \n",
"count 150 \n",
"unique 3 \n",
"top Iris-virginica \n",
"freq 50 \n",
"mean NaN \n",
"std NaN \n",
"min NaN \n",
"25% NaN \n",
"50% NaN \n",
"75% NaN \n",
"max NaN "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.describe(include='all')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Iris-virginica 50\n",
"Iris-setosa 50\n",
"Iris-versicolor 50\n",
"Name: Species, dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris[\"Species\"].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEyCAYAAADjiYtYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAASkklEQVR4nO3de6xlZX3G8e8zgOKNCuFAplwcbFGrlpujEaGaglhaVKgVkaqdGCq9YEtTi4HeEmusWBPjpd5GRKf1SivIFI1CByiSEHC4CkGD5aYyMgNVGcEil1//2OvIdDgzZ5+zz9lr3tnfT3Ky9nr33rN/yTrznLXf9b7vSlUhSWrPkr4LkCTNjwEuSY0ywCWpUQa4JDXKAJekRhngktSoHcf5YbvvvnstW7ZsnB8pSc27+uqr76mqqc3bxxrgy5YtY+3ateP8SElqXpI7Zmq3C0WSGmWAS1KjDHBJapQBLkmNMsAlqVFDjUJJcjuwEXgEeLiqlifZDfgisAy4HXhdVf1occqUJG1uLmfgv1lVB1XV8m7/dGBNVe0PrOn2JUljMkoXyrHAqu7xKuC4kauRJA1t2Ik8BVyYpICPV9VKYM+qWgdQVeuS7DHTG5OcDJwMsO+++y5AycNbdvpXxvp543b7mcf0XcKi8di1zeM3HsMG+GFVdVcX0hcl+fawH9CF/UqA5cuXe/sfSVogQ3WhVNVd3XY9cB7wIuDuJEsBuu36xSpSkvR4swZ4kqckedr0Y+AVwI3AamBF97IVwPmLVaQk6fGG6ULZEzgvyfTrP1dVX0vyTeCcJCcBdwLHL16ZkqTNzRrgVXUrcOAM7fcCRy5GUZKk2TkTU5IaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktSooQM8yQ5Jrk1yQbe/W5KLktzSbXddvDIlSZubyxn4qcDNm+yfDqypqv2BNd2+JGlMhgrwJHsDxwBnbdJ8LLCqe7wKOG5BK5MkbdWwZ+DvB94OPLpJ255VtQ6g2+6xsKVJkrZm1gBP8kpgfVVdPZ8PSHJykrVJ1m7YsGE+/4QkaQbDnIEfBrw6ye3AF4AjknwGuDvJUoBuu36mN1fVyqpaXlXLp6amFqhsSdKsAV5VZ1TV3lW1DHg9cHFVvRFYDazoXrYCOH/RqpQkPc4o48DPBI5KcgtwVLcvSRqTHefy4qq6FLi0e3wvcOTClyRJGoYzMSWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjXAk+yc5Kok1ye5Kck7uvbdklyU5JZuu+vilytJmjbMGfiDwBFVdSBwEHB0khcDpwNrqmp/YE23L0kak1kDvAZ+2u3u1P0UcCywqmtfBRy3GAVKkmY2VB94kh2SXAesBy6qqiuBPatqHUC33WPRqpQkPc5QAV5Vj1TVQcDewIuSPH/YD0hycpK1SdZu2LBhnmVKkjY3p1EoVfVj4FLgaODuJEsBuu36LbxnZVUtr6rlU1NTo1UrSfqFYUahTCV5evf4ScDLgW8Dq4EV3ctWAOcvUo2SpBnsOMRrlgKrkuzAIPDPqaoLklwBnJPkJOBO4PhFrFOStJlZA7yqbgAOnqH9XuDIxShKkjQ7Z2JKUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjZg3wJPskuSTJzUluSnJq175bkouS3NJtd138ciVJ04Y5A38YeFtV/RrwYuCUJM8FTgfWVNX+wJpuX5I0JrMGeFWtq6pruscbgZuBvYBjgVXdy1YBxy1SjZKkGcypDzzJMuBg4Epgz6paB4OQB/ZY8OokSVs0dIAneSrwJeAvquq+Obzv5CRrk6zdsGHDfGqUJM1gqABPshOD8P5sVZ3bNd+dZGn3/FJg/UzvraqVVbW8qpZPTU0tRM2SJIYbhRLgk8DNVfW+TZ5aDazoHq8Azl/48iRJW7LjEK85DHgT8K0k13Vtfw2cCZyT5CTgTuD4RalQkjSjWQO8qi4HsoWnj1zYciRJw3ImpiQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRswZ4krOTrE9y4yZtuyW5KMkt3XbXxS1TkrS5Yc7APw0cvVnb6cCaqtofWNPtS5LGaNYAr6rLgP/ZrPlYYFX3eBVw3MKWJUmazXz7wPesqnUA3XaPhStJkjSMRb+ImeTkJGuTrN2wYcNif5wkTYz5BvjdSZYCdNv1W3phVa2squVVtXxqamqeHydJ2tx8A3w1sKJ7vAI4f2HKkSQNa5hhhJ8HrgCeneT7SU4CzgSOSnILcFS3L0kaox1ne0FVnbiFp45c4FokSXPgTExJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWrUSAGe5Ogk30ny3SSnL1RRkqTZzTvAk+wAfBj4beC5wIlJnrtQhUmStm6UM/AXAd+tqlur6ufAF4BjF6YsSdJsRgnwvYDvbbL//a5NkjQGO47w3szQVo97UXIycHK3+9Mk3xnhM7d1uwP3jOvD8p5xfdJE8Ni1bXs/fs+YqXGUAP8+sM8m+3sDd23+oqpaCawc4XOakWRtVS3vuw7NnceubZN6/EbpQvkmsH+S/ZI8AXg9sHphypIkzWbeZ+BV9XCStwJfB3YAzq6qmxasMknSVo3ShUJVfRX46gLVsj2YiK6i7ZTHrm0TefxS9bjrjpKkBjiVXpIaZYBLUqMMcEnNSbIkyUv6rqNv9oEvgCTHAM8Ddp5uq6p/6K8iDctj164kV1TVoX3X0SfPwEeU5GPACcCfMZidejxbmDWlbYvHrnkXJvm9JDPNCp8InoGPKMkNVXXAJtunAudW1Sv6rk1b57FrW5KNwFOAR4CfMfgjXFW1S6+FjdFI48AFDH5xAB5I8svAvcB+Pdaj4XnsGlZVT+u7hr4Z4KO7IMnTgfcC1zBY0OusXivSsDx2jUvyauCl3e6lVXVBn/WMm10oCyjJE4Gdq+onfdeiufHYtSfJmcALgc92TScCV1fVxNwdzIuYI0pySncWR1U9CCxJ8qf9VqVhJDk+yfTX8NOATyU5uM+aNCe/AxxVVWdX1dnA0V3bxDDAR/eWqvrx9E5V/Qh4S3/laA7+rqo2Jjkc+C1gFfCxnmvS3Dx9k8e/1FcRfTHAR7dk02FM3b1Cn9BjPRreI932GOCjVXU+HruWvBu4Nsmnk6wCrgb+seeaxso+8BEleS+wjMGZWwF/DHyvqt7WZ12aXZILgB8ALwdewGBUylVVdWCvhWloSZYy6AcPcGVV/bDnksbKAB9RkiXAHwFHMvgluhA4q6oe2eob1bskT2b
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"iris[\"Species\"].value_counts().plot(kind=\"bar\")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PetalLengthCm</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Species</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Iris-setosa</th>\n",
" <td>1.464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Iris-versicolor</th>\n",
" <td>4.260</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Iris-virginica</th>\n",
" <td>5.552</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PetalLengthCm\n",
"Species \n",
"Iris-setosa 1.464\n",
"Iris-versicolor 4.260\n",
"Iris-virginica 5.552"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris[[\"Species\",\"PetalLengthCm\"]].groupby(\"Species\").mean()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Species'>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAFACAYAAACV7zazAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAY+ElEQVR4nO3dfZRU9Z3n8c+nGxQSMG603WPEBFRGI0+NNixCIFHiw4qTmU1iiJKsZ+LT7IYdNpnokTiYE0ej2XjUjJPEIIO46xNO8GnUzGhURs1xeZIGRXQh2kZGFDQZRPAB8Lt/1K22hYa+jV11f9X1fp1Tp+reunXr21TXh1//7u/+riNCAIB0NRRdAABgzwhqAEgcQQ0AiSOoASBxBDUAJI6gBoDE9anETg888MAYPHhwJXYNAL3SsmXLXo+Ips6eq0hQDx48WEuXLq3ErgGgV7L90u6eo+sDABJHUANA4ghqAEhcRfqoO7Nt2zatW7dO77zzTrXeEj2gX79+GjRokPr27Vt0KUDdqlpQr1u3TgMHDtTgwYNlu1pvi48gIvTGG29o3bp1GjJkSNHlAHWral0f77zzjg444ABCuobY1gEHHMBfQUDBqtpHTUjXHj4zoHh1dTCxsbFRzc3NGj58uE4//XRt3bp1t9u2trbqgQce6HKfCxcu1GmnnSZJmjdvnqZPn95j9e6sra1Nt956a/vynt7vrbfe0vnnn6/DDz9cw4YN06RJk7Ro0aKK1QagcqrWR72zwRfd36P7a7tySpfb9O/fX62trZKkadOm6frrr9d3v/vdTrdtbW3V0qVLdeqpp/ZkmR9JOajPPPPMLrc955xzNGTIEK1Zs0YNDQ164YUXtHr16ipUiXrS09/jlOTJlGqpqxZ1RxMnTtTatWu1ZcsWfetb39KYMWM0evRo3XPPPXrvvfd0ySWXaP78+Wpubtb8+fO1ePFijR8/XqNHj9b48eP1/PPP536vm2++WWPHjlVzc7POP/987dixQ5I0YMAAXXzxxRo1apTGjRun1157TZL0u9/9TuPGjdOYMWN0ySWXaMCAAZKkiy66SI8//riam5t1zTXXSJJeeeUVnXLKKRo6dKguvPDC9tcvWrRIl112mRoaSh/xYYcdpilTpqitrU1HHXWUzjnnHA0fPlzTpk3Tb37zG02YMEFDhw7V4sWLe+zfGEDPqMug3r59u379619rxIgRuvzyy3XCCSdoyZIlevTRR3XBBRdo27ZtuvTSSzV16lS1trZq6tSpOuqoo/TYY49p+fLluvTSS/X9738/13utXr1a8+fP129/+1u1traqsbFRt9xyiyRpy5YtGjdunFasWKFJkybphhtukCTNmDFDM2bM0JIlS/SpT32qfV9XXnmlJk6cqNbWVn3nO9+RVGr5z58/X08//bTmz5+vl19+WatWrVJzc7MaGxs7rWnt2rWaMWOGVq5cqeeee0633nqrnnjiCV111VX60Y9+9FH+aQFUQGFdH0V4++231dzcLKnUoj777LM1fvx43XvvvbrqqqsklUan/P73v9/ltZs2bdJZZ52lNWvWyLa2bduW6z0ffvhhLVu2TGPGjGmv4aCDDpIk7bPPPu3928cee6weeughSdKTTz6pu+++W5J05pln6nvf+95u9z958mR94hOfkCQdffTReuml3U4X0G7IkCEaMWKEJGnYsGGaPHmybGvEiBFqa2vL9XMBqJ66CuqOfdRlEaEFCxboyCOP/ND6nQ+8zZo1S8cff7zuuusutbW16Qtf+EKu94wInXXWWbriiit2ea5v377toyoaGxu1ffv2/D9MZt99921/XN7HsGHDtGLFCr3//vvtXR+7e01DQ0P7ckNDw17VAKCy6rLro6OTTz5Z1113ncpXY1++fLkkaeDAgdq8eXP7dps2bdIhhxwiqTTaIq/JkyfrV7/6lTZs2CBJ+sMf/tBlq3fcuHFasGCBJOn2229vX79zTbtz+OGHq6WlRT/4wQ/af641a9bonnvuyV03gHTUfVDPmjVL27Zt08iRIzV8+HDNmjVLknT88cfr2WefbT+YeOGFF2rmzJmaMGFC+8HAzsybN0+DBg1qv+2333667LLLdNJJJ2nkyJE68cQTtX79+j3WdO211+rqq6/W2LFjtX79+vaujZEjR6pPnz4aNWpU+8HE3ZkzZ45effVVHXHEERoxYoTOPffcD/V3A6gdLre4elJLS0vsPB/16tWr9dnPfrbH36s32rp1q/r37y/buv3223XbbbcV2hrms8PuMDyv59heFhEtnT1XV33UtWLZsmWaPn26IkL777+/5s6dW3RJAApEUCdo4sSJWrFiRdFlAEhE3fdRA0DqqhrUlegPR2XxmQHFq1pQ9+vXT2+88QZf/BpSno+6X79+RZcC1LWq9VEPGjRI69at08aNG6v1lugB5Su8AChO1YK6b9++XCUEAPYCBxMBIHEENQAkLlfXh+02SZsl7ZC0fXdnzwAAel53+qiPj4jXK1YJAKBTdH0AQOLyBnVIetD2MtvnVbIgAMCH5e36mBARr9g+SNJDtp+LiMc6bpAF+HmS9OlPf7qHywSA+pWrRR0Rr2T3GyTdJWlsJ9vMjoiWiGhpamrq2SoBoI51GdS2P257YPmxpJMkPVPpwgAAJXm6Pv6jpLuya/v1kXRrRPxzRasCALTrMqgj4gVJo6pQCwCgEwzPA4DEEdQAkDiCGgASR1ADQOIIagBIHEENAIkjqAEgcQQ1ACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxfYouAPVt8EX3F11CRbVdOaXoEtAL0KIGgMQR1ACQOIIaABJHUANA4nIHte1G28tt31fJggAAH9adFvUMSasrVQgAoHO5gtr2IElTJM2pbDkAgJ3lbVFfK+lCSe9XrhQAQGe6DGrbp0naEBHLutjuPNtLbS/duHFjjxUIAPUuT4t6gqQv2W6TdLukE2zfvPNGETE7IloioqWpqamHywSA+tVlUEfEzIgYFBGDJX1d0iMR8Y2KVwYAkMQ4agBIXrcmZYqIhZIWVqQSAECnaFEDQOIIagBIHEENAIkjqAEgcQQ1ACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6gBoDEEdQAkDiCGgASR1ADQOIIagBIHEENAIkjqAEgcQQ1ACSuy6C23c/2YtsrbK+y/cNqFAYAKOmTY5t3JZ0QEW/Z7ivpCdu/joj/W+HaAADKEdQREZLeyhb7ZreoZFEAgA/k6qO23Wi7VdIGSQ9FxKKKVgUAaJcrqCNiR0Q0Sxokaazt4TtvY/s820ttL924cWMPlwkA9atboz4i4t8lLZR0SifPzY6IlohoaWpq6pnqAAC5Rn002d4/e9xf0hclPVfhugAAmTyjPg6WdJPtRpWC/Y6IuK+yZQEAyvKM+lgpaXQVagEAdIIzEwEgcQQ1ACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0AiSOoASBxBDUAJI6gBoDEEdQAkDiCGgASR1ADQOIIagBIHEENAIkjqAEgcQQ1ACSOoAaAxBHUAJC4LoPa9qG2H7W92vYq2zOqURgAoKRPjm22S/rriHjK9kBJy2w/FBHPVrg2AIBytKgjYn1EPJU93ixptaRDKl0YAKCkW33UtgdLGi1pUUWqAQDsIndQ2x4gaYGk/xkRb3by/Hm2l9peunHjxp6sEQDqWq6gtt1XpZC+JSLu7GybiJgdES0R0dLU1NSTNQJAXcsz6sOS/kHS6oi4uvIlAQA6ytOiniDpm5JOsN2a3U6tcF0AgEyXw/Mi4glJrkItAIBOcGYiACSOoAaAxBHUAJA4ghoAEkdQA0DiCGoASBxBDQCJI6gBIHEENQAkjqAGgMQR1ACQOIIaABJHUANA4ghqAEgcQQ0
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"iris[[\"Species\",\"PetalLengthCm\"]].groupby(\"Species\").mean().plot(kind=\"bar\")"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f97eed545b0>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAdoAAAFtCAYAAACgK6tiAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABg1ElEQVR4nO3deXwU9f348dfMnrnvC8KNHMohEO5LARUBgYIHXtSq0NYDa2ulnigiX1FbWlGLtrX+rEetiiKIioCK3CAooIDIkQC5T5JNstfM74/AwrIhB2R3s+H9fDx4PNhP5j3zzhLy3pn5zPuj6LquI4QQQgi/UIOdgBBCCNGSSaEVQggh/EgKrRBCCOFHUmiFEEIIP5JCK4QQQviRFFohhBDCj4zBTqCpFRVVoGmNe2IpLi6ckpJKP2XkP6GaN0juwRCqeUPLzz0pKSpA2YhgkDNawGg0BDuFcxKqeYPkHgyhmjdI7iK0SaEVQggh/EgKrRBCCOFHUmiFEEIIP5JCK4QQQviRFFohhBDCj6TQCiGEEH4khVYIIYTwIym0QgghhB8FpDNUSUkJDz74IFlZWZjNZtq1a8fcuXOJj4/32m7RokW8/fbbJCcnA9C3b1/mzJkTiBSFEH5mMCiAgtutnUOcN6NRRdP0RneBEyIYAlJoFUXhzjvvZODAgQAsWLCA559/nvnz5/tsO3nyZGbPnh2ItIQQAaAokO/KY8PhrdhcVQxrM4B0azqqXnfHJEWBPGceGw9vpXJfFUPbDCA1LJnD5VlsOPotaZHJDGjVh3g1AV3qrWjGAlJoY2NjPUUW4NJLL+Wdd94JxKGFEEFW4Mpn3jd/w6W5AFiftZU/DP41HcM61Rv39LpTcbvz9zGhy2je3PmhZ5vVh9bx6NDfEa3E+i1/Ic5XwO/RaprGO++8w6hRo2r9+ieffMI111zD7bffzo4dOwKcnRCiKSmKwq6CPZ5iedKyn74Aw9kvISuKws5877hBbfry0d6VXtvZHJUcrchu2qSFaGIBX73nqaeeIjw8nFtuucXna9OmTeM3v/kNJpOJ9evXc9ddd7FixQri4uIavP+EhMhzyitUV88I1bxBcg+GYOSt5PqO6ehER4dhMZrPHpjjfT1YQUGv5RqxwaA2+3+P5p6f8K+AFtoFCxaQmZnJ4sWLUVXfk+mkpCTP34cOHUpaWhr79+9nwIABDT7GuSyTl5QURUFBeaNimoNQzRsk92AIVt6XJHRlifIpbv3UGez4zmM4XmIH7GeN65HYjY+Uzzxxm4/u4JquY/jv7o8924SZrKSGpTTrf4+GvO9SiFu2gBXahQsXsnv3bl599VXM5to/xebl5ZGSkgLAnj17OHbsGB06dAhUikIIP0gypfDI8PtYc3g9Nmclo9sPo214W6jn83DyaXGVzkpGtR9Gq/AUEsLi+TpzI62iUhneZiCxapxMhhLNmqLXdi2mie3fv58JEybQvn17rFYrAOnp6bz00kvMmDGDWbNm0bNnT2bPns0PP/yAqqqYTCZmzZrFyJEjG3UsOaMNDZJ74AU7b4Oh5ipW4x/vUYmLC6ewsMIzZjSq6LqO2938K6yc0YqAFNpAkkIbGiT3wAvVvKHl5y6FtmWTzlBCCCGEH0mhFUIIIfxICq0QQgjhR1JohbhAqQZQDP6ZomEwKGByYzTKrxghAt6wQggRZIpOtuMYn+75Epujkqs6jaRTVCeMuqlJdl+k5/PNwc38XHyYnindGdiqD7EkNMm+hQhFUmiFuMDkOfOYv26Rp8vSvqID3NP/NrpHXXze+65Uj/PSxtfJsxUCcKjkCAeLM5nZ+1YM7jq6QAnRgsl1HSEuIIqisLtgr08rw0/2r0ZX3ee9/2xbnqfInrQ7fx/59oLz3rcQoUoKrRAXFB2zwfcSscVoQfFd9rXRDIrvrxQFBbWWcSEuFPLTL8QFRNfh4oQuPsX2mi5XgLvu9WEbIi08lc7x7b3GhrbNIMmSfN77FiJUyT1aIS4wicZkHh32O3bk7cLmrCIjrRdpltb19h5uCKsWwe2XTuPHwp84WJJJt8TOdI3vjOo6/yIuRKiSQivEBUbXdRIMSVyZPhqgpmVpEz7lE0M8Q5MHMSJtKE6nu0n3LUQokkIrxAWqsT3BG7dv0LTzn1wlREsg92iFEEIIP5JCK4QQQviRFFohhBDCj6TQCnGB0gwu3KrD8/ysqio4VQe64bR7q6qGU7WjqPpZ4871eLVS9ZrjKWe/f1xrns2AprpwncP7Ilo+mQwlxAVGUzQyKw+zZO8KbM4qxl80mm7xndmes5PVh9aRGB7PlO7jiDRFsGzvSvYWHaBfWk+u6DCS/MpCluxdQeWJuJ7xl2DSLXUeT1c0Dlce5oO9K6iqI65EL+KTvV+wr+gg/dJ6Mrr9CKKI8dqmChtbcraz5tB6kiMSmNptPKnmNNCDV900xc0h2yGW7F2B3eVgQpcxXBLXvd73RVw4FP3MXmwhrqiootGzKZOSoigoKPdTRv4TqnmD5B4MJ/POcR5j3jd/84ynRiaR0bo3y/et8owZVAM39ZrEf75b4hm7o+8N/Gv7u177nNnvFi6N7UVdv0WyHUd5et0LXmO/7ncLvU+Lq1JszFu/kNLq455tuiR05J4+d2DQTCQlRVFYVM5nWav4+KeVXnk+MfwPxBsSG/VeNKVjjiPMX7fIa+y3GdPpEd0DaNjPS1JSlN/yE8Enl46FuICoqsKu/L1eY31b9WT1wXVeY27NTaWj2vM6PiyWgyVHfPb32c9fotXRI1lVFXbm7/GNO+Adl19V4FVkAX4qOkiJs8Tzulqv5PMDX/nkecyWc9bj+5uqKmzP3e0zvvLA12DQgpCRaI6k0ApxAdF1nUhzhNdYlbOKSFO4z7an9ye2uxyEmaw+28RYolA4+2VbXdeJskT4jMdYolFO+/VjVn37L6uKium0cVUxEGH2zdOsBm9VIF2HaEukz3iMNbrO90VcWKTQCnEB0XW4OLGLV8Hacux7rusxwWu7pPAEok4rIDZnJZ3j2xNxWkFWFZVrul4J7rP/GtF1uCShq29clyvAfaoQJVmT6JvW0yt23EWjiTHEel6bdQs39ZzstU1yRCLpka3q/qb9SNd1eiVd7PUhxKCojOs8Ct0thVbUkHu0hP49t1AkuQfeybwVBUrdxewvPYTdZeei+I4kmhPJrc5lf8khYi3RdIptjxETmRVZZFfk0Ta6NW0j21DptrG/5FRcsiml3olIigIlWjE/n4jrEt+JZFMy+hlxVdjIrDhCTkUebaPTaRuR7plQdDJ3TXGTY8/hZ0+eHYgk2m/vWUMoChS7i/i59BBOt5OL4juSZEz2vC9yj1ZIoSX0f3GGIsk98GrLW1Hwmsh05uvGjDVEQ+Jq2+bM3M/1+P7WkNxrI4W2ZZNLx0JcwM4sCrUVr4aOncvx/LlNMDTXvERwSaEVQggh/EgKrRBCCOFHUmiFEEIIP5JCK4RocprBRZViQ6+jmYWi6lQrlbhURwAzazxFJSTyFM2X9DoWQjQZRYF8Vx7vfP8RB0oy6ZXSjWu7XUOMEue1nY1yVh38mq8yNxIfFsetvabQPqxDUHsW18ZGOV8c/IqvMzeREBbHLb2m0j6sfbPLUzRvckYrhGgyNr2c5zb+nX1FB3BpLrbn7Obv2/8fztPOBhVV54vDX7Py4Focbie5Ffk8v/EVCpz5Qcy8FqrO54fW8MXBb3C4neRU5PP8xsUUugqCnZkIMVJohRBNJr+qCJuj0mvsSFk2pY5TPYur9Cq+ytzotY2u62Tb8gKSY0NV6ZV8lbnJa0zXdXKaWZ6i+ZNCK4RoMmFG337IBtWAxXBqyTijYiTBGuuzXXgtscFkVIzEW2N8xmv7HoWoixRaIUSTSTQncFn7wV5j13UfT4x6qmAZNTO39J7q1XS/Q2wb0iOC17O4Nmbdwq29r/XKs2NcW1o3szxF8yctGGlZLfVCheQeeIHK20E12VW5lNpLSQxLINWailE/Y3UeRafAmU+2LY9wo5X0iFaE4bsKzklBe88VjXxnPjm2/Jo8I1sTpvuuRlQXacEoZNaxEKJJmbHWzMw
"text/plain": [
"<Figure size 474.35x360 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAdoAAAFtCAYAAACgK6tiAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAB0yElEQVR4nO3dd5hU5fnw8e8502d2ts82ekekShOwgcQuIJZgosYSkVjwF2PktSJGMZZoFDXYExM1VlSqBrECIkiR3qVt77uzu9POef8YGBhmYWfZndkF7s91cV3sc9q9Z87Ofcpz7kfRdV1HCCGEEDGhtnQAQgghxIlMEq0QQggRQ5JohRBCiBiSRCuEEELEkCRaIYQQIoYk0QohhBAxFPdE+8ILL9CjRw+2bNkSMW3GjBkMGzaMsWPHMnbsWKZNmxbv8IQQQohmZYznxtavX8/q1avJyck54jzjxo1jypQpx7yNkpJqNC1+rwanpNgpK6uJ2/ai1VrjgtYbm8TVOBJX4x0pNpfL2QLRiHiJ2xWt1+vlkUceYerUqSiKEq/NxpzRaGjpEOrVWuOC1hubxNU4ElfjtebYROzELdE+99xzjBkzhnbt2h11vrlz53LppZdy4403smrVqjhFJ4QQQsSGEo8SjKtWreLZZ5/lX//6F4qiMGrUKGbOnEn37t3D5isqKiI5ORmTycTixYu5++67mTdvHikpKbEOUQghhIiJuDyjXb58OTt27ODcc88FID8/n5tuuonHH3+cM844IzSfy+UK/X/EiBFkZ2ezdetWhgwZEvW24v2M1uVyUlRUFbftRau1xgWtNzaJq3EkrsY7UmzyjPbEFpdEO3HiRCZOnBj6+UhXtAUFBWRmZgKwceNG9u3bR6dOneIRohBCCBETce11XJ+bb76ZyZMn06dPH5555hnWr1+PqqqYTCaefPLJsKtcIYQQ4njTIol20aJFof+/+uqrof8/8cQTLRGOEEIIETNSGUoIIYSIIUm0QgghRAxJohVCCCFiSBKtOC5oQHGVh/IaX0uHIoQQjdLivY6FaEhVnZ83521g9ZZiVFVh7JmdOX9wO8xGOU8UQrR+8k0lWjVFVfhq5V5WbykGQNN0Zn2znZ0FrbMggRBCHE4SrWjVfH6dH9bnR7Rv3VOOqp44g1MIIU5ckmhFq2YyKPRoH1nrun2mM66lNoUQ4lhJohWtmq7rXHpGJ1KcllBbn65pdGmT1IJRCSFE9KQzlGj1Uh1m/nLz6eSX1mAyGshMsWI2yDmiEOL4IIlWHBfsZgOds2SEEyHE8UcuC4QQQogYkkQrhBBCxJAkWiGEECKGJNEKIYQQMSSJVgghhIghSbRCCCFEDEmiFUIIIWJIEq0QQggRQ5JohRBCiBiSRCuEEELEkCRaIYQQIoYk0QohhBAxJIlWCCGEiCFJtEIIIUQMSaIVQgghYkgSrRBCCBFDMvC7iIuArlNYUUdRWS2uijrSE8yYDHKeJ4Q48UmiFTGnKPDT5mJmzlobart4eEfGntEJo6q0YGRCCBF7ckkhYq6i1s+bczaEtc1d8gvFlXUtFJEQQsSPJFoRc3VePx5fIKK9utbXAtEIIUR8SaIVMZeSYKFtRkJYm8VsICPF3kIRCSFE/EiiFTFnUhX+79f96dkhBYCcdAf3/24wSTZTC0cmhBCxJ52hRFykOszcNaE/tZ4AaSl2vLVedF1v6bCEECLm5IpWxI1RUXBajSQlWFo6FCGEiBtJtEIIIUQMSaIVQgghYkgSrRBCCBFDkmhFBEWKNQkhRLORXscixKfp7C1ys31fBZmpdjrnJOIwG1o6LCGEOK5JohUAKCosWZvPv+ZuDLX17JDC5Cv7YTXKjQ8hhDhW8g0qAKis9fPfL7aEtW3aVUZeSU0LRSSEECcGSbQCgEBAr7cesbeeNiGEENGTRCsASLSbGNYnO6zNbjWSne5ooYiEEOLEIM9oBRA847p6dDcyU2x8/3MuHbOTuGJkV5JsRqRSohBCHDtJtCIkwWJk7BmduGBoB8wGFdAlyQohRBPJrWMRRtd0zAYFkAwrhBDNQRKtEEIIEUOSaIUQQogYkkQrhBBCxFDcE+0LL7xAjx492LJlS8S0QCDAtGnTGD16NL/61a/44IMP4h2eOE4pChiNKooUahZCtDJx7XW8fv16Vq9eTU5OTr3TZ8+eze7du/niiy8oLy9n3LhxDBs2jLZt28YzTHGccXsCrN5ezMpNhfTvls6A7i4SLNKhXgjROsTtitbr9fLII48wderUI151zJs3jyuvvBJVVUlNTWX06NEsWLAgXiGK45Bf13lz3kZe/2w9q7YU8ebcjcyctQ6fJr2mhRCtQ9wS7XPPPceYMWNo167dEefJy8sLu9rNzs4mPz8/HuGJ41RJpYeVmwvD2jb8UkpRRV0LRSSEEOHicn9t1apVrF27lrvvvjvm20pLS4j5Ng7ncjnjvs1otNa4oPliK6v119tuMhmOaRutdZ9JXI3TWuOC1h2biI24JNrly5ezY8cOzj33XADy8/O56aabePzxxznjjDNC82VnZ5Obm0vfvn2ByCvcaJSUVKPF8bahy+WkqKgqbtuLVmuNC5o3NqfVSP/uLlZvKQq19eyQQrLN1OhttNZ9JnE1TmuNC44cmyTfE1tcEu3EiROZOHFi6OdRo0Yxc+ZMunfvHjbfBRdcwAcffMB5551HeXk5Cxcu5O23345HiOI4ZVTgxotPYVV3Fys3FdKvm4uBPVyYDNL7WAjROrR418ybb76ZyZMn06dPH8aOHcuaNWs477zzALjtttuO+kxXCAjWaD6rTzYj+7chENDQpUCzEKIVaZFEu2jRotD/X3311dD/DQYD06ZNa4mQxHFO13X8fhk7VwjR+khlKCGEECKGJNEKIYQQMSSJVgghhIghSbSi2agq+HSQJ6VCCHFQi/c6FieGOr/Gmu0lzF28E5NR5bKzu9KzXSJGVc7lhBAnN/kWFM1i/a4yXp61lr2F1ezMreSZd1eys8Dd0mEJIUSLk0Qrmkw1qiz8cXdE+7L1+ZhMcogJIU5u8i0omkxRwGk3R7Q77SY0rQUCEkKIVkQSrWiygE/j4uEdMagHyx7aLEYG98okEJBMK4Q4uUlnKNEsOmQ6mPr709mwswSjUeWUjqm0SbHKFa0Q4qQniVY0Dw3aptpon942+KOGJFkhhEASrWhmklyFECKcPKMVQgghYkgSrRBCCBFDkmiFEEKIGJJEe5JRjVBe66e81o/BqDS8QGuhgNsboM6noarHUdzi+KHquKnCo9SiKI07xlRVoU5xU6NUo8i3qjiMdIY6iVR5Any7bB9zvt8JwCVndOLMfjkkWlv3YVDjDTB/2S4WLN2F1WLkmgt6MrBbOkZJuKKZuKnisy2fs3jPcpzmBK7rezk9E3ug6IYGl/UrXn4qXMUHG+bi0/xc0PUcRrY9Ayv2OEQujgdy7nUS2fBLKR99tQ2PL4DHF+Cjr7axcVdZS4d1VKqqsHR9PnMX/0JA03HX+nh51lr2FEkdZdE8FFXnf798w/e7f0TXdSo9Vbyw/J8UeAuiWn63ew9v/fwRtf46/JqfOVsWsqZkXaOvisWJSxLtScJuN/Pj+vyI9h/W5WOvp3xia+ENaCz6aW9E+8ZfSuUWsmgWtXot3+/5MaJ9X3Xk38vhVFXh58INEe3f7PoBXfU3S3zi+CeJ9iTh9wfIcSVEtLdxJeD3t94RZI2qQpt64s5ItaNpegtEJE40JsVEpiM9oj3RHHncHU7XdbITMiPa2yXmRHXbWZwcJNGeJLzeACP65pCUcPDqNdFhZkS/bLze1pto0eHyc7pgMR/80mrjSqBHu+SWi0mcUAyaid/0vgyDevAY65raibYJbRpcVtehV3oP0u2poTab0cp5nc9Gl+ItYj9F1/UT6rKgpKQ6rlc6LpeToqKquG0vWkeKq7jay56CanRdp32Wk/SE+N82buw+UxSFUreX3KJqTEYDbVwOHObmv1o43j7LlnZCxaXolAZKyHcXYjVYyHFkY9Wj78zkpopcdz4BPUCOI4skNZn6vlmPFJvL5WxcvOK40rq7m4pml55
"text/plain": [
"<Figure size 474.35x360 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import seaborn as sns\n",
"sns.set_theme()\n",
"sns.relplot(data=iris, x=\"PetalLengthCm\", y=\"PetalWidthCm\", hue=\"Species\")\n",
"sns.relplot(data=iris, x=\"SepalLengthCm\", y=\"SepalWidthCm\", hue=\"Species\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f97ef942eb0>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAdgAAAFtCAYAAACk3ntfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABjbUlEQVR4nO3deXwU9f348dfMnjk29yYECKfcgtwICMpRFQWJKC22HvXAIiL8bClUrSAeVNBqK6VfvFtbsRYUFIhoERREQBALQhERlCPkPsi12Wvm90dkw5KbZDdZ8n4+Hj7MfuazM+8ddve9M/OZz1vRdV1HCCGEEE1Kbe4AhBBCiIuRJFghhBAiACTBCiGEEAEgCVYIIYQIAEmwQgghRABIghVCCCECwNjcAQRTXl4Jmta0dyXFxoZTUFDWpOsMtFCLOdTihdCLOdTihdCLubp47XZbM0UjgkGOYBvJaDQ0dwgNFmoxh1q8EHoxh1q8EHoxh1q8ovEkwQohhBABIAlWCCGECABJsEIIIUQASIIVQgghAkASrBBCCBEAkmCFEEKIAJAEK4QQQgSAJFghhBAiACTBCiFaJUUBo1FFURq3HlVVMBrlq1RU1aqmShRCCACLpxDPiX2U/7Afa6f+GDr0w2mMbvB6rOXZOL/7gvLcE4T3GIGe1AO3GhaAiEUokgQrhGhVTLqT4k0v4zxxEADHd19i7TKA8PH34cZc7/VY3QXkrX4SrayoYj3ffkHMVbei9BiP3rRTnosQJec1hBCtilqS5UuuZ5Uf+wq1NLtB69HyTviS61lnPl+N2VPc6BjFxUESrBCidanp8LKBR516dU/Q9YavSFy0JMEKIVoV3ZaEuV0PvzZLxz5okfYGrccQl4IaFunXFnV5Km5TVKNjFBcHuQYrhGhVXIqVqKtn4D62G+ex/2LpOhBT58E4sTRoPeXmeOJv/j3lh7bhzjlOeJ8roW0f3FqAAhchRxKsEKLVcZpiUXpeQ1ifa/F6dZwXOCqpPKwNhiE/xagouD2SWYW/oCXYmTNncurUKVRVJTw8nEcffZRevXr59Zk3bx6HDx/2PT58+DDLly9n3LhxLFu2jJUrV5KYmAjAwIEDWbhwYbDCF0JcZHRdx+Np/PVSr1euu4rqBS3BLlmyBJvNBsCmTZt4+OGHWbNmjV+fpUuX+v7+5ptvuOOOOxg1apSvLTU1lfnz5wcnYCGEEKIRgjbI6WxyBSgpKUGpY/qU1atXM2nSJMzm+t+XJoQQQrQUQb0G+8gjj7B9+3Z0XeeVV16psZ/L5WLdunX87W9/82vfsGEDn332GXa7nQceeIABAwYEOGIhhBDiwii6Hvw5R9auXcuGDRt4+eWXq12elpbGyy+/7HcKOScnh5iYGEwmE9u3b2fu3LmkpaURGxsbrLCFEEKIemuWUcSpqaksWLCAgoKCahPkO++8w0033eTXZrdX3qM2cuRIkpOTOXLkCEOHDq33dvPyStC0pv09YbfbyMkJrZlbQi3mUIsXQi/mUIsXQi/m6uK122019BYXg6Bcgy0tLSUjI8P3ePPmzURHRxMTE1Olb2ZmJl9++SUTJ070a8/KyvL9fejQIdLT0+ncuXPAYhZCCCEaIyhHsA6Hgzlz5uBwOFBVlejoaFasWIGiKEyfPp3Zs2fTt29fANasWcOYMWOqJN/nnnuOgwcPoqoqJpOJpUuX+h3VCiGEEC1Js1yDbS5yirhCqMUcavFC6MUcavFC6MUsp4hbH5mLWAghhAgASbBCCCFEAEiCFUIIIQJAEqwQQggRAJJghRBCiACQBCuEEEIEgCRYIYQQIgAkwQohhBABIAlWCCGECABJsEIIIUQASIIVQgghAkASrBBCCBEAkmCFEEKIAJAEK4QQQgSAJFghhBAiACTBCiEaTVEULLoDi16GoijNHY4QLYKxuQMQQoQ2o1aOcuorCravAiBq+E3QcRAexdrMkQnRvOQIVgjRKGrOEfI3voi3OB9vcT4FH72Mmv1tc4clRLOTBCuEuGBGo0rZoa1V2su+3ozRKF8vonWTT4AQ4oJpmo4xOqlKuyEmCV3XmyEiIVoOSbBCiAumaTrWniNRzGG+NsVsJaz3lXi9kmBF6yaDnIQQjeIMTybhlsfx5vwAuo7B3glnWCJIfhWtnCRYIUSj6DqUW+zQ3g6AGyS5CoGcIhZCCCECQhKsEEIIEQCSYIUQQogAkAQrhBBCBIAkWCGEECIAJMEKIYQQASAJVgghhAgASbBCCCFEAARtoomZM2dy6tQpVFUlPDycRx99lF69evn1WbZsGStXriQxMRGAgQMHsnDhQgC8Xi9PPvkk27ZtQ1EU7r33XqZOnRqs8IUQQogGCVqCXbJkCTabDYBNmzbx8MMPs2bNmir9UlNTmT9/fpX2devWceLECT766CMKCwtJTU1l+PDhtG/fPuCxCyGEEA0VtFPEZ5MrQElJCYqiNOj5aWlpTJ06FVVViYuLY/z48WzcuLGpwxRCCCGaRFDnIn7kkUfYvn07uq7zyiuvVNtnw4YNfPbZZ9jtdh544AEGDBgAQEZGBm3btvX1S05OJjMzMyhxCyGEEA0V1AT71FNPAbB27VqWLl3Kyy+/7Ld82rRpzJgxA5PJxPbt25k5cyZpaWnExsY2yfbj4yObZD3ns9ttdXdqYUIt5lCLF0Iv5lCLF0Iv5lCLVzROs1TTSU1NZcGCBRQUFPglT7vd7vt75MiRJCcnc+TIEYYOHUpycjKnT5+mX79+QNUj2vrIyytB05q2zIfdbiMnp7hJ1xlooRZzqMULoRdzqMULoRdzdfFKwr24BeUabGlpKRkZGb7HmzdvJjo6mpiYGL9+WVlZvr8PHTpEeno6nTt3BuDaa69l1apVaJpGfn4+mzZt4pprrglG+EIIIUSDBeUI1uFwMGfOHBwOB6qqEh0dzYoVK1AUhenTpzN79mz69u3Lc889x8GDB1FVFZPJxNKlS31HtZMnT2bfvn1cffXVANx///2kpKQEI3whhBCiwRRd11tNaWQ5RVwh1GIOtXgh9GIOtXgh9GKWU8Stj8zkJIQQQgSAJFghhBAiACTBCiGEEAEgCVYIIYQIAEmwQgghRABIghVCCCECQBKsEEIIEQDNMlWiEPWhKAqFZW4KThQQaTFgUhtWgekso+LFUJYLuoYWnoAbU5U+JtyoZbmgqHjDE/DohsaGL4Ro5STBihbJq+l8/r9M/vHBN3i8Gu0TI/l/PxtAXETV5Fgbs6eE8j3vUrp/C6Bj7TqIyNG34TTF+PpYPIWUbv0Hju++BBQi+l2FdcgUXAaZBEAIceHkFLFokTILHby+/n94vBoAp7JLeOODQ2gNXI+eeYjS/ZuBihm8yo9+ifvoLtQfj4YVRcF9dPePyRVAp3T/FrSMQ03zQoQQrZYkWNEiZRc4qrTt/y4Xh8tb73UYDCrO4weqtDuOfIFBqViPUdUoP7KrSh/nD/sxGOTjIYS4cPINIlqkWJulSluXdtFYTPV/y2qahrntJVXaLSl98P54jdWrq5hT+lTt07Y7mtbQ42UhhKgkCVa0SG3jI7h2eEff43Crkbsn9cGo1H+gk66DoX1fzG17+NqMcW2x9h7tK/qgaTrWnqMwxlXWFja37YYhpR+tpwyGECIQZJCTaJHMBoUpo7ow+rJ2uLw6sZEmoqzGBic9pykG2/X/D6UoE13zokQnU65G+PUptyQQPeUROJOBoqroUck4lbAmfDVCiNZIEqxosYyqQpsYq6/M14UeUbqUMIjuXHsfNQJiq55OFkKICyWniIUQQogAkAQrhBBCBIAkWCGEECIAJMEKIYQQASAJVgghhAgASbBCCCFEAEiCFUIIIQJAEqwQQggRADLRhBCAwVuKJ79iJidDTDJeQ/PN5KSiY3WcRivKRbXF4YpIxqPLR1WIUCOfWtHqqY4cij9Yhjf3BACGtr2I/Mm9aJbY4MeigunkbrI2vgReD6gG4sb/EkPn4XglyQoRUuQUsWjVjEaV8kOf+ZIrgPf0IVzHv0Zthk+H1ZFJ3kevViRXAM1L/qa/YSnNDH4wQohGkQQrWjVV0eH0wSrt3tO
"text/plain": [
"<Figure size 474.35x360 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"irisv = iris[iris[\"Species\"] != \"Iris-setosa\"]\n",
"sns.relplot(data=irisv, x=\"SepalLengthCm\", y=\"SepalWidthCm\", hue=\"Species\")"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.PairGrid at 0x7f97f2ad3550>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAzwAAALDCAYAAADQRQWWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd3gcV7m435nZ3nelVW+2Zcu23HtN7z1OQggtEEJNIEBCgAvcGzoXuPy4l8ClBgiXmp6Q3uPe495lq3dppV1t35nfHyutvJZsy/aqOed9Hj+P59szM9+Mzsyc75yvSJqmaQgEAoFAIBAIBALBeYg82goIBAKBQCAQCAQCwXAhDB6BQCAQCAQCgUBw3iIMHoFAIBAIBAKBQHDeIgwegUAgEAgEAoFAcN4iDB6BQCAQCAQCgUBw3iIMHoFAIBAIBAKBQHDeohttBU7kzTff5L//+7/RNA1VVfnc5z7HFVdcMeT929sDqOrQMm273RY6O4Nnq+q44Hy/xtG4Pq/XPuS2Z9Ifz4bx8PcdDzrC+NVzqP1xuPviqRgP91boeO5k8t041q/1dAj9Rxe324JOp4y2GoIxxJgyeDRN48EHH+Qvf/kLU6ZMYf/+/dxxxx1cdtllyHLmF6PeCw/D+X6N5/v1nY7xcP3jQUcQeg4n40FnoePYYrxfq9B/dBnv+gsyz5hzaZNlGb/fD4Df7ycnJ2dYjB2BQCAQCAQCgUBw/iNpmjY6Pg4nYf369XzhC1/AYrHQ09PDr3/9a+bOnTvaagkEAoFAIBAIBIJxyJhyaYvH4/z617/ml7/8JfPnz2fr1q188Ytf5Pnnn8dqtQ7pGGfip+712mlt9Z+LymOe8/0aR+P6xlIMz3j4+46mjpIEXaqPxp4mFFmh0FqAWRv8XTIe7iUM1HM8xPCMh3s7XnXs6+MNPY3oZT0F1vyT9vGR0G+onK4/joe/x6kYr/r34Ke+pxFFkck15WDDMdoqnRVn0hcF7w3GlMGzb98+WlpamD9/PgDz58/HbDZz5MgRZs2aNcraCQSC8UZrooUfrnmYUDwMQL4th/sWfRL7OP2ICwQn0hJv5gdrHyYSjwBQaM/j8wvvHrcDVcHo0aV18J/rf0lXuBsAh9HOV5Z+FpecNcqaCQTnzpgKjsnLy6OpqYmqqioAjhw5QltbGyUlJaOsmUAgGG9IssaLh99IGTsAjYEWDnQcQpJGUTGBIFPIGs8dfCVl7ADU+5s47KsaRaUE4xFZltjc+G7K2AHojvhZ37ANWRYvTMH4Z0yt8Hi9Xh566CHuu+8+pN4RyQ9+8ANcLtfoKpYhQpE4+6o7mV2ehSISMQgEw0pCSlDb3TBA3tTTipQtMcbCFwWCMyZBnDp/0wB5S08bskcaNRdGwfhDliWqu+oGyI/5apDLZFQ1MQpaCQSZY0wZPAA33HADN9xww2irMSz87l972XOsgysWFLPqwkmjrY5AcF6jqHouLFnCX3c/nSavzJ4iBoKC8wI9Bi4oWcxje/+VJq/IKhd9XHBGxOMqS4vms61xd5p8RfEi4nFh7AjGP2KZYYRo84U4WOvjQ5dP4Y3t9cQT6mirJBCc12iaxrycOVxVfhGKrGDWmfjI7FspthSNtmoCQUZQVY1FefO4fOJKFEnGojfzsTm3U2guHG3VBOOQcsckbp1+HQZFj17Rs2rq1UxxTh5ttQSCjDDmVnjOV7YdamNykYtspxmn1cCR+i4qStyjrZZAcF5jxsINZddwaelKJGSskk3MfAvOKyzYuGnidVwx4SJkZCyijwvOEoNm4pKCC1iSPw+TSY8cMaKJuVnBeYJY4Rkhdle1U5JrA6Akx8a+6s5R1kggeG+gqWDR7Jg1qxgICs5PVAmLZsck+rjgHNFUMGs2cmzZwtgRnFeIFZ4RQNM0qhq6uWB2AQAF2VYO1PpGVymB4FRIGq3xFhoDzVj05kHr1/TQTV2gkZgao9CWj0fJTksEEJMiNIaa2H20G5feRZ4pF51mGNnLkCQ6Em00BJJ1eIpthVgR9RkE4xBJ43D7MWq6GjDrTRRa8jFjO2lzWU6mZa/prkdGosRZiB4jtYF6NE2l0FaAS3YjcncI+pAk6FZ91AUaORCUyDfn4hykjySkOM2RZlqDbTiNDgos+cSJURuopzvix2vNotBciF41js6FCASDIAyeEaDVF0Kvk7GZ9QDkeSy8umVgNhSBYCwgSVAVOsp/rf91yoAp95Tx6bl3poweP138ZP3/0hbqAECv6Pm35feSo8sHQJUTvHzsDV48/GbquLdOu5aLCi9AUkcuxWlzrJEfrP050UQMgCyzmweWfAaH5BoxHQSCTFATruHH6/4XtXfavcxZxD0L7sKiDW70NMYa+PG6XxGKJdOy2w1Wbpx2Jf+340kAzDoTX1v+ObIU78hcgGDM06G288O1DxOI9gBg0Zv52vLP4ZGzU20kGba0buPRHY+nZPcu+igb6razpWFHSvbBWTexImcZqlglEowRhEvbCFDdHCDXY0lt28x6EqpGV090FLUSCAYnJkX4847H01ZrDncco74nmeJZkmB/+6GUsQMQS8T416HXkOTkPh2x9jRjB+DJ/S/SFR85V05J0Xjh8OspYwegPdTJflGHRzDOSMhR/rLzyZSxA3Csq476wMC06wB6vcLb1RtTxg6AP9pDo7+FLHMydjQUD/N2zToURQwDBMm01Jvqt6WMHYBgLMS6us0oSv8L06928bfdz6TvLJFm7AA8vucFOtT2YdVZIDgTxJtuBKhr8ZPtNKW2JUkix2WmoTUwiloJBIMT1+K0hQYaJsFYCEj23/ZBfm8KtJIgmb70+IFWH6qmEo4PlA8XKipNPa0D5G3B9lSdL4FgPBDT4mkTDH30xIKD7yBrtPS0DRB3hnw4TP0rQvX+ZkD4tAmSBk9DoHmAvM7fiCT1DxUjiSix4yaRAMLHFb49vt1gcoFgtBAGzwhQ0xxIM3gA3HYjDe0n+VgJBKOISbJwQeniNJkkSeTbcoFkKtzp3ikD9rt4wjIULem2mW3OwmFMj5XJsWbjMXqGSeuByKqOi8uWDZBXeqeKwG7BuMIsWbiobGmaTEKiwJY3oK0kSbSF2llaPG/Ab+VZZdT46lPbF5UuJZEQz4IgWYdnWdGCAfILSpbQFGlih28HBwL7MetNlLmK09rYDVaMuvR4nQnuYrJG8H0vEJwOYfCMAI3twZMYPD0n2UMgGEVUiasnXJo0YGSFXGs29y/5JF5dTqpJoamQT8//ME6jHaNi4OapVzE3e1bKDc4oGblr7vuY5ClFkiQqsifxodmrMDBySQs0TWNO9gxWTbsao86I02jnU/M/RJGoUSIYZ2gqXFKykismXYBO1uG1ePjCkk/g1ecMaNsab+Kh1T8hFA2xatrVmPUmrAYLH5x1M3m2HMx6M2adiffPuJEpLlFjRdDPRPsEPjLrFqx6Cxa9mQ/MvJksi4uH3vkJv9n2F/5n0yP896bf8fF572de/gwkSWKCq4Qsk4cvLb2bEmchkiQxO286d819P/qE6fQnFQhGCEnTzq8cLe3tgSHP3nq9dlpb/cOqTzyh8tmfvs3nb5mF7jhf6cP1Xeyv6eSB988d1vOPxDWOJqNxfV7v0LN8nUl/PFtdhu36ZY2wFkQn6dANkm1HkiQiUghVUzFLlrQUpm3xFr61+qfMy59BkTOfY5117Gzex7cv/DJuOWt49D0JsgwhLYgkSRg1Cyd75Y2XZ+VEPYfaH4e7L56K8XBvx4OO7iwLjR1tKJKCQTMOyJ4lK/CXA4+xpmYzkFzRuar8IjwmJ4XGElRVJUQQ0DCT+RTWmXw3joe/x6kYr/rLskSIIBazgXhE48ebfkm1Lz3J0r0LP8ZU5xQihNFjQFGTK/tRJUxUi2CVrEiJ0c2JdSZ9UfDeQGRpG2ZafSEcVkOasQPgthlp6QyNklYCwRBQJUxYT+rir2kaBs3U+//030LxMKqmsqVhJ1sadqbkkUR0xNeVVRWMWEADTcQrCMYxOlnBpPX15YFoaDQF+uPWDrcf4+H2P3Jx2TJun1yMqmoYMQOgimdBMAh9fcRjsVMXbKE
"text/plain": [
"<Figure size 834.35x720 with 20 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.pairplot(data=iris.drop(columns=[\"Id\"]), hue=\"Species\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Podział danych\n",
" - ### Zbiór trenujący (\"training set\")\n",
" - Służy do dopasowania parametrów modelu (np. wag w sieci neuronowej).\n",
" - Podczas trenowania algorytm minimalizuje funkcję kosztu obliczoną na zbiorze treningowym \n",
" - ### Zbiór walidujący/walidacyjny (\"validation set\" aka. \"dev set\")\n",
" - Służy do porównania modeli powstałych przy użyciu różnych hiperparametrów (np. architektura sieci, ilość iteracji trenowania)\n",
" - Pomaga uniknąć przetrenowania (overfitting) modelu na zbiorze trenującym poprzez zastosowanie tzw. early stopping\n",
" - ### Zbiór testujący (\"test set\")\n",
" - Służy do ewaluacji finalnego modelu wybranego/wytrenowanego za pomocą zbiorów trenującego i walidującego"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Podział danych\n",
"- Zbiory trenujący, walidacyjny i testowy powinny być niezależne, ale pochodzić z tego samego rozkładu\n",
"- W przypadku klasyfikacji, rozkład klas w zbiorach powinien być zbliżony\n",
"- Bardzo istotne jest to, żeby zbiory walidujący i testujący dobrze odzwierciedlały nasze cele biznesowe i rzeczywiste dane, na których będzie działał nasz model\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Metody podziału:\n",
"- Skorzystać z gotowego podziału danych :)\n",
"- Jeśli dzielimy zbiór sami:\n",
" - \"Klasyczne\" podejście: proporcja Train:Dev:Test 6:2:2 lub 8:1:1\n",
" - Uczenie głębokie: \n",
" - metody \"głębokie\" mają bardzo duże zapotrzebowanie na dane, zbiory rzędu > 1 000 000 przykładów\n",
" - Załóżmy, że cały zbiór ma 1 000 000 przykładów\n",
" - wielkości zbiorów dev i test ustalamy bezwzględnie, np. na 1000 albo 10 000 przykładów\n",
" - 10 000 przykładów to (wystarczająco) dużo, choć stanowi jedynie 1% z całego zbioru\n",
" - szkoda \"marnować\" dodatkowe 180 000 przykładów na zbiory testujące i walidacyjne, lepiej mieć większy zbiór trenujący \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Przykładowy podział z pomocą standardowych narzędzi Bash"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2021-03-15 11:16:36-- https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\n",
"Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252\n",
"Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.\n",
"HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable\n",
"\n",
" The file is already fully retrieved; nothing to do.\n",
"\n"
]
}
],
"source": [
"# Pobierzmy plik ze zbiorem z repozytorium\n",
"!cd IUM_02; wget -c https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"151 IUM_02/iris.data\r\n"
]
}
],
"source": [
"#Sprawdźmy wielkość zbioru\n",
"!wc -l IUM_02/iris.data"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.1,3.5,1.4,0.2,Iris-setosa\r\n",
"4.9,3.0,1.4,0.2,Iris-setosa\r\n",
"4.7,3.2,1.3,0.2,Iris-setosa\r\n",
"4.6,3.1,1.5,0.2,Iris-setosa\r\n",
"5.0,3.6,1.4,0.2,Iris-setosa\r\n"
]
}
],
"source": [
"#Sprawdźmy strukturę\n",
"!head -n 5 IUM_02/iris.data"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1 \r\n",
" 50 Iris-setosa\r\n",
" 50 Iris-versicolor\r\n",
" 50 Iris-virginica\r\n"
]
}
],
"source": [
"#Sprawdźmy jakie są klasy i ile każda ma przykładów:\n",
"!cut -f 5 -d \",\" IUM_02/iris.data | sort | uniq -c"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"151:\r\n"
]
}
],
"source": [
"# Znajdźmy pustą linijkę:\n",
"! grep -P \"^$\" -n IUM_02/iris.data"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text&