{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Plan na dzisiaj\n",
"1. Motywacja\n",
"2. Podział danych\n",
"3. Skąd wziąć dane?\n",
"4. Przygotowanie danych\n",
"5. Zadanie"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Motywacja\n",
"- Zasada \"Garbage in - garbage out\"\n",
"- Im lepszej jakości dane - tym lepszy model\n",
"- Najlepsza architektura, najpotężniejsze zasoby obliczeniowe i najbardziej wyrafinowane metody nie pomogą, jeśli dane użyte do rozwoju modelu nie odpowiadają tym, z którymi będzie on używany, albo jeśli w danych nie będzie żadnych zależności\n",
"- Możemy stracić dużo czasu, energii i zasobów optymalizując nasz model w złym kierunku, jeśli dane są źle dobrane"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Źródła danych\n",
"- Gotowe zbiory:\n",
" - Otwarte wyzwania (challenge)\n",
" - Repozytoria otwartych zbiorów danych\n",
" - Dane udostępniane przez firmy\n",
" - Repozytoria zbiorów komercyjnych\n",
" - Dane wewnętrzne (np. firmy)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Źródła danych\n",
"- Tworzenie danych:\n",
" - Generowanie syntetyczne\n",
" - Crowdsourcing\n",
" - Data scrapping\n",
" - Ekstrakcja\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Otwarte wyzwania (shared task / challenge)\n",
"- Kaggle: https://www.kaggle.com/datasets\n",
"- Gonito: https://gonito.net/list-challenges - polski (+poznański +z UAM) Kaggle\n",
"- Semeval: https://semeval.github.io/ - zadania z semantyki\n",
"- Poleval: http://poleval.pl/ - przetwarzanie języka polskiego\n",
"- WMT http://www.statmt.org/wmt20/ (tłumaczenie maszynowe)\n",
"- IWSLT https://iwslt.org/2021/#shared-tasks (tłumaczenie mowy)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Repozytoria/wyszukiwarki otwartych zbiorów danych\n",
"- Papers with code: https://paperswithcode.com/datasets\n",
"- UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/\n",
"- Google dataset search: https://datasetsearch.research.google.com/\n",
"- Zbiory google:https://research.google/tools/datasets/\n",
"- https://registry.opendata.aws/\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Otwarte zbiory\n",
"- Rozpoznawanie mowy:\n",
" - https://www.openslr.org/ - Libri Speech, TED Lium\n",
" - Mozilla Open Voice: https://commonvoice.mozilla.org/\n",
"- NLP:\n",
" - Clarin PL: https://lindat.cz/repository/xmlui/\n",
" - Clarin: https://clarin-pl.eu/index.php/zasoby/\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Crowdsourcing\n",
"- Amazon Mechanical Turk: https://www.mturk.com/\n",
"- Yandex Toloka\n",
"- reCAPTCHA\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Licencje\n",
"- Przed podjęciem decyzji o użyciu danego zbioru koniecznie sprawdź jego licencję!\n",
"- Wiele dostępnych w internecie zbiorów jest udostępniana na podstawie otwartych licencji\n",
"- Zazwyczaj jednak ich użycie wymaga spełnienia pewnych warunków, np. podania źródła\n",
"- Wiele ogólnie dostępnych zbiorów nie może być jednak użytych za darmo w celach komercyjnych!\n",
"- Niektóre z nich mogą nawet powodować, że praca pochodna, która zostanie stworzona z ich wykorzystaniem, będzie musiała być udostępniona na tej samej licencji (GPL). Jest to \"niebezpieczeństwo\" w przypadku wykorzystania zasobów przez firmę komercyjną!\n",
"- Zasady działania licencji CC: https://creativecommons.pl/\n",
"- Najbardziej popularne licencje:\n",
" - Przyjazne również w zastosowaniach komercyjnych: MIT, BSD, Appache, CC (bez dopisku NC)\n",
" - GPL (GNU Public License) - \"zaraźliwa\" licencja Open Source"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Przykład \n",
"- Za pomocą standardowych narzędzi bash dokonamy wstępnej inspekcji i podziału danych\n",
"- Jako przykładu użyjemy klasycznego zbioru IRIS: https://archive.ics.uci.edu/ml/datasets/Iris\n",
"- Zbiór zawiera dane dotyczące długości i szerokości płatków kwiatowych trzech gatunków irysa:\n",
" - Iris Setosa\n",
" - Iris Versicolour\n",
" - Iris Virginica\n",
" \n",
"\n",
"https://www.kaggle.com/vinayshaw/iris-species-100-accuracy-using-naive-bayes"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inspekcja\n",
"- Zanim zaczniemy trenować model na danych, powinniśmy poznać ich specyfikę\n",
"- Pozwoli nam to:\n",
" - usunąć lub naprawić nieprawidłowe przykłady\n",
" - dokonać selekcji cech, których użyjemy w naszym modelu\n",
" - wybrać odpowiedni algorytm uczenia\n",
" - podjąć dezycję dotyczącą podziału zbioru i ewentualnej normalizacji\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inspekcja\n",
"- Do inspekcji danych użyjemy popularnej biblioteki pythonowej Pandas: https://pandas.pydata.org/\n",
"- Do wizualizacji użyjemy biblioteki Seaborn: https://seaborn.pydata.org/index.html\n",
"- Służy ona do analizy i operowania na danych tabelarycznych jak i szeregach czasowych"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: kaggle in /home/tomek/.local/lib/python3.8/site-packages (1.5.12)\n",
"Requirement already satisfied: python-dateutil in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2.8.1)\n",
"Requirement already satisfied: six>=1.10 in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (1.15.0)\n",
"Requirement already satisfied: urllib3 in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (1.25.11)\n",
"Requirement already satisfied: python-slugify in /home/tomek/.local/lib/python3.8/site-packages (from kaggle) (4.0.1)\n",
"Requirement already satisfied: certifi in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2020.6.20)\n",
"Requirement already satisfied: tqdm in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (4.50.2)\n",
"Requirement already satisfied: requests in /home/tomek/anaconda3/lib/python3.8/site-packages (from kaggle) (2.24.0)\n",
"Requirement already satisfied: text-unidecode>=1.3 in /home/tomek/.local/lib/python3.8/site-packages (from python-slugify->kaggle) (1.3)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /home/tomek/anaconda3/lib/python3.8/site-packages (from requests->kaggle) (3.0.4)\n",
"Requirement already satisfied: idna<3,>=2.5 in /home/tomek/anaconda3/lib/python3.8/site-packages (from requests->kaggle) (2.10)\n",
"Requirement already satisfied: pandas in /home/tomek/anaconda3/lib/python3.8/site-packages (1.1.3)\n",
"Requirement already satisfied: python-dateutil>=2.7.3 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (2.8.1)\n",
"Requirement already satisfied: numpy>=1.15.4 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (1.19.2)\n",
"Requirement already satisfied: pytz>=2017.2 in /home/tomek/anaconda3/lib/python3.8/site-packages (from pandas) (2020.1)\n",
"Requirement already satisfied: six>=1.5 in /home/tomek/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n"
]
}
],
"source": [
"#Zainstalujmy potrzebne biblioteki \n",
"!pip install --user kaggle #API Kaggle, do pobrania zbioru\n",
"!pip install --user pandas"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/tomek/.kaggle/kaggle.json'\n",
"iris.zip: Skipping, found more recently modified local copy (use --force to force download)\n"
]
}
],
"source": [
"# Żeby poniższa komenda zadziałała, musisz posiadać plik /.kaggle/kaggle.json, zawierający Kaggle API token.\n",
"# Instrukcje: https://www.kaggle.com/docs/api\n",
"!kaggle datasets download -d uciml/iris"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Archive: iris.zip\r\n",
" inflating: Iris.csv \r\n",
" inflating: database.sqlite \r\n"
]
}
],
"source": [
"!unzip -o iris.zip"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species\r\n",
"1,5.1,3.5,1.4,0.2,Iris-setosa\r\n",
"2,4.9,3.0,1.4,0.2,Iris-setosa\r\n",
"3,4.7,3.2,1.3,0.2,Iris-setosa\r\n",
"4,4.6,3.1,1.5,0.2,Iris-setosa\r\n"
]
}
],
"source": [
"!head -n 5 Iris.csv"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Id | \n",
" SepalLengthCm | \n",
" SepalWidthCm | \n",
" PetalLengthCm | \n",
" PetalWidthCm | \n",
" Species | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 5.1 | \n",
" 3.5 | \n",
" 1.4 | \n",
" 0.2 | \n",
" Iris-setosa | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 4.9 | \n",
" 3.0 | \n",
" 1.4 | \n",
" 0.2 | \n",
" Iris-setosa | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 4.7 | \n",
" 3.2 | \n",
" 1.3 | \n",
" 0.2 | \n",
" Iris-setosa | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 4.6 | \n",
" 3.1 | \n",
" 1.5 | \n",
" 0.2 | \n",
" Iris-setosa | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 5.0 | \n",
" 3.6 | \n",
" 1.4 | \n",
" 0.2 | \n",
" Iris-setosa | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 145 | \n",
" 146 | \n",
" 6.7 | \n",
" 3.0 | \n",
" 5.2 | \n",
" 2.3 | \n",
" Iris-virginica | \n",
"
\n",
" \n",
" 146 | \n",
" 147 | \n",
" 6.3 | \n",
" 2.5 | \n",
" 5.0 | \n",
" 1.9 | \n",
" Iris-virginica | \n",
"
\n",
" \n",
" 147 | \n",
" 148 | \n",
" 6.5 | \n",
" 3.0 | \n",
" 5.2 | \n",
" 2.0 | \n",
" Iris-virginica | \n",
"
\n",
" \n",
" 148 | \n",
" 149 | \n",
" 6.2 | \n",
" 3.4 | \n",
" 5.4 | \n",
" 2.3 | \n",
" Iris-virginica | \n",
"
\n",
" \n",
" 149 | \n",
" 150 | \n",
" 5.9 | \n",
" 3.0 | \n",
" 5.1 | \n",
" 1.8 | \n",
" Iris-virginica | \n",
"
\n",
" \n",
"
\n",
"
150 rows × 6 columns
\n",
"
"
],
"text/plain": [
" Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \\\n",
"0 1 5.1 3.5 1.4 0.2 \n",
"1 2 4.9 3.0 1.4 0.2 \n",
"2 3 4.7 3.2 1.3 0.2 \n",
"3 4 4.6 3.1 1.5 0.2 \n",
"4 5 5.0 3.6 1.4 0.2 \n",
".. ... ... ... ... ... \n",
"145 146 6.7 3.0 5.2 2.3 \n",
"146 147 6.3 2.5 5.0 1.9 \n",
"147 148 6.5 3.0 5.2 2.0 \n",
"148 149 6.2 3.4 5.4 2.3 \n",
"149 150 5.9 3.0 5.1 1.8 \n",
"\n",
" Species \n",
"0 Iris-setosa \n",
"1 Iris-setosa \n",
"2 Iris-setosa \n",
"3 Iris-setosa \n",
"4 Iris-setosa \n",
".. ... \n",
"145 Iris-virginica \n",
"146 Iris-virginica \n",
"147 Iris-virginica \n",
"148 Iris-virginica \n",
"149 Iris-virginica \n",
"\n",
"[150 rows x 6 columns]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"iris=pd.read_csv('Iris.csv')\n",
"iris"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Id | \n",
" SepalLengthCm | \n",
" SepalWidthCm | \n",
" PetalLengthCm | \n",
" PetalWidthCm | \n",
" Species | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150 | \n",
"
\n",
" \n",
" unique | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 3 | \n",
"
\n",
" \n",
" top | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" Iris-virginica | \n",
"
\n",
" \n",
" freq | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 50 | \n",
"
\n",
" \n",
" mean | \n",
" 75.500000 | \n",
" 5.843333 | \n",
" 3.054000 | \n",
" 3.758667 | \n",
" 1.198667 | \n",
" NaN | \n",
"
\n",
" \n",
" std | \n",
" 43.445368 | \n",
" 0.828066 | \n",
" 0.433594 | \n",
" 1.764420 | \n",
" 0.763161 | \n",
" NaN | \n",
"
\n",
" \n",
" min | \n",
" 1.000000 | \n",
" 4.300000 | \n",
" 2.000000 | \n",
" 1.000000 | \n",
" 0.100000 | \n",
" NaN | \n",
"
\n",
" \n",
" 25% | \n",
" 38.250000 | \n",
" 5.100000 | \n",
" 2.800000 | \n",
" 1.600000 | \n",
" 0.300000 | \n",
" NaN | \n",
"
\n",
" \n",
" 50% | \n",
" 75.500000 | \n",
" 5.800000 | \n",
" 3.000000 | \n",
" 4.350000 | \n",
" 1.300000 | \n",
" NaN | \n",
"
\n",
" \n",
" 75% | \n",
" 112.750000 | \n",
" 6.400000 | \n",
" 3.300000 | \n",
" 5.100000 | \n",
" 1.800000 | \n",
" NaN | \n",
"
\n",
" \n",
" max | \n",
" 150.000000 | \n",
" 7.900000 | \n",
" 4.400000 | \n",
" 6.900000 | \n",
" 2.500000 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \\\n",
"count 150.000000 150.000000 150.000000 150.000000 150.000000 \n",
"unique NaN NaN NaN NaN NaN \n",
"top NaN NaN NaN NaN NaN \n",
"freq NaN NaN NaN NaN NaN \n",
"mean 75.500000 5.843333 3.054000 3.758667 1.198667 \n",
"std 43.445368 0.828066 0.433594 1.764420 0.763161 \n",
"min 1.000000 4.300000 2.000000 1.000000 0.100000 \n",
"25% 38.250000 5.100000 2.800000 1.600000 0.300000 \n",
"50% 75.500000 5.800000 3.000000 4.350000 1.300000 \n",
"75% 112.750000 6.400000 3.300000 5.100000 1.800000 \n",
"max 150.000000 7.900000 4.400000 6.900000 2.500000 \n",
"\n",
" Species \n",
"count 150 \n",
"unique 3 \n",
"top Iris-virginica \n",
"freq 50 \n",
"mean NaN \n",
"std NaN \n",
"min NaN \n",
"25% NaN \n",
"50% NaN \n",
"75% NaN \n",
"max NaN "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris.describe(include='all')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"Iris-virginica 50\n",
"Iris-setosa 50\n",
"Iris-versicolor 50\n",
"Name: Species, dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris[\"Species\"].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAEyCAYAAADjiYtYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAASkklEQVR4nO3de6xlZX3G8e8zgOKNCuFAplwcbFGrlpujEaGaglhaVKgVkaqdGCq9YEtTi4HeEmusWBPjpd5GRKf1SivIFI1CByiSEHC4CkGD5aYyMgNVGcEil1//2OvIdDgzZ5+zz9lr3tnfT3Ky9nr33rN/yTrznLXf9b7vSlUhSWrPkr4LkCTNjwEuSY0ywCWpUQa4JDXKAJekRhngktSoHcf5YbvvvnstW7ZsnB8pSc27+uqr76mqqc3bxxrgy5YtY+3ateP8SElqXpI7Zmq3C0WSGmWAS1KjDHBJapQBLkmNMsAlqVFDjUJJcjuwEXgEeLiqlifZDfgisAy4HXhdVf1occqUJG1uLmfgv1lVB1XV8m7/dGBNVe0PrOn2JUljMkoXyrHAqu7xKuC4kauRJA1t2Ik8BVyYpICPV9VKYM+qWgdQVeuS7DHTG5OcDJwMsO+++y5AycNbdvpXxvp543b7mcf0XcKi8di1zeM3HsMG+GFVdVcX0hcl+fawH9CF/UqA5cuXe/sfSVogQ3WhVNVd3XY9cB7wIuDuJEsBuu36xSpSkvR4swZ4kqckedr0Y+AVwI3AamBF97IVwPmLVaQk6fGG6ULZEzgvyfTrP1dVX0vyTeCcJCcBdwLHL16ZkqTNzRrgVXUrcOAM7fcCRy5GUZKk2TkTU5IaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktSooQM8yQ5Jrk1yQbe/W5KLktzSbXddvDIlSZubyxn4qcDNm+yfDqypqv2BNd2+JGlMhgrwJHsDxwBnbdJ8LLCqe7wKOG5BK5MkbdWwZ+DvB94OPLpJ255VtQ6g2+6xsKVJkrZm1gBP8kpgfVVdPZ8PSHJykrVJ1m7YsGE+/4QkaQbDnIEfBrw6ye3AF4AjknwGuDvJUoBuu36mN1fVyqpaXlXLp6amFqhsSdKsAV5VZ1TV3lW1DHg9cHFVvRFYDazoXrYCOH/RqpQkPc4o48DPBI5KcgtwVLcvSRqTHefy4qq6FLi0e3wvcOTClyRJGoYzMSWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNmjXAk+yc5Kok1ye5Kck7uvbdklyU5JZuu+vilytJmjbMGfiDwBFVdSBwEHB0khcDpwNrqmp/YE23L0kak1kDvAZ+2u3u1P0UcCywqmtfBRy3GAVKkmY2VB94kh2SXAesBy6qqiuBPatqHUC33WPRqpQkPc5QAV5Vj1TVQcDewIuSPH/YD0hycpK1SdZu2LBhnmVKkjY3p1EoVfVj4FLgaODuJEsBuu36LbxnZVUtr6rlU1NTo1UrSfqFYUahTCV5evf4ScDLgW8Dq4EV3ctWAOcvUo2SpBnsOMRrlgKrkuzAIPDPqaoLklwBnJPkJOBO4PhFrFOStJlZA7yqbgAOnqH9XuDIxShKkjQ7Z2JKUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjZg3wJPskuSTJzUluSnJq175bkouS3NJtd138ciVJ04Y5A38YeFtV/RrwYuCUJM8FTgfWVNX+wJpuX5I0JrMGeFWtq6pruscbgZuBvYBjgVXdy1YBxy1SjZKkGcypDzzJMuBg4Epgz6paB4OQB/ZY8OokSVs0dIAneSrwJeAvquq+Obzv5CRrk6zdsGHDfGqUJM1gqABPshOD8P5sVZ3bNd+dZGn3/FJg/UzvraqVVbW8qpZPTU0tRM2SJIYbhRLgk8DNVfW+TZ5aDazoHq8Azl/48iRJW7LjEK85DHgT8K0k13Vtfw2cCZyT5CTgTuD4RalQkjSjWQO8qi4HsoWnj1zYciRJw3ImpiQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRswZ4krOTrE9y4yZtuyW5KMkt3XbXxS1TkrS5Yc7APw0cvVnb6cCaqtofWNPtS5LGaNYAr6rLgP/ZrPlYYFX3eBVw3MKWJUmazXz7wPesqnUA3XaPhStJkjSMRb+ImeTkJGuTrN2wYcNif5wkTYz5BvjdSZYCdNv1W3phVa2squVVtXxqamqeHydJ2tx8A3w1sKJ7vAI4f2HKkSQNa5hhhJ8HrgCeneT7SU4CzgSOSnILcFS3L0kaox1ne0FVnbiFp45c4FokSXPgTExJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWqUAS5JjTLAJalRBrgkNcoAl6RGGeCS1CgDXJIaZYBLUqMMcElqlAEuSY0ywCWpUQa4JDXKAJekRhngktQoA1ySGmWAS1KjDHBJapQBLkmNMsAlqVEGuCQ1ygCXpEYZ4JLUKANckhplgEtSowxwSWrUSAGe5Ogk30ny3SSnL1RRkqTZzTvAk+wAfBj4beC5wIlJnrtQhUmStm6UM/AXAd+tqlur6ufAF4BjF6YsSdJsRgnwvYDvbbL//a5NkjQGO47w3szQVo97UXIycHK3+9Mk3xnhM7d1uwP3jOvD8p5xfdJE8Ni1bXs/fs+YqXGUAP8+sM8m+3sDd23+oqpaCawc4XOakWRtVS3vuw7NnceubZN6/EbpQvkmsH+S/ZI8AXg9sHphypIkzWbeZ+BV9XCStwJfB3YAzq6qmxasMknSVo3ShUJVfRX46gLVsj2YiK6i7ZTHrm0TefxS9bjrjpKkBjiVXpIaZYBLUqMMcEnNSbIkyUv6rqNv9oEvgCTHAM8Ddp5uq6p/6K8iDctj164kV1TVoX3X0SfPwEeU5GPACcCfMZidejxbmDWlbYvHrnkXJvm9JDPNCp8InoGPKMkNVXXAJtunAudW1Sv6rk1b57FrW5KNwFOAR4CfMfgjXFW1S6+FjdFI48AFDH5xAB5I8svAvcB+Pdaj4XnsGlZVT+u7hr4Z4KO7IMnTgfcC1zBY0OusXivSsDx2jUvyauCl3e6lVXVBn/WMm10oCyjJE4Gdq+onfdeiufHYtSfJmcALgc92TScCV1fVxNwdzIuYI0pySncWR1U9CCxJ8qf9VqVhJDk+yfTX8NOATyU5uM+aNCe/AxxVVWdX1dnA0V3bxDDAR/eWqvrx9E5V/Qh4S3/laA7+rqo2Jjkc+C1gFfCxnmvS3Dx9k8e/1FcRfTHAR7dk02FM3b1Cn9BjPRreI932GOCjVXU+HruWvBu4Nsmnk6wCrgb+seeaxso+8BEleS+wjMGZWwF/DHyvqt7WZ12aXZILgB8ALwdewGBUylVVdWCvhWloSZYy6AcPcGVV/bDnksbKAB9RkiXAHwFHMvgluhA4q6oe2eob1bskT2bQb/qtqrqlC4Nfr6oLey5NW5HkkK09X1XXjKuWvhngmmhJDgR+o9v9RlVd32c9ml2SS7bydFXVEWMrpmcG+DwlOaeqXpfkW8xwM+eqOqCHsjQHSU5lcMH53K7pd4GVVfWh/qqShmeAz1OSpVW1LsmMa2dU1R3jrklzk+QG4NCqur/bfwpwhX9825BkJ+BP2GQiD/Dxqnqot6LGzJmY81RV67qtQd2u8NhIFLrHE7swUoM+CuwEfKTbf1PX9oe9VTRmBviIkrwGeA+wB4P//BO3oE7DPgVcmeS8bv844Oz+ytEcvXCzEUMXJ5moaxh2oYwoyXeBV1XVzX3XornrRjQczuAP72VVdW3PJWlISa4Bjq+q/+72nwn8e1VtdZTK9sQz8NHdbXi3Kcm/VtWbGCxktXmbtn2nAZckuZXBH+BnAG/ut6TxMsBHtzbJF4EvAw9ON1bVuVt8h7YVz9t0p5tF+4KeatEcVdWaJPsDz2YQ4N/u1iOaGE6lH90uwAPAK4BXdT+v7LUibVWSM7qbARyQ5L4kG7v99cD5PZenISU5BXhSVd3Qjd9/8qQtJGcfuCZWkndX1Rl916H5SXJdVR20Wdu1VTUxK0rahTJPSd5eVf+U5EPMPJHnz3soS3PzN0neCOxXVe9Msg+wtKqu6rswDWVJklR3FjqJC8kZ4PM3feFyba9VaBQfBh4FjgDeCfy0a3thn0VpaF8HzuluTj29kNzX+i1pvOxC0cRKck1VHbLp1+4k17saYRtcSM4z8JEl+Q8e34XyEwZn5h+vqv8df1Ua0kPd1+7pr+BTDM7I1YCqepTBzMuP9l1LXwzw0d0KTAGf7/ZPAO4GngV8gsH0Xm2bPgicB+yR5F3Aa4G/7bckzWYrC8lNz4KemLVs7EIZUZLLquqlM7Uluamqnrel96p/SZ7DY1/B1zgpa9vnQnKPcRz46KaS7Du90z3evdv9eT8laRhJfgW4rao+DNwIHDV9g2ptu6YXkgPuYXD3qzuAJwIHAnf1VlgPDPDR/SVweZJLklwKfAM4rVuadFWvlWk2XwIeSfKrwFnAfsDn+i1Jc3AZsHOSvYA1DKbRf7rXisbMPvARdFfBnwbsDzyHx6bzTl+4fH9PpWk4j1bVw92Kkh+oqg8lcTGrdqSqHkhyEvChbl7GRB0/z8BH0F0Ff2tVPVhV11fVdY46acpDSU4E/gC4oGvbqcd6NDdJcijwBuArXdtEnZQa4KO7KMlfJdknyW7TP30XpaG8GTgUeFdV3ZZkP+AzPdek4Z0KnAGcV1U3dcvJbu1+mdsdR6GMKMltMzRXVT1z7MVo3pIcMkl3M29dN37/zKo6re9a+jRRXzcWQ1Xt13cNWhBnARNzI4DWVdUjSSZ+6V8DfJ6SHFFVF3cXwB7H9cCb470w23NtktXAvwH3TzdO0v89A3z+XgZczGD9780VMDG/RNuJd/RdgOZsN+BeBouRTZuo/3v2gY8oyQ6TtHjO9iTJYcB1VXV/t6zsIQyGE07MTD61zVEoo7stycokRybxa3hbPgo8kORABvdXvAP4l35L0rCSPCvJmiQ3dvsHJJmotWwM8NE9G/hP4BQGYf7PSQ7vuSYN5+HuZgDHAh+sqg8wmJilNnyCwTDChwCq6gbg9b1WNGYG+Iiq6mdVdU5VvQY4mME9Mv+r57I0nI1JzgDeCHylG5rmRJ52PHmGuyc93EslPTHAF0CSlyX5CHANsDPwup5L0nBOAB4ETqqqHwJ7Ae/ttyTNwT3dgmTT67m/Fli39bdsX7yIOaJuIs91wDnA6qq6f+vvkLQQupmXK4GXAD8CbgPeMEkXoQ3wESXZparu6x47m68BSS6vqsOTbGTmGwLs0lNpmoPpEWDdyp9Lqmpj3zWNmwG+gKbvsdh3HdIkSHIng5sYfxG4uCYwzOwDX1gOI2xEkiXTw8/UrIkfAWaALyxn8zWiWwr4+k3vpqS2OALMAB9ZksO6PjiApyZ535bu1adtzlLgpm4yyOrpn76L0vAmfQSYfeAjSnIDg3vxHcBgFt/ZwGuq6mW9FqZZJZnxGFXVRJ3FtcoRYAb4yKYvXCb5e+AHVfVJL2ZKi88RYK5GuBA2nc33UmfzbftmGD74i6dwGGEzpsO7M5HruRvgozsB+H262XzdRTFn823Dqsr1TrY/EzkCzC4USc1LclxVfbnvOsbNUSjzlOTybrsxyX2b/GxMct9s75c0GkeAeQYuqVGOAPMMfCTO5pN6NfHruRvgI3A2n9SriV/P3VEoo5uezXcV///O2K/uryRpIkz8CDD7wEfkbD5JfTHAJTXF9dwfY4DPk7P5JPXNAJfUnCRLgBuq6vl919InR6FIao4jwAYchSKpVRM/AswAl9Sqib8Dln3gktQoz8AlNcURYI/xDFySGuUoFElqlAEuSY0ywCWpUQa4JDXKAJekRv0f24qF5Vr84pkAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"iris[\"Species\"].value_counts().plot(kind=\"bar\")"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PetalLengthCm | \n",
"
\n",
" \n",
" Species | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" Iris-setosa | \n",
" 1.464 | \n",
"
\n",
" \n",
" Iris-versicolor | \n",
" 4.260 | \n",
"
\n",
" \n",
" Iris-virginica | \n",
" 5.552 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PetalLengthCm\n",
"Species \n",
"Iris-setosa 1.464\n",
"Iris-versicolor 4.260\n",
"Iris-virginica 5.552"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iris[[\"Species\",\"PetalLengthCm\"]].groupby(\"Species\").mean()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"