2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#  Analiza Danych w Pythonie: `pandas`\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### `pandas`\n",
    "Biblioteka `pandas` jest podstawowym narzędziem w ekosystemie Pythona do analizy danych:\n",
    " * dostarcza dwa podstawowe typy danych: \n",
    "   * `Series` (szereg, 1D)\n",
    "   * `DataFrame` (ramka danych, 2D)\n",
    " * operacje na tych obiektach: obsługa brakujących wartości, łączenie danych;\n",
    " * obsługuje dane różnego typu, np. szeregi czasowe;\n",
    " * biblioteka bazuje na `numpy` -- bibliotece do obliczeń numerycznych;\n",
    " * pozwala też na prostą wizualizację danych;\n",
    " * ETL: extract, transform, load."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Żeby zaimportowąc bibliotekę `pandas` wystarczy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### __Zadanie 0__: sprawdź, czy masz zainstalowaną bibliotekę `pandas`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### [Szeregi](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) (`pd.Series`)\n",
    "\n",
    " Szereg reprezentuje jednorodne dane jednowymiarowe - jest odpowiednikiem wektora w R.\n",
    "  * Szeregi możemy tworzyć na różne sposoby (więcej za chwilę), np. z obiektów tj. listy i słowniki.\n",
    "  * Dane muszą być jednorodne. W przeciwnym przypadku nastąpi automatyczna konwersja.\n",
    "  * Podczas tworzenia szeregu musimy podać jeden obowiązkowy argument `data` - dane.\n",
    "  * Ponadto możemy podać też indeks (`index`), typ danych (`dtype`) lub nazwę (`name`).\n",
    "  \n",
    "  \n",
    "  ```\n",
    "  class pandas.Series(data=None, index=None, dtype=None, name=None)\n",
    "  ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podczas tworzenie szeregu mozemy podać dane w formacie listy lub słownika.\n",
    "\n",
    "Poniżej jest przykład przedstawiający tworzenie szeregu z danych, które są zawarte w liście:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    211819\n",
       "1    682758\n",
       "2    737011\n",
       "3    779511\n",
       "4    673790\n",
       "5    673790\n",
       "6    444177\n",
       "7    136791\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "data = [211819, 682758, 737011, 779511, 673790, 673790, 444177, 136791]\n",
    "\n",
    "s = pd.Series(data)\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "W przypadku, gdy dane pochodzą z listy i nie podaliśmy indeksu, pandas doda automatyczny indeks liczbowy zaczynający się od 0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "W przypadku przekazania słownika jako danych do szeregu, pandas wykorzysta klucze do stworzenia indeksu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April    211819\n",
       "May      682758\n",
       "June     737011\n",
       "July     779511\n",
       "dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podczas tworzenia szeregu możemy zdefiniować indeks, jak i nazwę szeregu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April    211819.0\n",
       "May      682758.0\n",
       "June     737011.0\n",
       "July     779511.0\n",
       "Name: Rides, dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "months = ['April', 'May', 'June', 'July']\n",
    "\n",
    "data = [211819, 682758, 737011, 779511]\n",
    "\n",
    "s = pd.Series(data=data, index=months, dtype=float, name='Rides')\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Odwołanie się do poszczególnego elementu odbywa się przy pomocy klucza z indeksu."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "211819\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "April     211819\n",
       "May       682758\n",
       "June      737011\n",
       "July      779511\n",
       "August    673790\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "print(s['April'])\n",
    "\n",
    "s['August'] = 673790\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Dodanie elementu do szeregu odbywa się poprzez definiowanie nowego klucza:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April     211819\n",
       "May       682758\n",
       "June      737011\n",
       "July      779511\n",
       "August    673790\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "s['August'] = 673790\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Więcej nt. indeksowania w szeregach w dalszej części kursu."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podstawowa cechą szeregu jest wykonywanie operacji w sposób wektorowy. Działa to w następujący sposób:\n",
    " * gdy w obu szeregach jest zawarty ten sam klucz, to są sumowane ich wartości;\n",
    " * w przeciwnym przypadku wartość klucza w wynikowym szeregu to `pd.NaN`.  \n",
    " * Równoważnie możemy wykorzystać metodę `pandas.Series.add`. W tym przypadku możemy podać domyślną wartość w przypadku braku klucza."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "August       880599.0\n",
       "July         973827.0\n",
       "June         908505.0\n",
       "May          830656.0\n",
       "October           NaN\n",
       "September    814282.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011,  'August': 673790, 'July': 779511,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
    "'September': 140492})\n",
    "\n",
    "all_data = members + occasionals\n",
    "# Równoważnie\n",
    "all_data = members.add(occasionals)\n",
    "all_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy wykonać operacje arytmetyczne na szeregu: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May          683758\n",
       "June         738011\n",
       "July         780511\n",
       "August       674790\n",
       "September    674790\n",
       "October      445177\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "members += 1000\n",
    "\n",
    "members"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Podsumowanie\n",
    " * Szeregi działają podobnie do słowników, z tą różnicą, że wartości muszą być jednorodne (tego samego typu).\n",
    " * Odwołanie do poszczególnych elementów odbywa się poprzez nawiasy `[]` i podanie klucza.\n",
    " * W przeciwieństwie do słowników, możemy w prosty sposób wykonywać operacje arytmetyczne."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie 1\n",
    " * Stwórz szereg `n`, który będzie zawierać liczby od 0 do 10 (włącznie).\n",
    " * Stwórz szereg `n2`, który będzie zawierać kwadraty liczb od 0 do 10 (włącznie).\n",
    " * Następnie stwórz szereg `trojkatne`, który będzie sumą powyższych szeregów podzieloną przez 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### [Ramka danych](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) (`pd.DataFrame`)\n",
    "\n",
    "Ramka danych jest podstawową strukturą danych w bibliotece `pandas`, która pozwala na trzymanie i reprezentowanie danych tabelarycznych (dwuwymiarowych).\n",
    " * Posiada kolumny (cechy) i wiersze (obserwacje, przykłady).\n",
    " * Możemy też patrzeć na nią jak na słownik, którego wartościami są szeregi.\n",
    "\n",
    "```\n",
    "class pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
    "```\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Ramkę danych możemy stworzyć na różne sposoby.\n",
    "\n",
    "Pierwszy z nich (\"kolumnowy\") polega na zdefiniowaniu ramki poprzez podanie szeregów jako kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      members  occasionals\n",
       "May    682758       147898\n",
       "June   737011       171494\n",
       "July   779511       194316"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Drugim popularnym sposobem jest przekazanie listy słowników. Wtedy `pandas` zinterpretuje to jako listę przykładów:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   members  occasionals\n",
       "0   682758       147898\n",
       "1   737011       171494\n",
       "2   779511       194316"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [\n",
    "    {'members': 682758, 'occasionals': 147898},\n",
    "    {'occasionals': 171494,'members': 737011},\n",
    "    {'members': 779511, 'occasionals': 194316},\n",
    "]\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy też wykorzystać metodę `from_dict` ([doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html)), która pozwala zdefiniować czy podane dane są w podane w postaci kolumnowej lub wierszowej:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index\n",
      "       members  occasionals\n",
      "May    682758       147898\n",
      "June   737011       171494\n",
      "July   779511       194316\n",
      "\n",
      "columns\n",
      "                 May    June    July\n",
      "members      682758  737011  779511\n",
      "occasionals  147898  171494  194316\n"
     ]
    }
   ],
   "source": [
    "data = {\n",
    "    'May': {'members': 682758, 'occasionals': 147898},\n",
    "    'June': {'members': 737011, 'occasionals': 171494},\n",
    "    'July': {'members': 779511, 'occasionals': 194316}\n",
    "}\n",
    "\n",
    "df = pd.DataFrame.from_dict(data, orient='index')\n",
    "print('index\\n', df)\n",
    "print()\n",
    "df = pd.DataFrame.from_dict(data, orient='columns')\n",
    "print('columns\\n', df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Wczytywanie danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Biblioteka `pandas` pozwala na wczytanie i zapis danych z różnych formatów:\n",
    " * formaty tekstowe, np. `csv`, `json`\n",
    " * pliki arkuszy kalkulacyjnych: Excel (xls, xlsx)\n",
    " * bazy danych\n",
    " * inne: `sas` `spss`\n",
    "\n",
    "\n",
    "Efektem wczytania danych jest odpowiednio stworzona ramka danych (`DataFrame`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Jednym z najprostszych formatów danych jest format `csv`, gdzie kolejne wartości są rozdzielone przecinkiem.\n",
    "\n",
    "Żeby wczytać dane w takim formacie należy użyć funkcji `pandas.read_csv`.\n",
    "\n",
    "Pandas pozwala na ustawienie wielu parametrów (np. separator, cudzysłowy). Więcej na ten temat w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Angola</td>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Antigua and Barbuda</td>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>28.13408</td>\n",
       "      <td>27.44500</td>\n",
       "      <td>17911.0</td>\n",
       "      <td>28116716.0</td>\n",
       "      <td>17.1</td>\n",
       "      <td>74.2</td>\n",
       "      <td>2.53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>21.06500</td>\n",
       "      <td>20.91630</td>\n",
       "      <td>4085.0</td>\n",
       "      <td>86589342.0</td>\n",
       "      <td>26.2</td>\n",
       "      <td>74.1</td>\n",
       "      <td>1.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>Palestine</td>\n",
       "      <td>29.02643</td>\n",
       "      <td>26.57750</td>\n",
       "      <td>3564.0</td>\n",
       "      <td>3854667.0</td>\n",
       "      <td>24.7</td>\n",
       "      <td>74.1</td>\n",
       "      <td>4.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>23.05436</td>\n",
       "      <td>20.68321</td>\n",
       "      <td>3039.0</td>\n",
       "      <td>13114579.0</td>\n",
       "      <td>94.9</td>\n",
       "      <td>51.1</td>\n",
       "      <td>5.88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>24.64522</td>\n",
       "      <td>22.02660</td>\n",
       "      <td>1286.0</td>\n",
       "      <td>13495462.0</td>\n",
       "      <td>98.3</td>\n",
       "      <td>47.3</td>\n",
       "      <td>3.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>175 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Country  female_BMI  male_BMI      gdp  population  \\\n",
       "0            Afghanistan    21.07402  20.62058   1311.0  26528741.0   \n",
       "1                Albania    25.65726  26.44657   8644.0   2968026.0   \n",
       "2                Algeria    26.36841  24.59620  12314.0  34811059.0   \n",
       "3                 Angola    23.48431  22.25083   7103.0  19842251.0   \n",
       "4    Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "..                   ...         ...       ...      ...         ...   \n",
       "170            Venezuela    28.13408  27.44500  17911.0  28116716.0   \n",
       "171              Vietnam    21.06500  20.91630   4085.0  86589342.0   \n",
       "172            Palestine    29.02643  26.57750   3564.0   3854667.0   \n",
       "173               Zambia    23.05436  20.68321   3039.0  13114579.0   \n",
       "174             Zimbabwe    24.64522  22.02660   1286.0  13495462.0   \n",
       "\n",
       "     under5mortality  life_expectancy  fertility  \n",
       "0              110.4             52.8       6.20  \n",
       "1               17.9             76.8       1.76  \n",
       "2               29.5             75.5       2.73  \n",
       "3              192.0             56.7       6.43  \n",
       "4               10.9             75.5       2.16  \n",
       "..               ...              ...        ...  \n",
       "170             17.1             74.2       2.53  \n",
       "171             26.2             74.1       1.86  \n",
       "172             24.7             74.1       4.38  \n",
       "173             94.9             51.1       5.88  \n",
       "174             98.3             47.3       3.85  \n",
       "\n",
       "[175 rows x 8 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('gapminder.csv')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen\\t Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen\\t Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex  Age  \\\n",
       "PassengerId                                                                   \n",
       "1                                     Braund\\t Mr. Owen Harris    male   22   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female   38   \n",
       "3                                      Heikkinen\\t Miss. Laina  female   26   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female   35   \n",
       "5                                    Allen\\t Mr. William Henry    male   35   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', delimiter='\\t', index_col=0, nrows=5)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Do wczytania danych z arkusza kalkulacyjnego służy funkcja `pandas.read_excel`. Do otworzenia pliku `xlsx` może być koniecnze ustawienie parametru: `engine='openpyxl`. Więcej opcji w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>start_date</th>\n",
       "      <th>start_station_code</th>\n",
       "      <th>end_date</th>\n",
       "      <th>end_station_code</th>\n",
       "      <th>duration_sec</th>\n",
       "      <th>is_member</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2019-04-14 07:55:22</td>\n",
       "      <td>6001</td>\n",
       "      <td>2019-04-14 08:07:16</td>\n",
       "      <td>6132</td>\n",
       "      <td>713</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2019-04-14 07:59:31</td>\n",
       "      <td>6411</td>\n",
       "      <td>2019-04-14 08:09:18</td>\n",
       "      <td>6411</td>\n",
       "      <td>587</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2019-04-14 07:59:55</td>\n",
       "      <td>6097</td>\n",
       "      <td>2019-04-14 08:12:11</td>\n",
       "      <td>6036</td>\n",
       "      <td>736</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2019-04-14 07:59:57</td>\n",
       "      <td>6310</td>\n",
       "      <td>2019-04-14 08:27:58</td>\n",
       "      <td>6345</td>\n",
       "      <td>1680</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2019-04-14 08:00:37</td>\n",
       "      <td>7029</td>\n",
       "      <td>2019-04-14 08:14:12</td>\n",
       "      <td>6250</td>\n",
       "      <td>814</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           start_date  start_station_code            end_date  \\\n",
       "0 2019-04-14 07:55:22                6001 2019-04-14 08:07:16   \n",
       "1 2019-04-14 07:59:31                6411 2019-04-14 08:09:18   \n",
       "2 2019-04-14 07:59:55                6097 2019-04-14 08:12:11   \n",
       "3 2019-04-14 07:59:57                6310 2019-04-14 08:27:58   \n",
       "4 2019-04-14 08:00:37                7029 2019-04-14 08:14:12   \n",
       "\n",
       "   end_station_code  duration_sec  is_member  \n",
       "0              6132           713          1  \n",
       "1              6411           587          1  \n",
       "2              6036           736          1  \n",
       "3              6345          1680          1  \n",
       "4              6250           814          0  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Innym ważnym źródłem informacji są bazy danych. Pandas potrafi komunikować się z bazą danych za pomocą biblioteki [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) i dostarcza odpowiedną funkcję:\n",
    " * `pandas.read_sql` - wczytanie całej tabeli lub zapytania do bazy danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>ArtistId</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AlbumId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>For Those About To Rock We Salute You</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balls to the Wall</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Restless and Wild</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Let There Be Rock</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Big Ones</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>Respighi:Pines of Rome</td>\n",
       "      <td>226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>344</th>\n",
       "      <td>Schubert: The Late String Quartets &amp; String Qu...</td>\n",
       "      <td>272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345</th>\n",
       "      <td>Monteverdi: L'Orfeo</td>\n",
       "      <td>273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346</th>\n",
       "      <td>Mozart: Chamber Music</td>\n",
       "      <td>274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347</th>\n",
       "      <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>347 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     Title  ArtistId\n",
       "AlbumId                                                             \n",
       "1                    For Those About To Rock We Salute You         1\n",
       "2                                        Balls to the Wall         2\n",
       "3                                        Restless and Wild         2\n",
       "4                                        Let There Be Rock         1\n",
       "5                                                 Big Ones         3\n",
       "...                                                    ...       ...\n",
       "343                                 Respighi:Pines of Rome       226\n",
       "344      Schubert: The Late String Quartets & String Qu...       272\n",
       "345                                    Monteverdi: L'Orfeo       273\n",
       "346                                  Mozart: Chamber Music       274\n",
       "347      Koyaanisqatsi (Soundtrack from the Motion Pict...       275\n",
       "\n",
       "[347 rows x 2 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_sql('Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2023-11-17 15:53:28,542 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1\n",
      "2023-11-17 15:53:28,543 INFO sqlalchemy.engine.base.Engine ()\n",
      "2023-11-17 15:53:28,544 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1\n",
      "2023-11-17 15:53:28,545 INFO sqlalchemy.engine.base.Engine ()\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>ArtistId</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AlbumId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>For Those About To Rock We Salute You</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balls to the Wall</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Restless and Wild</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Let There Be Rock</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Big Ones</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>Respighi:Pines of Rome</td>\n",
       "      <td>226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>344</th>\n",
       "      <td>Schubert: The Late String Quartets &amp; String Qu...</td>\n",
       "      <td>272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345</th>\n",
       "      <td>Monteverdi: L'Orfeo</td>\n",
       "      <td>273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346</th>\n",
       "      <td>Mozart: Chamber Music</td>\n",
       "      <td>274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347</th>\n",
       "      <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>347 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     Title  ArtistId\n",
       "AlbumId                                                             \n",
       "1                    For Those About To Rock We Salute You         1\n",
       "2                                        Balls to the Wall         2\n",
       "3                                        Restless and Wild         2\n",
       "4                                        Let There Be Rock         1\n",
       "5                                                 Big Ones         3\n",
       "...                                                    ...       ...\n",
       "343                                 Respighi:Pines of Rome       226\n",
       "344      Schubert: The Late String Quartets & String Qu...       272\n",
       "345                                    Monteverdi: L'Orfeo       273\n",
       "346                                  Mozart: Chamber Music       274\n",
       "347      Koyaanisqatsi (Soundtrack from the Motion Pict...       275\n",
       "\n",
       "[347 rows x 2 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import sqlalchemy\n",
    "\n",
    "engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
    "connection  = engine.raw_connection()\n",
    "\n",
    "df = pd.read_sql('SELECT * FROM Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Podsumowanie\n",
    "\n",
    "\n",
    " * Biblioteka `pandas` wspiera pobieranie danych z różnych formatów i źródeł.\n",
    " * Każda funkcja ma listę argumentów, które pozwalają na ustawić poszczególne  parametry (np. [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zapis i eksport danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas pozwala w prosty sposób na zapisywanie ramki danych do pliku. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# zapis do formatu CSV\n",
    "df.to_csv('tmp.csv')\n",
    "# zapis do arkusza kalkulacyjnego \n",
    "df.to_excel('tmp.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Ponadto możemy przekonwertować ramkę danych do JSONa lub Pythonowego słownika:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"members\":{\"May\":682758,\"June\":737011,\"July\":779511},\"occasionals\":{\"May\":147898,\"June\":171494,\"July\":194316}}\n"
     ]
    }
   ],
   "source": [
    "print(df.to_json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'members': {'May': 682758, 'June': 737011, 'July': 779511}, 'occasionals': {'May': 147898, 'June': 171494, 'July': 194316}}\n"
     ]
    }
   ],
   "source": [
    "print(df.to_dict())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Lub przekopiować dane do schowka:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.to_clipboard()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie\n",
    "\n",
    "\n",
    "\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " * Przekonwertuj tabele `Customer` z bazy `Chinook.sqlite` do arkusza kalkulacyjnego. Plik wynikowy nazwij `customers.xlsx`.\n",
    " * Tabela `Employee` zawiera informacje o pracownikach firmy Chinook. Wyswietl dane na ekranie i podaj miasta, w których mieszkają pracownicy.\n",
    " * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Ramka danych - podstawy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "#### Kolumny\n",
    "\n",
    "Na ramkę danych możemy patrzeć jak na swego rodzaju słownik, którego wartościami są szeregi. Pozwoli to na uzyskanie lepszej intuicji.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>life_expectancy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>52.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "      <td>72.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "      <td>81.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         gdp  population  life_expectancy\n",
       "Country                                                  \n",
       "Afghanistan           1311.0  26528741.0             52.8\n",
       "Albania               8644.0   2968026.0             76.8\n",
       "Algeria              12314.0  34811059.0             75.5\n",
       "Angola                7103.0  19842251.0             56.7\n",
       "Antigua and Barbuda  25736.0     85350.0             75.5\n",
       "Argentina            14646.0  40381860.0             75.4\n",
       "Armenia               7383.0   2975029.0             72.3\n",
       "Australia            41312.0  21370348.0             81.6"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=8, usecols=['Country', 'gdp', 'population','life_expectancy'])\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Dostęp do poszczególnej kolumny możemy uzystać na dwa sposoby:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan            26528741.0\n",
       "Albania                 2968026.0\n",
       "Algeria                34811059.0\n",
       "Angola                 19842251.0\n",
       "Antigua and Barbuda       85350.0\n",
       "Argentina              40381860.0\n",
       "Armenia                 2975029.0\n",
       "Australia              21370348.0\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notacja z kropką\n",
    "df.population"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan            26528741.0\n",
       "Albania                 2968026.0\n",
       "Algeria                34811059.0\n",
       "Angola                 19842251.0\n",
       "Antigua and Barbuda       85350.0\n",
       "Argentina              40381860.0\n",
       "Armenia                 2975029.0\n",
       "Australia              21370348.0\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Operator []\n",
    "df['population']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Do operatora `[]` możemy też podać listę nazw kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         gdp  population\n",
       "Country                                 \n",
       "Afghanistan           1311.0  26528741.0\n",
       "Albania               8644.0   2968026.0\n",
       "Algeria              12314.0  34811059.0\n",
       "Angola                7103.0  19842251.0\n",
       "Antigua and Barbuda  25736.0     85350.0\n",
       "Argentina            14646.0  40381860.0\n",
       "Armenia               7383.0   2975029.0\n",
       "Australia            41312.0  21370348.0"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['gdp','population']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Listę kolumn możemy pobrać za pomocą:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['gdp', 'population', 'life_expectancy'], dtype='object')"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>52.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "      <td>72.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "      <td>81.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         PKB   Populacja   ODŻ\n",
       "Country                                       \n",
       "Afghanistan           1311.0  26528741.0  52.8\n",
       "Albania               8644.0   2968026.0  76.8\n",
       "Algeria              12314.0  34811059.0  75.5\n",
       "Angola                7103.0  19842251.0  56.7\n",
       "Antigua and Barbuda  25736.0     85350.0  75.5\n",
       "Argentina            14646.0  40381860.0  75.4\n",
       "Armenia               7383.0   2975029.0  72.3\n",
       "Australia            41312.0  21370348.0  81.6"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = ['PKB', 'Populacja', 'ODŻ']\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Żeby odwołać się do poszczególnych wierszy należy wykorzystać metodę `loc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PKB             14646.0\n",
       "Populacja    40381860.0\n",
       "ODŻ                75.4\n",
       "Name: Argentina, dtype: float64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Argentina']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Metoda `loc` również może przyjąć listę wierszy: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            PKB   Populacja   ODŻ\n",
       "Country                          \n",
       "Albania  8644.0   2968026.0  76.8\n",
       "Angola   7103.0  19842251.0  56.7"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[['Albania', 'Angola']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy również podać drugi parametr: nazwy kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            PKB   Populacja\n",
       "Country                    \n",
       "Albania  8644.0   2968026.0\n",
       "Angola   7103.0  19842251.0"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df.loc[['Albania', 'Angola'], ['PKB', 'Populacja']]\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Albo wykorzystać tzw. _slicing_, cyzli operator `:`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             PKB   Populacja   ODŻ\n",
       "Country                           \n",
       "Albania   8644.0   2968026.0  76.8\n",
       "Algeria  12314.0  34811059.0  75.5\n",
       "Angola    7103.0  19842251.0  56.7"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Albania': 'Angola', 'PKB': 'ODŻ']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Żeby odwołać się do pojedyńczej wartości możemy użyć metody `at`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7103.0"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.at['Angola', 'PKB']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Dostęp do indeksu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda',\n",
       "       'Argentina', 'Armenia', 'Australia'],\n",
       "      dtype='object', name='Country')"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Podstawowe metody `pd.Series` i `pd.DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "May         682758       147898\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
    "'September': 140492, 'October': 53596})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `head` pozwala tworzy nową ramkę danych z pierwszymi 5 przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "May         682758       147898\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `tail` robi to samo, ale z 5 ostatnymi przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `sample` pozwala na stworzenie nowej ramki danych z wylosowanymi `n` przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         members  occasionals\n",
       "October   444177        53596\n",
       "June      737011       171494\n",
       "May       682758       147898"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sample(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `describe` zwraca podstawowe statystyki m.in.: liczebność, średnią, wartości skrajne: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>6.000000</td>\n",
       "      <td>6.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>665172.833333</td>\n",
       "      <td>152434.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>116216.045456</td>\n",
       "      <td>54783.506738</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>444177.000000</td>\n",
       "      <td>53596.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>673790.000000</td>\n",
       "      <td>142343.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>678274.000000</td>\n",
       "      <td>159696.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>723447.750000</td>\n",
       "      <td>188610.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>779511.000000</td>\n",
       "      <td>206809.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             members    occasionals\n",
       "count       6.000000       6.000000\n",
       "mean   665172.833333  152434.166667\n",
       "std    116216.045456   54783.506738\n",
       "min    444177.000000   53596.000000\n",
       "25%    673790.000000  142343.500000\n",
       "50%    678274.000000  159696.000000\n",
       "75%    723447.750000  188610.500000\n",
       "max    779511.000000  206809.000000"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `info` zwraca informacje techniczne o kolumnach: np. typ danych:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 6 entries, May to October\n",
      "Data columns (total 2 columns):\n",
      " #   Column       Non-Null Count  Dtype\n",
      "---  ------       --------------  -----\n",
      " 0   members      6 non-null      int64\n",
      " 1   occasionals  6 non-null      int64\n",
      "dtypes: int64(2)\n",
      "memory usage: 144.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Podstawową informacją o ramce danych to liczba przykładów w ramce danych. Możemy wykorzystać to tego funkcję `len`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Natomiast atrybut `shape` zwraca nam krotkę z liczbą przykładów i liczbą kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6, 2)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operacja arytmetyczne\n",
    "\n",
    " * `max`, `idxmax`\n",
    " * `min`, `idxmin`\n",
    " * `mean`\n",
    " * `count`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "members        665172.833333\n",
       "occasionals    152434.166667\n",
       "dtype: float64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Zbiór wartości i zliczanie wartości:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 3 2]\n",
      "3    4\n",
      "1    3\n",
      "2    3\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
    "\n",
    "print(dane.unique())\n",
    "\n",
    "dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
    "\n",
    "print(dane.value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Sprawdzanie czy brakuje danych:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "5      False\n",
       "       ...  \n",
       "887    False\n",
       "888    False\n",
       "889     True\n",
       "890    False\n",
       "891    False\n",
       "Name: Age, Length: 891, dtype: bool"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "df.Age.isnull()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Dodawanie i modyfikowanie danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     female_BMI  male_BMI      gdp  population  \\\n",
       "Country                                                          \n",
       "Afghanistan            21.07402  20.62058   1311.0  26528741.0   \n",
       "Albania                25.65726  26.44657   8644.0   2968026.0   \n",
       "Algeria                26.36841  24.59620  12314.0  34811059.0   \n",
       "Angola                 23.48431  22.25083   7103.0  19842251.0   \n",
       "Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "\n",
       "                     under5mortality  life_expectancy  fertility  \n",
       "Country                                                           \n",
       "Afghanistan                    110.4             52.8       6.20  \n",
       "Albania                         17.9             76.8       1.76  \n",
       "Algeria                         29.5             75.5       2.73  \n",
       "Angola                         192.0             56.7       6.43  \n",
       "Antigua and Barbuda             10.9             75.5       2.16  "
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "      <th>continent</th>\n",
       "      <th>tmp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "      <td>Asia</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "      <td>Europe</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "      <td>Americas</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     female_BMI  male_BMI      gdp  population  \\\n",
       "Country                                                          \n",
       "Afghanistan            21.07402  20.62058   1311.0  26528741.0   \n",
       "Albania                25.65726  26.44657   8644.0   2968026.0   \n",
       "Algeria                26.36841  24.59620  12314.0  34811059.0   \n",
       "Angola                 23.48431  22.25083   7103.0  19842251.0   \n",
       "Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "\n",
       "                     under5mortality  life_expectancy  fertility continent  \\\n",
       "Country                                                                      \n",
       "Afghanistan                    110.4             52.8       6.20      Asia   \n",
       "Albania                         17.9             76.8       1.76    Europe   \n",
       "Algeria                         29.5             75.5       2.73    Africa   \n",
       "Angola                         192.0             56.7       6.43    Africa   \n",
       "Antigua and Barbuda             10.9             75.5       2.16  Americas   \n",
       "\n",
       "                     tmp  \n",
       "Country                   \n",
       "Afghanistan            1  \n",
       "Albania                1  \n",
       "Algeria                1  \n",
       "Angola                 1  \n",
       "Antigua and Barbuda    1  "
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "conts = pd.Series({\n",
    "    'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n",
    "\n",
    "df['continent'] = conts\n",
    "\n",
    "df['tmp'] = 1\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "      <th>continent</th>\n",
       "      <th>tmp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "      <td>Asia</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "      <td>Europe</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "      <td>Americas</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>27.46523</td>\n",
       "      <td>27.50170</td>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>15.4</td>\n",
       "      <td>75.4</td>\n",
       "      <td>2.24</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     female_BMI  male_BMI      gdp  population  \\\n",
       "Country                                                          \n",
       "Afghanistan            21.07402  20.62058   1311.0  26528741.0   \n",
       "Albania                25.65726  26.44657   8644.0   2968026.0   \n",
       "Algeria                26.36841  24.59620  12314.0  34811059.0   \n",
       "Angola                 23.48431  22.25083   7103.0  19842251.0   \n",
       "Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "Argentina              27.46523  27.50170  14646.0  40381860.0   \n",
       "\n",
       "                     under5mortality  life_expectancy  fertility continent  \\\n",
       "Country                                                                      \n",
       "Afghanistan                    110.4             52.8       6.20      Asia   \n",
       "Albania                         17.9             76.8       1.76    Europe   \n",
       "Algeria                         29.5             75.5       2.73    Africa   \n",
       "Angola                         192.0             56.7       6.43    Africa   \n",
       "Antigua and Barbuda             10.9             75.5       2.16  Americas   \n",
       "Argentina                       15.4             75.4       2.24       NaN   \n",
       "\n",
       "                     tmp  \n",
       "Country                   \n",
       "Afghanistan          1.0  \n",
       "Albania              1.0  \n",
       "Algeria              1.0  \n",
       "Angola               1.0  \n",
       "Antigua and Barbuda  1.0  \n",
       "Argentina            NaN  "
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Argentina'] = {\n",
    "    'female_BMI': 27.46523,\n",
    "    'male_BMI': 27.5017,\n",
    "    'gdp': 14646.0,\n",
    "    'population': 40381860.0,\n",
    "    'under5mortality': 15.4,\n",
    "    'life_expectancy': 75.4,\n",
    "    'fertility': 2.24\n",
    "}\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "      <th>continent</th>\n",
       "      <th>tmp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "      <td>Asia</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "      <td>Europe</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "      <td>Africa</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "      <td>Americas</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>27.46523</td>\n",
       "      <td>27.50170</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>15.4</td>\n",
       "      <td>75.4</td>\n",
       "      <td>2.24</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     female_BMI  male_BMI  population  under5mortality  \\\n",
       "Country                                                                  \n",
       "Afghanistan            21.07402  20.62058  26528741.0            110.4   \n",
       "Albania                25.65726  26.44657   2968026.0             17.9   \n",
       "Algeria                26.36841  24.59620  34811059.0             29.5   \n",
       "Angola                 23.48431  22.25083  19842251.0            192.0   \n",
       "Antigua and Barbuda    27.50545  25.76602     85350.0             10.9   \n",
       "Argentina              27.46523  27.50170  40381860.0             15.4   \n",
       "\n",
       "                     life_expectancy  fertility continent  tmp  \n",
       "Country                                                         \n",
       "Afghanistan                     52.8       6.20      Asia  1.0  \n",
       "Albania                         76.8       1.76    Europe  1.0  \n",
       "Algeria                         75.5       2.73    Africa  1.0  \n",
       "Angola                          56.7       6.43    Africa  1.0  \n",
       "Antigua and Barbuda             75.5       2.16  Americas  1.0  \n",
       "Argentina                       75.4       2.24       NaN  NaN  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop('gdp', axis='columns')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Filtrowanie danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Biblioteka pandas posiada 2 sposoby na filtrowanie danych zawartych w ramce danych:\n",
    " * operator `[]` -- najbardziej rozpowszechniony;\n",
    " * metoda `query()`.\n",
    "Oba sposoby mają różną składnię.\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen\\t Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen\\t Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "3                                      Heikkinen\\t Miss. Laina  female  26.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "5                                    Allen\\t Mr. William Henry    male  35.0   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      0\n",
       "2      1\n",
       "3      1\n",
       "4      1\n",
       "5      0\n",
       "      ..\n",
       "887    0\n",
       "888    1\n",
       "889    0\n",
       "890    1\n",
       "891    0\n",
       "Name: Survived, Length: 891, dtype: int64"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Survived']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      False\n",
       "2       True\n",
       "3       True\n",
       "4       True\n",
       "5      False\n",
       "       ...  \n",
       "887    False\n",
       "888     True\n",
       "889    False\n",
       "890     True\n",
       "891    False\n",
       "Name: Survived, Length: 891, dtype: bool"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Survived'] == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>McCarthy\\t Mr. Timothy J</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17463</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>E46</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Sloper\\t Mr. William Thompson</td>\n",
       "      <td>male</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113788</td>\n",
       "      <td>35.5000</td>\n",
       "      <td>A6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>872</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>11751</td>\n",
       "      <td>52.5542</td>\n",
       "      <td>D35</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>873</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Carlsson\\t Mr. Frans Olof</td>\n",
       "      <td>male</td>\n",
       "      <td>33.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>695</td>\n",
       "      <td>5.0000</td>\n",
       "      <td>B51 B53 B55</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>880</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
       "      <td>female</td>\n",
       "      <td>56.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11767</td>\n",
       "      <td>83.1583</td>\n",
       "      <td>C50</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Graham\\t Miss. Margaret Edith</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>B42</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Behr\\t Mr. Karl Howell</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>111369</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>C148</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>216 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "7                   0       1   \n",
       "12                  1       1   \n",
       "24                  1       1   \n",
       "...               ...     ...   \n",
       "872                 1       1   \n",
       "873                 0       1   \n",
       "880                 1       1   \n",
       "888                 1       1   \n",
       "890                 1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "7                                     McCarthy\\t Mr. Timothy J    male  54.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "24                               Sloper\\t Mr. William Thompson    male  28.0   \n",
       "...                                                        ...     ...   ...   \n",
       "872          Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)  female  47.0   \n",
       "873                                  Carlsson\\t Mr. Frans Olof    male  33.0   \n",
       "880             Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)  female  56.0   \n",
       "888                              Graham\\t Miss. Margaret Edith  female  19.0   \n",
       "890                                     Behr\\t Mr. Karl Howell    male  26.0   \n",
       "\n",
       "             SibSp  Parch    Ticket     Fare        Cabin Embarked  \n",
       "PassengerId                                                         \n",
       "2                1      0  PC 17599  71.2833          C85        C  \n",
       "4                1      0    113803  53.1000         C123        S  \n",
       "7                0      0     17463  51.8625          E46        S  \n",
       "12               0      0    113783  26.5500         C103        S  \n",
       "24               0      0    113788  35.5000           A6        S  \n",
       "...            ...    ...       ...      ...          ...      ...  \n",
       "872              1      1     11751  52.5542          D35        S  \n",
       "873              0      0       695   5.0000  B51 B53 B55        S  \n",
       "880              0      1     11767  83.1583          C50        C  \n",
       "888              0      0    112053  30.0000          B42        S  \n",
       "890              0      0    111369  30.0000         C148        C  \n",
       "\n",
       "[216 rows x 11 columns]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Pclass'] == 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operatory\n",
    "\n",
    "* `&` - koniukcja (i)\n",
    "* `|` - alternatywa (lub)\n",
    "* `~` - negacja (nie)\n",
    "* `()` - jeżeli mamy kilka warunków to warto je uporządkować w nawiasy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17569</td>\n",
       "      <td>146.5208</td>\n",
       "      <td>B78</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
       "      <td>female</td>\n",
       "      <td>49.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17572</td>\n",
       "      <td>76.7292</td>\n",
       "      <td>D33</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>857</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Wick\\t Mrs. George Dennick (Mary Hitchcock)</td>\n",
       "      <td>female</td>\n",
       "      <td>45.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>36928</td>\n",
       "      <td>164.8667</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>863</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Swift\\t Mrs. Frederick Joel (Margaret Welles B...</td>\n",
       "      <td>female</td>\n",
       "      <td>48.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17466</td>\n",
       "      <td>25.9292</td>\n",
       "      <td>D17</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>872</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>11751</td>\n",
       "      <td>52.5542</td>\n",
       "      <td>D35</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>880</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
       "      <td>female</td>\n",
       "      <td>56.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11767</td>\n",
       "      <td>83.1583</td>\n",
       "      <td>C50</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Graham\\t Miss. Margaret Edith</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>B42</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>94 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "12                  1       1   \n",
       "32                  1       1   \n",
       "53                  1       1   \n",
       "...               ...     ...   \n",
       "857                 1       1   \n",
       "863                 1       1   \n",
       "872                 1       1   \n",
       "880                 1       1   \n",
       "888                 1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "32             Spencer\\t Mrs. William Augustus (Marie Eugenie)  female   NaN   \n",
       "53                   Harper\\t Mrs. Henry Sleeper (Myna Haxtun)  female  49.0   \n",
       "...                                                        ...     ...   ...   \n",
       "857                Wick\\t Mrs. George Dennick (Mary Hitchcock)  female  45.0   \n",
       "863          Swift\\t Mrs. Frederick Joel (Margaret Welles B...  female  48.0   \n",
       "872          Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)  female  47.0   \n",
       "880             Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)  female  56.0   \n",
       "888                              Graham\\t Miss. Margaret Edith  female  19.0   \n",
       "\n",
       "             SibSp  Parch    Ticket      Fare Cabin Embarked  \n",
       "PassengerId                                                   \n",
       "2                1      0  PC 17599   71.2833   C85        C  \n",
       "4                1      0    113803   53.1000  C123        S  \n",
       "12               0      0    113783   26.5500  C103        S  \n",
       "32               1      0  PC 17569  146.5208   B78        C  \n",
       "53               1      0  PC 17572   76.7292   D33        C  \n",
       "...            ...    ...       ...       ...   ...      ...  \n",
       "857              1      1     36928  164.8667   NaN        S  \n",
       "863              0      0     17466   25.9292   D17        S  \n",
       "872              1      1     11751   52.5542   D35        S  \n",
       "880              0      1     11767   83.1583   C50        C  \n",
       "888              0      0    112053   30.0000   B42        S  \n",
       "\n",
       "[94 rows x 11 columns]"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pierwsza_klasa = df['Pclass'] == 1\n",
    "kobiety = df['Sex'] == 'female'\n",
    "\n",
    "df[pierwsza_klasa & kobiety]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Palsson\\t Master. Gosta Leonard</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>349909</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>237736</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>861</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Hansen\\t Mr. Claus Peter</td>\n",
       "      <td>male</td>\n",
       "      <td>41.0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>350026</td>\n",
       "      <td>14.1083</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>862</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Giles\\t Mr. Frederick Edward</td>\n",
       "      <td>male</td>\n",
       "      <td>21.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>28134</td>\n",
       "      <td>11.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>69.5500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Duran y More\\t Miss. Asuncion</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>SC/PARIS 2149</td>\n",
       "      <td>13.8583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>875</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
       "      <td>female</td>\n",
       "      <td>28.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>P/PP 3381</td>\n",
       "      <td>24.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>192 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "8                   0       3   \n",
       "10                  1       2   \n",
       "...               ...     ...   \n",
       "861                 0       3   \n",
       "862                 0       2   \n",
       "864                 0       3   \n",
       "867                 1       2   \n",
       "875                 1       2   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "8                              Palsson\\t Master. Gosta Leonard    male   2.0   \n",
       "10                        Nasser\\t Mrs. Nicholas (Adele Achem)  female  14.0   \n",
       "...                                                        ...     ...   ...   \n",
       "861                                   Hansen\\t Mr. Claus Peter    male  41.0   \n",
       "862                               Giles\\t Mr. Frederick Edward    male  21.0   \n",
       "864                         Sage\\t Miss. Dorothy Edith \"Dolly\"  female   NaN   \n",
       "867                              Duran y More\\t Miss. Asuncion  female  27.0   \n",
       "875                     Abelson\\t Mrs. Samuel (Hannah Wizosky)  female  28.0   \n",
       "\n",
       "             SibSp  Parch         Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                       \n",
       "1                1      0      A/5 21171   7.2500   NaN        S  \n",
       "2                1      0       PC 17599  71.2833   C85        C  \n",
       "4                1      0         113803  53.1000  C123        S  \n",
       "8                3      1         349909  21.0750   NaN        S  \n",
       "10               1      0         237736  30.0708   NaN        C  \n",
       "...            ...    ...            ...      ...   ...      ...  \n",
       "861              2      0         350026  14.1083   NaN        S  \n",
       "862              1      0          28134  11.5000   NaN        S  \n",
       "864              8      2       CA. 2343  69.5500   NaN        S  \n",
       "867              1      0  SC/PARIS 2149  13.8583   NaN        C  \n",
       "875              1      0      P/PP 3381  24.0000   NaN        C  \n",
       "\n",
       "[192 rows x 11 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "df[df['SibSp'] > df['Parch']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### `pd.DataFrame.query`\n",
    "\n",
    "Innym sposobem na filtrowanie danych jest metoda `query`, która jako argument przyjmuje wyrażenie:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>McCarthy\\t Mr. Timothy J</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17463</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>E46</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Sloper\\t Mr. William Thompson</td>\n",
       "      <td>male</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113788</td>\n",
       "      <td>35.5000</td>\n",
       "      <td>A6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "7                   0       1   \n",
       "12                  1       1   \n",
       "24                  1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "7                                     McCarthy\\t Mr. Timothy J    male  54.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "24                               Sloper\\t Mr. William Thompson    male  28.0   \n",
       "\n",
       "             SibSp  Parch    Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                  \n",
       "2                1      0  PC 17599  71.2833   C85        C  \n",
       "4                1      0    113803  53.1000  C123        S  \n",
       "7                0      0     17463  51.8625   E46        S  \n",
       "12               0      0    113783  26.5500  C103        S  \n",
       "24               0      0    113788  35.5000    A6        S  "
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('Pclass == 1').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17569</td>\n",
       "      <td>146.5208</td>\n",
       "      <td>B78</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
       "      <td>female</td>\n",
       "      <td>49.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17572</td>\n",
       "      <td>76.7292</td>\n",
       "      <td>D33</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "12                  1       1   \n",
       "32                  1       1   \n",
       "53                  1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "32             Spencer\\t Mrs. William Augustus (Marie Eugenie)  female   NaN   \n",
       "53                   Harper\\t Mrs. Henry Sleeper (Myna Haxtun)  female  49.0   \n",
       "\n",
       "             SibSp  Parch    Ticket      Fare Cabin Embarked  \n",
       "PassengerId                                                   \n",
       "2                1      0  PC 17599   71.2833   C85        C  \n",
       "4                1      0    113803   53.1000  C123        S  \n",
       "12               0      0    113783   26.5500  C103        S  \n",
       "32               1      0  PC 17569  146.5208   B78        C  \n",
       "53               1      0  PC 17572   76.7292   D33        C  "
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('(Pclass == 1) and (Sex == \"female\")').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Palsson\\t Master. Gosta Leonard</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>349909</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>237736</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>861</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Hansen\\t Mr. Claus Peter</td>\n",
       "      <td>male</td>\n",
       "      <td>41.0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>350026</td>\n",
       "      <td>14.1083</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>862</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Giles\\t Mr. Frederick Edward</td>\n",
       "      <td>male</td>\n",
       "      <td>21.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>28134</td>\n",
       "      <td>11.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>69.5500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Duran y More\\t Miss. Asuncion</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>SC/PARIS 2149</td>\n",
       "      <td>13.8583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>875</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
       "      <td>female</td>\n",
       "      <td>28.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>P/PP 3381</td>\n",
       "      <td>24.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>192 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "8                   0       3   \n",
       "10                  1       2   \n",
       "...               ...     ...   \n",
       "861                 0       3   \n",
       "862                 0       2   \n",
       "864                 0       3   \n",
       "867                 1       2   \n",
       "875                 1       2   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "8                              Palsson\\t Master. Gosta Leonard    male   2.0   \n",
       "10                        Nasser\\t Mrs. Nicholas (Adele Achem)  female  14.0   \n",
       "...                                                        ...     ...   ...   \n",
       "861                                   Hansen\\t Mr. Claus Peter    male  41.0   \n",
       "862                               Giles\\t Mr. Frederick Edward    male  21.0   \n",
       "864                         Sage\\t Miss. Dorothy Edith \"Dolly\"  female   NaN   \n",
       "867                              Duran y More\\t Miss. Asuncion  female  27.0   \n",
       "875                     Abelson\\t Mrs. Samuel (Hannah Wizosky)  female  28.0   \n",
       "\n",
       "             SibSp  Parch         Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                       \n",
       "1                1      0      A/5 21171   7.2500   NaN        S  \n",
       "2                1      0       PC 17599  71.2833   C85        C  \n",
       "4                1      0         113803  53.1000  C123        S  \n",
       "8                3      1         349909  21.0750   NaN        S  \n",
       "10               1      0         237736  30.0708   NaN        C  \n",
       "...            ...    ...            ...      ...   ...      ...  \n",
       "861              2      0         350026  14.1083   NaN        S  \n",
       "862              1      0          28134  11.5000   NaN        S  \n",
       "864              8      2       CA. 2343  69.5500   NaN        S  \n",
       "867              1      0  SC/PARIS 2149  13.8583   NaN        C  \n",
       "875              1      0      P/PP 3381  24.0000   NaN        C  \n",
       "\n",
       "[192 rows x 11 columns]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('SibSp > Parch')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(113, 11)"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "young = 18\n",
    "df.query('Age < @young').shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operacje na wierszach i kolumnach"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     female_BMI  male_BMI      gdp  population  \\\n",
       "Country                                                          \n",
       "Afghanistan            21.07402  20.62058   1311.0  26528741.0   \n",
       "Albania                25.65726  26.44657   8644.0   2968026.0   \n",
       "Algeria                26.36841  24.59620  12314.0  34811059.0   \n",
       "Angola                 23.48431  22.25083   7103.0  19842251.0   \n",
       "Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "\n",
       "                     under5mortality  life_expectancy  fertility  \n",
       "Country                                                           \n",
       "Afghanistan                    110.4             52.8       6.20  \n",
       "Albania                         17.9             76.8       1.76  \n",
       "Algeria                         29.5             75.5       2.73  \n",
       "Angola                         192.0             56.7       6.43  \n",
       "Antigua and Barbuda             10.9             75.5       2.16  "
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Iterowanie po ramce danych oznacza oznacza przejście po nazwach kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "female_BMI\n",
      "male_BMI\n",
      "gdp\n",
      "population\n",
      "under5mortality\n",
      "life_expectancy\n",
      "fertility\n"
     ]
    }
   ],
   "source": [
    "for column_name in df:\n",
    "    print(column_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "female_BMI Country\n",
      "Afghanistan            21.07402\n",
      "Albania                25.65726\n",
      "Algeria                26.36841\n",
      "Angola                 23.48431\n",
      "Antigua and Barbuda    27.50545\n",
      "Name: female_BMI, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "for col_name, series in df.items():\n",
    "    print(col_name, series)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Afghanistan \n",
      " female_BMI         2.107402e+01\n",
      "male_BMI           2.062058e+01\n",
      "gdp                1.311000e+03\n",
      "population         2.652874e+07\n",
      "under5mortality    1.104000e+02\n",
      "life_expectancy    5.280000e+01\n",
      "fertility          6.200000e+00\n",
      "Name: Afghanistan, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "for idx, row in df.iterrows():\n",
    "    print(idx, '\\n', row)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan                normal\n",
       "Albania                overweight\n",
       "Algeria                    normal\n",
       "Angola                     normal\n",
       "Antigua and Barbuda    overweight\n",
       "Name: male_BMI, dtype: object"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def bmi_level(bmi):\n",
    "    if bmi <= 18.5:\n",
    "        level =  'underweight'\n",
    "    elif bmi < 25:\n",
    "        level =  'normal'\n",
    "    elif bmi < 30:\n",
    "        level =  'overweight'\n",
    "    else:\n",
    "        level = 'obese'\n",
    "    return level\n",
    "\n",
    "s = df['male_BMI'].map(bmi_level)\n",
    "    \n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan                normal\n",
       "Albania                overweight\n",
       "Algeria                    normal\n",
       "Angola                     normal\n",
       "Antigua and Barbuda    overweight\n",
       "dtype: object"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def bmi_level(row_data):\n",
    "    bmi = row_data['male_BMI']\n",
    "    if bmi <= 18.5:\n",
    "        return 'underweight'\n",
    "    elif bmi < 25:\n",
    "        return 'normal'\n",
    "    elif bmi < 30:\n",
    "        return 'overweight'\n",
    "    return  'obese'\n",
    "\n",
    "df.apply(bmi_level, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Country</th>\n",
       "      <th>Afghanistan</th>\n",
       "      <th>Albania</th>\n",
       "      <th>Algeria</th>\n",
       "      <th>Angola</th>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female_BMI</th>\n",
       "      <td>2.107402e+01</td>\n",
       "      <td>2.565726e+01</td>\n",
       "      <td>2.636841e+01</td>\n",
       "      <td>2.348431e+01</td>\n",
       "      <td>27.50545</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male_BMI</th>\n",
       "      <td>2.062058e+01</td>\n",
       "      <td>2.644657e+01</td>\n",
       "      <td>2.459620e+01</td>\n",
       "      <td>2.225083e+01</td>\n",
       "      <td>25.76602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gdp</th>\n",
       "      <td>1.311000e+03</td>\n",
       "      <td>8.644000e+03</td>\n",
       "      <td>1.231400e+04</td>\n",
       "      <td>7.103000e+03</td>\n",
       "      <td>25736.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>population</th>\n",
       "      <td>2.652874e+07</td>\n",
       "      <td>2.968026e+06</td>\n",
       "      <td>3.481106e+07</td>\n",
       "      <td>1.984225e+07</td>\n",
       "      <td>85350.00000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>under5mortality</th>\n",
       "      <td>1.104000e+02</td>\n",
       "      <td>1.790000e+01</td>\n",
       "      <td>2.950000e+01</td>\n",
       "      <td>1.920000e+02</td>\n",
       "      <td>10.90000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>life_expectancy</th>\n",
       "      <td>5.280000e+01</td>\n",
       "      <td>7.680000e+01</td>\n",
       "      <td>7.550000e+01</td>\n",
       "      <td>5.670000e+01</td>\n",
       "      <td>75.50000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fertility</th>\n",
       "      <td>6.200000e+00</td>\n",
       "      <td>1.760000e+00</td>\n",
       "      <td>2.730000e+00</td>\n",
       "      <td>6.430000e+00</td>\n",
       "      <td>2.16000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Country           Afghanistan       Albania       Algeria        Angola  \\\n",
       "female_BMI       2.107402e+01  2.565726e+01  2.636841e+01  2.348431e+01   \n",
       "male_BMI         2.062058e+01  2.644657e+01  2.459620e+01  2.225083e+01   \n",
       "gdp              1.311000e+03  8.644000e+03  1.231400e+04  7.103000e+03   \n",
       "population       2.652874e+07  2.968026e+06  3.481106e+07  1.984225e+07   \n",
       "under5mortality  1.104000e+02  1.790000e+01  2.950000e+01  1.920000e+02   \n",
       "life_expectancy  5.280000e+01  7.680000e+01  7.550000e+01  5.670000e+01   \n",
       "fertility        6.200000e+00  1.760000e+00  2.730000e+00  6.430000e+00   \n",
       "\n",
       "Country          Antigua and Barbuda  \n",
       "female_BMI                  27.50545  \n",
       "male_BMI                    25.76602  \n",
       "gdp                      25736.00000  \n",
       "population               85350.00000  \n",
       "under5mortality             10.90000  \n",
       "life_expectancy             75.50000  \n",
       "fertility                    2.16000  "
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grupowanie (`groupby`)\n",
    "\n",
    "Często zdarza się, gdy potrzebujemy podzielić dane ze względu na wartości w zadanej kolumnie, a następnie obliczenie zebranie danych w każdej z grup. Do tego służy metody `groupby`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Team</th>\n",
       "      <th>Number</th>\n",
       "      <th>Position</th>\n",
       "      <th>Age</th>\n",
       "      <th>Height</th>\n",
       "      <th>Weight</th>\n",
       "      <th>College</th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>401</th>\n",
       "      <td>Tyus Jones</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "      <td>1.0</td>\n",
       "      <td>PG</td>\n",
       "      <td>20.0</td>\n",
       "      <td>6-2</td>\n",
       "      <td>195.0</td>\n",
       "      <td>Duke</td>\n",
       "      <td>1282080.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>342</th>\n",
       "      <td>Gerald Green</td>\n",
       "      <td>Miami Heat</td>\n",
       "      <td>14.0</td>\n",
       "      <td>SF</td>\n",
       "      <td>30.0</td>\n",
       "      <td>6-7</td>\n",
       "      <td>205.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>947276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>DeMarcus Cousins</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "      <td>15.0</td>\n",
       "      <td>C</td>\n",
       "      <td>25.0</td>\n",
       "      <td>6-11</td>\n",
       "      <td>270.0</td>\n",
       "      <td>Kentucky</td>\n",
       "      <td>15851950.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>267</th>\n",
       "      <td>P.J. Hairston</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "      <td>19.0</td>\n",
       "      <td>SF</td>\n",
       "      <td>23.0</td>\n",
       "      <td>6-6</td>\n",
       "      <td>230.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>1201440.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>335</th>\n",
       "      <td>Jeremy Lin</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "      <td>7.0</td>\n",
       "      <td>PG</td>\n",
       "      <td>27.0</td>\n",
       "      <td>6-3</td>\n",
       "      <td>200.0</td>\n",
       "      <td>Harvard</td>\n",
       "      <td>2139000.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Name                    Team  Number Position   Age Height  \\\n",
       "401        Tyus Jones  Minnesota Timberwolves     1.0       PG  20.0    6-2   \n",
       "342      Gerald Green              Miami Heat    14.0       SF  30.0    6-7   \n",
       "143  DeMarcus Cousins        Sacramento Kings    15.0        C  25.0   6-11   \n",
       "267     P.J. Hairston       Memphis Grizzlies    19.0       SF  23.0    6-6   \n",
       "335        Jeremy Lin       Charlotte Hornets     7.0       PG  27.0    6-3   \n",
       "\n",
       "     Weight         College      Salary  \n",
       "401   195.0            Duke   1282080.0  \n",
       "342   205.0             NaN    947276.0  \n",
       "143   270.0        Kentucky  15851950.0  \n",
       "267   230.0  North Carolina   1201440.0  \n",
       "335   200.0         Harvard   2139000.0  "
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('./nba.csv')\n",
    "\n",
    "df.sample(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "_Przykład_: chcemy obliczyć średnią wypłatę dla każdej z drużyn."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Team</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Atlanta Hawks</th>\n",
       "      <td>4.860197e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boston Celtics</th>\n",
       "      <td>4.181505e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brooklyn Nets</th>\n",
       "      <td>3.501898e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Charlotte Hornets</th>\n",
       "      <td>5.222728e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chicago Bulls</th>\n",
       "      <td>5.785559e+06</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Salary\n",
       "Team                           \n",
       "Atlanta Hawks      4.860197e+06\n",
       "Boston Celtics     4.181505e+06\n",
       "Brooklyn Nets      3.501898e+06\n",
       "Charlotte Hornets  5.222728e+06\n",
       "Chicago Bulls      5.785559e+06"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Team', 'Salary']].groupby('Team').mean().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Możemy też podać listę nazw kolumn. Wtedy wartości zostaną obliczone dla każdej z wytworzonych grup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Team                Position\n",
       "Atlanta Hawks       C           7.585417e+06\n",
       "                    PF          5.988067e+06\n",
       "                    PG          4.881700e+06\n",
       "                    SF          3.000000e+06\n",
       "                    SG          2.607758e+06\n",
       "                                    ...     \n",
       "Washington Wizards  C           8.163476e+06\n",
       "                    PF          5.650000e+06\n",
       "                    PG          9.011208e+06\n",
       "                    SF          2.789700e+06\n",
       "                    SG          2.839248e+06\n",
       "Name: Salary, Length: 149, dtype: float64"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['Team', 'Position'])['Salary'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " * `sum()`\n",
    " * `min()`\n",
    " * `max()`\n",
    " * `mean()`\n",
    " * `size()`\n",
    " * `describe()`\n",
    " * `first()`\n",
    " * `last()`\n",
    " * `count()`\n",
    " * `std()`\n",
    " * `var()`\n",
    " * `sem()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Position</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>5.967052e+06</td>\n",
       "      <td>5.787989e+06</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PF</th>\n",
       "      <td>4.562483e+06</td>\n",
       "      <td>4.800054e+06</td>\n",
       "      <td>97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PG</th>\n",
       "      <td>5.077829e+06</td>\n",
       "      <td>5.051809e+06</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SF</th>\n",
       "      <td>4.857393e+06</td>\n",
       "      <td>6.011889e+06</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SG</th>\n",
       "      <td>4.009861e+06</td>\n",
       "      <td>4.491609e+06</td>\n",
       "      <td>99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                Salary                    \n",
       "                  mean           std count\n",
       "Position                                  \n",
       "C         5.967052e+06  5.787989e+06    78\n",
       "PF        4.562483e+06  4.800054e+06    97\n",
       "PG        5.077829e+06  5.051809e+06    88\n",
       "SF        4.857393e+06  6.011889e+06    84\n",
       "SG        4.009861e+06  4.491609e+06    99"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Position</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>22275967.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PF</th>\n",
       "      <td>22081286.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PG</th>\n",
       "      <td>21412973.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SF</th>\n",
       "      <td>24969112.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SG</th>\n",
       "      <td>19944278.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Salary\n",
       "Position            \n",
       "C         22275967.0\n",
       "PF        22081286.0\n",
       "PG        21412973.0\n",
       "SF        24969112.0\n",
       "SG        19944278.0"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def group_range(x):\n",
    "    return x.max() - x.min()\n",
    "\n",
    "df[['Position', 'Salary']].groupby('Position').apply(group_range)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Liczba grup: 5\n",
      "dict_keys(['C', 'PF', 'PG', 'SF', 'SG'])\n",
      "               Name            Team  Number Position   Age Height  Weight  \\\n",
      "7      Kelly Olynyk  Boston Celtics    41.0        C  25.0    7-0   238.0   \n",
      "10  Jared Sullinger  Boston Celtics     7.0        C  24.0    6-9   260.0   \n",
      "14     Tyler Zeller  Boston Celtics    44.0        C  26.0    7-0   253.0   \n",
      "23      Brook Lopez   Brooklyn Nets    11.0        C  28.0    7-0   275.0   \n",
      "27       Henry Sims   Brooklyn Nets    14.0        C  26.0   6-10   248.0   \n",
      "\n",
      "           College      Salary  \n",
      "7          Gonzaga   2165160.0  \n",
      "10      Ohio State   2569260.0  \n",
      "14  North Carolina   2616975.0  \n",
      "23        Stanford  19689000.0  \n",
      "27      Georgetown    947276.0  \n"
     ]
    }
   ],
   "source": [
    "gb = df.groupby(['Position'])\n",
    "\n",
    "print('Liczba grup:', gb.ngroups)\n",
    "print(gb.groups.keys())\n",
    "\n",
    "print(gb.get_group('C').head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      15.36\n",
       "1      15.36\n",
       "2      15.36\n",
       "3      15.36\n",
       "4      15.36\n",
       "       ...  \n",
       "453    15.36\n",
       "454    15.36\n",
       "455    17.92\n",
       "456    17.92\n",
       "457     <NA>\n",
       "Name: Height, Length: 458, dtype: Float64"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "df.Height.str.split('-').str[0].astype('Int64') * 2.56"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Pivot\n",
    "Metoda `pivot` pozwala na stworzenie nowej ramki danych, gdzie indeks i nazwy kolumn są wartościami początkowej ranki danych. \n",
    "\n",
    "_Przykład_: zobaczmy na poniższą ramkę danych, która zawiera informacje o jakości tłumaczenia dla pary językowej hausa-angielski. Kolumna `system` zawiera nazwę systemu, kolumna `metric` - nazwę metryki, zaś kolumna `score`- wartość metryki. Chcemy przedstawić te dane w następujący sposób: jako klucz chcemy mieć nazwę systemu, zaś jako kolumny -  metryki. Możemy wykorzystać do tego metodę `pivot`, gdzie musimy podać 3 argumenty:\n",
    " * `index`: nazwę kolumny, na podstawie której zostanie stworzony indeks;\n",
    " * `columns`: nazwa kolumny, które zawiera nazwy kolumn dla nowej ramki danych;\n",
    " * `values`: nazwa kolumny, która zawiera interesujące nas dane."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pair</th>\n",
       "      <th>system</th>\n",
       "      <th>id</th>\n",
       "      <th>is_constrained</th>\n",
       "      <th>metric</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1214</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>NiuTrans</td>\n",
       "      <td>382</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>16.512243</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1215</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>NiuTrans</td>\n",
       "      <td>382</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>44.724766</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1216</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>NiuTrans</td>\n",
       "      <td>382</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>16.512243</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1217</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>NiuTrans</td>\n",
       "      <td>382</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>44.724766</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1218</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Facebook-AI</td>\n",
       "      <td>181</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>20.982704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1219</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Facebook-AI</td>\n",
       "      <td>181</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>48.653770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1220</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Facebook-AI</td>\n",
       "      <td>181</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>20.982704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1221</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Facebook-AI</td>\n",
       "      <td>181</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>48.653770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1222</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TRANSSION</td>\n",
       "      <td>336</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>18.834851</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1223</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TRANSSION</td>\n",
       "      <td>336</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>47.238279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1224</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TRANSSION</td>\n",
       "      <td>336</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>18.834851</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1225</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TRANSSION</td>\n",
       "      <td>336</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>47.238279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1226</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>AMU</td>\n",
       "      <td>628</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>14.132845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1227</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>AMU</td>\n",
       "      <td>628</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>41.256570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1228</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>AMU</td>\n",
       "      <td>628</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>14.132845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1229</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>AMU</td>\n",
       "      <td>628</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>41.256570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1230</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>P3AI</td>\n",
       "      <td>715</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>17.793617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1231</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>P3AI</td>\n",
       "      <td>715</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>46.307402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1232</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>P3AI</td>\n",
       "      <td>715</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>17.793617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1233</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>P3AI</td>\n",
       "      <td>715</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>46.307402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1234</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-B</td>\n",
       "      <td>1356</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>18.655658</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1235</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-B</td>\n",
       "      <td>1356</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>46.658216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1236</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-B</td>\n",
       "      <td>1356</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>18.655658</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1237</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-B</td>\n",
       "      <td>1356</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>46.658216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1238</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TWB</td>\n",
       "      <td>1335</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>12.326443</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1239</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TWB</td>\n",
       "      <td>1335</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>40.282629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1240</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TWB</td>\n",
       "      <td>1335</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>12.326443</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1241</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>TWB</td>\n",
       "      <td>1335</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>40.282629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1242</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>ZMT</td>\n",
       "      <td>553</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>18.837023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1243</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>ZMT</td>\n",
       "      <td>553</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>47.231474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1244</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>ZMT</td>\n",
       "      <td>553</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>18.837023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1245</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>ZMT</td>\n",
       "      <td>553</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>47.231474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1246</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Manifold</td>\n",
       "      <td>437</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>16.943915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1247</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Manifold</td>\n",
       "      <td>437</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>45.638356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1248</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Manifold</td>\n",
       "      <td>437</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>16.943915</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1249</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Manifold</td>\n",
       "      <td>437</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>45.638356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1250</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-Y</td>\n",
       "      <td>1374</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>13.898531</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1251</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-Y</td>\n",
       "      <td>1374</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>44.842874</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1252</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-Y</td>\n",
       "      <td>1374</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>13.898531</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1253</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>Online-Y</td>\n",
       "      <td>1374</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>44.842874</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1254</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>HuaweiTSC</td>\n",
       "      <td>758</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>17.492440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1255</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>HuaweiTSC</td>\n",
       "      <td>758</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>46.795737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1256</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>HuaweiTSC</td>\n",
       "      <td>758</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>17.492440</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1257</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>HuaweiTSC</td>\n",
       "      <td>758</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>46.795737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1258</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>MS-EgDC</td>\n",
       "      <td>896</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>17.133350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1259</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>MS-EgDC</td>\n",
       "      <td>896</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>45.266274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1260</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>MS-EgDC</td>\n",
       "      <td>896</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>17.133350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1261</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>MS-EgDC</td>\n",
       "      <td>896</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>45.266274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1262</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>GTCOM</td>\n",
       "      <td>1298</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>17.794272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1263</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>GTCOM</td>\n",
       "      <td>1298</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>46.714831</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1264</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>GTCOM</td>\n",
       "      <td>1298</td>\n",
       "      <td>False</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>17.794272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1265</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>GTCOM</td>\n",
       "      <td>1298</td>\n",
       "      <td>False</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>46.714831</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1266</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>UEdin</td>\n",
       "      <td>1149</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-all</td>\n",
       "      <td>14.887836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1267</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>UEdin</td>\n",
       "      <td>1149</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-all</td>\n",
       "      <td>42.247415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1268</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>UEdin</td>\n",
       "      <td>1149</td>\n",
       "      <td>True</td>\n",
       "      <td>bleu-A</td>\n",
       "      <td>14.887836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1269</th>\n",
       "      <td>ha-en</td>\n",
       "      <td>UEdin</td>\n",
       "      <td>1149</td>\n",
       "      <td>True</td>\n",
       "      <td>chrf-A</td>\n",
       "      <td>42.247415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       pair       system    id  is_constrained    metric      score\n",
       "1214  ha-en     NiuTrans   382            True  bleu-all  16.512243\n",
       "1215  ha-en     NiuTrans   382            True  chrf-all  44.724766\n",
       "1216  ha-en     NiuTrans   382            True    bleu-A  16.512243\n",
       "1217  ha-en     NiuTrans   382            True    chrf-A  44.724766\n",
       "1218  ha-en  Facebook-AI   181           False  bleu-all  20.982704\n",
       "1219  ha-en  Facebook-AI   181           False  chrf-all  48.653770\n",
       "1220  ha-en  Facebook-AI   181           False    bleu-A  20.982704\n",
       "1221  ha-en  Facebook-AI   181           False    chrf-A  48.653770\n",
       "1222  ha-en    TRANSSION   336           False  bleu-all  18.834851\n",
       "1223  ha-en    TRANSSION   336           False  chrf-all  47.238279\n",
       "1224  ha-en    TRANSSION   336           False    bleu-A  18.834851\n",
       "1225  ha-en    TRANSSION   336           False    chrf-A  47.238279\n",
       "1226  ha-en          AMU   628            True  bleu-all  14.132845\n",
       "1227  ha-en          AMU   628            True  chrf-all  41.256570\n",
       "1228  ha-en          AMU   628            True    bleu-A  14.132845\n",
       "1229  ha-en          AMU   628            True    chrf-A  41.256570\n",
       "1230  ha-en         P3AI   715            True  bleu-all  17.793617\n",
       "1231  ha-en         P3AI   715            True  chrf-all  46.307402\n",
       "1232  ha-en         P3AI   715            True    bleu-A  17.793617\n",
       "1233  ha-en         P3AI   715            True    chrf-A  46.307402\n",
       "1234  ha-en     Online-B  1356           False  bleu-all  18.655658\n",
       "1235  ha-en     Online-B  1356           False  chrf-all  46.658216\n",
       "1236  ha-en     Online-B  1356           False    bleu-A  18.655658\n",
       "1237  ha-en     Online-B  1356           False    chrf-A  46.658216\n",
       "1238  ha-en          TWB  1335           False  bleu-all  12.326443\n",
       "1239  ha-en          TWB  1335           False  chrf-all  40.282629\n",
       "1240  ha-en          TWB  1335           False    bleu-A  12.326443\n",
       "1241  ha-en          TWB  1335           False    chrf-A  40.282629\n",
       "1242  ha-en          ZMT   553           False  bleu-all  18.837023\n",
       "1243  ha-en          ZMT   553           False  chrf-all  47.231474\n",
       "1244  ha-en          ZMT   553           False    bleu-A  18.837023\n",
       "1245  ha-en          ZMT   553           False    chrf-A  47.231474\n",
       "1246  ha-en     Manifold   437            True  bleu-all  16.943915\n",
       "1247  ha-en     Manifold   437            True  chrf-all  45.638356\n",
       "1248  ha-en     Manifold   437            True    bleu-A  16.943915\n",
       "1249  ha-en     Manifold   437            True    chrf-A  45.638356\n",
       "1250  ha-en     Online-Y  1374           False  bleu-all  13.898531\n",
       "1251  ha-en     Online-Y  1374           False  chrf-all  44.842874\n",
       "1252  ha-en     Online-Y  1374           False    bleu-A  13.898531\n",
       "1253  ha-en     Online-Y  1374           False    chrf-A  44.842874\n",
       "1254  ha-en    HuaweiTSC   758            True  bleu-all  17.492440\n",
       "1255  ha-en    HuaweiTSC   758            True  chrf-all  46.795737\n",
       "1256  ha-en    HuaweiTSC   758            True    bleu-A  17.492440\n",
       "1257  ha-en    HuaweiTSC   758            True    chrf-A  46.795737\n",
       "1258  ha-en      MS-EgDC   896            True  bleu-all  17.133350\n",
       "1259  ha-en      MS-EgDC   896            True  chrf-all  45.266274\n",
       "1260  ha-en      MS-EgDC   896            True    bleu-A  17.133350\n",
       "1261  ha-en      MS-EgDC   896            True    chrf-A  45.266274\n",
       "1262  ha-en        GTCOM  1298           False  bleu-all  17.794272\n",
       "1263  ha-en        GTCOM  1298           False  chrf-all  46.714831\n",
       "1264  ha-en        GTCOM  1298           False    bleu-A  17.794272\n",
       "1265  ha-en        GTCOM  1298           False    chrf-A  46.714831\n",
       "1266  ha-en        UEdin  1149            True  bleu-all  14.887836\n",
       "1267  ha-en        UEdin  1149            True  chrf-all  42.247415\n",
       "1268  ha-en        UEdin  1149            True    bleu-A  14.887836\n",
       "1269  ha-en        UEdin  1149            True    chrf-A  42.247415"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n",
    "df = df[df.pair == 'ha-en']\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>metric</th>\n",
       "      <th>bleu-A</th>\n",
       "      <th>bleu-all</th>\n",
       "      <th>chrf-A</th>\n",
       "      <th>chrf-all</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>system</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AMU</th>\n",
       "      <td>14.132845</td>\n",
       "      <td>14.132845</td>\n",
       "      <td>41.256570</td>\n",
       "      <td>41.256570</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Facebook-AI</th>\n",
       "      <td>20.982704</td>\n",
       "      <td>20.982704</td>\n",
       "      <td>48.653770</td>\n",
       "      <td>48.653770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GTCOM</th>\n",
       "      <td>17.794272</td>\n",
       "      <td>17.794272</td>\n",
       "      <td>46.714831</td>\n",
       "      <td>46.714831</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>HuaweiTSC</th>\n",
       "      <td>17.492440</td>\n",
       "      <td>17.492440</td>\n",
       "      <td>46.795737</td>\n",
       "      <td>46.795737</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MS-EgDC</th>\n",
       "      <td>17.133350</td>\n",
       "      <td>17.133350</td>\n",
       "      <td>45.266274</td>\n",
       "      <td>45.266274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Manifold</th>\n",
       "      <td>16.943915</td>\n",
       "      <td>16.943915</td>\n",
       "      <td>45.638356</td>\n",
       "      <td>45.638356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NiuTrans</th>\n",
       "      <td>16.512243</td>\n",
       "      <td>16.512243</td>\n",
       "      <td>44.724766</td>\n",
       "      <td>44.724766</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Online-B</th>\n",
       "      <td>18.655658</td>\n",
       "      <td>18.655658</td>\n",
       "      <td>46.658216</td>\n",
       "      <td>46.658216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Online-Y</th>\n",
       "      <td>13.898531</td>\n",
       "      <td>13.898531</td>\n",
       "      <td>44.842874</td>\n",
       "      <td>44.842874</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>P3AI</th>\n",
       "      <td>17.793617</td>\n",
       "      <td>17.793617</td>\n",
       "      <td>46.307402</td>\n",
       "      <td>46.307402</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TRANSSION</th>\n",
       "      <td>18.834851</td>\n",
       "      <td>18.834851</td>\n",
       "      <td>47.238279</td>\n",
       "      <td>47.238279</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TWB</th>\n",
       "      <td>12.326443</td>\n",
       "      <td>12.326443</td>\n",
       "      <td>40.282629</td>\n",
       "      <td>40.282629</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>UEdin</th>\n",
       "      <td>14.887836</td>\n",
       "      <td>14.887836</td>\n",
       "      <td>42.247415</td>\n",
       "      <td>42.247415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZMT</th>\n",
       "      <td>18.837023</td>\n",
       "      <td>18.837023</td>\n",
       "      <td>47.231474</td>\n",
       "      <td>47.231474</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "metric          bleu-A   bleu-all     chrf-A   chrf-all\n",
       "system                                                 \n",
       "AMU          14.132845  14.132845  41.256570  41.256570\n",
       "Facebook-AI  20.982704  20.982704  48.653770  48.653770\n",
       "GTCOM        17.794272  17.794272  46.714831  46.714831\n",
       "HuaweiTSC    17.492440  17.492440  46.795737  46.795737\n",
       "MS-EgDC      17.133350  17.133350  45.266274  45.266274\n",
       "Manifold     16.943915  16.943915  45.638356  45.638356\n",
       "NiuTrans     16.512243  16.512243  44.724766  44.724766\n",
       "Online-B     18.655658  18.655658  46.658216  46.658216\n",
       "Online-Y     13.898531  13.898531  44.842874  44.842874\n",
       "P3AI         17.793617  17.793617  46.307402  46.307402\n",
       "TRANSSION    18.834851  18.834851  47.238279  47.238279\n",
       "TWB          12.326443  12.326443  40.282629  40.282629\n",
       "UEdin        14.887836  14.887836  42.247415  42.247415\n",
       "ZMT          18.837023  18.837023  47.231474  47.231474"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot(index='system', columns='metric', values='score')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dane tekstowe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "`pandas` posiada udogodnienia do pracy z wartościami tekstowymi:\n",
    " * dostęp następuje przez atrybut `str`;\n",
    " * funkcje:\n",
    "    * formatujące: `lower()`, `upper()`;\n",
    "    * wyrażenia regularne: `contains()`, `match()`;\n",
    "    * inne: `split()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen\\t Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen\\t Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "3                                      Heikkinen\\t Miss. Laina  female  26.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "5                                    Allen\\t Mr. William Henry    male  35.0   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1                               BRAUND\\t MR. OWEN HARRIS\n",
       "2      CUMINGS\\t MRS. JOHN BRADLEY (FLORENCE BRIGGS T...\n",
       "3                                HEIKKINEN\\t MISS. LAINA\n",
       "4          FUTRELLE\\t MRS. JACQUES HEATH (LILY MAY PEEL)\n",
       "5                              ALLEN\\t MR. WILLIAM HENRY\n",
       "                             ...                        \n",
       "887                               MONTVILA\\t REV. JUOZAS\n",
       "888                        GRAHAM\\t MISS. MARGARET EDITH\n",
       "889            JOHNSTON\\t MISS. CATHERINE HELEN \"CARRIE\"\n",
       "890                               BEHR\\t MR. KARL HOWELL\n",
       "891                                 DOOLEY\\t MR. PATRICK\n",
       "Name: Name, Length: 891, dtype: object"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name.str.upper()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PassengerId\n",
      "1                             Braund\\t Mr. Owen Harris\n",
      "2    Cumings\\t Mrs. John Bradley (Florence Briggs T...\n",
      "3                              Heikkinen\\t Miss. Laina\n",
      "4        Futrelle\\t Mrs. Jacques Heath (Lily May Peel)\n",
      "5                            Allen\\t Mr. William Henry\n",
      "Name: Name, dtype: object\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1    False\n",
       "2     True\n",
       "3     True\n",
       "4     True\n",
       "5    False\n",
       "Name: Name, dtype: bool"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(df.Name.head())\n",
    "df.Name.str.contains('Miss|Mrs').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Braund</td>\n",
       "      <td>Mr. Owen Harris</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Cumings</td>\n",
       "      <td>Mrs. John Bradley (Florence Briggs Thayer)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Heikkinen</td>\n",
       "      <td>Miss. Laina</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Futrelle</td>\n",
       "      <td>Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Allen</td>\n",
       "      <td>Mr. William Henry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887</th>\n",
       "      <td>Montvila</td>\n",
       "      <td>Rev. Juozas</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>Graham</td>\n",
       "      <td>Miss. Margaret Edith</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>Johnston</td>\n",
       "      <td>Miss. Catherine Helen \"Carrie\"</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>Behr</td>\n",
       "      <td>Mr. Karl Howell</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>891</th>\n",
       "      <td>Dooley</td>\n",
       "      <td>Mr. Patrick</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>891 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     0                                            1\n",
       "PassengerId                                                        \n",
       "1               Braund                              Mr. Owen Harris\n",
       "2              Cumings   Mrs. John Bradley (Florence Briggs Thayer)\n",
       "3            Heikkinen                                  Miss. Laina\n",
       "4             Futrelle           Mrs. Jacques Heath (Lily May Peel)\n",
       "5                Allen                            Mr. William Henry\n",
       "...                ...                                          ...\n",
       "887           Montvila                                  Rev. Juozas\n",
       "888             Graham                         Miss. Margaret Edith\n",
       "889           Johnston               Miss. Catherine Helen \"Carrie\"\n",
       "890               Behr                              Mr. Karl Howell\n",
       "891             Dooley                                  Mr. Patrick\n",
       "\n",
       "[891 rows x 2 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name.str.split('\\t', expand=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1                             [Braund,  Mr. Owen Harris]\n",
       "2      [Cumings,  Mrs. John Bradley (Florence Briggs ...\n",
       "3                              [Heikkinen,  Miss. Laina]\n",
       "4        [Futrelle,  Mrs. Jacques Heath (Lily May Peel)]\n",
       "5                            [Allen,  Mr. William Henry]\n",
       "                             ...                        \n",
       "887                             [Montvila,  Rev. Juozas]\n",
       "888                      [Graham,  Miss. Margaret Edith]\n",
       "889          [Johnston,  Miss. Catherine Helen \"Carrie\"]\n",
       "890                             [Behr,  Mr. Karl Howell]\n",
       "891                               [Dooley,  Mr. Patrick]\n",
       "Name: Name, Length: 891, dtype: object"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name.str.split('\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1                                  Mr. Owen Harris\n",
       "2       Mrs. John Bradley (Florence Briggs Thayer)\n",
       "3                                      Miss. Laina\n",
       "4               Mrs. Jacques Heath (Lily May Peel)\n",
       "5                                Mr. William Henry\n",
       "                          ...                     \n",
       "887                                    Rev. Juozas\n",
       "888                           Miss. Margaret Edith\n",
       "889                 Miss. Catherine Helen \"Carrie\"\n",
       "890                                Mr. Karl Howell\n",
       "891                                    Mr. Patrick\n",
       "Name: Name, Length: 891, dtype: object"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name.str.split('\\t').str[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1        Mr.\n",
       "2       Mrs.\n",
       "3      Miss.\n",
       "4       Mrs.\n",
       "5        Mr.\n",
       "       ...  \n",
       "887     Rev.\n",
       "888    Miss.\n",
       "889    Miss.\n",
       "890      Mr.\n",
       "891      Mr.\n",
       "Name: Name, Length: 891, dtype: object"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie\n",
    "Zestaw `nba.csv` zawiera informaję o wysokości zawodników. Oblicz wzrost każdego z zawodników w systemie metrycznym przyjmując, że stop to `30.48` cm., a cal to `2.54` cm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "interpreter": {
   "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}