{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#  Analiza Danych w Pythonie: `pandas`\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### `pandas`\n",
    "Biblioteka `pandas` jest podstawowym narzędziem w ekosystemie Pythona do analizy danych:\n",
    " * dostarcza dwa podstawowe typy danych: \n",
    "   * `Series` (szereg, 1D)\n",
    "   * `DataFrame` (ramka danych, 2D)\n",
    " * operacje na tych obiektach: obsługa brakujących wartości, łączenie danych;\n",
    " * obsługuje dane różnego typu, np. szeregi czasowe;\n",
    " * biblioteka bazuje na `numpy` -- bibliotece do obliczeń numerycznych;\n",
    " * pozwala też na prostą wizualizację danych;\n",
    " * ETL: extract, transform, load."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Żeby zaimportowąc bibliotekę `pandas` wystarczy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### __Zadanie 0__: sprawdź, czy masz zainstalowaną bibliotekę `pandas`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### [Szeregi](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) (`pd.Series`)\n",
    "\n",
    " Szereg reprezentuje jednorodne dane jednowymiarowe - jest odpowiednikiem wektora w R.\n",
    "  * Szeregi możemy tworzyć na różne sposoby (więcej za chwilę), np. z obiektów tj. listy i słowniki.\n",
    "  * Dane muszą być jednorodne. W przeciwnym przypadku nastąpi automatyczna konwersja.\n",
    "  * Podczas tworzenia szeregu musimy podać jeden obowiązkowy argument `data` - dane.\n",
    "  * Ponadto możemy podać też indeks (`index`), typ danych (`dtype`) lub nazwę (`name`).\n",
    "  \n",
    "  \n",
    "  ```\n",
    "  class pandas.Series(data=None, index=None, dtype=None, name=None)\n",
    "  ```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podczas tworzenie szeregu mozemy podać dane w formacie listy lub słownika.\n",
    "\n",
    "Poniżej jest przykład przedstawiający tworzenie szeregu z danych, które są zawarte w liście:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    211819\n",
       "1    682758\n",
       "2    737011\n",
       "3    779511\n",
       "4    673790\n",
       "5    673790\n",
       "6    444177\n",
       "7    136791\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "data = [211819, 682758, 737011, 779511, 673790, 673790, 444177, 136791]\n",
    "\n",
    "s = pd.Series(data)\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "W przypadku, gdy dane pochodzą z listy i nie podaliśmy indeksu, pandas doda automatyczny indeks liczbowy zaczynający się od 0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "W przypadku przekazania słownika jako danych do szeregu, pandas wykorzysta klucze do stworzenia indeksu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April    211819\n",
       "May      682758\n",
       "June     737011\n",
       "July     779511\n",
       "dtype: int64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podczas tworzenia szeregu możemy zdefiniować indeks, jak i nazwę szeregu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April    211819.0\n",
       "May      682758.0\n",
       "June     737011.0\n",
       "July     779511.0\n",
       "Name: Rides, dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "months = ['April', 'May', 'June', 'July']\n",
    "\n",
    "data = [211819, 682758, 737011, 779511]\n",
    "\n",
    "s = pd.Series(data=data, index=months, dtype=float, name='Rides')\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Odwołanie się do poszczególnego elementu odbywa się przy pomocy klucza z indeksu."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "211819\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "April     211819\n",
       "May       682758\n",
       "June      737011\n",
       "July      779511\n",
       "August    673790\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "print(s['April'])\n",
    "\n",
    "s['August'] = 673790\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Dodanie elementu do szeregu odbywa się poprzez definiowanie nowego klucza:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "April     211819\n",
       "May       682758\n",
       "June      737011\n",
       "July      779511\n",
       "August    673790\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
    "\n",
    "s = pd.Series(members)\n",
    "\n",
    "s['August'] = 673790\n",
    "\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Więcej nt. indeksowania w szeregach w dalszej części kursu."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Podstawowa cechą szeregu jest wykonywanie operacji w sposób wektorowy. Działa to w następujący sposób:\n",
    " * gdy w obu szeregach jest zawarty ten sam klucz, to są sumowane ich wartości;\n",
    " * w przeciwnym przypadku wartość klucza w wynikowym szeregu to `pd.NaN`.  \n",
    " * Równoważnie możemy wykorzystać metodę `pandas.Series.add`. W tym przypadku możemy podać domyślną wartość w przypadku braku klucza."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "August       880599.0\n",
       "July         973827.0\n",
       "June         908505.0\n",
       "May          830656.0\n",
       "October           NaN\n",
       "September    814282.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011,  'August': 673790, 'July': 779511,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
    "'September': 140492})\n",
    "\n",
    "all_data = members + occasionals\n",
    "# Równoważnie\n",
    "all_data = members.add(occasionals)\n",
    "all_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy wykonać operacje arytmetyczne na szeregu: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May          683758\n",
       "June         738011\n",
       "July         780511\n",
       "August       674790\n",
       "September    674790\n",
       "October      445177\n",
       "dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "members += 1000\n",
    "\n",
    "members"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May          683758\n",
       "June         738011\n",
       "July         780511\n",
       "August       674790\n",
       "September    674790\n",
       "October      445177\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May          10000000000683758\n",
       "June         10000000000738011\n",
       "July         10000000000780511\n",
       "August       10000000000674790\n",
       "September    10000000000674790\n",
       "October      10000000000445177\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members + 10000000000000000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Podsumowanie\n",
    " * Szeregi działają podobnie do słowników, z tą różnicą, że wartości muszą być jednorodne (tego samego typu).\n",
    " * Odwołanie do poszczególnych elementów odbywa się poprzez nawiasy `[]` i podanie klucza.\n",
    " * W przeciwieństwie do słowników, możemy w prosty sposób wykonywać operacje arytmetyczne."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie 1\n",
    " * Stwórz szereg `n`, który będzie zawierać liczby od 0 do 10 (włącznie).\n",
    " * Stwórz szereg `n2`, który będzie zawierać kwadraty liczb od 0 do 10 (włącznie).\n",
    " * Następnie stwórz szereg `trojkatne`, który będzie sumą powyższych szeregów podzieloną przez 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "n= list(range(10+1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = pd.Series(n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0\n",
       "1      1\n",
       "2      2\n",
       "3      3\n",
       "4      4\n",
       "5      5\n",
       "6      6\n",
       "7      7\n",
       "8      8\n",
       "9      9\n",
       "10    10\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "n2 = n**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       0\n",
       "1       1\n",
       "2       4\n",
       "3       9\n",
       "4      16\n",
       "5      25\n",
       "6      36\n",
       "7      49\n",
       "8      64\n",
       "9      81\n",
       "10    100\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "n2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "trojkatne = ( n + n2 ) / 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      0.0\n",
       "1      1.0\n",
       "2      3.0\n",
       "3      6.0\n",
       "4     10.0\n",
       "5     15.0\n",
       "6     21.0\n",
       "7     28.0\n",
       "8     36.0\n",
       "9     45.0\n",
       "10    55.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trojkatne"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### [Ramka danych](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) (`pd.DataFrame`)\n",
    "\n",
    "Ramka danych jest podstawową strukturą danych w bibliotece `pandas`, która pozwala na trzymanie i reprezentowanie danych tabelarycznych (dwuwymiarowych).\n",
    " * Posiada kolumny (cechy) i wiersze (obserwacje, przykłady).\n",
    " * Możemy też patrzeć na nią jak na słownik, którego wartościami są szeregi.\n",
    "\n",
    "```\n",
    "class pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
    "```\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Ramkę danych możemy stworzyć na różne sposoby.\n",
    "\n",
    "Pierwszy z nich (\"kolumnowy\") polega na zdefiniowaniu ramki poprzez podanie szeregów jako kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May     682758\n",
       "June    737011\n",
       "July    779511\n",
       "dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "May     147898\n",
       "June    171494\n",
       "July    194316\n",
       "dtype: int64"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "occasionals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511.0</td>\n",
       "      <td>194316.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011.0</td>\n",
       "      <td>171494.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>NaN</td>\n",
       "      <td>147898.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Maydfdsgfdg</th>\n",
       "      <td>682758.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              members  occasionals\n",
       "July         779511.0     194316.0\n",
       "June         737011.0     171494.0\n",
       "May               NaN     147898.0\n",
       "Maydfdsgfdg  682758.0          NaN"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'Maydfdsgfdg': 682758, 'June': 737011, 'July': 779511})\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Drugim popularnym sposobem jest przekazanie listy słowników. Wtedy `pandas` zinterpretuje to jako listę przykładów:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   members  occasionals\n",
       "0   682758       147898\n",
       "1   737011       171494\n",
       "2   779511       194316"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = [\n",
    "    {'members': 682758, 'occasionals': 147898},\n",
    "    {'occasionals': 171494,'members': 737011},\n",
    "    {'members': 779511, 'occasionals': 194316},\n",
    "]\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy też wykorzystać metodę `from_dict` ([doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html)), która pozwala zdefiniować czy podane dane są w podane w postaci kolumnowej lub wierszowej:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "index\n",
      "       members  occasionals\n",
      "May    682758       147898\n",
      "June   737011       171494\n",
      "July   779511       194316\n",
      "\n",
      "columns\n",
      "                 May    June    July\n",
      "members      682758  737011  779511\n",
      "occasionals  147898  171494  194316\n"
     ]
    }
   ],
   "source": [
    "data = {\n",
    "    'May': {'members': 682758, 'occasionals': 147898},\n",
    "    'June': {'members': 737011, 'occasionals': 171494},\n",
    "    'July': {'members': 779511, 'occasionals': 194316}\n",
    "}\n",
    "\n",
    "df = pd.DataFrame.from_dict(data, orient='index')\n",
    "print('index\\n', df)\n",
    "print()\n",
    "df = pd.DataFrame.from_dict(data, orient='columns')\n",
    "print('columns\\n', df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Wczytywanie danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Biblioteka `pandas` pozwala na wczytanie i zapis danych z różnych formatów:\n",
    " * formaty tekstowe, np. `csv`, `json`\n",
    " * pliki arkuszy kalkulacyjnych: Excel (xls, xlsx)\n",
    " * bazy danych\n",
    " * inne: `sas` `spss`\n",
    "\n",
    "\n",
    "Efektem wczytania danych jest odpowiednio stworzona ramka danych (`DataFrame`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Jednym z najprostszych formatów danych jest format `csv`, gdzie kolejne wartości są rozdzielone przecinkiem.\n",
    "\n",
    "Żeby wczytać dane w takim formacie należy użyć funkcji `pandas.read_csv`.\n",
    "\n",
    "Pandas pozwala na ustawienie wielu parametrów (np. separator, cudzysłowy). Więcej na ten temat w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>female_BMI</th>\n",
       "      <th>male_BMI</th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>under5mortality</th>\n",
       "      <th>life_expectancy</th>\n",
       "      <th>fertility</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>21.07402</td>\n",
       "      <td>20.62058</td>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>110.4</td>\n",
       "      <td>52.8</td>\n",
       "      <td>6.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>25.65726</td>\n",
       "      <td>26.44657</td>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>17.9</td>\n",
       "      <td>76.8</td>\n",
       "      <td>1.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>26.36841</td>\n",
       "      <td>24.59620</td>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>29.5</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Angola</td>\n",
       "      <td>23.48431</td>\n",
       "      <td>22.25083</td>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>192.0</td>\n",
       "      <td>56.7</td>\n",
       "      <td>6.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Antigua and Barbuda</td>\n",
       "      <td>27.50545</td>\n",
       "      <td>25.76602</td>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>10.9</td>\n",
       "      <td>75.5</td>\n",
       "      <td>2.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>28.13408</td>\n",
       "      <td>27.44500</td>\n",
       "      <td>17911.0</td>\n",
       "      <td>28116716.0</td>\n",
       "      <td>17.1</td>\n",
       "      <td>74.2</td>\n",
       "      <td>2.53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>21.06500</td>\n",
       "      <td>20.91630</td>\n",
       "      <td>4085.0</td>\n",
       "      <td>86589342.0</td>\n",
       "      <td>26.2</td>\n",
       "      <td>74.1</td>\n",
       "      <td>1.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>Palestine</td>\n",
       "      <td>29.02643</td>\n",
       "      <td>26.57750</td>\n",
       "      <td>3564.0</td>\n",
       "      <td>3854667.0</td>\n",
       "      <td>24.7</td>\n",
       "      <td>74.1</td>\n",
       "      <td>4.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>23.05436</td>\n",
       "      <td>20.68321</td>\n",
       "      <td>3039.0</td>\n",
       "      <td>13114579.0</td>\n",
       "      <td>94.9</td>\n",
       "      <td>51.1</td>\n",
       "      <td>5.88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>24.64522</td>\n",
       "      <td>22.02660</td>\n",
       "      <td>1286.0</td>\n",
       "      <td>13495462.0</td>\n",
       "      <td>98.3</td>\n",
       "      <td>47.3</td>\n",
       "      <td>3.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>175 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Country  female_BMI  male_BMI      gdp  population  \\\n",
       "0            Afghanistan    21.07402  20.62058   1311.0  26528741.0   \n",
       "1                Albania    25.65726  26.44657   8644.0   2968026.0   \n",
       "2                Algeria    26.36841  24.59620  12314.0  34811059.0   \n",
       "3                 Angola    23.48431  22.25083   7103.0  19842251.0   \n",
       "4    Antigua and Barbuda    27.50545  25.76602  25736.0     85350.0   \n",
       "..                   ...         ...       ...      ...         ...   \n",
       "170            Venezuela    28.13408  27.44500  17911.0  28116716.0   \n",
       "171              Vietnam    21.06500  20.91630   4085.0  86589342.0   \n",
       "172            Palestine    29.02643  26.57750   3564.0   3854667.0   \n",
       "173               Zambia    23.05436  20.68321   3039.0  13114579.0   \n",
       "174             Zimbabwe    24.64522  22.02660   1286.0  13495462.0   \n",
       "\n",
       "     under5mortality  life_expectancy  fertility  \n",
       "0              110.4             52.8       6.20  \n",
       "1               17.9             76.8       1.76  \n",
       "2               29.5             75.5       2.73  \n",
       "3              192.0             56.7       6.43  \n",
       "4               10.9             75.5       2.16  \n",
       "..               ...              ...        ...  \n",
       "170             17.1             74.2       2.53  \n",
       "171             26.2             74.1       1.86  \n",
       "172             24.7             74.1       4.38  \n",
       "173             94.9             51.1       5.88  \n",
       "174             98.3             47.3       3.85  \n",
       "\n",
       "[175 rows x 8 columns]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('gapminder.csv')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen\\t Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen\\t Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex  Age  \\\n",
       "PassengerId                                                                   \n",
       "1                                     Braund\\t Mr. Owen Harris    male   22   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female   38   \n",
       "3                                      Heikkinen\\t Miss. Laina  female   26   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female   35   \n",
       "5                                    Allen\\t Mr. William Henry    male   35   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', delimiter='\\t', index_col=0, nrows=5)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Do wczytania danych z arkusza kalkulacyjnego służy funkcja `pandas.read_excel`. Do otworzenia pliku `xlsx` może być koniecnze ustawienie parametru: `engine='openpyxl`. Więcej opcji w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Innym ważnym źródłem informacji są bazy danych. Pandas potrafi komunikować się z bazą danych za pomocą biblioteki [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) i dostarcza odpowiedną funkcję:\n",
    " * `pandas.read_sql` - wczytanie całej tabeli lub zapytania do bazy danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>ArtistId</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AlbumId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>For Those About To Rock We Salute You</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balls to the Wall</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Restless and Wild</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Let There Be Rock</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Big Ones</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>Respighi:Pines of Rome</td>\n",
       "      <td>226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>344</th>\n",
       "      <td>Schubert: The Late String Quartets &amp; String Qu...</td>\n",
       "      <td>272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345</th>\n",
       "      <td>Monteverdi: L'Orfeo</td>\n",
       "      <td>273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346</th>\n",
       "      <td>Mozart: Chamber Music</td>\n",
       "      <td>274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347</th>\n",
       "      <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>347 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     Title  ArtistId\n",
       "AlbumId                                                             \n",
       "1                    For Those About To Rock We Salute You         1\n",
       "2                                        Balls to the Wall         2\n",
       "3                                        Restless and Wild         2\n",
       "4                                        Let There Be Rock         1\n",
       "5                                                 Big Ones         3\n",
       "...                                                    ...       ...\n",
       "343                                 Respighi:Pines of Rome       226\n",
       "344      Schubert: The Late String Quartets & String Qu...       272\n",
       "345                                    Monteverdi: L'Orfeo       273\n",
       "346                                  Mozart: Chamber Music       274\n",
       "347      Koyaanisqatsi (Soundtrack from the Motion Pict...       275\n",
       "\n",
       "[347 rows x 2 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_sql('Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>ArtistId</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AlbumId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>For Those About To Rock We Salute You</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Balls to the Wall</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Restless and Wild</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Let There Be Rock</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Big Ones</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>Respighi:Pines of Rome</td>\n",
       "      <td>226</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>344</th>\n",
       "      <td>Schubert: The Late String Quartets &amp; String Qu...</td>\n",
       "      <td>272</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>345</th>\n",
       "      <td>Monteverdi: L'Orfeo</td>\n",
       "      <td>273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346</th>\n",
       "      <td>Mozart: Chamber Music</td>\n",
       "      <td>274</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347</th>\n",
       "      <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>347 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     Title  ArtistId\n",
       "AlbumId                                                             \n",
       "1                    For Those About To Rock We Salute You         1\n",
       "2                                        Balls to the Wall         2\n",
       "3                                        Restless and Wild         2\n",
       "4                                        Let There Be Rock         1\n",
       "5                                                 Big Ones         3\n",
       "...                                                    ...       ...\n",
       "343                                 Respighi:Pines of Rome       226\n",
       "344      Schubert: The Late String Quartets & String Qu...       272\n",
       "345                                    Monteverdi: L'Orfeo       273\n",
       "346                                  Mozart: Chamber Music       274\n",
       "347      Koyaanisqatsi (Soundtrack from the Motion Pict...       275\n",
       "\n",
       "[347 rows x 2 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import sqlalchemy\n",
    "\n",
    "engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
    "connection  = engine.raw_connection()\n",
    "\n",
    "df = pd.read_sql('SELECT * FROM Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Podsumowanie\n",
    "\n",
    "\n",
    " * Biblioteka `pandas` wspiera pobieranie danych z różnych formatów i źródeł.\n",
    " * Każda funkcja ma listę argumentów, które pozwalają na ustawić poszczególne  parametry (np. [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv))."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zapis i eksport danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Pandas pozwala w prosty sposób na zapisywanie ramki danych do pliku. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# zapis do formatu CSV\n",
    "df.to_csv('tmp.csv')\n",
    "# zapis do arkusza kalkulacyjnego \n",
    "df.to_excel('tmp.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Ponadto możemy przekonwertować ramkę danych do JSONa lub Pythonowego słownika:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"members\":{\"May\":682758,\"June\":737011,\"July\":779511},\"occasionals\":{\"May\":147898,\"June\":171494,\"July\":194316}}\n"
     ]
    }
   ],
   "source": [
    "print(df.to_json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'members': {'May': 682758, 'June': 737011, 'July': 779511}, 'occasionals': {'May': 147898, 'June': 171494, 'July': 194316}}\n"
     ]
    }
   ],
   "source": [
    "print(df.to_dict())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Lub przekopiować dane do schowka:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.to_clipboard()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie\n",
    "\n",
    "\n",
    "\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " * Przekonwertuj tabele `Customer` z bazy `Chinook.sqlite` do arkusza kalkulacyjnego. Plik wynikowy nazwij `customers.xlsx`.\n",
    " * Tabela `Employee` zawiera informacje o pracownikach firmy Chinook. Wyswietl dane na ekranie i podaj miasta, w których mieszkają pracownicy.\n",
    " * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Ramka danych - podstawy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "#### Kolumny\n",
    "\n",
    "Na ramkę danych możemy patrzeć jak na swego rodzaju słownik, którego wartościami są szeregi. Pozwoli to na uzyskanie lepszej intuicji.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>life_expectancy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>52.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "      <td>72.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "      <td>81.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         gdp  population  life_expectancy\n",
       "Country                                                  \n",
       "Afghanistan           1311.0  26528741.0             52.8\n",
       "Albania               8644.0   2968026.0             76.8\n",
       "Algeria              12314.0  34811059.0             75.5\n",
       "Angola                7103.0  19842251.0             56.7\n",
       "Antigua and Barbuda  25736.0     85350.0             75.5\n",
       "Argentina            14646.0  40381860.0             75.4\n",
       "Armenia               7383.0   2975029.0             72.3\n",
       "Australia            41312.0  21370348.0             81.6"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=8, usecols=['Country', 'gdp', 'population','life_expectancy'])\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Dostęp do poszczególnej kolumny możemy uzystać na dwa sposoby:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan            26528741.0\n",
       "Albania                 2968026.0\n",
       "Algeria                34811059.0\n",
       "Angola                 19842251.0\n",
       "Antigua and Barbuda       85350.0\n",
       "Argentina              40381860.0\n",
       "Armenia                 2975029.0\n",
       "Australia              21370348.0\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# notacja z kropką\n",
    "df.population"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Country\n",
       "Afghanistan            26528741.0\n",
       "Albania                 2968026.0\n",
       "Algeria                34811059.0\n",
       "Angola                 19842251.0\n",
       "Antigua and Barbuda       85350.0\n",
       "Argentina              40381860.0\n",
       "Armenia                 2975029.0\n",
       "Australia              21370348.0\n",
       "Name: population, dtype: float64"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Operator []\n",
    "df['population']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Do operatora `[]` możemy też podać listę nazw kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         gdp  population\n",
       "Country                                 \n",
       "Afghanistan           1311.0  26528741.0\n",
       "Albania               8644.0   2968026.0\n",
       "Algeria              12314.0  34811059.0\n",
       "Angola                7103.0  19842251.0\n",
       "Antigua and Barbuda  25736.0     85350.0\n",
       "Argentina            14646.0  40381860.0\n",
       "Armenia               7383.0   2975029.0\n",
       "Australia            41312.0  21370348.0"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['gdp','population']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Listę kolumn możemy pobrać za pomocą:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['gdp', 'population', 'life_expectancy'], dtype='object')"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gdp</th>\n",
       "      <th>population</th>\n",
       "      <th>life_expectancy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>52.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "      <td>72.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "      <td>81.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         gdp  population  life_expectancy\n",
       "Country                                                  \n",
       "Afghanistan           1311.0  26528741.0             52.8\n",
       "Albania               8644.0   2968026.0             76.8\n",
       "Algeria              12314.0  34811059.0             75.5\n",
       "Angola                7103.0  19842251.0             56.7\n",
       "Antigua and Barbuda  25736.0     85350.0             75.5\n",
       "Argentina            14646.0  40381860.0             75.4\n",
       "Armenia               7383.0   2975029.0             72.3\n",
       "Australia            41312.0  21370348.0             81.6"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afghanistan</th>\n",
       "      <td>1311.0</td>\n",
       "      <td>26528741.0</td>\n",
       "      <td>52.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Antigua and Barbuda</th>\n",
       "      <td>25736.0</td>\n",
       "      <td>85350.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Argentina</th>\n",
       "      <td>14646.0</td>\n",
       "      <td>40381860.0</td>\n",
       "      <td>75.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Armenia</th>\n",
       "      <td>7383.0</td>\n",
       "      <td>2975029.0</td>\n",
       "      <td>72.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Australia</th>\n",
       "      <td>41312.0</td>\n",
       "      <td>21370348.0</td>\n",
       "      <td>81.6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         PKB   Populacja   ODŻ\n",
       "Country                                       \n",
       "Afghanistan           1311.0  26528741.0  52.8\n",
       "Albania               8644.0   2968026.0  76.8\n",
       "Algeria              12314.0  34811059.0  75.5\n",
       "Angola                7103.0  19842251.0  56.7\n",
       "Antigua and Barbuda  25736.0     85350.0  75.5\n",
       "Argentina            14646.0  40381860.0  75.4\n",
       "Armenia               7383.0   2975029.0  72.3\n",
       "Australia            41312.0  21370348.0  81.6"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = ['PKB', 'Populacja', 'ODŻ']\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Żeby odwołać się do poszczególnych wierszy należy wykorzystać metodę `loc`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PKB             14646.0\n",
       "Populacja    40381860.0\n",
       "ODŻ                75.4\n",
       "Name: Argentina, dtype: float64"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Argentina']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Metoda `loc` również może przyjąć listę wierszy: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            PKB   Populacja   ODŻ\n",
       "Country                          \n",
       "Albania  8644.0   2968026.0  76.8\n",
       "Angola   7103.0  19842251.0  56.7"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[['Albania', 'Angola']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Możemy również podać drugi parametr: nazwy kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            PKB   Populacja\n",
       "Country                    \n",
       "Albania  8644.0   2968026.0\n",
       "Angola   7103.0  19842251.0"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df.loc[['Albania', 'Angola'], ['PKB', 'Populacja']]\n",
    "\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Albo wykorzystać tzw. _slicing_, cyzli operator `:`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PKB</th>\n",
       "      <th>Populacja</th>\n",
       "      <th>ODŻ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Country</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Albania</th>\n",
       "      <td>8644.0</td>\n",
       "      <td>2968026.0</td>\n",
       "      <td>76.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Algeria</th>\n",
       "      <td>12314.0</td>\n",
       "      <td>34811059.0</td>\n",
       "      <td>75.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Angola</th>\n",
       "      <td>7103.0</td>\n",
       "      <td>19842251.0</td>\n",
       "      <td>56.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             PKB   Populacja   ODŻ\n",
       "Country                           \n",
       "Albania   8644.0   2968026.0  76.8\n",
       "Algeria  12314.0  34811059.0  75.5\n",
       "Angola    7103.0  19842251.0  56.7"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['Albania': 'Angola', 'PKB': 'ODŻ']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Żeby odwołać się do pojedyńczej wartości możemy użyć metody `at`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.at['Angola', 'PKB']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Dostęp do indeksu:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda',\n",
       "       'Argentina', 'Armenia', 'Australia'],\n",
       "      dtype='object', name='Country')"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Podstawowe metody `pd.Series` i `pd.DataFrame`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "May         682758       147898\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
    "'September': 673790, 'October': 444177})\n",
    "\n",
    "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
    "'September': 140492, 'October': 53596})\n",
    "\n",
    "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `head` pozwala tworzy nową ramkę danych z pierwszymi 5 przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      members  occasionals\n",
       "May    682758       147898\n",
       "June   737011       171494"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `tail` robi to samo, ale z 5 ostatnymi przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `sample` pozwala na stworzenie nowej ramki danych z wylosowanymi `n` przykładami:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "September   673790       140492\n",
       "May         682758       147898\n",
       "June        737011       171494"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sample(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `describe` zwraca podstawowe statystyki m.in.: liczebność, średnią, wartości skrajne: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "May         682758       147898\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>6.000000</td>\n",
       "      <td>6.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>665172.833333</td>\n",
       "      <td>152434.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>116216.045456</td>\n",
       "      <td>54783.506738</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>444177.000000</td>\n",
       "      <td>53596.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>673790.000000</td>\n",
       "      <td>142343.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>678274.000000</td>\n",
       "      <td>159696.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>723447.750000</td>\n",
       "      <td>188610.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>779511.000000</td>\n",
       "      <td>206809.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             members    occasionals\n",
       "count       6.000000       6.000000\n",
       "mean   665172.833333  152434.166667\n",
       "std    116216.045456   54783.506738\n",
       "min    444177.000000   53596.000000\n",
       "25%    673790.000000  142343.500000\n",
       "50%    678274.000000  159696.000000\n",
       "75%    723447.750000  188610.500000\n",
       "max    779511.000000  206809.000000"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metoda `info` zwraca informacje techniczne o kolumnach: np. typ danych:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Index: 6 entries, May to October\n",
      "Data columns (total 2 columns):\n",
      " #   Column       Non-Null Count  Dtype\n",
      "---  ------       --------------  -----\n",
      " 0   members      6 non-null      int64\n",
      " 1   occasionals  6 non-null      int64\n",
      "dtypes: int64(2)\n",
      "memory usage: 144.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Podstawową informacją o ramce danych to liczba przykładów w ramce danych. Możemy wykorzystać to tego funkcję `len`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Natomiast atrybut `shape` zwraca nam krotkę z liczbą przykładów i liczbą kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6, 2)"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operacja arytmetyczne\n",
    "\n",
    " * `max`, `idxmax`\n",
    " * `min`, `idxmin`\n",
    " * `mean`\n",
    " * `count`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>members</th>\n",
       "      <th>occasionals</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>May</th>\n",
       "      <td>682758</td>\n",
       "      <td>147898</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>June</th>\n",
       "      <td>737011</td>\n",
       "      <td>171494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>July</th>\n",
       "      <td>779511</td>\n",
       "      <td>194316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>August</th>\n",
       "      <td>673790</td>\n",
       "      <td>206809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>September</th>\n",
       "      <td>673790</td>\n",
       "      <td>140492</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>October</th>\n",
       "      <td>444177</td>\n",
       "      <td>53596</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           members  occasionals\n",
       "May         682758       147898\n",
       "June        737011       171494\n",
       "July        779511       194316\n",
       "August      673790       206809\n",
       "September   673790       140492\n",
       "October     444177        53596"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Zbiór wartości i zliczanie wartości:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 3 2]\n",
      "3    4\n",
      "1    3\n",
      "2    3\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
    "\n",
    "print(dane.unique())\n",
    "\n",
    "dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
    "\n",
    "print(dane.value_counts())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    3\n",
       "2    2\n",
       "3    3\n",
       "4    1\n",
       "5    1\n",
       "6    2\n",
       "7    3\n",
       "8    2\n",
       "9    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dane"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3    4\n",
      "1    3\n",
      "2    3\n",
      "Name: count, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print(dane.value_counts())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Sprawdzanie czy brakuje danych:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      False\n",
       "2      False\n",
       "3      False\n",
       "4      False\n",
       "5      False\n",
       "       ...  \n",
       "887    False\n",
       "888    False\n",
       "889     True\n",
       "890    False\n",
       "891    False\n",
       "Name: Age, Length: 891, dtype: bool"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "df.Age.isnull()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Dodawanie i modyfikowanie danych"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "conts = pd.Series({\n",
    "    'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n",
    "\n",
    "df['continent'] = conts\n",
    "\n",
    "df['tmp'] = 1\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.loc['Argentina'] = {\n",
    "    'female_BMI': 27.46523,\n",
    "    'male_BMI': 27.5017,\n",
    "    'gdp': 14646.0,\n",
    "    'population': 40381860.0,\n",
    "    'under5mortality': 15.4,\n",
    "    'life_expectancy': 75.4,\n",
    "    'fertility': 2.24\n",
    "}\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.drop('gdp', axis='columns')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Filtrowanie danych"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Biblioteka pandas posiada 2 sposoby na filtrowanie danych zawartych w ramce danych:\n",
    " * operator `[]` -- najbardziej rozpowszechniony;\n",
    " * metoda `query()`.\n",
    "Oba sposoby mają różną składnię.\n",
    " "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen\\t Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen\\t Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "3                   1       3   \n",
       "4                   1       1   \n",
       "5                   0       3   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "3                                      Heikkinen\\t Miss. Laina  female  26.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "5                                    Allen\\t Mr. William Henry    male  35.0   \n",
       "\n",
       "             SibSp  Parch            Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                          \n",
       "1                1      0         A/5 21171   7.2500   NaN        S  \n",
       "2                1      0          PC 17599  71.2833   C85        C  \n",
       "3                0      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "4                1      0            113803  53.1000  C123        S  \n",
       "5                0      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      0\n",
       "2      1\n",
       "3      1\n",
       "4      1\n",
       "5      0\n",
       "      ..\n",
       "887    0\n",
       "888    1\n",
       "889    0\n",
       "890    1\n",
       "891    0\n",
       "Name: Survived, Length: 891, dtype: int64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Survived']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df['Survived']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId\n",
       "1      False\n",
       "2       True\n",
       "3       True\n",
       "4       True\n",
       "5      False\n",
       "       ...  \n",
       "887    False\n",
       "888     True\n",
       "889    False\n",
       "890     True\n",
       "891    False\n",
       "Name: Survived, Length: 891, dtype: bool"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Survived'] == 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df_survived = df[df['Pclass'] == 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "891"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "216"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df_survived)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>McCarthy\\t Mr. Timothy J</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17463</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>E46</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Sloper\\t Mr. William Thompson</td>\n",
       "      <td>male</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113788</td>\n",
       "      <td>35.5000</td>\n",
       "      <td>A6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>872</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>11751</td>\n",
       "      <td>52.5542</td>\n",
       "      <td>D35</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>873</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Carlsson\\t Mr. Frans Olof</td>\n",
       "      <td>male</td>\n",
       "      <td>33.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>695</td>\n",
       "      <td>5.0000</td>\n",
       "      <td>B51 B53 B55</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>880</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
       "      <td>female</td>\n",
       "      <td>56.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11767</td>\n",
       "      <td>83.1583</td>\n",
       "      <td>C50</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Graham\\t Miss. Margaret Edith</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>B42</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Behr\\t Mr. Karl Howell</td>\n",
       "      <td>male</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>111369</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>C148</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>216 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "7                   0       1   \n",
       "12                  1       1   \n",
       "24                  1       1   \n",
       "...               ...     ...   \n",
       "872                 1       1   \n",
       "873                 0       1   \n",
       "880                 1       1   \n",
       "888                 1       1   \n",
       "890                 1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "7                                     McCarthy\\t Mr. Timothy J    male  54.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "24                               Sloper\\t Mr. William Thompson    male  28.0   \n",
       "...                                                        ...     ...   ...   \n",
       "872          Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)  female  47.0   \n",
       "873                                  Carlsson\\t Mr. Frans Olof    male  33.0   \n",
       "880             Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)  female  56.0   \n",
       "888                              Graham\\t Miss. Margaret Edith  female  19.0   \n",
       "890                                     Behr\\t Mr. Karl Howell    male  26.0   \n",
       "\n",
       "             SibSp  Parch    Ticket     Fare        Cabin Embarked  \n",
       "PassengerId                                                         \n",
       "2                1      0  PC 17599  71.2833          C85        C  \n",
       "4                1      0    113803  53.1000         C123        S  \n",
       "7                0      0     17463  51.8625          E46        S  \n",
       "12               0      0    113783  26.5500         C103        S  \n",
       "24               0      0    113788  35.5000           A6        S  \n",
       "...            ...    ...       ...      ...          ...      ...  \n",
       "872              1      1     11751  52.5542          D35        S  \n",
       "873              0      0       695   5.0000  B51 B53 B55        S  \n",
       "880              0      1     11767  83.1583          C50        C  \n",
       "888              0      0    112053  30.0000          B42        S  \n",
       "890              0      0    111369  30.0000         C148        C  \n",
       "\n",
       "[216 rows x 11 columns]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_survived"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operatory\n",
    "\n",
    "* `&` - koniukcja (i)\n",
    "* `|` - alternatywa (lub)\n",
    "* `~` - negacja (nie)\n",
    "* `()` - jeżeli mamy kilka warunków to warto je uporządkować w nawiasy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17569</td>\n",
       "      <td>146.5208</td>\n",
       "      <td>B78</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
       "      <td>female</td>\n",
       "      <td>49.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17572</td>\n",
       "      <td>76.7292</td>\n",
       "      <td>D33</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>857</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Wick\\t Mrs. George Dennick (Mary Hitchcock)</td>\n",
       "      <td>female</td>\n",
       "      <td>45.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>36928</td>\n",
       "      <td>164.8667</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>863</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Swift\\t Mrs. Frederick Joel (Margaret Welles B...</td>\n",
       "      <td>female</td>\n",
       "      <td>48.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17466</td>\n",
       "      <td>25.9292</td>\n",
       "      <td>D17</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>872</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
       "      <td>female</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>11751</td>\n",
       "      <td>52.5542</td>\n",
       "      <td>D35</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>880</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
       "      <td>female</td>\n",
       "      <td>56.0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>11767</td>\n",
       "      <td>83.1583</td>\n",
       "      <td>C50</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Graham\\t Miss. Margaret Edith</td>\n",
       "      <td>female</td>\n",
       "      <td>19.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>112053</td>\n",
       "      <td>30.0000</td>\n",
       "      <td>B42</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>94 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "12                  1       1   \n",
       "32                  1       1   \n",
       "53                  1       1   \n",
       "...               ...     ...   \n",
       "857                 1       1   \n",
       "863                 1       1   \n",
       "872                 1       1   \n",
       "880                 1       1   \n",
       "888                 1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "32             Spencer\\t Mrs. William Augustus (Marie Eugenie)  female   NaN   \n",
       "53                   Harper\\t Mrs. Henry Sleeper (Myna Haxtun)  female  49.0   \n",
       "...                                                        ...     ...   ...   \n",
       "857                Wick\\t Mrs. George Dennick (Mary Hitchcock)  female  45.0   \n",
       "863          Swift\\t Mrs. Frederick Joel (Margaret Welles B...  female  48.0   \n",
       "872          Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)  female  47.0   \n",
       "880             Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)  female  56.0   \n",
       "888                              Graham\\t Miss. Margaret Edith  female  19.0   \n",
       "\n",
       "             SibSp  Parch    Ticket      Fare Cabin Embarked  \n",
       "PassengerId                                                   \n",
       "2                1      0  PC 17599   71.2833   C85        C  \n",
       "4                1      0    113803   53.1000  C123        S  \n",
       "12               0      0    113783   26.5500  C103        S  \n",
       "32               1      0  PC 17569  146.5208   B78        C  \n",
       "53               1      0  PC 17572   76.7292   D33        C  \n",
       "...            ...    ...       ...       ...   ...      ...  \n",
       "857              1      1     36928  164.8667   NaN        S  \n",
       "863              0      0     17466   25.9292   D17        S  \n",
       "872              1      1     11751   52.5542   D35        S  \n",
       "880              0      1     11767   83.1583   C50        C  \n",
       "888              0      0    112053   30.0000   B42        S  \n",
       "\n",
       "[94 rows x 11 columns]"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pierwsza_klasa = df['Pclass'] == 1\n",
    "kobiety = df['Sex'] == 'female'\n",
    "\n",
    "df[pierwsza_klasa & kobiety]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Palsson\\t Master. Gosta Leonard</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>349909</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>237736</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>861</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Hansen\\t Mr. Claus Peter</td>\n",
       "      <td>male</td>\n",
       "      <td>41.0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>350026</td>\n",
       "      <td>14.1083</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>862</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Giles\\t Mr. Frederick Edward</td>\n",
       "      <td>male</td>\n",
       "      <td>21.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>28134</td>\n",
       "      <td>11.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>69.5500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Duran y More\\t Miss. Asuncion</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>SC/PARIS 2149</td>\n",
       "      <td>13.8583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>875</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
       "      <td>female</td>\n",
       "      <td>28.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>P/PP 3381</td>\n",
       "      <td>24.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>192 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "8                   0       3   \n",
       "10                  1       2   \n",
       "...               ...     ...   \n",
       "861                 0       3   \n",
       "862                 0       2   \n",
       "864                 0       3   \n",
       "867                 1       2   \n",
       "875                 1       2   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "8                              Palsson\\t Master. Gosta Leonard    male   2.0   \n",
       "10                        Nasser\\t Mrs. Nicholas (Adele Achem)  female  14.0   \n",
       "...                                                        ...     ...   ...   \n",
       "861                                   Hansen\\t Mr. Claus Peter    male  41.0   \n",
       "862                               Giles\\t Mr. Frederick Edward    male  21.0   \n",
       "864                         Sage\\t Miss. Dorothy Edith \"Dolly\"  female   NaN   \n",
       "867                              Duran y More\\t Miss. Asuncion  female  27.0   \n",
       "875                     Abelson\\t Mrs. Samuel (Hannah Wizosky)  female  28.0   \n",
       "\n",
       "             SibSp  Parch         Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                       \n",
       "1                1      0      A/5 21171   7.2500   NaN        S  \n",
       "2                1      0       PC 17599  71.2833   C85        C  \n",
       "4                1      0         113803  53.1000  C123        S  \n",
       "8                3      1         349909  21.0750   NaN        S  \n",
       "10               1      0         237736  30.0708   NaN        C  \n",
       "...            ...    ...            ...      ...   ...      ...  \n",
       "861              2      0         350026  14.1083   NaN        S  \n",
       "862              1      0          28134  11.5000   NaN        S  \n",
       "864              8      2       CA. 2343  69.5500   NaN        S  \n",
       "867              1      0  SC/PARIS 2149  13.8583   NaN        C  \n",
       "875              1      0      P/PP 3381  24.0000   NaN        C  \n",
       "\n",
       "[192 rows x 11 columns]"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['SibSp'] > df['Parch']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### `pd.DataFrame.query`\n",
    "\n",
    "Innym sposobem na filtrowanie danych jest metoda `query`, która jako argument przyjmuje wyrażenie:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>McCarthy\\t Mr. Timothy J</td>\n",
       "      <td>male</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17463</td>\n",
       "      <td>51.8625</td>\n",
       "      <td>E46</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Sloper\\t Mr. William Thompson</td>\n",
       "      <td>male</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113788</td>\n",
       "      <td>35.5000</td>\n",
       "      <td>A6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "7                   0       1   \n",
       "12                  1       1   \n",
       "24                  1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "7                                     McCarthy\\t Mr. Timothy J    male  54.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "24                               Sloper\\t Mr. William Thompson    male  28.0   \n",
       "\n",
       "             SibSp  Parch    Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                  \n",
       "2                1      0  PC 17599  71.2833   C85        C  \n",
       "4                1      0    113803  53.1000  C123        S  \n",
       "7                0      0     17463  51.8625   E46        S  \n",
       "12               0      0    113783  26.5500  C103        S  \n",
       "24               0      0    113788  35.5000    A6        S  "
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('Pclass == 1').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Bonnell\\t Miss. Elizabeth</td>\n",
       "      <td>female</td>\n",
       "      <td>58.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>113783</td>\n",
       "      <td>26.5500</td>\n",
       "      <td>C103</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17569</td>\n",
       "      <td>146.5208</td>\n",
       "      <td>B78</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
       "      <td>female</td>\n",
       "      <td>49.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17572</td>\n",
       "      <td>76.7292</td>\n",
       "      <td>D33</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "12                  1       1   \n",
       "32                  1       1   \n",
       "53                  1       1   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "12                                   Bonnell\\t Miss. Elizabeth  female  58.0   \n",
       "32             Spencer\\t Mrs. William Augustus (Marie Eugenie)  female   NaN   \n",
       "53                   Harper\\t Mrs. Henry Sleeper (Myna Haxtun)  female  49.0   \n",
       "\n",
       "             SibSp  Parch    Ticket      Fare Cabin Embarked  \n",
       "PassengerId                                                   \n",
       "2                1      0  PC 17599   71.2833   C85        C  \n",
       "4                1      0    113803   53.1000  C123        S  \n",
       "12               0      0    113783   26.5500  C103        S  \n",
       "32               1      0  PC 17569  146.5208   B78        C  \n",
       "53               1      0  PC 17572   76.7292   D33        C  "
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('(Pclass == 1) and (Sex == \"female\")').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PassengerId</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund\\t Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Palsson\\t Master. Gosta Leonard</td>\n",
       "      <td>male</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>349909</td>\n",
       "      <td>21.0750</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
       "      <td>female</td>\n",
       "      <td>14.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>237736</td>\n",
       "      <td>30.0708</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>861</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Hansen\\t Mr. Claus Peter</td>\n",
       "      <td>male</td>\n",
       "      <td>41.0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>350026</td>\n",
       "      <td>14.1083</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>862</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>Giles\\t Mr. Frederick Edward</td>\n",
       "      <td>male</td>\n",
       "      <td>21.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>28134</td>\n",
       "      <td>11.5000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>864</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
       "      <td>female</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>CA. 2343</td>\n",
       "      <td>69.5500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Duran y More\\t Miss. Asuncion</td>\n",
       "      <td>female</td>\n",
       "      <td>27.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>SC/PARIS 2149</td>\n",
       "      <td>13.8583</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>875</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
       "      <td>female</td>\n",
       "      <td>28.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>P/PP 3381</td>\n",
       "      <td>24.0000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>192 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             Survived  Pclass  \\\n",
       "PassengerId                     \n",
       "1                   0       3   \n",
       "2                   1       1   \n",
       "4                   1       1   \n",
       "8                   0       3   \n",
       "10                  1       2   \n",
       "...               ...     ...   \n",
       "861                 0       3   \n",
       "862                 0       2   \n",
       "864                 0       3   \n",
       "867                 1       2   \n",
       "875                 1       2   \n",
       "\n",
       "                                                          Name     Sex   Age  \\\n",
       "PassengerId                                                                    \n",
       "1                                     Braund\\t Mr. Owen Harris    male  22.0   \n",
       "2            Cumings\\t Mrs. John Bradley (Florence Briggs T...  female  38.0   \n",
       "4                Futrelle\\t Mrs. Jacques Heath (Lily May Peel)  female  35.0   \n",
       "8                              Palsson\\t Master. Gosta Leonard    male   2.0   \n",
       "10                        Nasser\\t Mrs. Nicholas (Adele Achem)  female  14.0   \n",
       "...                                                        ...     ...   ...   \n",
       "861                                   Hansen\\t Mr. Claus Peter    male  41.0   \n",
       "862                               Giles\\t Mr. Frederick Edward    male  21.0   \n",
       "864                         Sage\\t Miss. Dorothy Edith \"Dolly\"  female   NaN   \n",
       "867                              Duran y More\\t Miss. Asuncion  female  27.0   \n",
       "875                     Abelson\\t Mrs. Samuel (Hannah Wizosky)  female  28.0   \n",
       "\n",
       "             SibSp  Parch         Ticket     Fare Cabin Embarked  \n",
       "PassengerId                                                       \n",
       "1                1      0      A/5 21171   7.2500   NaN        S  \n",
       "2                1      0       PC 17599  71.2833   C85        C  \n",
       "4                1      0         113803  53.1000  C123        S  \n",
       "8                3      1         349909  21.0750   NaN        S  \n",
       "10               1      0         237736  30.0708   NaN        C  \n",
       "...            ...    ...            ...      ...   ...      ...  \n",
       "861              2      0         350026  14.1083   NaN        S  \n",
       "862              1      0          28134  11.5000   NaN        S  \n",
       "864              8      2       CA. 2343  69.5500   NaN        S  \n",
       "867              1      0  SC/PARIS 2149  13.8583   NaN        C  \n",
       "875              1      0      P/PP 3381  24.0000   NaN        C  \n",
       "\n",
       "[192 rows x 11 columns]"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query('SibSp > Parch')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(113, 11)"
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "young = 18\n",
    "df.query('Age < @young').shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Operacje na wierszach i kolumnach"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Iterowanie po ramce danych oznacza oznacza przejście po nazwach kolumn:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "for column_name in df:\n",
    "    print(column_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for col_name, series in df.items():\n",
    "    print(col_name, series)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "for idx, row in df.iterrows():\n",
    "    print(idx, '\\n', row)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def bmi_level(bmi):\n",
    "    if bmi <= 18.5:\n",
    "        level =  'underweight'\n",
    "    elif bmi < 25:\n",
    "        level =  'normal'\n",
    "    elif bmi < 30:\n",
    "        level =  'overweight'\n",
    "    else:\n",
    "        level = 'obese'\n",
    "    return level\n",
    "\n",
    "s = df['male_BMI'].map(bmi_level)\n",
    "    \n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def bmi_level(row_data):\n",
    "    bmi = row_data['male_BMI']\n",
    "    if bmi <= 18.5:\n",
    "        return 'underweight'\n",
    "    elif bmi < 25:\n",
    "        return 'normal'\n",
    "    elif bmi < 30:\n",
    "        return 'overweight'\n",
    "    return  'obese'\n",
    "\n",
    "df.apply(bmi_level, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.transpose()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Grupowanie (`groupby`)\n",
    "\n",
    "Często zdarza się, gdy potrzebujemy podzielić dane ze względu na wartości w zadanej kolumnie, a następnie obliczenie zebranie danych w każdej z grup. Do tego służy metody `groupby`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv('./nba.csv')\n",
    "\n",
    "#df.sample(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Team</th>\n",
       "      <th>Number</th>\n",
       "      <th>Position</th>\n",
       "      <th>Age</th>\n",
       "      <th>Height</th>\n",
       "      <th>Weight</th>\n",
       "      <th>College</th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Avery Bradley</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>0.0</td>\n",
       "      <td>PG</td>\n",
       "      <td>25.0</td>\n",
       "      <td>6-2</td>\n",
       "      <td>180.0</td>\n",
       "      <td>Texas</td>\n",
       "      <td>7730337.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jae Crowder</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>99.0</td>\n",
       "      <td>SF</td>\n",
       "      <td>25.0</td>\n",
       "      <td>6-6</td>\n",
       "      <td>235.0</td>\n",
       "      <td>Marquette</td>\n",
       "      <td>6796117.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>John Holland</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>30.0</td>\n",
       "      <td>SG</td>\n",
       "      <td>27.0</td>\n",
       "      <td>6-5</td>\n",
       "      <td>205.0</td>\n",
       "      <td>Boston University</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>R.J. Hunter</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>28.0</td>\n",
       "      <td>SG</td>\n",
       "      <td>22.0</td>\n",
       "      <td>6-5</td>\n",
       "      <td>185.0</td>\n",
       "      <td>Georgia State</td>\n",
       "      <td>1148640.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jonas Jerebko</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PF</td>\n",
       "      <td>29.0</td>\n",
       "      <td>6-10</td>\n",
       "      <td>231.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5000000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>453</th>\n",
       "      <td>Shelvin Mack</td>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>8.0</td>\n",
       "      <td>PG</td>\n",
       "      <td>26.0</td>\n",
       "      <td>6-3</td>\n",
       "      <td>203.0</td>\n",
       "      <td>Butler</td>\n",
       "      <td>2433333.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>454</th>\n",
       "      <td>Raul Neto</td>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>25.0</td>\n",
       "      <td>PG</td>\n",
       "      <td>24.0</td>\n",
       "      <td>6-1</td>\n",
       "      <td>179.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>900000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>455</th>\n",
       "      <td>Tibor Pleiss</td>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>21.0</td>\n",
       "      <td>C</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7-3</td>\n",
       "      <td>256.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2900000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>456</th>\n",
       "      <td>Jeff Withey</td>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>24.0</td>\n",
       "      <td>C</td>\n",
       "      <td>26.0</td>\n",
       "      <td>7-0</td>\n",
       "      <td>231.0</td>\n",
       "      <td>Kansas</td>\n",
       "      <td>947276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>457</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>458 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              Name            Team  Number Position   Age Height  Weight  \\\n",
       "0    Avery Bradley  Boston Celtics     0.0       PG  25.0    6-2   180.0   \n",
       "1      Jae Crowder  Boston Celtics    99.0       SF  25.0    6-6   235.0   \n",
       "2     John Holland  Boston Celtics    30.0       SG  27.0    6-5   205.0   \n",
       "3      R.J. Hunter  Boston Celtics    28.0       SG  22.0    6-5   185.0   \n",
       "4    Jonas Jerebko  Boston Celtics     8.0       PF  29.0   6-10   231.0   \n",
       "..             ...             ...     ...      ...   ...    ...     ...   \n",
       "453   Shelvin Mack       Utah Jazz     8.0       PG  26.0    6-3   203.0   \n",
       "454      Raul Neto       Utah Jazz    25.0       PG  24.0    6-1   179.0   \n",
       "455   Tibor Pleiss       Utah Jazz    21.0        C  26.0    7-3   256.0   \n",
       "456    Jeff Withey       Utah Jazz    24.0        C  26.0    7-0   231.0   \n",
       "457            NaN             NaN     NaN      NaN   NaN    NaN     NaN   \n",
       "\n",
       "               College     Salary  \n",
       "0                Texas  7730337.0  \n",
       "1            Marquette  6796117.0  \n",
       "2    Boston University        NaN  \n",
       "3        Georgia State  1148640.0  \n",
       "4                  NaN  5000000.0  \n",
       "..                 ...        ...  \n",
       "453             Butler  2433333.0  \n",
       "454                NaN   900000.0  \n",
       "455                NaN  2900000.0  \n",
       "456             Kansas   947276.0  \n",
       "457                NaN        NaN  \n",
       "\n",
       "[458 rows x 9 columns]"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "_Przykład_: chcemy obliczyć średnią wypłatę dla każdej z drużyn."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Team</th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>7730337.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>6796117.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>1148640.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>5000000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>453</th>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>2433333.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>454</th>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>900000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>455</th>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>2900000.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>456</th>\n",
       "      <td>Utah Jazz</td>\n",
       "      <td>947276.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>457</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>458 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               Team     Salary\n",
       "0    Boston Celtics  7730337.0\n",
       "1    Boston Celtics  6796117.0\n",
       "2    Boston Celtics        NaN\n",
       "3    Boston Celtics  1148640.0\n",
       "4    Boston Celtics  5000000.0\n",
       "..              ...        ...\n",
       "453       Utah Jazz  2433333.0\n",
       "454       Utah Jazz   900000.0\n",
       "455       Utah Jazz  2900000.0\n",
       "456       Utah Jazz   947276.0\n",
       "457             NaN        NaN\n",
       "\n",
       "[458 rows x 2 columns]"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Team', 'Salary']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Salary</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Team</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Atlanta Hawks</th>\n",
       "      <td>2854940.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Boston Celtics</th>\n",
       "      <td>3021242.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Brooklyn Nets</th>\n",
       "      <td>1335480.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Charlotte Hornets</th>\n",
       "      <td>4204200.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Chicago Bulls</th>\n",
       "      <td>2380440.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      Salary\n",
       "Team                        \n",
       "Atlanta Hawks      2854940.0\n",
       "Boston Celtics     3021242.5\n",
       "Brooklyn Nets      1335480.0\n",
       "Charlotte Hornets  4204200.0\n",
       "Chicago Bulls      2380440.0"
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Team', 'Salary']].groupby('Team').median().h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Możemy też podać listę nazw kolumn. Wtedy wartości zostaną obliczone dla każdej z wytworzonych grup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.groupby(['Team', 'Position'])['Salary'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " * `sum()`\n",
    " * `min()`\n",
    " * `max()`\n",
    " * `mean()`\n",
    " * `size()`\n",
    " * `describe()`\n",
    " * `first()`\n",
    " * `last()`\n",
    " * `count()`\n",
    " * `std()`\n",
    " * `var()`\n",
    " * `sem()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def group_range(x):\n",
    "    return x.max() - x.min()\n",
    "\n",
    "df[['Position', 'Salary']].groupby('Position').apply(group_range)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "gb = df.groupby(['Position'])\n",
    "\n",
    "print('Liczba grup:', gb.ngroups)\n",
    "print(gb.groups.keys())\n",
    "\n",
    "print(gb.get_group('C').head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "df.Height.str.split('-').str[0].astype('Int64') * 2.56"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Pivot\n",
    "Metoda `pivot` pozwala na stworzenie nowej ramki danych, gdzie indeks i nazwy kolumn są wartościami początkowej ranki danych. \n",
    "\n",
    "_Przykład_: zobaczmy na poniższą ramkę danych, która zawiera informacje o jakości tłumaczenia dla pary językowej hausa-angielski. Kolumna `system` zawiera nazwę systemu, kolumna `metric` - nazwę metryki, zaś kolumna `score`- wartość metryki. Chcemy przedstawić te dane w następujący sposób: jako klucz chcemy mieć nazwę systemu, zaś jako kolumny -  metryki. Możemy wykorzystać do tego metodę `pivot`, gdzie musimy podać 3 argumenty:\n",
    " * `index`: nazwę kolumny, na podstawie której zostanie stworzony indeks;\n",
    " * `columns`: nazwa kolumny, które zawiera nazwy kolumn dla nowej ramki danych;\n",
    " * `values`: nazwa kolumny, która zawiera interesujące nas dane."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n",
    "df = df[df.pair == 'ha-en']\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.pivot(index='system', columns='metric', values='score')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Dane tekstowe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "`pandas` posiada udogodnienia do pracy z wartościami tekstowymi:\n",
    " * dostęp następuje przez atrybut `str`;\n",
    " * funkcje:\n",
    "    * formatujące: `lower()`, `upper()`;\n",
    "    * wyrażenia regularne: `contains()`, `match()`;\n",
    "    * inne: `split()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.Name.str.upper()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "print(df.Name.head())\n",
    "df.Name.str.contains('Miss|Mrs').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.Name.str.split('\\t', expand=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.Name.str.split('\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.Name.str.split('\\t').str[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Zadanie\n",
    "Zestaw `nba.csv` zawiera informaję o wysokości zawodników. Oblicz wzrost każdego z zawodników w systemie metrycznym przyjmując, że stop to `30.48` cm., a cal to `2.54` cm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "interpreter": {
   "hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}