8516 lines
257 KiB
Plaintext
8516 lines
257 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# Analiza Danych w Pythonie: `pandas`\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### `pandas`\n",
|
||
"Biblioteka `pandas` jest podstawowym narzędziem w ekosystemie Pythona do analizy danych:\n",
|
||
" * dostarcza dwa podstawowe typy danych: \n",
|
||
" * `Series` (szereg, 1D)\n",
|
||
" * `DataFrame` (ramka danych, 2D)\n",
|
||
" * operacje na tych obiektach: obsługa brakujących wartości, łączenie danych;\n",
|
||
" * obsługuje dane różnego typu, np. szeregi czasowe;\n",
|
||
" * biblioteka bazuje na `numpy` -- bibliotece do obliczeń numerycznych;\n",
|
||
" * pozwala też na prostą wizualizację danych;\n",
|
||
" * ETL: extract, transform, load."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby zaimportowąc bibliotekę `pandas` wystarczy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### __Zadanie 0__: sprawdź, czy masz zainstalowaną bibliotekę `pandas`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Szeregi](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) (`pd.Series`)\n",
|
||
"\n",
|
||
" Szereg reprezentuje jednorodne dane jednowymiarowe - jest odpowiednikiem wektora w R.\n",
|
||
" * Szeregi możemy tworzyć na różne sposoby (więcej za chwilę), np. z obiektów tj. listy i słowniki.\n",
|
||
" * Dane muszą być jednorodne. W przeciwnym przypadku nastąpi automatyczna konwersja.\n",
|
||
" * Podczas tworzenia szeregu musimy podać jeden obowiązkowy argument `data` - dane.\n",
|
||
" * Ponadto możemy podać też indeks (`index`), typ danych (`dtype`) lub nazwę (`name`).\n",
|
||
" \n",
|
||
" \n",
|
||
" ```\n",
|
||
" class pandas.Series(data=None, index=None, dtype=None, name=None)\n",
|
||
" ```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenie szeregu mozemy podać dane w formacie listy lub słownika.\n",
|
||
"\n",
|
||
"Poniżej jest przykład przedstawiający tworzenie szeregu z danych, które są zawarte w liście:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 211819\n",
|
||
"1 682758\n",
|
||
"2 737011\n",
|
||
"3 779511\n",
|
||
"4 673790\n",
|
||
"5 673790\n",
|
||
"6 444177\n",
|
||
"7 136791\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511, 673790, 673790, 444177, 136791]\n",
|
||
"\n",
|
||
"s = pd.Series(data)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku, gdy dane pochodzą z listy i nie podaliśmy indeksu, pandas doda automatyczny indeks liczbowy zaczynający się od 0."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku przekazania słownika jako danych do szeregu, pandas wykorzysta klucze do stworzenia indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenia szeregu możemy zdefiniować indeks, jak i nazwę szeregu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819.0\n",
|
||
"May 682758.0\n",
|
||
"June 737011.0\n",
|
||
"July 779511.0\n",
|
||
"Name: Rides, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"months = ['April', 'May', 'June', 'July']\n",
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511]\n",
|
||
"\n",
|
||
"s = pd.Series(data=data, index=months, dtype=float, name='Rides')\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Odwołanie się do poszczególnego elementu odbywa się przy pomocy klucza z indeksu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"211819\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"print(s['April'])\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dodanie elementu do szeregu odbywa się poprzez definiowanie nowego klucza:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Więcej nt. indeksowania w szeregach w dalszej części kursu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podstawowa cechą szeregu jest wykonywanie operacji w sposób wektorowy. Działa to w następujący sposób:\n",
|
||
" * gdy w obu szeregach jest zawarty ten sam klucz, to są sumowane ich wartości;\n",
|
||
" * w przeciwnym przypadku wartość klucza w wynikowym szeregu to `pd.NaN`. \n",
|
||
" * Równoważnie możemy wykorzystać metodę `pandas.Series.add`. W tym przypadku możemy podać domyślną wartość w przypadku braku klucza."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"August 880599.0\n",
|
||
"July 973827.0\n",
|
||
"June 908505.0\n",
|
||
"May 830656.0\n",
|
||
"October NaN\n",
|
||
"September 814282.0\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'August': 673790, 'July': 779511,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492})\n",
|
||
"\n",
|
||
"all_data = members + occasionals\n",
|
||
"# Równoważnie\n",
|
||
"all_data = members.add(occasionals)\n",
|
||
"all_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy wykonać operacje arytmetyczne na szeregu: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 683758\n",
|
||
"June 738011\n",
|
||
"July 780511\n",
|
||
"August 674790\n",
|
||
"September 674790\n",
|
||
"October 445177\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"members += 1000\n",
|
||
"\n",
|
||
"members"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Podsumowanie\n",
|
||
" * Szeregi działają podobnie do słowników, z tą różnicą, że wartości muszą być jednorodne (tego samego typu).\n",
|
||
" * Odwołanie do poszczególnych elementów odbywa się poprzez nawiasy `[]` i podanie klucza.\n",
|
||
" * W przeciwieństwie do słowników, możemy w prosty sposób wykonywać operacje arytmetyczne."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie 1\n",
|
||
" * Stwórz szereg `n`, który będzie zawierać liczby od 0 do 10 (włącznie).\n",
|
||
" * Stwórz szereg `n2`, który będzie zawierać kwadraty liczb od 0 do 10 (włącznie).\n",
|
||
" * Następnie stwórz szereg `trojkatne`, który będzie sumą powyższych szeregów podzieloną przez 2."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"0 0\n",
|
||
"1 1\n",
|
||
"2 2\n",
|
||
"3 3\n",
|
||
"4 4\n",
|
||
"5 5\n",
|
||
"6 6\n",
|
||
"7 7\n",
|
||
"8 8\n",
|
||
"9 9\n",
|
||
"10 10\n",
|
||
"dtype: int64 0 0\n",
|
||
"1 1\n",
|
||
"2 4\n",
|
||
"3 9\n",
|
||
"4 16\n",
|
||
"5 25\n",
|
||
"6 36\n",
|
||
"7 49\n",
|
||
"8 64\n",
|
||
"9 81\n",
|
||
"10 100\n",
|
||
"dtype: int64 0 0.0\n",
|
||
"1 1.5\n",
|
||
"2 4.0\n",
|
||
"3 7.5\n",
|
||
"4 12.0\n",
|
||
"5 17.5\n",
|
||
"6 24.0\n",
|
||
"7 31.5\n",
|
||
"8 40.0\n",
|
||
"9 49.5\n",
|
||
"10 60.0\n",
|
||
"dtype: float64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"n = pd.Series(range(0, 11))\n",
|
||
"n2 = n * n\n",
|
||
"trojkatne = n + n2 / 2\n",
|
||
"print(n, n2, trojkatne)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Ramka danych](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) (`pd.DataFrame`)\n",
|
||
"\n",
|
||
"Ramka danych jest podstawową strukturą danych w bibliotece `pandas`, która pozwala na trzymanie i reprezentowanie danych tabelarycznych (dwuwymiarowych).\n",
|
||
" * Posiada kolumny (cechy) i wiersze (obserwacje, przykłady).\n",
|
||
" * Możemy też patrzeć na nią jak na słownik, którego wartościami są szeregi.\n",
|
||
"\n",
|
||
"```\n",
|
||
"class pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
|
||
"```\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ramkę danych możemy stworzyć na różne sposoby.\n",
|
||
"\n",
|
||
"Pierwszy z nich (\"kolumnowy\") polega na zdefiniowaniu ramki poprzez podanie szeregów jako kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Drugim popularnym sposobem jest przekazanie listy słowników. Wtedy `pandas` zinterpretuje to jako listę przykładów:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"0 682758 147898\n",
|
||
"1 737011 171494\n",
|
||
"2 779511 194316"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"data = [\n",
|
||
" {'members': 682758, 'occasionals': 147898},\n",
|
||
" {'occasionals': 171494,'members': 737011},\n",
|
||
" {'members': 779511, 'occasionals': 194316},\n",
|
||
"]\n",
|
||
"\n",
|
||
"df = pd.DataFrame(data)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy też wykorzystać metodę `from_dict` ([doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html)), która pozwala zdefiniować czy podane dane są w podane w postaci kolumnowej lub wierszowej:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"index\n",
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"\n",
|
||
"columns\n",
|
||
" May June July\n",
|
||
"members 682758 737011 779511\n",
|
||
"occasionals 147898 171494 194316\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"data = {\n",
|
||
" 'May': {'members': 682758, 'occasionals': 147898},\n",
|
||
" 'June': {'members': 737011, 'occasionals': 171494},\n",
|
||
" 'July': {'members': 779511, 'occasionals': 194316}\n",
|
||
"}\n",
|
||
"\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='index')\n",
|
||
"print('index\\n', df)\n",
|
||
"print()\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='columns')\n",
|
||
"print('columns\\n', df)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Wczytywanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka `pandas` pozwala na wczytanie i zapis danych z różnych formatów:\n",
|
||
" * formaty tekstowe, np. `csv`, `json`\n",
|
||
" * pliki arkuszy kalkulacyjnych: Excel (xls, xlsx)\n",
|
||
" * bazy danych\n",
|
||
" * inne: `sas` `spss`\n",
|
||
"\n",
|
||
"\n",
|
||
"Efektem wczytania danych jest odpowiednio stworzona ramka danych (`DataFrame`)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Jednym z najprostszych formatów danych jest format `csv`, gdzie kolejne wartości są rozdzielone przecinkiem.\n",
|
||
"\n",
|
||
"Żeby wczytać dane w takim formacie należy użyć funkcji `pandas.read_csv`.\n",
|
||
"\n",
|
||
"Pandas pozwala na ustawienie wielu parametrów (np. separator, cudzysłowy). Więcej na ten temat w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Country</th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Afghanistan</td>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Albania</td>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Algeria</td>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Angola</td>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Antigua and Barbuda</td>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>170</th>\n",
|
||
" <td>Venezuela</td>\n",
|
||
" <td>28.13408</td>\n",
|
||
" <td>27.44500</td>\n",
|
||
" <td>17911.0</td>\n",
|
||
" <td>28116716.0</td>\n",
|
||
" <td>17.1</td>\n",
|
||
" <td>74.2</td>\n",
|
||
" <td>2.53</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>171</th>\n",
|
||
" <td>Vietnam</td>\n",
|
||
" <td>21.06500</td>\n",
|
||
" <td>20.91630</td>\n",
|
||
" <td>4085.0</td>\n",
|
||
" <td>86589342.0</td>\n",
|
||
" <td>26.2</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>1.86</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>172</th>\n",
|
||
" <td>Palestine</td>\n",
|
||
" <td>29.02643</td>\n",
|
||
" <td>26.57750</td>\n",
|
||
" <td>3564.0</td>\n",
|
||
" <td>3854667.0</td>\n",
|
||
" <td>24.7</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>4.38</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>173</th>\n",
|
||
" <td>Zambia</td>\n",
|
||
" <td>23.05436</td>\n",
|
||
" <td>20.68321</td>\n",
|
||
" <td>3039.0</td>\n",
|
||
" <td>13114579.0</td>\n",
|
||
" <td>94.9</td>\n",
|
||
" <td>51.1</td>\n",
|
||
" <td>5.88</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>174</th>\n",
|
||
" <td>Zimbabwe</td>\n",
|
||
" <td>24.64522</td>\n",
|
||
" <td>22.02660</td>\n",
|
||
" <td>1286.0</td>\n",
|
||
" <td>13495462.0</td>\n",
|
||
" <td>98.3</td>\n",
|
||
" <td>47.3</td>\n",
|
||
" <td>3.85</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>175 rows × 8 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Country female_BMI male_BMI gdp population \\\n",
|
||
"0 Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"1 Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"2 Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"3 Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"4 Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
".. ... ... ... ... ... \n",
|
||
"170 Venezuela 28.13408 27.44500 17911.0 28116716.0 \n",
|
||
"171 Vietnam 21.06500 20.91630 4085.0 86589342.0 \n",
|
||
"172 Palestine 29.02643 26.57750 3564.0 3854667.0 \n",
|
||
"173 Zambia 23.05436 20.68321 3039.0 13114579.0 \n",
|
||
"174 Zimbabwe 24.64522 22.02660 1286.0 13495462.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"0 110.4 52.8 6.20 \n",
|
||
"1 17.9 76.8 1.76 \n",
|
||
"2 29.5 75.5 2.73 \n",
|
||
"3 192.0 56.7 6.43 \n",
|
||
"4 10.9 75.5 2.16 \n",
|
||
".. ... ... ... \n",
|
||
"170 17.1 74.2 2.53 \n",
|
||
"171 26.2 74.1 1.86 \n",
|
||
"172 24.7 74.1 4.38 \n",
|
||
"173 94.9 51.1 5.88 \n",
|
||
"174 98.3 47.3 3.85 \n",
|
||
"\n",
|
||
"[175 rows x 8 columns]"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('gapminder.csv')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35 \n",
|
||
"5 Allen\\t Mr. William Henry male 35 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', delimiter='\\t', index_col=0, nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do wczytania danych z arkusza kalkulacyjnego służy funkcja `pandas.read_excel`. Do otworzenia pliku `xlsx` może być koniecnze ustawienie parametru: `engine='openpyxl`. Więcej opcji w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"ename": "ImportError",
|
||
"evalue": "Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl.",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\compat\\_optional.py:142\u001b[0m, in \u001b[0;36mimport_optional_dependency\u001b[1;34m(name, extra, errors, min_version)\u001b[0m\n\u001b[0;32m 141\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[1;32m--> 142\u001b[0m module \u001b[39m=\u001b[39m importlib\u001b[39m.\u001b[39;49mimport_module(name)\n\u001b[0;32m 143\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mImportError\u001b[39;00m:\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\importlib\\__init__.py:126\u001b[0m, in \u001b[0;36mimport_module\u001b[1;34m(name, package)\u001b[0m\n\u001b[0;32m 125\u001b[0m level \u001b[39m+\u001b[39m\u001b[39m=\u001b[39m \u001b[39m1\u001b[39m\n\u001b[1;32m--> 126\u001b[0m \u001b[39mreturn\u001b[39;00m _bootstrap\u001b[39m.\u001b[39;49m_gcd_import(name[level:], package, level)\n",
|
||
"File \u001b[1;32m<frozen importlib._bootstrap>:1050\u001b[0m, in \u001b[0;36m_gcd_import\u001b[1;34m(name, package, level)\u001b[0m\n",
|
||
"File \u001b[1;32m<frozen importlib._bootstrap>:1027\u001b[0m, in \u001b[0;36m_find_and_load\u001b[1;34m(name, import_)\u001b[0m\n",
|
||
"File \u001b[1;32m<frozen importlib._bootstrap>:1004\u001b[0m, in \u001b[0;36m_find_and_load_unlocked\u001b[1;34m(name, import_)\u001b[0m\n",
|
||
"\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'openpyxl'",
|
||
"\nDuring handling of the above exception, another exception occurred:\n",
|
||
"\u001b[1;31mImportError\u001b[0m Traceback (most recent call last)",
|
||
"\u001b[1;32mj:\\Python\\2023-programowanie-w-pythonie\\zajecia2\\data_analysis.ipynb Cell 39\u001b[0m line \u001b[0;36m1\n\u001b[1;32m----> <a href='vscode-notebook-cell:/j%3A/Python/2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb#X53sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m df \u001b[39m=\u001b[39m pd\u001b[39m.\u001b[39;49mread_excel(\u001b[39m'\u001b[39;49m\u001b[39m./bikes.xlsx\u001b[39;49m\u001b[39m'\u001b[39;49m, engine\u001b[39m=\u001b[39;49m\u001b[39m'\u001b[39;49m\u001b[39mopenpyxl\u001b[39;49m\u001b[39m'\u001b[39;49m, nrows\u001b[39m=\u001b[39;49m\u001b[39m5\u001b[39;49m)\n\u001b[0;32m <a href='vscode-notebook-cell:/j%3A/Python/2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb#X53sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m df\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\io\\excel\\_base.py:478\u001b[0m, in \u001b[0;36mread_excel\u001b[1;34m(io, sheet_name, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, date_format, thousands, decimal, comment, skipfooter, storage_options, dtype_backend)\u001b[0m\n\u001b[0;32m 476\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mnot\u001b[39;00m \u001b[39misinstance\u001b[39m(io, ExcelFile):\n\u001b[0;32m 477\u001b[0m should_close \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[1;32m--> 478\u001b[0m io \u001b[39m=\u001b[39m ExcelFile(io, storage_options\u001b[39m=\u001b[39;49mstorage_options, engine\u001b[39m=\u001b[39;49mengine)\n\u001b[0;32m 479\u001b[0m \u001b[39melif\u001b[39;00m engine \u001b[39mand\u001b[39;00m engine \u001b[39m!=\u001b[39m io\u001b[39m.\u001b[39mengine:\n\u001b[0;32m 480\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\n\u001b[0;32m 481\u001b[0m \u001b[39m\"\u001b[39m\u001b[39mEngine should not be specified when passing \u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m 482\u001b[0m \u001b[39m\"\u001b[39m\u001b[39man ExcelFile - ExcelFile already has the engine set\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m 483\u001b[0m )\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\io\\excel\\_base.py:1513\u001b[0m, in \u001b[0;36mExcelFile.__init__\u001b[1;34m(self, path_or_buffer, engine, storage_options)\u001b[0m\n\u001b[0;32m 1510\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mengine \u001b[39m=\u001b[39m engine\n\u001b[0;32m 1511\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstorage_options \u001b[39m=\u001b[39m storage_options\n\u001b[1;32m-> 1513\u001b[0m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_reader \u001b[39m=\u001b[39m \u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_engines[engine](\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49m_io, storage_options\u001b[39m=\u001b[39;49mstorage_options)\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\io\\excel\\_openpyxl.py:548\u001b[0m, in \u001b[0;36mOpenpyxlReader.__init__\u001b[1;34m(self, filepath_or_buffer, storage_options)\u001b[0m\n\u001b[0;32m 533\u001b[0m \u001b[39m@doc\u001b[39m(storage_options\u001b[39m=\u001b[39m_shared_docs[\u001b[39m\"\u001b[39m\u001b[39mstorage_options\u001b[39m\u001b[39m\"\u001b[39m])\n\u001b[0;32m 534\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__init__\u001b[39m(\n\u001b[0;32m 535\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[0;32m 536\u001b[0m filepath_or_buffer: FilePath \u001b[39m|\u001b[39m ReadBuffer[\u001b[39mbytes\u001b[39m],\n\u001b[0;32m 537\u001b[0m storage_options: StorageOptions \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m,\n\u001b[0;32m 538\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m 539\u001b[0m \u001b[39m \u001b[39m\u001b[39m\"\"\"\u001b[39;00m\n\u001b[0;32m 540\u001b[0m \u001b[39m Reader using openpyxl engine.\u001b[39;00m\n\u001b[0;32m 541\u001b[0m \n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 546\u001b[0m \u001b[39m {storage_options}\u001b[39;00m\n\u001b[0;32m 547\u001b[0m \u001b[39m \"\"\"\u001b[39;00m\n\u001b[1;32m--> 548\u001b[0m import_optional_dependency(\u001b[39m\"\u001b[39;49m\u001b[39mopenpyxl\u001b[39;49m\u001b[39m\"\u001b[39;49m)\n\u001b[0;32m 549\u001b[0m \u001b[39msuper\u001b[39m()\u001b[39m.\u001b[39m\u001b[39m__init__\u001b[39m(filepath_or_buffer, storage_options\u001b[39m=\u001b[39mstorage_options)\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\compat\\_optional.py:145\u001b[0m, in \u001b[0;36mimport_optional_dependency\u001b[1;34m(name, extra, errors, min_version)\u001b[0m\n\u001b[0;32m 143\u001b[0m \u001b[39mexcept\u001b[39;00m \u001b[39mImportError\u001b[39;00m:\n\u001b[0;32m 144\u001b[0m \u001b[39mif\u001b[39;00m errors \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mraise\u001b[39m\u001b[39m\"\u001b[39m:\n\u001b[1;32m--> 145\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mImportError\u001b[39;00m(msg)\n\u001b[0;32m 146\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mNone\u001b[39;00m\n\u001b[0;32m 148\u001b[0m \u001b[39m# Handle submodules: if we have submodule, grab parent module from sys.modules\u001b[39;00m\n",
|
||
"\u001b[1;31mImportError\u001b[0m: Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl."
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Innym ważnym źródłem informacji są bazy danych. Pandas potrafi komunikować się z bazą danych za pomocą biblioteki [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) i dostarcza odpowiedną funkcję:\n",
|
||
" * `pandas.read_sql` - wczytanie całej tabeli lub zapytania do bazy danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_sql('Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 21,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import sqlalchemy\n",
|
||
"\n",
|
||
"engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
|
||
"connection = engine.raw_connection()\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podsumowanie\n",
|
||
"\n",
|
||
"\n",
|
||
" * Biblioteka `pandas` wspiera pobieranie danych z różnych formatów i źródeł.\n",
|
||
" * Każda funkcja ma listę argumentów, które pozwalają na ustawić poszczególne parametry (np. [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv))."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zapis i eksport danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Pandas pozwala w prosty sposób na zapisywanie ramki danych do pliku. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"ename": "ModuleNotFoundError",
|
||
"evalue": "No module named 'openpyxl'",
|
||
"output_type": "error",
|
||
"traceback": [
|
||
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
|
||
"\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
|
||
"\u001b[1;32mj:\\Python\\2023-programowanie-w-pythonie\\zajecia2\\data_analysis.ipynb Cell 47\u001b[0m line \u001b[0;36m4\n\u001b[0;32m <a href='vscode-notebook-cell:/j%3A/Python/2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb#X64sZmlsZQ%3D%3D?line=1'>2</a>\u001b[0m df\u001b[39m.\u001b[39mto_csv(\u001b[39m'\u001b[39m\u001b[39mtmp.csv\u001b[39m\u001b[39m'\u001b[39m)\n\u001b[0;32m <a href='vscode-notebook-cell:/j%3A/Python/2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb#X64sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39m# zapis do arkusza kalkulacyjnego \u001b[39;00m\n\u001b[1;32m----> <a href='vscode-notebook-cell:/j%3A/Python/2023-programowanie-w-pythonie/zajecia2/data_analysis.ipynb#X64sZmlsZQ%3D%3D?line=3'>4</a>\u001b[0m df\u001b[39m.\u001b[39;49mto_excel(\u001b[39m'\u001b[39;49m\u001b[39mtmp.xlsx\u001b[39;49m\u001b[39m'\u001b[39;49m)\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\core\\generic.py:2252\u001b[0m, in \u001b[0;36mNDFrame.to_excel\u001b[1;34m(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, inf_rep, freeze_panes, storage_options)\u001b[0m\n\u001b[0;32m 2239\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mpandas\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mio\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mformats\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mexcel\u001b[39;00m \u001b[39mimport\u001b[39;00m ExcelFormatter\n\u001b[0;32m 2241\u001b[0m formatter \u001b[39m=\u001b[39m ExcelFormatter(\n\u001b[0;32m 2242\u001b[0m df,\n\u001b[0;32m 2243\u001b[0m na_rep\u001b[39m=\u001b[39mna_rep,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 2250\u001b[0m inf_rep\u001b[39m=\u001b[39minf_rep,\n\u001b[0;32m 2251\u001b[0m )\n\u001b[1;32m-> 2252\u001b[0m formatter\u001b[39m.\u001b[39;49mwrite(\n\u001b[0;32m 2253\u001b[0m excel_writer,\n\u001b[0;32m 2254\u001b[0m sheet_name\u001b[39m=\u001b[39;49msheet_name,\n\u001b[0;32m 2255\u001b[0m startrow\u001b[39m=\u001b[39;49mstartrow,\n\u001b[0;32m 2256\u001b[0m startcol\u001b[39m=\u001b[39;49mstartcol,\n\u001b[0;32m 2257\u001b[0m freeze_panes\u001b[39m=\u001b[39;49mfreeze_panes,\n\u001b[0;32m 2258\u001b[0m engine\u001b[39m=\u001b[39;49mengine,\n\u001b[0;32m 2259\u001b[0m storage_options\u001b[39m=\u001b[39;49mstorage_options,\n\u001b[0;32m 2260\u001b[0m )\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\io\\formats\\excel.py:934\u001b[0m, in \u001b[0;36mExcelFormatter.write\u001b[1;34m(self, writer, sheet_name, startrow, startcol, freeze_panes, engine, storage_options)\u001b[0m\n\u001b[0;32m 930\u001b[0m need_save \u001b[39m=\u001b[39m \u001b[39mFalse\u001b[39;00m\n\u001b[0;32m 931\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[0;32m 932\u001b[0m \u001b[39m# error: Cannot instantiate abstract class 'ExcelWriter' with abstract\u001b[39;00m\n\u001b[0;32m 933\u001b[0m \u001b[39m# attributes 'engine', 'save', 'supported_extensions' and 'write_cells'\u001b[39;00m\n\u001b[1;32m--> 934\u001b[0m writer \u001b[39m=\u001b[39m ExcelWriter( \u001b[39m# type: ignore[abstract]\u001b[39;49;00m\n\u001b[0;32m 935\u001b[0m writer, engine\u001b[39m=\u001b[39;49mengine, storage_options\u001b[39m=\u001b[39;49mstorage_options\n\u001b[0;32m 936\u001b[0m )\n\u001b[0;32m 937\u001b[0m need_save \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m 939\u001b[0m \u001b[39mtry\u001b[39;00m:\n",
|
||
"File \u001b[1;32mc:\\software\\python3\\lib\\site-packages\\pandas\\io\\excel\\_openpyxl.py:56\u001b[0m, in \u001b[0;36mOpenpyxlWriter.__init__\u001b[1;34m(self, path, engine, date_format, datetime_format, mode, storage_options, if_sheet_exists, engine_kwargs, **kwargs)\u001b[0m\n\u001b[0;32m 43\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__init__\u001b[39m(\n\u001b[0;32m 44\u001b[0m \u001b[39mself\u001b[39m,\n\u001b[0;32m 45\u001b[0m path: FilePath \u001b[39m|\u001b[39m WriteExcelBuffer \u001b[39m|\u001b[39m ExcelWriter,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 54\u001b[0m ) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m \u001b[39mNone\u001b[39;00m:\n\u001b[0;32m 55\u001b[0m \u001b[39m# Use the openpyxl module as the Excel writer.\u001b[39;00m\n\u001b[1;32m---> 56\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39mopenpyxl\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mworkbook\u001b[39;00m \u001b[39mimport\u001b[39;00m Workbook\n\u001b[0;32m 58\u001b[0m engine_kwargs \u001b[39m=\u001b[39m combine_kwargs(engine_kwargs, kwargs)\n\u001b[0;32m 60\u001b[0m \u001b[39msuper\u001b[39m()\u001b[39m.\u001b[39m\u001b[39m__init__\u001b[39m(\n\u001b[0;32m 61\u001b[0m path,\n\u001b[0;32m 62\u001b[0m mode\u001b[39m=\u001b[39mmode,\n\u001b[1;32m (...)\u001b[0m\n\u001b[0;32m 65\u001b[0m engine_kwargs\u001b[39m=\u001b[39mengine_kwargs,\n\u001b[0;32m 66\u001b[0m )\n",
|
||
"\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'openpyxl'"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"# zapis do formatu CSV\n",
|
||
"df.to_csv('tmp.csv')\n",
|
||
"# zapis do arkusza kalkulacyjnego \n",
|
||
"df.to_excel('tmp.xlsx')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ponadto możemy przekonwertować ramkę danych do JSONa lub Pythonowego słownika:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{\"members\":{\"May\":682758,\"June\":737011,\"July\":779511},\"occasionals\":{\"May\":147898,\"June\":171494,\"July\":194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_json())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{'members': {'May': 682758, 'June': 737011, 'July': 779511}, 'occasionals': {'May': 147898, 'June': 171494, 'July': 194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_dict())\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Lub przekopiować dane do schowka:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.to_clipboard()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Przekonwertuj tabele `Customer` z bazy `Chinook.sqlite` do arkusza kalkulacyjnego. Plik wynikowy nazwij `customers.xlsx`.\n",
|
||
" * Tabela `Employee` zawiera informacje o pracownikach firmy Chinook. Wyswietl dane na ekranie i podaj miasta, w których mieszkają pracownicy.\n",
|
||
" * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"{0: 'Germany',\n",
|
||
" 1: 'Norway',\n",
|
||
" 2: 'Belgium',\n",
|
||
" 3: 'Canada',\n",
|
||
" 4: 'USA',\n",
|
||
" 5: 'Germany',\n",
|
||
" 6: 'Germany',\n",
|
||
" 7: 'France',\n",
|
||
" 8: 'France',\n",
|
||
" 9: 'Ireland',\n",
|
||
" 10: 'United Kingdom',\n",
|
||
" 11: 'Germany',\n",
|
||
" 12: 'USA',\n",
|
||
" 13: 'USA',\n",
|
||
" 14: 'USA',\n",
|
||
" 15: 'USA',\n",
|
||
" 16: 'USA',\n",
|
||
" 17: 'Canada',\n",
|
||
" 18: 'France',\n",
|
||
" 19: 'United Kingdom',\n",
|
||
" 20: 'Australia',\n",
|
||
" 21: 'Chile',\n",
|
||
" 22: 'India',\n",
|
||
" 23: 'Norway',\n",
|
||
" 24: 'Brazil',\n",
|
||
" 25: 'USA',\n",
|
||
" 26: 'Canada',\n",
|
||
" 27: 'Portugal',\n",
|
||
" 28: 'Germany',\n",
|
||
" 29: 'Germany',\n",
|
||
" 30: 'France',\n",
|
||
" 31: 'Netherlands',\n",
|
||
" 32: 'Chile',\n",
|
||
" 33: 'Brazil',\n",
|
||
" 34: 'Brazil',\n",
|
||
" 35: 'Canada',\n",
|
||
" 36: 'USA',\n",
|
||
" 37: 'USA',\n",
|
||
" 38: 'USA',\n",
|
||
" 39: 'Germany',\n",
|
||
" 40: 'Spain',\n",
|
||
" 41: 'Sweden',\n",
|
||
" 42: 'United Kingdom',\n",
|
||
" 43: 'Australia',\n",
|
||
" 44: 'India',\n",
|
||
" 45: 'Czech Republic',\n",
|
||
" 46: 'Canada',\n",
|
||
" 47: 'Canada',\n",
|
||
" 48: 'Canada',\n",
|
||
" 49: 'Canada',\n",
|
||
" 50: 'Portugal',\n",
|
||
" 51: 'Germany',\n",
|
||
" 52: 'Finland',\n",
|
||
" 53: 'United Kingdom',\n",
|
||
" 54: 'Belgium',\n",
|
||
" 55: 'Denmark',\n",
|
||
" 56: 'Brazil',\n",
|
||
" 57: 'Brazil',\n",
|
||
" 58: 'USA',\n",
|
||
" 59: 'USA',\n",
|
||
" 60: 'Canada',\n",
|
||
" 61: 'Ireland',\n",
|
||
" 62: 'Italy',\n",
|
||
" 63: 'Poland',\n",
|
||
" 64: 'Sweden',\n",
|
||
" 65: 'Australia',\n",
|
||
" 66: 'Germany',\n",
|
||
" 67: 'Brazil',\n",
|
||
" 68: 'USA',\n",
|
||
" 69: 'USA',\n",
|
||
" 70: 'USA',\n",
|
||
" 71: 'Canada',\n",
|
||
" 72: 'Portugal',\n",
|
||
" 73: 'France',\n",
|
||
" 74: 'Poland',\n",
|
||
" 75: 'Norway',\n",
|
||
" 76: 'Czech Republic',\n",
|
||
" 77: 'Austria',\n",
|
||
" 78: 'Denmark',\n",
|
||
" 79: 'Brazil',\n",
|
||
" 80: 'USA',\n",
|
||
" 81: 'USA',\n",
|
||
" 82: 'France',\n",
|
||
" 83: 'France',\n",
|
||
" 84: 'Hungary',\n",
|
||
" 85: 'Italy',\n",
|
||
" 86: 'Sweden',\n",
|
||
" 87: 'Chile',\n",
|
||
" 88: 'Austria',\n",
|
||
" 89: 'USA',\n",
|
||
" 90: 'USA',\n",
|
||
" 91: 'USA',\n",
|
||
" 92: 'USA',\n",
|
||
" 93: 'Canada',\n",
|
||
" 94: 'Germany',\n",
|
||
" 95: 'Hungary',\n",
|
||
" 96: 'India',\n",
|
||
" 97: 'Brazil',\n",
|
||
" 98: 'Canada',\n",
|
||
" 99: 'Czech Republic',\n",
|
||
" 100: 'Denmark',\n",
|
||
" 101: 'Canada',\n",
|
||
" 102: 'USA',\n",
|
||
" 103: 'Germany',\n",
|
||
" 104: 'France',\n",
|
||
" 105: 'France',\n",
|
||
" 106: 'France',\n",
|
||
" 107: 'Italy',\n",
|
||
" 108: 'United Kingdom',\n",
|
||
" 109: 'Canada',\n",
|
||
" 110: 'USA',\n",
|
||
" 111: 'USA',\n",
|
||
" 112: 'USA',\n",
|
||
" 113: 'USA',\n",
|
||
" 114: 'USA',\n",
|
||
" 115: 'Canada',\n",
|
||
" 116: 'France',\n",
|
||
" 117: 'Australia',\n",
|
||
" 118: 'Argentina',\n",
|
||
" 119: 'India',\n",
|
||
" 120: 'Brazil',\n",
|
||
" 121: 'Czech Republic',\n",
|
||
" 122: 'Brazil',\n",
|
||
" 123: 'USA',\n",
|
||
" 124: 'Portugal',\n",
|
||
" 125: 'Portugal',\n",
|
||
" 126: 'Germany',\n",
|
||
" 127: 'France',\n",
|
||
" 128: 'France',\n",
|
||
" 129: 'Poland',\n",
|
||
" 130: 'India',\n",
|
||
" 131: 'Brazil',\n",
|
||
" 132: 'Canada',\n",
|
||
" 133: 'USA',\n",
|
||
" 134: 'USA',\n",
|
||
" 135: 'USA',\n",
|
||
" 136: 'USA',\n",
|
||
" 137: 'Germany',\n",
|
||
" 138: 'Sweden',\n",
|
||
" 139: 'United Kingdom',\n",
|
||
" 140: 'United Kingdom',\n",
|
||
" 141: 'Argentina',\n",
|
||
" 142: 'Brazil',\n",
|
||
" 143: 'Austria',\n",
|
||
" 144: 'USA',\n",
|
||
" 145: 'Canada',\n",
|
||
" 146: 'Canada',\n",
|
||
" 147: 'Canada',\n",
|
||
" 148: 'Portugal',\n",
|
||
" 149: 'France',\n",
|
||
" 150: 'Hungary',\n",
|
||
" 151: 'United Kingdom',\n",
|
||
" 152: 'Denmark',\n",
|
||
" 153: 'Brazil',\n",
|
||
" 154: 'Brazil',\n",
|
||
" 155: 'Canada',\n",
|
||
" 156: 'USA',\n",
|
||
" 157: 'USA',\n",
|
||
" 158: 'Canada',\n",
|
||
" 159: 'Italy',\n",
|
||
" 160: 'Netherlands',\n",
|
||
" 161: 'Spain',\n",
|
||
" 162: 'United Kingdom',\n",
|
||
" 163: 'Argentina',\n",
|
||
" 164: 'Canada',\n",
|
||
" 165: 'Brazil',\n",
|
||
" 166: 'USA',\n",
|
||
" 167: 'USA',\n",
|
||
" 168: 'Canada',\n",
|
||
" 169: 'Canada',\n",
|
||
" 170: 'Portugal',\n",
|
||
" 171: 'France',\n",
|
||
" 172: 'Spain',\n",
|
||
" 173: 'Czech Republic',\n",
|
||
" 174: 'Czech Republic',\n",
|
||
" 175: 'Belgium',\n",
|
||
" 176: 'Brazil',\n",
|
||
" 177: 'Canada',\n",
|
||
" 178: 'USA',\n",
|
||
" 179: 'Canada',\n",
|
||
" 180: 'France',\n",
|
||
" 181: 'Finland',\n",
|
||
" 182: 'Ireland',\n",
|
||
" 183: 'Netherlands',\n",
|
||
" 184: 'United Kingdom',\n",
|
||
" 185: 'India',\n",
|
||
" 186: 'Belgium',\n",
|
||
" 187: 'USA',\n",
|
||
" 188: 'USA',\n",
|
||
" 189: 'USA',\n",
|
||
" 190: 'USA',\n",
|
||
" 191: 'Canada',\n",
|
||
" 192: 'Germany',\n",
|
||
" 193: 'Ireland',\n",
|
||
" 194: 'Brazil',\n",
|
||
" 195: 'Germany',\n",
|
||
" 196: 'Norway',\n",
|
||
" 197: 'Czech Republic',\n",
|
||
" 198: 'Brazil',\n",
|
||
" 199: 'USA',\n",
|
||
" 200: 'USA',\n",
|
||
" 201: 'France',\n",
|
||
" 202: 'France',\n",
|
||
" 203: 'France',\n",
|
||
" 204: 'Finland',\n",
|
||
" 205: 'Netherlands',\n",
|
||
" 206: 'United Kingdom',\n",
|
||
" 207: 'Norway',\n",
|
||
" 208: 'USA',\n",
|
||
" 209: 'USA',\n",
|
||
" 210: 'USA',\n",
|
||
" 211: 'USA',\n",
|
||
" 212: 'USA',\n",
|
||
" 213: 'Canada',\n",
|
||
" 214: 'France',\n",
|
||
" 215: 'Argentina',\n",
|
||
" 216: 'Chile',\n",
|
||
" 217: 'India',\n",
|
||
" 218: 'Germany',\n",
|
||
" 219: 'Czech Republic',\n",
|
||
" 220: 'Brazil',\n",
|
||
" 221: 'USA',\n",
|
||
" 222: 'Portugal',\n",
|
||
" 223: 'Germany',\n",
|
||
" 224: 'Germany',\n",
|
||
" 225: 'France',\n",
|
||
" 226: 'Finland',\n",
|
||
" 227: 'Spain',\n",
|
||
" 228: 'India',\n",
|
||
" 229: 'Canada',\n",
|
||
" 230: 'Canada',\n",
|
||
" 231: 'USA',\n",
|
||
" 232: 'USA',\n",
|
||
" 233: 'USA',\n",
|
||
" 234: 'Canada',\n",
|
||
" 235: 'Germany',\n",
|
||
" 236: 'United Kingdom',\n",
|
||
" 237: 'United Kingdom',\n",
|
||
" 238: 'Australia',\n",
|
||
" 239: 'Chile',\n",
|
||
" 240: 'Germany',\n",
|
||
" 241: 'Belgium',\n",
|
||
" 242: 'USA',\n",
|
||
" 243: 'Canada',\n",
|
||
" 244: 'Canada',\n",
|
||
" 245: 'Portugal',\n",
|
||
" 246: 'Germany',\n",
|
||
" 247: 'France',\n",
|
||
" 248: 'Ireland',\n",
|
||
" 249: 'Australia',\n",
|
||
" 250: 'Brazil',\n",
|
||
" 251: 'Brazil',\n",
|
||
" 252: 'Brazil',\n",
|
||
" 253: 'Canada',\n",
|
||
" 254: 'USA',\n",
|
||
" 255: 'USA',\n",
|
||
" 256: 'Portugal',\n",
|
||
" 257: 'Netherlands',\n",
|
||
" 258: 'Poland',\n",
|
||
" 259: 'Sweden',\n",
|
||
" 260: 'United Kingdom',\n",
|
||
" 261: 'Chile',\n",
|
||
" 262: 'Norway',\n",
|
||
" 263: 'Brazil',\n",
|
||
" 264: 'USA',\n",
|
||
" 265: 'USA',\n",
|
||
" 266: 'Canada',\n",
|
||
" 267: 'Canada',\n",
|
||
" 268: 'Germany',\n",
|
||
" 269: 'France',\n",
|
||
" 270: 'Sweden',\n",
|
||
" 271: 'Czech Republic',\n",
|
||
" 272: 'Austria',\n",
|
||
" 273: 'Denmark',\n",
|
||
" 274: 'Brazil',\n",
|
||
" 275: 'Canada',\n",
|
||
" 276: 'USA',\n",
|
||
" 277: 'Canada',\n",
|
||
" 278: 'Finland',\n",
|
||
" 279: 'Hungary',\n",
|
||
" 280: 'Italy',\n",
|
||
" 281: 'Poland',\n",
|
||
" 282: 'United Kingdom',\n",
|
||
" 283: 'India',\n",
|
||
" 284: 'Denmark',\n",
|
||
" 285: 'USA',\n",
|
||
" 286: 'USA',\n",
|
||
" 287: 'USA',\n",
|
||
" 288: 'USA',\n",
|
||
" 289: 'Canada',\n",
|
||
" 290: 'Germany',\n",
|
||
" 291: 'Italy',\n",
|
||
" 292: 'Germany',\n",
|
||
" 293: 'Canada',\n",
|
||
" 294: 'Czech Republic',\n",
|
||
" 295: 'Austria',\n",
|
||
" 296: 'Brazil',\n",
|
||
" 297: 'USA',\n",
|
||
" 298: 'USA',\n",
|
||
" 299: 'France',\n",
|
||
" 300: 'France',\n",
|
||
" 301: 'France',\n",
|
||
" 302: 'Hungary',\n",
|
||
" 303: 'Poland',\n",
|
||
" 304: 'Australia',\n",
|
||
" 305: 'Czech Republic',\n",
|
||
" 306: 'USA',\n",
|
||
" 307: 'USA',\n",
|
||
" 308: 'USA',\n",
|
||
" 309: 'USA',\n",
|
||
" 310: 'USA',\n",
|
||
" 311: 'Portugal',\n",
|
||
" 312: 'France',\n",
|
||
" 313: 'Chile',\n",
|
||
" 314: 'India',\n",
|
||
" 315: 'Brazil',\n",
|
||
" 316: 'Canada',\n",
|
||
" 317: 'Austria',\n",
|
||
" 318: 'Brazil',\n",
|
||
" 319: 'USA',\n",
|
||
" 320: 'Germany',\n",
|
||
" 321: 'Germany',\n",
|
||
" 322: 'France',\n",
|
||
" 323: 'France',\n",
|
||
" 324: 'Hungary',\n",
|
||
" 325: 'Sweden',\n",
|
||
" 326: 'Brazil',\n",
|
||
" 327: 'Canada',\n",
|
||
" 328: 'USA',\n",
|
||
" 329: 'USA',\n",
|
||
" 330: 'USA',\n",
|
||
" 331: 'USA',\n",
|
||
" 332: 'Canada',\n",
|
||
" 333: 'France',\n",
|
||
" 334: 'United Kingdom',\n",
|
||
" 335: 'United Kingdom',\n",
|
||
" 336: 'Argentina',\n",
|
||
" 337: 'India',\n",
|
||
" 338: 'Canada',\n",
|
||
" 339: 'Denmark',\n",
|
||
" 340: 'USA',\n",
|
||
" 341: 'Canada',\n",
|
||
" 342: 'Canada',\n",
|
||
" 343: 'Portugal',\n",
|
||
" 344: 'Germany',\n",
|
||
" 345: 'France',\n",
|
||
" 346: 'Italy',\n",
|
||
" 347: 'Argentina',\n",
|
||
" 348: 'Brazil',\n",
|
||
" 349: 'Brazil',\n",
|
||
" 350: 'Canada',\n",
|
||
" 351: 'USA',\n",
|
||
" 352: 'USA',\n",
|
||
" 353: 'USA',\n",
|
||
" 354: 'Portugal',\n",
|
||
" 355: 'Poland',\n",
|
||
" 356: 'Spain',\n",
|
||
" 357: 'United Kingdom',\n",
|
||
" 358: 'United Kingdom',\n",
|
||
" 359: 'India',\n",
|
||
" 360: 'Czech Republic',\n",
|
||
" 361: 'Canada',\n",
|
||
" 362: 'USA',\n",
|
||
" 363: 'Canada',\n",
|
||
" 364: 'Canada',\n",
|
||
" 365: 'Canada',\n",
|
||
" 366: 'Germany',\n",
|
||
" 367: 'France',\n",
|
||
" 368: 'United Kingdom',\n",
|
||
" 369: 'Austria',\n",
|
||
" 370: 'Belgium',\n",
|
||
" 371: 'Brazil',\n",
|
||
" 372: 'Brazil',\n",
|
||
" 373: 'USA',\n",
|
||
" 374: 'USA',\n",
|
||
" 375: 'Canada',\n",
|
||
" 376: 'Hungary',\n",
|
||
" 377: 'Ireland',\n",
|
||
" 378: 'Netherlands',\n",
|
||
" 379: 'Spain',\n",
|
||
" 380: 'United Kingdom',\n",
|
||
" 381: 'Brazil',\n",
|
||
" 382: 'Brazil',\n",
|
||
" 383: 'USA',\n",
|
||
" 384: 'USA',\n",
|
||
" 385: 'USA',\n",
|
||
" 386: 'Canada',\n",
|
||
" 387: 'Canada',\n",
|
||
" 388: 'France',\n",
|
||
" 389: 'Netherlands',\n",
|
||
" 390: 'Canada',\n",
|
||
" 391: 'Norway',\n",
|
||
" 392: 'Czech Republic',\n",
|
||
" 393: 'Belgium',\n",
|
||
" 394: 'Brazil',\n",
|
||
" 395: 'USA',\n",
|
||
" 396: 'USA',\n",
|
||
" 397: 'France',\n",
|
||
" 398: 'France',\n",
|
||
" 399: 'Finland',\n",
|
||
" 400: 'Ireland',\n",
|
||
" 401: 'Spain',\n",
|
||
" 402: 'Argentina',\n",
|
||
" 403: 'Czech Republic',\n",
|
||
" 404: 'USA',\n",
|
||
" 405: 'USA',\n",
|
||
" 406: 'USA',\n",
|
||
" 407: 'USA',\n",
|
||
" 408: 'Canada',\n",
|
||
" 409: 'Portugal',\n",
|
||
" 410: 'Finland',\n",
|
||
" 411: 'India'}"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import sqlalchemy\n",
|
||
"\n",
|
||
"engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
|
||
"connection = engine.raw_connection()\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Customer', con='sqlite:///Chinook.sqlite')\n",
|
||
"df.to_csv('customers.csv')\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Employee', con='sqlite:///Chinook.sqlite')\n",
|
||
"# df['City'].drop_duplicates()\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Invoice', con='sqlite:///Chinook.sqlite')\n",
|
||
"df['BillingCountry'].drop_duplicates().to_dict()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Ramka danych - podstawy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Kolumny\n",
|
||
"\n",
|
||
"Na ramkę danych możemy patrzeć jak na swego rodzaju słownik, którego wartościami są szeregi. Pozwoli to na uzyskanie lepszej intuicji.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population life_expectancy\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=8, usecols=['Country', 'gdp', 'population','life_expectancy'])\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do poszczególnej kolumny możemy uzystać na dwa sposoby:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# notacja z kropką\n",
|
||
"df.population"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Operator []\n",
|
||
"df['population']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do operatora `[]` możemy też podać listę nazw kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0\n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Algeria 12314.0 34811059.0\n",
|
||
"Angola 7103.0 19842251.0\n",
|
||
"Antigua and Barbuda 25736.0 85350.0\n",
|
||
"Argentina 14646.0 40381860.0\n",
|
||
"Armenia 7383.0 2975029.0\n",
|
||
"Australia 41312.0 21370348.0"
|
||
]
|
||
},
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['gdp','population']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Listę kolumn możemy pobrać za pomocą:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['gdp', 'population', 'life_expectancy'], dtype='object')"
|
||
]
|
||
},
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns = ['PKB', 'Populacja', 'ODŻ']\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do poszczególnych wierszy należy wykorzystać metodę `loc`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PKB 14646.0\n",
|
||
"Populacja 40381860.0\n",
|
||
"ODŻ 75.4\n",
|
||
"Name: Argentina, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Argentina']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Metoda `loc` również może przyjąć listę wierszy: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[['Albania', 'Angola']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy również podać drugi parametr: nazwy kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Angola 7103.0 19842251.0"
|
||
]
|
||
},
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df2 = df.loc[['Albania', 'Angola'], ['PKB', 'Populacja']]\n",
|
||
"\n",
|
||
"df2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Albo wykorzystać tzw. _slicing_, cyzli operator `:`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Albania': 'Angola', 'PKB': 'ODŻ']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do pojedyńczej wartości możemy użyć metody `at`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"7103.0"
|
||
]
|
||
},
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.at['Angola', 'PKB']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda',\n",
|
||
" 'Argentina', 'Armenia', 'Australia'],\n",
|
||
" dtype='object', name='Country')"
|
||
]
|
||
},
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.index"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podstawowe metody `pd.Series` i `pd.DataFrame`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492, 'October': 53596})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `head` pozwala tworzy nową ramkę danych z pierwszymi 5 przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `tail` robi to samo, ale z 5 ostatnymi przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.tail()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `sample` pozwala na stworzenie nowej ramki danych z wylosowanymi `n` przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"October 444177 53596\n",
|
||
"June 737011 171494\n",
|
||
"May 682758 147898"
|
||
]
|
||
},
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.sample(3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `describe` zwraca podstawowe statystyki m.in.: liczebność, średnią, wartości skrajne: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>count</th>\n",
|
||
" <td>6.000000</td>\n",
|
||
" <td>6.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>mean</th>\n",
|
||
" <td>665172.833333</td>\n",
|
||
" <td>152434.166667</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>std</th>\n",
|
||
" <td>116216.045456</td>\n",
|
||
" <td>54783.506738</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>min</th>\n",
|
||
" <td>444177.000000</td>\n",
|
||
" <td>53596.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25%</th>\n",
|
||
" <td>673790.000000</td>\n",
|
||
" <td>142343.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50%</th>\n",
|
||
" <td>678274.000000</td>\n",
|
||
" <td>159696.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>75%</th>\n",
|
||
" <td>723447.750000</td>\n",
|
||
" <td>188610.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>max</th>\n",
|
||
" <td>779511.000000</td>\n",
|
||
" <td>206809.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"count 6.000000 6.000000\n",
|
||
"mean 665172.833333 152434.166667\n",
|
||
"std 116216.045456 54783.506738\n",
|
||
"min 444177.000000 53596.000000\n",
|
||
"25% 673790.000000 142343.500000\n",
|
||
"50% 678274.000000 159696.000000\n",
|
||
"75% 723447.750000 188610.500000\n",
|
||
"max 779511.000000 206809.000000"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `info` zwraca informacje techniczne o kolumnach: np. typ danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 6 entries, May to October\n",
|
||
"Data columns (total 2 columns):\n",
|
||
" # Column Non-Null Count Dtype\n",
|
||
"--- ------ -------------- -----\n",
|
||
" 0 members 6 non-null int64\n",
|
||
" 1 occasionals 6 non-null int64\n",
|
||
"dtypes: int64(2)\n",
|
||
"memory usage: 144.0+ bytes\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Podstawową informacją o ramce danych to liczba przykładów w ramce danych. Możemy wykorzystać to tego funkcję `len`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6"
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Natomiast atrybut `shape` zwraca nam krotkę z liczbą przykładów i liczbą kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(6, 2)"
|
||
]
|
||
},
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacja arytmetyczne\n",
|
||
"\n",
|
||
" * `max`, `idxmax`\n",
|
||
" * `min`, `idxmin`\n",
|
||
" * `mean`\n",
|
||
" * `count`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"members 665172.833333\n",
|
||
"occasionals 152434.166667\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Zbiór wartości i zliczanie wartości:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[1 3 2]\n",
|
||
"3 4\n",
|
||
"1 3\n",
|
||
"2 3\n",
|
||
"dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.unique())\n",
|
||
"\n",
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.value_counts())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Sprawdzanie czy brakuje danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 False\n",
|
||
"3 False\n",
|
||
"4 False\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 False\n",
|
||
"889 True\n",
|
||
"890 False\n",
|
||
"891 False\n",
|
||
"Name: Age, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"df.Age.isnull()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Dodawanie i modyfikowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 \n",
|
||
"Albania 17.9 76.8 1.76 \n",
|
||
"Algeria 29.5 75.5 2.73 \n",
|
||
"Angola 192.0 56.7 6.43 \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 "
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility continent \\\n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 Asia \n",
|
||
"Albania 17.9 76.8 1.76 Europe \n",
|
||
"Algeria 29.5 75.5 2.73 Africa \n",
|
||
"Angola 192.0 56.7 6.43 Africa \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 Americas \n",
|
||
"\n",
|
||
" tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 1 \n",
|
||
"Albania 1 \n",
|
||
"Algeria 1 \n",
|
||
"Angola 1 \n",
|
||
"Antigua and Barbuda 1 "
|
||
]
|
||
},
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"conts = pd.Series({\n",
|
||
" 'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n",
|
||
"\n",
|
||
"df['continent'] = conts\n",
|
||
"\n",
|
||
"df['tmp'] = 1\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>27.46523</td>\n",
|
||
" <td>27.50170</td>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>15.4</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" <td>2.24</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"Argentina 27.46523 27.50170 14646.0 40381860.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility continent \\\n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 Asia \n",
|
||
"Albania 17.9 76.8 1.76 Europe \n",
|
||
"Algeria 29.5 75.5 2.73 Africa \n",
|
||
"Angola 192.0 56.7 6.43 Africa \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 Americas \n",
|
||
"Argentina 15.4 75.4 2.24 NaN \n",
|
||
"\n",
|
||
" tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 1.0 \n",
|
||
"Albania 1.0 \n",
|
||
"Algeria 1.0 \n",
|
||
"Angola 1.0 \n",
|
||
"Antigua and Barbuda 1.0 \n",
|
||
"Argentina NaN "
|
||
]
|
||
},
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Argentina'] = {\n",
|
||
" 'female_BMI': 27.46523,\n",
|
||
" 'male_BMI': 27.5017,\n",
|
||
" 'gdp': 14646.0,\n",
|
||
" 'population': 40381860.0,\n",
|
||
" 'under5mortality': 15.4,\n",
|
||
" 'life_expectancy': 75.4,\n",
|
||
" 'fertility': 2.24\n",
|
||
"}\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>27.46523</td>\n",
|
||
" <td>27.50170</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>15.4</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" <td>2.24</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI population under5mortality \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 26528741.0 110.4 \n",
|
||
"Albania 25.65726 26.44657 2968026.0 17.9 \n",
|
||
"Algeria 26.36841 24.59620 34811059.0 29.5 \n",
|
||
"Angola 23.48431 22.25083 19842251.0 192.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 85350.0 10.9 \n",
|
||
"Argentina 27.46523 27.50170 40381860.0 15.4 \n",
|
||
"\n",
|
||
" life_expectancy fertility continent tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 52.8 6.20 Asia 1.0 \n",
|
||
"Albania 76.8 1.76 Europe 1.0 \n",
|
||
"Algeria 75.5 2.73 Africa 1.0 \n",
|
||
"Angola 56.7 6.43 Africa 1.0 \n",
|
||
"Antigua and Barbuda 75.5 2.16 Americas 1.0 \n",
|
||
"Argentina 75.4 2.24 NaN NaN "
|
||
]
|
||
},
|
||
"execution_count": 48,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.drop('gdp', axis='columns')\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Filtrowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka pandas posiada 2 sposoby na filtrowanie danych zawartych w ramce danych:\n",
|
||
" * operator `[]` -- najbardziej rozpowszechniony;\n",
|
||
" * metoda `query()`.\n",
|
||
"Oba sposoby mają różną składnię.\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"5 Allen\\t Mr. William Henry male 35.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 49,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 0\n",
|
||
"2 1\n",
|
||
"3 1\n",
|
||
"4 1\n",
|
||
"5 0\n",
|
||
" ..\n",
|
||
"887 0\n",
|
||
"888 1\n",
|
||
"889 0\n",
|
||
"890 1\n",
|
||
"891 0\n",
|
||
"Name: Survived, Length: 891, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 True\n",
|
||
"3 True\n",
|
||
"4 True\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 True\n",
|
||
"889 False\n",
|
||
"890 True\n",
|
||
"891 False\n",
|
||
"Name: Survived, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived'] == 1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>873</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Carlsson\\t Mr. Frans Olof</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>33.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>695</td>\n",
|
||
" <td>5.0000</td>\n",
|
||
" <td>B51 B53 B55</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>890</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Behr\\t Mr. Karl Howell</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>111369</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>C148</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>216 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"... ... ... \n",
|
||
"872 1 1 \n",
|
||
"873 0 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"890 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"... ... ... ... \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"873 Carlsson\\t Mr. Frans Olof male 33.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"890 Behr\\t Mr. Karl Howell male 26.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"873 0 0 695 5.0000 B51 B53 B55 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"890 0 0 111369 30.0000 C148 C \n",
|
||
"\n",
|
||
"[216 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[df['Pclass'] == 1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operatory\n",
|
||
"\n",
|
||
"* `&` - koniukcja (i)\n",
|
||
"* `|` - alternatywa (lub)\n",
|
||
"* `~` - negacja (nie)\n",
|
||
"* `()` - jeżeli mamy kilka warunków to warto je uporządkować w nawiasy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>857</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wick\\t Mrs. George Dennick (Mary Hitchcock)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>45.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>36928</td>\n",
|
||
" <td>164.8667</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>863</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Swift\\t Mrs. Frederick Joel (Margaret Welles B...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>48.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17466</td>\n",
|
||
" <td>25.9292</td>\n",
|
||
" <td>D17</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>94 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"... ... ... \n",
|
||
"857 1 1 \n",
|
||
"863 1 1 \n",
|
||
"872 1 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"... ... ... ... \n",
|
||
"857 Wick\\t Mrs. George Dennick (Mary Hitchcock) female 45.0 \n",
|
||
"863 Swift\\t Mrs. Frederick Joel (Margaret Welles B... female 48.0 \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"857 1 1 36928 164.8667 NaN S \n",
|
||
"863 0 0 17466 25.9292 D17 S \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"\n",
|
||
"[94 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pierwsza_klasa = df['Pclass'] == 1\n",
|
||
"kobiety = df['Sex'] == 'female'\n",
|
||
"\n",
|
||
"df[pierwsza_klasa & kobiety]\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"df[df['SibSp'] > df['Parch']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### `pd.DataFrame.query`\n",
|
||
"\n",
|
||
"Innym sposobem na filtrowanie danych jest metoda `query`, która jako argument przyjmuje wyrażenie:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S "
|
||
]
|
||
},
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('Pclass == 1').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C "
|
||
]
|
||
},
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('(Pclass == 1) and (Sex == \"female\")').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('SibSp > Parch')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(113, 11)"
|
||
]
|
||
},
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"young = 18\n",
|
||
"df.query('Age < @young').shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacje na wierszach i kolumnach"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 \n",
|
||
"Albania 17.9 76.8 1.76 \n",
|
||
"Algeria 29.5 75.5 2.73 \n",
|
||
"Angola 192.0 56.7 6.43 \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 "
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Iterowanie po ramce danych oznacza oznacza przejście po nazwach kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"female_BMI\n",
|
||
"male_BMI\n",
|
||
"gdp\n",
|
||
"population\n",
|
||
"under5mortality\n",
|
||
"life_expectancy\n",
|
||
"fertility\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for column_name in df:\n",
|
||
" print(column_name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"female_BMI Country\n",
|
||
"Afghanistan 21.07402\n",
|
||
"Albania 25.65726\n",
|
||
"Algeria 26.36841\n",
|
||
"Angola 23.48431\n",
|
||
"Antigua and Barbuda 27.50545\n",
|
||
"Name: female_BMI, dtype: float64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for col_name, series in df.items():\n",
|
||
" print(col_name, series)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 62,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Afghanistan \n",
|
||
" female_BMI 2.107402e+01\n",
|
||
"male_BMI 2.062058e+01\n",
|
||
"gdp 1.311000e+03\n",
|
||
"population 2.652874e+07\n",
|
||
"under5mortality 1.104000e+02\n",
|
||
"life_expectancy 5.280000e+01\n",
|
||
"fertility 6.200000e+00\n",
|
||
"Name: Afghanistan, dtype: float64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for idx, row in df.iterrows():\n",
|
||
" print(idx, '\\n', row)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan normal\n",
|
||
"Albania overweight\n",
|
||
"Algeria normal\n",
|
||
"Angola normal\n",
|
||
"Antigua and Barbuda overweight\n",
|
||
"Name: male_BMI, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def bmi_level(bmi):\n",
|
||
" if bmi <= 18.5:\n",
|
||
" level = 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" level = 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" level = 'overweight'\n",
|
||
" else:\n",
|
||
" level = 'obese'\n",
|
||
" return level\n",
|
||
"\n",
|
||
"s = df['male_BMI'].map(bmi_level)\n",
|
||
" \n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan normal\n",
|
||
"Albania overweight\n",
|
||
"Algeria normal\n",
|
||
"Angola normal\n",
|
||
"Antigua and Barbuda overweight\n",
|
||
"dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def bmi_level(row_data):\n",
|
||
" bmi = row_data['male_BMI']\n",
|
||
" if bmi <= 18.5:\n",
|
||
" return 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" return 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" return 'overweight'\n",
|
||
" return 'obese'\n",
|
||
"\n",
|
||
"df.apply(bmi_level, axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th>Country</th>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <th>Albania</th>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <th>Angola</th>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <td>2.107402e+01</td>\n",
|
||
" <td>2.565726e+01</td>\n",
|
||
" <td>2.636841e+01</td>\n",
|
||
" <td>2.348431e+01</td>\n",
|
||
" <td>27.50545</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <td>2.062058e+01</td>\n",
|
||
" <td>2.644657e+01</td>\n",
|
||
" <td>2.459620e+01</td>\n",
|
||
" <td>2.225083e+01</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>gdp</th>\n",
|
||
" <td>1.311000e+03</td>\n",
|
||
" <td>8.644000e+03</td>\n",
|
||
" <td>1.231400e+04</td>\n",
|
||
" <td>7.103000e+03</td>\n",
|
||
" <td>25736.00000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>population</th>\n",
|
||
" <td>2.652874e+07</td>\n",
|
||
" <td>2.968026e+06</td>\n",
|
||
" <td>3.481106e+07</td>\n",
|
||
" <td>1.984225e+07</td>\n",
|
||
" <td>85350.00000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <td>1.104000e+02</td>\n",
|
||
" <td>1.790000e+01</td>\n",
|
||
" <td>2.950000e+01</td>\n",
|
||
" <td>1.920000e+02</td>\n",
|
||
" <td>10.90000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <td>5.280000e+01</td>\n",
|
||
" <td>7.680000e+01</td>\n",
|
||
" <td>7.550000e+01</td>\n",
|
||
" <td>5.670000e+01</td>\n",
|
||
" <td>75.50000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>fertility</th>\n",
|
||
" <td>6.200000e+00</td>\n",
|
||
" <td>1.760000e+00</td>\n",
|
||
" <td>2.730000e+00</td>\n",
|
||
" <td>6.430000e+00</td>\n",
|
||
" <td>2.16000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
"Country Afghanistan Albania Algeria Angola \\\n",
|
||
"female_BMI 2.107402e+01 2.565726e+01 2.636841e+01 2.348431e+01 \n",
|
||
"male_BMI 2.062058e+01 2.644657e+01 2.459620e+01 2.225083e+01 \n",
|
||
"gdp 1.311000e+03 8.644000e+03 1.231400e+04 7.103000e+03 \n",
|
||
"population 2.652874e+07 2.968026e+06 3.481106e+07 1.984225e+07 \n",
|
||
"under5mortality 1.104000e+02 1.790000e+01 2.950000e+01 1.920000e+02 \n",
|
||
"life_expectancy 5.280000e+01 7.680000e+01 7.550000e+01 5.670000e+01 \n",
|
||
"fertility 6.200000e+00 1.760000e+00 2.730000e+00 6.430000e+00 \n",
|
||
"\n",
|
||
"Country Antigua and Barbuda \n",
|
||
"female_BMI 27.50545 \n",
|
||
"male_BMI 25.76602 \n",
|
||
"gdp 25736.00000 \n",
|
||
"population 85350.00000 \n",
|
||
"under5mortality 10.90000 \n",
|
||
"life_expectancy 75.50000 \n",
|
||
"fertility 2.16000 "
|
||
]
|
||
},
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.transpose()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Grupowanie (`groupby`)\n",
|
||
"\n",
|
||
"Często zdarza się, gdy potrzebujemy podzielić dane ze względu na wartości w zadanej kolumnie, a następnie obliczenie zebranie danych w każdej z grup. Do tego służy metody `groupby`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Team</th>\n",
|
||
" <th>Number</th>\n",
|
||
" <th>Position</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>Height</th>\n",
|
||
" <th>Weight</th>\n",
|
||
" <th>College</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>401</th>\n",
|
||
" <td>Tyus Jones</td>\n",
|
||
" <td>Minnesota Timberwolves</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>20.0</td>\n",
|
||
" <td>6-2</td>\n",
|
||
" <td>195.0</td>\n",
|
||
" <td>Duke</td>\n",
|
||
" <td>1282080.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>342</th>\n",
|
||
" <td>Gerald Green</td>\n",
|
||
" <td>Miami Heat</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>SF</td>\n",
|
||
" <td>30.0</td>\n",
|
||
" <td>6-7</td>\n",
|
||
" <td>205.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>947276.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>143</th>\n",
|
||
" <td>DeMarcus Cousins</td>\n",
|
||
" <td>Sacramento Kings</td>\n",
|
||
" <td>15.0</td>\n",
|
||
" <td>C</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>6-11</td>\n",
|
||
" <td>270.0</td>\n",
|
||
" <td>Kentucky</td>\n",
|
||
" <td>15851950.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>267</th>\n",
|
||
" <td>P.J. Hairston</td>\n",
|
||
" <td>Memphis Grizzlies</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>SF</td>\n",
|
||
" <td>23.0</td>\n",
|
||
" <td>6-6</td>\n",
|
||
" <td>230.0</td>\n",
|
||
" <td>North Carolina</td>\n",
|
||
" <td>1201440.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>335</th>\n",
|
||
" <td>Jeremy Lin</td>\n",
|
||
" <td>Charlotte Hornets</td>\n",
|
||
" <td>7.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>6-3</td>\n",
|
||
" <td>200.0</td>\n",
|
||
" <td>Harvard</td>\n",
|
||
" <td>2139000.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Name Team Number Position Age Height \\\n",
|
||
"401 Tyus Jones Minnesota Timberwolves 1.0 PG 20.0 6-2 \n",
|
||
"342 Gerald Green Miami Heat 14.0 SF 30.0 6-7 \n",
|
||
"143 DeMarcus Cousins Sacramento Kings 15.0 C 25.0 6-11 \n",
|
||
"267 P.J. Hairston Memphis Grizzlies 19.0 SF 23.0 6-6 \n",
|
||
"335 Jeremy Lin Charlotte Hornets 7.0 PG 27.0 6-3 \n",
|
||
"\n",
|
||
" Weight College Salary \n",
|
||
"401 195.0 Duke 1282080.0 \n",
|
||
"342 205.0 NaN 947276.0 \n",
|
||
"143 270.0 Kentucky 15851950.0 \n",
|
||
"267 230.0 North Carolina 1201440.0 \n",
|
||
"335 200.0 Harvard 2139000.0 "
|
||
]
|
||
},
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"df = pd.read_csv('./nba.csv')\n",
|
||
"\n",
|
||
"df.sample(5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"_Przykład_: chcemy obliczyć średnią wypłatę dla każdej z drużyn."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Team</th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Atlanta Hawks</th>\n",
|
||
" <td>4.860197e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Boston Celtics</th>\n",
|
||
" <td>4.181505e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Brooklyn Nets</th>\n",
|
||
" <td>3.501898e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Charlotte Hornets</th>\n",
|
||
" <td>5.222728e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Chicago Bulls</th>\n",
|
||
" <td>5.785559e+06</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary\n",
|
||
"Team \n",
|
||
"Atlanta Hawks 4.860197e+06\n",
|
||
"Boston Celtics 4.181505e+06\n",
|
||
"Brooklyn Nets 3.501898e+06\n",
|
||
"Charlotte Hornets 5.222728e+06\n",
|
||
"Chicago Bulls 5.785559e+06"
|
||
]
|
||
},
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Team', 'Salary']].groupby('Team').mean().head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Możemy też podać listę nazw kolumn. Wtedy wartości zostaną obliczone dla każdej z wytworzonych grup:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Team Position\n",
|
||
"Atlanta Hawks C 7.585417e+06\n",
|
||
" PF 5.988067e+06\n",
|
||
" PG 4.881700e+06\n",
|
||
" SF 3.000000e+06\n",
|
||
" SG 2.607758e+06\n",
|
||
" ... \n",
|
||
"Washington Wizards C 8.163476e+06\n",
|
||
" PF 5.650000e+06\n",
|
||
" PG 9.011208e+06\n",
|
||
" SF 2.789700e+06\n",
|
||
" SG 2.839248e+06\n",
|
||
"Name: Salary, Length: 149, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.groupby(['Team', 'Position'])['Salary'].mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * `sum()`\n",
|
||
" * `min()`\n",
|
||
" * `max()`\n",
|
||
" * `mean()`\n",
|
||
" * `size()`\n",
|
||
" * `describe()`\n",
|
||
" * `first()`\n",
|
||
" * `last()`\n",
|
||
" * `count()`\n",
|
||
" * `std()`\n",
|
||
" * `var()`\n",
|
||
" * `sem()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead tr th {\n",
|
||
" text-align: left;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead tr:last-of-type th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr>\n",
|
||
" <th></th>\n",
|
||
" <th colspan=\"3\" halign=\"left\">Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th></th>\n",
|
||
" <th>mean</th>\n",
|
||
" <th>std</th>\n",
|
||
" <th>count</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Position</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>C</th>\n",
|
||
" <td>5.967052e+06</td>\n",
|
||
" <td>5.787989e+06</td>\n",
|
||
" <td>78</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PF</th>\n",
|
||
" <td>4.562483e+06</td>\n",
|
||
" <td>4.800054e+06</td>\n",
|
||
" <td>97</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PG</th>\n",
|
||
" <td>5.077829e+06</td>\n",
|
||
" <td>5.051809e+06</td>\n",
|
||
" <td>88</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SF</th>\n",
|
||
" <td>4.857393e+06</td>\n",
|
||
" <td>6.011889e+06</td>\n",
|
||
" <td>84</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SG</th>\n",
|
||
" <td>4.009861e+06</td>\n",
|
||
" <td>4.491609e+06</td>\n",
|
||
" <td>99</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary \n",
|
||
" mean std count\n",
|
||
"Position \n",
|
||
"C 5.967052e+06 5.787989e+06 78\n",
|
||
"PF 4.562483e+06 4.800054e+06 97\n",
|
||
"PG 5.077829e+06 5.051809e+06 88\n",
|
||
"SF 4.857393e+06 6.011889e+06 84\n",
|
||
"SG 4.009861e+06 4.491609e+06 99"
|
||
]
|
||
},
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Position</th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>C</th>\n",
|
||
" <td>22275967.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PF</th>\n",
|
||
" <td>22081286.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PG</th>\n",
|
||
" <td>21412973.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SF</th>\n",
|
||
" <td>24969112.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SG</th>\n",
|
||
" <td>19944278.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary\n",
|
||
"Position \n",
|
||
"C 22275967.0\n",
|
||
"PF 22081286.0\n",
|
||
"PG 21412973.0\n",
|
||
"SF 24969112.0\n",
|
||
"SG 19944278.0"
|
||
]
|
||
},
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def group_range(x):\n",
|
||
" return x.max() - x.min()\n",
|
||
"\n",
|
||
"df[['Position', 'Salary']].groupby('Position').apply(group_range)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Liczba grup: 5\n",
|
||
"dict_keys(['C', 'PF', 'PG', 'SF', 'SG'])\n",
|
||
" Name Team Number Position Age Height Weight \\\n",
|
||
"7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 \n",
|
||
"10 Jared Sullinger Boston Celtics 7.0 C 24.0 6-9 260.0 \n",
|
||
"14 Tyler Zeller Boston Celtics 44.0 C 26.0 7-0 253.0 \n",
|
||
"23 Brook Lopez Brooklyn Nets 11.0 C 28.0 7-0 275.0 \n",
|
||
"27 Henry Sims Brooklyn Nets 14.0 C 26.0 6-10 248.0 \n",
|
||
"\n",
|
||
" College Salary \n",
|
||
"7 Gonzaga 2165160.0 \n",
|
||
"10 Ohio State 2569260.0 \n",
|
||
"14 North Carolina 2616975.0 \n",
|
||
"23 Stanford 19689000.0 \n",
|
||
"27 Georgetown 947276.0 \n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"gb = df.groupby(['Position'])\n",
|
||
"\n",
|
||
"print('Liczba grup:', gb.ngroups)\n",
|
||
"print(gb.groups.keys())\n",
|
||
"\n",
|
||
"print(gb.get_group('C').head())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 15.36\n",
|
||
"1 15.36\n",
|
||
"2 15.36\n",
|
||
"3 15.36\n",
|
||
"4 15.36\n",
|
||
" ... \n",
|
||
"453 15.36\n",
|
||
"454 15.36\n",
|
||
"455 17.92\n",
|
||
"456 17.92\n",
|
||
"457 <NA>\n",
|
||
"Name: Height, Length: 458, dtype: Float64"
|
||
]
|
||
},
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"df.Height.str.split('-').str[0].astype('Int64') * 2.56"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Pivot\n",
|
||
"Metoda `pivot` pozwala na stworzenie nowej ramki danych, gdzie indeks i nazwy kolumn są wartościami początkowej ranki danych. \n",
|
||
"\n",
|
||
"_Przykład_: zobaczmy na poniższą ramkę danych, która zawiera informacje o jakości tłumaczenia dla pary językowej hausa-angielski. Kolumna `system` zawiera nazwę systemu, kolumna `metric` - nazwę metryki, zaś kolumna `score`- wartość metryki. Chcemy przedstawić te dane w następujący sposób: jako klucz chcemy mieć nazwę systemu, zaś jako kolumny - metryki. Możemy wykorzystać do tego metodę `pivot`, gdzie musimy podać 3 argumenty:\n",
|
||
" * `index`: nazwę kolumny, na podstawie której zostanie stworzony indeks;\n",
|
||
" * `columns`: nazwa kolumny, które zawiera nazwy kolumn dla nowej ramki danych;\n",
|
||
" * `values`: nazwa kolumny, która zawiera interesujące nas dane."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>pair</th>\n",
|
||
" <th>system</th>\n",
|
||
" <th>id</th>\n",
|
||
" <th>is_constrained</th>\n",
|
||
" <th>metric</th>\n",
|
||
" <th>score</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1214</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1215</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1216</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1217</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1218</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1219</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1220</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1221</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1222</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1223</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1224</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1225</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1226</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1227</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1228</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1229</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1230</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1231</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1232</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1233</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1234</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1235</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1236</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1237</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1238</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1239</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1240</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1241</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1242</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1243</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1244</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1245</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1246</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1247</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1248</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1249</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1250</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1251</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1252</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1253</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1254</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1255</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1256</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1257</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1258</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1259</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1260</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1261</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1262</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1263</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1264</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1265</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1266</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1267</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1268</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1269</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" pair system id is_constrained metric score\n",
|
||
"1214 ha-en NiuTrans 382 True bleu-all 16.512243\n",
|
||
"1215 ha-en NiuTrans 382 True chrf-all 44.724766\n",
|
||
"1216 ha-en NiuTrans 382 True bleu-A 16.512243\n",
|
||
"1217 ha-en NiuTrans 382 True chrf-A 44.724766\n",
|
||
"1218 ha-en Facebook-AI 181 False bleu-all 20.982704\n",
|
||
"1219 ha-en Facebook-AI 181 False chrf-all 48.653770\n",
|
||
"1220 ha-en Facebook-AI 181 False bleu-A 20.982704\n",
|
||
"1221 ha-en Facebook-AI 181 False chrf-A 48.653770\n",
|
||
"1222 ha-en TRANSSION 336 False bleu-all 18.834851\n",
|
||
"1223 ha-en TRANSSION 336 False chrf-all 47.238279\n",
|
||
"1224 ha-en TRANSSION 336 False bleu-A 18.834851\n",
|
||
"1225 ha-en TRANSSION 336 False chrf-A 47.238279\n",
|
||
"1226 ha-en AMU 628 True bleu-all 14.132845\n",
|
||
"1227 ha-en AMU 628 True chrf-all 41.256570\n",
|
||
"1228 ha-en AMU 628 True bleu-A 14.132845\n",
|
||
"1229 ha-en AMU 628 True chrf-A 41.256570\n",
|
||
"1230 ha-en P3AI 715 True bleu-all 17.793617\n",
|
||
"1231 ha-en P3AI 715 True chrf-all 46.307402\n",
|
||
"1232 ha-en P3AI 715 True bleu-A 17.793617\n",
|
||
"1233 ha-en P3AI 715 True chrf-A 46.307402\n",
|
||
"1234 ha-en Online-B 1356 False bleu-all 18.655658\n",
|
||
"1235 ha-en Online-B 1356 False chrf-all 46.658216\n",
|
||
"1236 ha-en Online-B 1356 False bleu-A 18.655658\n",
|
||
"1237 ha-en Online-B 1356 False chrf-A 46.658216\n",
|
||
"1238 ha-en TWB 1335 False bleu-all 12.326443\n",
|
||
"1239 ha-en TWB 1335 False chrf-all 40.282629\n",
|
||
"1240 ha-en TWB 1335 False bleu-A 12.326443\n",
|
||
"1241 ha-en TWB 1335 False chrf-A 40.282629\n",
|
||
"1242 ha-en ZMT 553 False bleu-all 18.837023\n",
|
||
"1243 ha-en ZMT 553 False chrf-all 47.231474\n",
|
||
"1244 ha-en ZMT 553 False bleu-A 18.837023\n",
|
||
"1245 ha-en ZMT 553 False chrf-A 47.231474\n",
|
||
"1246 ha-en Manifold 437 True bleu-all 16.943915\n",
|
||
"1247 ha-en Manifold 437 True chrf-all 45.638356\n",
|
||
"1248 ha-en Manifold 437 True bleu-A 16.943915\n",
|
||
"1249 ha-en Manifold 437 True chrf-A 45.638356\n",
|
||
"1250 ha-en Online-Y 1374 False bleu-all 13.898531\n",
|
||
"1251 ha-en Online-Y 1374 False chrf-all 44.842874\n",
|
||
"1252 ha-en Online-Y 1374 False bleu-A 13.898531\n",
|
||
"1253 ha-en Online-Y 1374 False chrf-A 44.842874\n",
|
||
"1254 ha-en HuaweiTSC 758 True bleu-all 17.492440\n",
|
||
"1255 ha-en HuaweiTSC 758 True chrf-all 46.795737\n",
|
||
"1256 ha-en HuaweiTSC 758 True bleu-A 17.492440\n",
|
||
"1257 ha-en HuaweiTSC 758 True chrf-A 46.795737\n",
|
||
"1258 ha-en MS-EgDC 896 True bleu-all 17.133350\n",
|
||
"1259 ha-en MS-EgDC 896 True chrf-all 45.266274\n",
|
||
"1260 ha-en MS-EgDC 896 True bleu-A 17.133350\n",
|
||
"1261 ha-en MS-EgDC 896 True chrf-A 45.266274\n",
|
||
"1262 ha-en GTCOM 1298 False bleu-all 17.794272\n",
|
||
"1263 ha-en GTCOM 1298 False chrf-all 46.714831\n",
|
||
"1264 ha-en GTCOM 1298 False bleu-A 17.794272\n",
|
||
"1265 ha-en GTCOM 1298 False chrf-A 46.714831\n",
|
||
"1266 ha-en UEdin 1149 True bleu-all 14.887836\n",
|
||
"1267 ha-en UEdin 1149 True chrf-all 42.247415\n",
|
||
"1268 ha-en UEdin 1149 True bleu-A 14.887836\n",
|
||
"1269 ha-en UEdin 1149 True chrf-A 42.247415"
|
||
]
|
||
},
|
||
"execution_count": 73,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n",
|
||
"df = df[df.pair == 'ha-en']\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th>metric</th>\n",
|
||
" <th>bleu-A</th>\n",
|
||
" <th>bleu-all</th>\n",
|
||
" <th>chrf-A</th>\n",
|
||
" <th>chrf-all</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>system</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>AMU</th>\n",
|
||
" <td>14.132845</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Facebook-AI</th>\n",
|
||
" <td>20.982704</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>GTCOM</th>\n",
|
||
" <td>17.794272</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>HuaweiTSC</th>\n",
|
||
" <td>17.492440</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>MS-EgDC</th>\n",
|
||
" <td>17.133350</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Manifold</th>\n",
|
||
" <td>16.943915</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>NiuTrans</th>\n",
|
||
" <td>16.512243</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Online-B</th>\n",
|
||
" <td>18.655658</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Online-Y</th>\n",
|
||
" <td>13.898531</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>P3AI</th>\n",
|
||
" <td>17.793617</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>TRANSSION</th>\n",
|
||
" <td>18.834851</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>TWB</th>\n",
|
||
" <td>12.326443</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>UEdin</th>\n",
|
||
" <td>14.887836</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ZMT</th>\n",
|
||
" <td>18.837023</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
"metric bleu-A bleu-all chrf-A chrf-all\n",
|
||
"system \n",
|
||
"AMU 14.132845 14.132845 41.256570 41.256570\n",
|
||
"Facebook-AI 20.982704 20.982704 48.653770 48.653770\n",
|
||
"GTCOM 17.794272 17.794272 46.714831 46.714831\n",
|
||
"HuaweiTSC 17.492440 17.492440 46.795737 46.795737\n",
|
||
"MS-EgDC 17.133350 17.133350 45.266274 45.266274\n",
|
||
"Manifold 16.943915 16.943915 45.638356 45.638356\n",
|
||
"NiuTrans 16.512243 16.512243 44.724766 44.724766\n",
|
||
"Online-B 18.655658 18.655658 46.658216 46.658216\n",
|
||
"Online-Y 13.898531 13.898531 44.842874 44.842874\n",
|
||
"P3AI 17.793617 17.793617 46.307402 46.307402\n",
|
||
"TRANSSION 18.834851 18.834851 47.238279 47.238279\n",
|
||
"TWB 12.326443 12.326443 40.282629 40.282629\n",
|
||
"UEdin 14.887836 14.887836 42.247415 42.247415\n",
|
||
"ZMT 18.837023 18.837023 47.231474 47.231474"
|
||
]
|
||
},
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.pivot(index='system', columns='metric', values='score')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Dane tekstowe"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"`pandas` posiada udogodnienia do pracy z wartościami tekstowymi:\n",
|
||
" * dostęp następuje przez atrybut `str`;\n",
|
||
" * funkcje:\n",
|
||
" * formatujące: `lower()`, `upper()`;\n",
|
||
" * wyrażenia regularne: `contains()`, `match()`;\n",
|
||
" * inne: `split()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 75,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"5 Allen\\t Mr. William Henry male 35.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 75,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 BRAUND\\t MR. OWEN HARRIS\n",
|
||
"2 CUMINGS\\t MRS. JOHN BRADLEY (FLORENCE BRIGGS T...\n",
|
||
"3 HEIKKINEN\\t MISS. LAINA\n",
|
||
"4 FUTRELLE\\t MRS. JACQUES HEATH (LILY MAY PEEL)\n",
|
||
"5 ALLEN\\t MR. WILLIAM HENRY\n",
|
||
" ... \n",
|
||
"887 MONTVILA\\t REV. JUOZAS\n",
|
||
"888 GRAHAM\\t MISS. MARGARET EDITH\n",
|
||
"889 JOHNSTON\\t MISS. CATHERINE HELEN \"CARRIE\"\n",
|
||
"890 BEHR\\t MR. KARL HOWELL\n",
|
||
"891 DOOLEY\\t MR. PATRICK\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.upper()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 77,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"PassengerId\n",
|
||
"1 Braund\\t Mr. Owen Harris\n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T...\n",
|
||
"3 Heikkinen\\t Miss. Laina\n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Allen\\t Mr. William Henry\n",
|
||
"Name: Name, dtype: object\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 True\n",
|
||
"3 True\n",
|
||
"4 True\n",
|
||
"5 False\n",
|
||
"Name: Name, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 77,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.Name.head())\n",
|
||
"df.Name.str.contains('Miss|Mrs').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 78,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>0</th>\n",
|
||
" <th>1</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Braund</td>\n",
|
||
" <td>Mr. Owen Harris</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Cumings</td>\n",
|
||
" <td>Mrs. John Bradley (Florence Briggs Thayer)</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Heikkinen</td>\n",
|
||
" <td>Miss. Laina</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Futrelle</td>\n",
|
||
" <td>Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Allen</td>\n",
|
||
" <td>Mr. William Henry</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>887</th>\n",
|
||
" <td>Montvila</td>\n",
|
||
" <td>Rev. Juozas</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>Graham</td>\n",
|
||
" <td>Miss. Margaret Edith</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>889</th>\n",
|
||
" <td>Johnston</td>\n",
|
||
" <td>Miss. Catherine Helen \"Carrie\"</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>890</th>\n",
|
||
" <td>Behr</td>\n",
|
||
" <td>Mr. Karl Howell</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>891</th>\n",
|
||
" <td>Dooley</td>\n",
|
||
" <td>Mr. Patrick</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>891 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" 0 1\n",
|
||
"PassengerId \n",
|
||
"1 Braund Mr. Owen Harris\n",
|
||
"2 Cumings Mrs. John Bradley (Florence Briggs Thayer)\n",
|
||
"3 Heikkinen Miss. Laina\n",
|
||
"4 Futrelle Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Allen Mr. William Henry\n",
|
||
"... ... ...\n",
|
||
"887 Montvila Rev. Juozas\n",
|
||
"888 Graham Miss. Margaret Edith\n",
|
||
"889 Johnston Miss. Catherine Helen \"Carrie\"\n",
|
||
"890 Behr Mr. Karl Howell\n",
|
||
"891 Dooley Mr. Patrick\n",
|
||
"\n",
|
||
"[891 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 78,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t', expand=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 [Braund, Mr. Owen Harris]\n",
|
||
"2 [Cumings, Mrs. John Bradley (Florence Briggs ...\n",
|
||
"3 [Heikkinen, Miss. Laina]\n",
|
||
"4 [Futrelle, Mrs. Jacques Heath (Lily May Peel)]\n",
|
||
"5 [Allen, Mr. William Henry]\n",
|
||
" ... \n",
|
||
"887 [Montvila, Rev. Juozas]\n",
|
||
"888 [Graham, Miss. Margaret Edith]\n",
|
||
"889 [Johnston, Miss. Catherine Helen \"Carrie\"]\n",
|
||
"890 [Behr, Mr. Karl Howell]\n",
|
||
"891 [Dooley, Mr. Patrick]\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 80,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 Mr. Owen Harris\n",
|
||
"2 Mrs. John Bradley (Florence Briggs Thayer)\n",
|
||
"3 Miss. Laina\n",
|
||
"4 Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Mr. William Henry\n",
|
||
" ... \n",
|
||
"887 Rev. Juozas\n",
|
||
"888 Miss. Margaret Edith\n",
|
||
"889 Miss. Catherine Helen \"Carrie\"\n",
|
||
"890 Mr. Karl Howell\n",
|
||
"891 Mr. Patrick\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 80,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 81,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 Mr.\n",
|
||
"2 Mrs.\n",
|
||
"3 Miss.\n",
|
||
"4 Mrs.\n",
|
||
"5 Mr.\n",
|
||
" ... \n",
|
||
"887 Rev.\n",
|
||
"888 Miss.\n",
|
||
"889 Miss.\n",
|
||
"890 Mr.\n",
|
||
"891 Mr.\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 81,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"Zestaw `nba.csv` zawiera informaję o wysokości zawodników. Oblicz wzrost każdego z zawodników w systemie metrycznym przyjmując, że stop to `30.48` cm., a cal to `2.54` cm."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"celltoolbar": "Slideshow",
|
||
"interpreter": {
|
||
"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.11"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|