8094 lines
235 KiB
Plaintext
8094 lines
235 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# Analiza Danych w Pythonie: `pandas`\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### `pandas`\n",
|
||
"Biblioteka `pandas` jest podstawowym narzędziem w ekosystemie Pythona do analizy danych:\n",
|
||
" * dostarcza dwa podstawowe typy danych: \n",
|
||
" * `Series` (szereg, 1D)\n",
|
||
" * `DataFrame` (ramka danych, 2D)\n",
|
||
" * operacje na tych obiektach: obsługa brakujących wartości, łączenie danych;\n",
|
||
" * obsługuje dane różnego typu, np. szeregi czasowe;\n",
|
||
" * biblioteka bazuje na `numpy` -- bibliotece do obliczeń numerycznych;\n",
|
||
" * pozwala też na prostą wizualizację danych;\n",
|
||
" * ETL: extract, transform, load."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby zaimportowąc bibliotekę `pandas` wystarczy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### __Zadanie 0__: sprawdź, czy masz zainstalowaną bibliotekę `pandas`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Szeregi](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) (`pd.Series`)\n",
|
||
"\n",
|
||
" Szereg reprezentuje jednorodne dane jednowymiarowe - jest odpowiednikiem wektora w R.\n",
|
||
" * Szeregi możemy tworzyć na różne sposoby (więcej za chwilę), np. z obiektów tj. listy i słowniki.\n",
|
||
" * Dane muszą być jednorodne. W przeciwnym przypadku nastąpi automatyczna konwersja.\n",
|
||
" * Podczas tworzenia szeregu musimy podać jeden obowiązkowy argument `data` - dane.\n",
|
||
" * Ponadto możemy podać też indeks (`index`), typ danych (`dtype`) lub nazwę (`name`).\n",
|
||
" \n",
|
||
" \n",
|
||
" ```\n",
|
||
" class pandas.Series(data=None, index=None, dtype=None, name=None)\n",
|
||
" ```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenie szeregu mozemy podać dane w formacie listy lub słownika.\n",
|
||
"\n",
|
||
"Poniżej jest przykład przedstawiający tworzenie szeregu z danych, które są zawarte w liście:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 211819\n",
|
||
"1 682758\n",
|
||
"2 737011\n",
|
||
"3 779511\n",
|
||
"4 673790\n",
|
||
"5 673790\n",
|
||
"6 444177\n",
|
||
"7 136791\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511, 673790, 673790, 444177, 136791]\n",
|
||
"\n",
|
||
"s = pd.Series(data)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku, gdy dane pochodzą z listy i nie podaliśmy indeksu, pandas doda automatyczny indeks liczbowy zaczynający się od 0."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku przekazania słownika jako danych do szeregu, pandas wykorzysta klucze do stworzenia indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenia szeregu możemy zdefiniować indeks, jak i nazwę szeregu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819.0\n",
|
||
"May 682758.0\n",
|
||
"June 737011.0\n",
|
||
"July 779511.0\n",
|
||
"Name: Rides, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"months = ['April', 'May', 'June', 'July']\n",
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511]\n",
|
||
"\n",
|
||
"s = pd.Series(data=data, index=months, dtype=float, name='Rides')\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Odwołanie się do poszczególnego elementu odbywa się przy pomocy klucza z indeksu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"211819\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"print(s['April'])\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dodanie elementu do szeregu odbywa się poprzez definiowanie nowego klucza:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Więcej nt. indeksowania w szeregach w dalszej części kursu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podstawowa cechą szeregu jest wykonywanie operacji w sposób wektorowy. Działa to w następujący sposób:\n",
|
||
" * gdy w obu szeregach jest zawarty ten sam klucz, to są sumowane ich wartości;\n",
|
||
" * w przeciwnym przypadku wartość klucza w wynikowym szeregu to `pd.NaN`. \n",
|
||
" * Równoważnie możemy wykorzystać metodę `pandas.Series.add`. W tym przypadku możemy podać domyślną wartość w przypadku braku klucza."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"August 880599.0\n",
|
||
"July 973827.0\n",
|
||
"June 908505.0\n",
|
||
"May 830656.0\n",
|
||
"October NaN\n",
|
||
"September 814282.0\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'August': 673790, 'July': 779511,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492})\n",
|
||
"\n",
|
||
"all_data = members + occasionals\n",
|
||
"# Równoważnie\n",
|
||
"all_data = members.add(occasionals)\n",
|
||
"all_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy wykonać operacje arytmetyczne na szeregu: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 683758\n",
|
||
"June 738011\n",
|
||
"July 780511\n",
|
||
"August 674790\n",
|
||
"September 674790\n",
|
||
"October 445177\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"members += 1000\n",
|
||
"\n",
|
||
"members"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Podsumowanie\n",
|
||
" * Szeregi działają podobnie do słowników, z tą różnicą, że wartości muszą być jednorodne (tego samego typu).\n",
|
||
" * Odwołanie do poszczególnych elementów odbywa się poprzez nawiasy `[]` i podanie klucza.\n",
|
||
" * W przeciwieństwie do słowników, możemy w prosty sposób wykonywać operacje arytmetyczne."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie 1\n",
|
||
" * Stwórz szereg `n`, który będzie zawierać liczby od 0 do 10 (włącznie).\n",
|
||
" * Stwórz szereg `n2`, który będzie zawierać kwadraty liczb od 0 do 10 (włącznie).\n",
|
||
" * Następnie stwórz szereg `trojkatne`, który będzie sumą powyższych szeregów podzieloną przez 2."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Ramka danych](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) (`pd.DataFrame`)\n",
|
||
"\n",
|
||
"Ramka danych jest podstawową strukturą danych w bibliotece `pandas`, która pozwala na trzymanie i reprezentowanie danych tabelarycznych (dwuwymiarowych).\n",
|
||
" * Posiada kolumny (cechy) i wiersze (obserwacje, przykłady).\n",
|
||
" * Możemy też patrzeć na nią jak na słownik, którego wartościami są szeregi.\n",
|
||
"\n",
|
||
"```\n",
|
||
"class pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
|
||
"```\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ramkę danych możemy stworzyć na różne sposoby.\n",
|
||
"\n",
|
||
"Pierwszy z nich (\"kolumnowy\") polega na zdefiniowaniu ramki poprzez podanie szeregów jako kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Drugim popularnym sposobem jest przekazanie listy słowników. Wtedy `pandas` zinterpretuje to jako listę przykładów:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"0 682758 147898\n",
|
||
"1 737011 171494\n",
|
||
"2 779511 194316"
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"data = [\n",
|
||
" {'members': 682758, 'occasionals': 147898},\n",
|
||
" {'occasionals': 171494,'members': 737011},\n",
|
||
" {'members': 779511, 'occasionals': 194316},\n",
|
||
"]\n",
|
||
"\n",
|
||
"df = pd.DataFrame(data)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy też wykorzystać metodę `from_dict` ([doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html)), która pozwala zdefiniować czy podane dane są w podane w postaci kolumnowej lub wierszowej:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"index\n",
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"\n",
|
||
"columns\n",
|
||
" May June July\n",
|
||
"members 682758 737011 779511\n",
|
||
"occasionals 147898 171494 194316\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"data = {\n",
|
||
" 'May': {'members': 682758, 'occasionals': 147898},\n",
|
||
" 'June': {'members': 737011, 'occasionals': 171494},\n",
|
||
" 'July': {'members': 779511, 'occasionals': 194316}\n",
|
||
"}\n",
|
||
"\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='index')\n",
|
||
"print('index\\n', df)\n",
|
||
"print()\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='columns')\n",
|
||
"print('columns\\n', df)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Wczytywanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka `pandas` pozwala na wczytanie i zapis danych z różnych formatów:\n",
|
||
" * formaty tekstowe, np. `csv`, `json`\n",
|
||
" * pliki arkuszy kalkulacyjnych: Excel (xls, xlsx)\n",
|
||
" * bazy danych\n",
|
||
" * inne: `sas` `spss`\n",
|
||
"\n",
|
||
"\n",
|
||
"Efektem wczytania danych jest odpowiednio stworzona ramka danych (`DataFrame`)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Jednym z najprostszych formatów danych jest format `csv`, gdzie kolejne wartości są rozdzielone przecinkiem.\n",
|
||
"\n",
|
||
"Żeby wczytać dane w takim formacie należy użyć funkcji `pandas.read_csv`.\n",
|
||
"\n",
|
||
"Pandas pozwala na ustawienie wielu parametrów (np. separator, cudzysłowy). Więcej na ten temat w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Country</th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Afghanistan</td>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Albania</td>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Algeria</td>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Angola</td>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Antigua and Barbuda</td>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>170</th>\n",
|
||
" <td>Venezuela</td>\n",
|
||
" <td>28.13408</td>\n",
|
||
" <td>27.44500</td>\n",
|
||
" <td>17911.0</td>\n",
|
||
" <td>28116716.0</td>\n",
|
||
" <td>17.1</td>\n",
|
||
" <td>74.2</td>\n",
|
||
" <td>2.53</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>171</th>\n",
|
||
" <td>Vietnam</td>\n",
|
||
" <td>21.06500</td>\n",
|
||
" <td>20.91630</td>\n",
|
||
" <td>4085.0</td>\n",
|
||
" <td>86589342.0</td>\n",
|
||
" <td>26.2</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>1.86</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>172</th>\n",
|
||
" <td>Palestine</td>\n",
|
||
" <td>29.02643</td>\n",
|
||
" <td>26.57750</td>\n",
|
||
" <td>3564.0</td>\n",
|
||
" <td>3854667.0</td>\n",
|
||
" <td>24.7</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>4.38</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>173</th>\n",
|
||
" <td>Zambia</td>\n",
|
||
" <td>23.05436</td>\n",
|
||
" <td>20.68321</td>\n",
|
||
" <td>3039.0</td>\n",
|
||
" <td>13114579.0</td>\n",
|
||
" <td>94.9</td>\n",
|
||
" <td>51.1</td>\n",
|
||
" <td>5.88</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>174</th>\n",
|
||
" <td>Zimbabwe</td>\n",
|
||
" <td>24.64522</td>\n",
|
||
" <td>22.02660</td>\n",
|
||
" <td>1286.0</td>\n",
|
||
" <td>13495462.0</td>\n",
|
||
" <td>98.3</td>\n",
|
||
" <td>47.3</td>\n",
|
||
" <td>3.85</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>175 rows × 8 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Country female_BMI male_BMI gdp population \\\n",
|
||
"0 Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"1 Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"2 Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"3 Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"4 Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
".. ... ... ... ... ... \n",
|
||
"170 Venezuela 28.13408 27.44500 17911.0 28116716.0 \n",
|
||
"171 Vietnam 21.06500 20.91630 4085.0 86589342.0 \n",
|
||
"172 Palestine 29.02643 26.57750 3564.0 3854667.0 \n",
|
||
"173 Zambia 23.05436 20.68321 3039.0 13114579.0 \n",
|
||
"174 Zimbabwe 24.64522 22.02660 1286.0 13495462.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"0 110.4 52.8 6.20 \n",
|
||
"1 17.9 76.8 1.76 \n",
|
||
"2 29.5 75.5 2.73 \n",
|
||
"3 192.0 56.7 6.43 \n",
|
||
"4 10.9 75.5 2.16 \n",
|
||
".. ... ... ... \n",
|
||
"170 17.1 74.2 2.53 \n",
|
||
"171 26.2 74.1 1.86 \n",
|
||
"172 24.7 74.1 4.38 \n",
|
||
"173 94.9 51.1 5.88 \n",
|
||
"174 98.3 47.3 3.85 \n",
|
||
"\n",
|
||
"[175 rows x 8 columns]"
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('gapminder.csv')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35 \n",
|
||
"5 Allen\\t Mr. William Henry male 35 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', delimiter='\\t', index_col=0, nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do wczytania danych z arkusza kalkulacyjnego służy funkcja `pandas.read_excel`. Do otworzenia pliku `xlsx` może być koniecnze ustawienie parametru: `engine='openpyxl`. Więcej opcji w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>start_date</th>\n",
|
||
" <th>start_station_code</th>\n",
|
||
" <th>end_date</th>\n",
|
||
" <th>end_station_code</th>\n",
|
||
" <th>duration_sec</th>\n",
|
||
" <th>is_member</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>2019-04-14 07:55:22</td>\n",
|
||
" <td>6001</td>\n",
|
||
" <td>2019-04-14 08:07:16</td>\n",
|
||
" <td>6132</td>\n",
|
||
" <td>713</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>2019-04-14 07:59:31</td>\n",
|
||
" <td>6411</td>\n",
|
||
" <td>2019-04-14 08:09:18</td>\n",
|
||
" <td>6411</td>\n",
|
||
" <td>587</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>2019-04-14 07:59:55</td>\n",
|
||
" <td>6097</td>\n",
|
||
" <td>2019-04-14 08:12:11</td>\n",
|
||
" <td>6036</td>\n",
|
||
" <td>736</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2019-04-14 07:59:57</td>\n",
|
||
" <td>6310</td>\n",
|
||
" <td>2019-04-14 08:27:58</td>\n",
|
||
" <td>6345</td>\n",
|
||
" <td>1680</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>2019-04-14 08:00:37</td>\n",
|
||
" <td>7029</td>\n",
|
||
" <td>2019-04-14 08:14:12</td>\n",
|
||
" <td>6250</td>\n",
|
||
" <td>814</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" start_date start_station_code end_date \\\n",
|
||
"0 2019-04-14 07:55:22 6001 2019-04-14 08:07:16 \n",
|
||
"1 2019-04-14 07:59:31 6411 2019-04-14 08:09:18 \n",
|
||
"2 2019-04-14 07:59:55 6097 2019-04-14 08:12:11 \n",
|
||
"3 2019-04-14 07:59:57 6310 2019-04-14 08:27:58 \n",
|
||
"4 2019-04-14 08:00:37 7029 2019-04-14 08:14:12 \n",
|
||
"\n",
|
||
" end_station_code duration_sec is_member \n",
|
||
"0 6132 713 1 \n",
|
||
"1 6411 587 1 \n",
|
||
"2 6036 736 1 \n",
|
||
"3 6345 1680 1 \n",
|
||
"4 6250 814 0 "
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Innym ważnym źródłem informacji są bazy danych. Pandas potrafi komunikować się z bazą danych za pomocą biblioteki [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) i dostarcza odpowiedną funkcję:\n",
|
||
" * `pandas.read_sql` - wczytanie całej tabeli lub zapytania do bazy danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_sql('Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import sqlalchemy\n",
|
||
"\n",
|
||
"engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
|
||
"connection = engine.raw_connection()\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podsumowanie\n",
|
||
"\n",
|
||
"\n",
|
||
" * Biblioteka `pandas` wspiera pobieranie danych z różnych formatów i źródeł.\n",
|
||
" * Każda funkcja ma listę argumentów, które pozwalają na ustawić poszczególne parametry (np. [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv))."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zapis i eksport danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Pandas pozwala w prosty sposób na zapisywanie ramki danych do pliku. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# zapis do formatu CSV\n",
|
||
"df.to_csv('tmp.csv')\n",
|
||
"# zapis do arkusza kalkulacyjnego \n",
|
||
"df.to_excel('tmp.xlsx')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ponadto możemy przekonwertować ramkę danych do JSONa lub Pythonowego słownika:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{\"members\":{\"May\":682758,\"June\":737011,\"July\":779511},\"occasionals\":{\"May\":147898,\"June\":171494,\"July\":194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_json())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{'members': {'May': 682758, 'June': 737011, 'July': 779511}, 'occasionals': {'May': 147898, 'June': 171494, 'July': 194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_dict())\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Lub przekopiować dane do schowka:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.to_clipboard()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Przekonwertuj tabele `Customer` z bazy `Chinook.sqlite` do arkusza kalkulacyjnego. Plik wynikowy nazwij `customers.xlsx`.\n",
|
||
" * Tabela `Employee` zawiera informacje o pracownikach firmy Chinook. Wyswietl dane na ekranie i podaj miasta, w których mieszkają pracownicy.\n",
|
||
" * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Ramka danych - podstawy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Kolumny\n",
|
||
"\n",
|
||
"Na ramkę danych możemy patrzeć jak na swego rodzaju słownik, którego wartościami są szeregi. Pozwoli to na uzyskanie lepszej intuicji.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population life_expectancy\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=8, usecols=['Country', 'gdp', 'population','life_expectancy'])\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do poszczególnej kolumny możemy uzystać na dwa sposoby:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# notacja z kropką\n",
|
||
"df.population"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 24,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Operator []\n",
|
||
"df['population']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do operatora `[]` możemy też podać listę nazw kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0\n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Algeria 12314.0 34811059.0\n",
|
||
"Angola 7103.0 19842251.0\n",
|
||
"Antigua and Barbuda 25736.0 85350.0\n",
|
||
"Argentina 14646.0 40381860.0\n",
|
||
"Armenia 7383.0 2975029.0\n",
|
||
"Australia 41312.0 21370348.0"
|
||
]
|
||
},
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['gdp','population']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Listę kolumn możemy pobrać za pomocą:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['gdp', 'population', 'life_expectancy'], dtype='object')"
|
||
]
|
||
},
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns = ['PKB', 'Populacja', 'ODŻ']\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do poszczególnych wierszy należy wykorzystać metodę `loc`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PKB 14646.0\n",
|
||
"Populacja 40381860.0\n",
|
||
"ODŻ 75.4\n",
|
||
"Name: Argentina, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Argentina']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Metoda `loc` również może przyjąć listę wierszy: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[['Albania', 'Angola']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy również podać drugi parametr: nazwy kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Angola 7103.0 19842251.0"
|
||
]
|
||
},
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df2 = df.loc[['Albania', 'Angola'], ['PKB', 'Populacja']]\n",
|
||
"\n",
|
||
"df2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Albo wykorzystać tzw. _slicing_, cyzli operator `:`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Albania': 'Angola', 'PKB': 'ODŻ']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do pojedyńczej wartości możemy użyć metody `at`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"7103.0"
|
||
]
|
||
},
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.at['Angola', 'PKB']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda',\n",
|
||
" 'Argentina', 'Armenia', 'Australia'],\n",
|
||
" dtype='object', name='Country')"
|
||
]
|
||
},
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.index"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podstawowe metody `pd.Series` i `pd.DataFrame`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492, 'October': 53596})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `head` pozwala tworzy nową ramkę danych z pierwszymi 5 przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `tail` robi to samo, ale z 5 ostatnymi przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 36,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.tail()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `sample` pozwala na stworzenie nowej ramki danych z wylosowanymi `n` przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"September 673790 140492\n",
|
||
"August 673790 206809\n",
|
||
"May 682758 147898"
|
||
]
|
||
},
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.sample(3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `describe` zwraca podstawowe statystyki m.in.: liczebność, średnią, wartości skrajne: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>count</th>\n",
|
||
" <td>6.000000</td>\n",
|
||
" <td>6.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>mean</th>\n",
|
||
" <td>665172.833333</td>\n",
|
||
" <td>152434.166667</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>std</th>\n",
|
||
" <td>116216.045456</td>\n",
|
||
" <td>54783.506738</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>min</th>\n",
|
||
" <td>444177.000000</td>\n",
|
||
" <td>53596.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25%</th>\n",
|
||
" <td>673790.000000</td>\n",
|
||
" <td>142343.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50%</th>\n",
|
||
" <td>678274.000000</td>\n",
|
||
" <td>159696.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>75%</th>\n",
|
||
" <td>723447.750000</td>\n",
|
||
" <td>188610.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>max</th>\n",
|
||
" <td>779511.000000</td>\n",
|
||
" <td>206809.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"count 6.000000 6.000000\n",
|
||
"mean 665172.833333 152434.166667\n",
|
||
"std 116216.045456 54783.506738\n",
|
||
"min 444177.000000 53596.000000\n",
|
||
"25% 673790.000000 142343.500000\n",
|
||
"50% 678274.000000 159696.000000\n",
|
||
"75% 723447.750000 188610.500000\n",
|
||
"max 779511.000000 206809.000000"
|
||
]
|
||
},
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `info` zwraca informacje techniczne o kolumnach: np. typ danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 6 entries, May to October\n",
|
||
"Data columns (total 2 columns):\n",
|
||
" # Column Non-Null Count Dtype\n",
|
||
"--- ------ -------------- -----\n",
|
||
" 0 members 6 non-null int64\n",
|
||
" 1 occasionals 6 non-null int64\n",
|
||
"dtypes: int64(2)\n",
|
||
"memory usage: 144.0+ bytes\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Podstawową informacją o ramce danych to liczba przykładów w ramce danych. Możemy wykorzystać to tego funkcję `len`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6"
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Natomiast atrybut `shape` zwraca nam krotkę z liczbą przykładów i liczbą kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(6, 2)"
|
||
]
|
||
},
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacja arytmetyczne\n",
|
||
"\n",
|
||
" * `max`, `idxmax`\n",
|
||
" * `min`, `idxmin`\n",
|
||
" * `mean`\n",
|
||
" * `count`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"members 665172.833333\n",
|
||
"occasionals 152434.166667\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Zbiór wartości i zliczanie wartości:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[1 3 2]\n",
|
||
"3 4\n",
|
||
"1 3\n",
|
||
"2 3\n",
|
||
"Name: count, dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.unique())\n",
|
||
"\n",
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.value_counts())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Sprawdzanie czy brakuje danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 False\n",
|
||
"3 False\n",
|
||
"4 False\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 False\n",
|
||
"889 True\n",
|
||
"890 False\n",
|
||
"891 False\n",
|
||
"Name: Age, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"df.Age.isnull()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Dodawanie i modyfikowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 \n",
|
||
"Albania 17.9 76.8 1.76 \n",
|
||
"Algeria 29.5 75.5 2.73 \n",
|
||
"Angola 192.0 56.7 6.43 \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 "
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility continent \\\n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 Asia \n",
|
||
"Albania 17.9 76.8 1.76 Europe \n",
|
||
"Algeria 29.5 75.5 2.73 Africa \n",
|
||
"Angola 192.0 56.7 6.43 Africa \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 Americas \n",
|
||
"\n",
|
||
" tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 1 \n",
|
||
"Albania 1 \n",
|
||
"Algeria 1 \n",
|
||
"Angola 1 \n",
|
||
"Antigua and Barbuda 1 "
|
||
]
|
||
},
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"conts = pd.Series({\n",
|
||
" 'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n",
|
||
"\n",
|
||
"df['continent'] = conts\n",
|
||
"\n",
|
||
"df['tmp'] = 1\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>27.46523</td>\n",
|
||
" <td>27.50170</td>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>15.4</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" <td>2.24</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"Argentina 27.46523 27.50170 14646.0 40381860.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility continent \\\n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 Asia \n",
|
||
"Albania 17.9 76.8 1.76 Europe \n",
|
||
"Algeria 29.5 75.5 2.73 Africa \n",
|
||
"Angola 192.0 56.7 6.43 Africa \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 Americas \n",
|
||
"Argentina 15.4 75.4 2.24 NaN \n",
|
||
"\n",
|
||
" tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 1.0 \n",
|
||
"Albania 1.0 \n",
|
||
"Algeria 1.0 \n",
|
||
"Angola 1.0 \n",
|
||
"Antigua and Barbuda 1.0 \n",
|
||
"Argentina NaN "
|
||
]
|
||
},
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Argentina'] = {\n",
|
||
" 'female_BMI': 27.46523,\n",
|
||
" 'male_BMI': 27.5017,\n",
|
||
" 'gdp': 14646.0,\n",
|
||
" 'population': 40381860.0,\n",
|
||
" 'under5mortality': 15.4,\n",
|
||
" 'life_expectancy': 75.4,\n",
|
||
" 'fertility': 2.24\n",
|
||
"}\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" <th>continent</th>\n",
|
||
" <th>tmp</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" <td>Asia</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" <td>Europe</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" <td>Africa</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" <td>Americas</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>27.46523</td>\n",
|
||
" <td>27.50170</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>15.4</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" <td>2.24</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI population under5mortality \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 26528741.0 110.4 \n",
|
||
"Albania 25.65726 26.44657 2968026.0 17.9 \n",
|
||
"Algeria 26.36841 24.59620 34811059.0 29.5 \n",
|
||
"Angola 23.48431 22.25083 19842251.0 192.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 85350.0 10.9 \n",
|
||
"Argentina 27.46523 27.50170 40381860.0 15.4 \n",
|
||
"\n",
|
||
" life_expectancy fertility continent tmp \n",
|
||
"Country \n",
|
||
"Afghanistan 52.8 6.20 Asia 1.0 \n",
|
||
"Albania 76.8 1.76 Europe 1.0 \n",
|
||
"Algeria 75.5 2.73 Africa 1.0 \n",
|
||
"Angola 56.7 6.43 Africa 1.0 \n",
|
||
"Antigua and Barbuda 75.5 2.16 Americas 1.0 \n",
|
||
"Argentina 75.4 2.24 NaN NaN "
|
||
]
|
||
},
|
||
"execution_count": 48,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.drop('gdp', axis='columns')\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Filtrowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka pandas posiada 2 sposoby na filtrowanie danych zawartych w ramce danych:\n",
|
||
" * operator `[]` -- najbardziej rozpowszechniony;\n",
|
||
" * metoda `query()`.\n",
|
||
"Oba sposoby mają różną składnię.\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"5 Allen\\t Mr. William Henry male 35.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 49,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 0\n",
|
||
"2 1\n",
|
||
"3 1\n",
|
||
"4 1\n",
|
||
"5 0\n",
|
||
" ..\n",
|
||
"887 0\n",
|
||
"888 1\n",
|
||
"889 0\n",
|
||
"890 1\n",
|
||
"891 0\n",
|
||
"Name: Survived, Length: 891, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 True\n",
|
||
"3 True\n",
|
||
"4 True\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 True\n",
|
||
"889 False\n",
|
||
"890 True\n",
|
||
"891 False\n",
|
||
"Name: Survived, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived'] == 1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>873</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Carlsson\\t Mr. Frans Olof</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>33.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>695</td>\n",
|
||
" <td>5.0000</td>\n",
|
||
" <td>B51 B53 B55</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>890</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Behr\\t Mr. Karl Howell</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>111369</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>C148</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>216 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"... ... ... \n",
|
||
"872 1 1 \n",
|
||
"873 0 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"890 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"... ... ... ... \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"873 Carlsson\\t Mr. Frans Olof male 33.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"890 Behr\\t Mr. Karl Howell male 26.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"873 0 0 695 5.0000 B51 B53 B55 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"890 0 0 111369 30.0000 C148 C \n",
|
||
"\n",
|
||
"[216 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[df['Pclass'] == 1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operatory\n",
|
||
"\n",
|
||
"* `&` - koniukcja (i)\n",
|
||
"* `|` - alternatywa (lub)\n",
|
||
"* `~` - negacja (nie)\n",
|
||
"* `()` - jeżeli mamy kilka warunków to warto je uporządkować w nawiasy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>857</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wick\\t Mrs. George Dennick (Mary Hitchcock)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>45.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>36928</td>\n",
|
||
" <td>164.8667</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>863</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Swift\\t Mrs. Frederick Joel (Margaret Welles B...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>48.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17466</td>\n",
|
||
" <td>25.9292</td>\n",
|
||
" <td>D17</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>94 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"... ... ... \n",
|
||
"857 1 1 \n",
|
||
"863 1 1 \n",
|
||
"872 1 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"... ... ... ... \n",
|
||
"857 Wick\\t Mrs. George Dennick (Mary Hitchcock) female 45.0 \n",
|
||
"863 Swift\\t Mrs. Frederick Joel (Margaret Welles B... female 48.0 \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"857 1 1 36928 164.8667 NaN S \n",
|
||
"863 0 0 17466 25.9292 D17 S \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"\n",
|
||
"[94 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 53,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pierwsza_klasa = df['Pclass'] == 1\n",
|
||
"kobiety = df['Sex'] == 'female'\n",
|
||
"\n",
|
||
"df[pierwsza_klasa & kobiety]\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"df[df['SibSp'] > df['Parch']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### `pd.DataFrame.query`\n",
|
||
"\n",
|
||
"Innym sposobem na filtrowanie danych jest metoda `query`, która jako argument przyjmuje wyrażenie:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S "
|
||
]
|
||
},
|
||
"execution_count": 55,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('Pclass == 1').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C "
|
||
]
|
||
},
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('(Pclass == 1) and (Sex == \"female\")').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('SibSp > Parch')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(113, 11)"
|
||
]
|
||
},
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"young = 18\n",
|
||
"df.query('Age < @young').shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacje na wierszach i kolumnach"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" female_BMI male_BMI gdp population \\\n",
|
||
"Country \n",
|
||
"Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"Country \n",
|
||
"Afghanistan 110.4 52.8 6.20 \n",
|
||
"Albania 17.9 76.8 1.76 \n",
|
||
"Algeria 29.5 75.5 2.73 \n",
|
||
"Angola 192.0 56.7 6.43 \n",
|
||
"Antigua and Barbuda 10.9 75.5 2.16 "
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Iterowanie po ramce danych oznacza oznacza przejście po nazwach kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"female_BMI\n",
|
||
"male_BMI\n",
|
||
"gdp\n",
|
||
"population\n",
|
||
"under5mortality\n",
|
||
"life_expectancy\n",
|
||
"fertility\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for column_name in df:\n",
|
||
" print(column_name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"female_BMI Country\n",
|
||
"Afghanistan 21.07402\n",
|
||
"Albania 25.65726\n",
|
||
"Algeria 26.36841\n",
|
||
"Angola 23.48431\n",
|
||
"Antigua and Barbuda 27.50545\n",
|
||
"Name: female_BMI, dtype: float64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for col_name, series in df.items():\n",
|
||
" print(col_name, series)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 62,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Afghanistan \n",
|
||
" female_BMI 2.107402e+01\n",
|
||
"male_BMI 2.062058e+01\n",
|
||
"gdp 1.311000e+03\n",
|
||
"population 2.652874e+07\n",
|
||
"under5mortality 1.104000e+02\n",
|
||
"life_expectancy 5.280000e+01\n",
|
||
"fertility 6.200000e+00\n",
|
||
"Name: Afghanistan, dtype: float64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"for idx, row in df.iterrows():\n",
|
||
" print(idx, '\\n', row)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan normal\n",
|
||
"Albania overweight\n",
|
||
"Algeria normal\n",
|
||
"Angola normal\n",
|
||
"Antigua and Barbuda overweight\n",
|
||
"Name: male_BMI, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 63,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def bmi_level(bmi):\n",
|
||
" if bmi <= 18.5:\n",
|
||
" level = 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" level = 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" level = 'overweight'\n",
|
||
" else:\n",
|
||
" level = 'obese'\n",
|
||
" return level\n",
|
||
"\n",
|
||
"s = df['male_BMI'].map(bmi_level)\n",
|
||
" \n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan normal\n",
|
||
"Albania overweight\n",
|
||
"Algeria normal\n",
|
||
"Angola normal\n",
|
||
"Antigua and Barbuda overweight\n",
|
||
"dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def bmi_level(row_data):\n",
|
||
" bmi = row_data['male_BMI']\n",
|
||
" if bmi <= 18.5:\n",
|
||
" return 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" return 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" return 'overweight'\n",
|
||
" return 'obese'\n",
|
||
"\n",
|
||
"df.apply(bmi_level, axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th>Country</th>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <th>Albania</th>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <th>Angola</th>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <td>2.107402e+01</td>\n",
|
||
" <td>2.565726e+01</td>\n",
|
||
" <td>2.636841e+01</td>\n",
|
||
" <td>2.348431e+01</td>\n",
|
||
" <td>27.50545</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <td>2.062058e+01</td>\n",
|
||
" <td>2.644657e+01</td>\n",
|
||
" <td>2.459620e+01</td>\n",
|
||
" <td>2.225083e+01</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>gdp</th>\n",
|
||
" <td>1.311000e+03</td>\n",
|
||
" <td>8.644000e+03</td>\n",
|
||
" <td>1.231400e+04</td>\n",
|
||
" <td>7.103000e+03</td>\n",
|
||
" <td>25736.00000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>population</th>\n",
|
||
" <td>2.652874e+07</td>\n",
|
||
" <td>2.968026e+06</td>\n",
|
||
" <td>3.481106e+07</td>\n",
|
||
" <td>1.984225e+07</td>\n",
|
||
" <td>85350.00000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <td>1.104000e+02</td>\n",
|
||
" <td>1.790000e+01</td>\n",
|
||
" <td>2.950000e+01</td>\n",
|
||
" <td>1.920000e+02</td>\n",
|
||
" <td>10.90000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <td>5.280000e+01</td>\n",
|
||
" <td>7.680000e+01</td>\n",
|
||
" <td>7.550000e+01</td>\n",
|
||
" <td>5.670000e+01</td>\n",
|
||
" <td>75.50000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>fertility</th>\n",
|
||
" <td>6.200000e+00</td>\n",
|
||
" <td>1.760000e+00</td>\n",
|
||
" <td>2.730000e+00</td>\n",
|
||
" <td>6.430000e+00</td>\n",
|
||
" <td>2.16000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
"Country Afghanistan Albania Algeria Angola \\\n",
|
||
"female_BMI 2.107402e+01 2.565726e+01 2.636841e+01 2.348431e+01 \n",
|
||
"male_BMI 2.062058e+01 2.644657e+01 2.459620e+01 2.225083e+01 \n",
|
||
"gdp 1.311000e+03 8.644000e+03 1.231400e+04 7.103000e+03 \n",
|
||
"population 2.652874e+07 2.968026e+06 3.481106e+07 1.984225e+07 \n",
|
||
"under5mortality 1.104000e+02 1.790000e+01 2.950000e+01 1.920000e+02 \n",
|
||
"life_expectancy 5.280000e+01 7.680000e+01 7.550000e+01 5.670000e+01 \n",
|
||
"fertility 6.200000e+00 1.760000e+00 2.730000e+00 6.430000e+00 \n",
|
||
"\n",
|
||
"Country Antigua and Barbuda \n",
|
||
"female_BMI 27.50545 \n",
|
||
"male_BMI 25.76602 \n",
|
||
"gdp 25736.00000 \n",
|
||
"population 85350.00000 \n",
|
||
"under5mortality 10.90000 \n",
|
||
"life_expectancy 75.50000 \n",
|
||
"fertility 2.16000 "
|
||
]
|
||
},
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.transpose()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Grupowanie (`groupby`)\n",
|
||
"\n",
|
||
"Często zdarza się, gdy potrzebujemy podzielić dane ze względu na wartości w zadanej kolumnie, a następnie obliczenie zebranie danych w każdej z grup. Do tego służy metody `groupby`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Team</th>\n",
|
||
" <th>Number</th>\n",
|
||
" <th>Position</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>Height</th>\n",
|
||
" <th>Weight</th>\n",
|
||
" <th>College</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>202</th>\n",
|
||
" <td>Solomon Hill</td>\n",
|
||
" <td>Indiana Pacers</td>\n",
|
||
" <td>44.0</td>\n",
|
||
" <td>SF</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>6-7</td>\n",
|
||
" <td>225.0</td>\n",
|
||
" <td>Arizona</td>\n",
|
||
" <td>1358880.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>286</th>\n",
|
||
" <td>Tim Frazier</td>\n",
|
||
" <td>New Orleans Pelicans</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>6-1</td>\n",
|
||
" <td>170.0</td>\n",
|
||
" <td>Penn State</td>\n",
|
||
" <td>845059.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>210</th>\n",
|
||
" <td>Joe Young</td>\n",
|
||
" <td>Indiana Pacers</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>23.0</td>\n",
|
||
" <td>6-2</td>\n",
|
||
" <td>180.0</td>\n",
|
||
" <td>Oregon</td>\n",
|
||
" <td>1007026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>420</th>\n",
|
||
" <td>Nazr Mohammed</td>\n",
|
||
" <td>Oklahoma City Thunder</td>\n",
|
||
" <td>13.0</td>\n",
|
||
" <td>C</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>6-10</td>\n",
|
||
" <td>250.0</td>\n",
|
||
" <td>Kentucky</td>\n",
|
||
" <td>222888.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>258</th>\n",
|
||
" <td>Tony Allen</td>\n",
|
||
" <td>Memphis Grizzlies</td>\n",
|
||
" <td>9.0</td>\n",
|
||
" <td>SG</td>\n",
|
||
" <td>34.0</td>\n",
|
||
" <td>6-4</td>\n",
|
||
" <td>213.0</td>\n",
|
||
" <td>Oklahoma State</td>\n",
|
||
" <td>5158539.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Name Team Number Position Age Height \\\n",
|
||
"202 Solomon Hill Indiana Pacers 44.0 SF 25.0 6-7 \n",
|
||
"286 Tim Frazier New Orleans Pelicans 2.0 PG 25.0 6-1 \n",
|
||
"210 Joe Young Indiana Pacers 1.0 PG 23.0 6-2 \n",
|
||
"420 Nazr Mohammed Oklahoma City Thunder 13.0 C 38.0 6-10 \n",
|
||
"258 Tony Allen Memphis Grizzlies 9.0 SG 34.0 6-4 \n",
|
||
"\n",
|
||
" Weight College Salary \n",
|
||
"202 225.0 Arizona 1358880.0 \n",
|
||
"286 170.0 Penn State 845059.0 \n",
|
||
"210 180.0 Oregon 1007026.0 \n",
|
||
"420 250.0 Kentucky 222888.0 \n",
|
||
"258 213.0 Oklahoma State 5158539.0 "
|
||
]
|
||
},
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"df = pd.read_csv('./nba.csv')\n",
|
||
"\n",
|
||
"df.sample(5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"_Przykład_: chcemy obliczyć średnią wypłatę dla każdej z drużyn."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Team</th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Atlanta Hawks</th>\n",
|
||
" <td>4.860197e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Boston Celtics</th>\n",
|
||
" <td>4.181505e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Brooklyn Nets</th>\n",
|
||
" <td>3.501898e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Charlotte Hornets</th>\n",
|
||
" <td>5.222728e+06</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Chicago Bulls</th>\n",
|
||
" <td>5.785559e+06</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary\n",
|
||
"Team \n",
|
||
"Atlanta Hawks 4.860197e+06\n",
|
||
"Boston Celtics 4.181505e+06\n",
|
||
"Brooklyn Nets 3.501898e+06\n",
|
||
"Charlotte Hornets 5.222728e+06\n",
|
||
"Chicago Bulls 5.785559e+06"
|
||
]
|
||
},
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Team', 'Salary']].groupby('Team').mean().head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Możemy też podać listę nazw kolumn. Wtedy wartości zostaną obliczone dla każdej z wytworzonych grup:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Team Position\n",
|
||
"Atlanta Hawks C 7.585417e+06\n",
|
||
" PF 5.988067e+06\n",
|
||
" PG 4.881700e+06\n",
|
||
" SF 3.000000e+06\n",
|
||
" SG 2.607758e+06\n",
|
||
" ... \n",
|
||
"Washington Wizards C 8.163476e+06\n",
|
||
" PF 5.650000e+06\n",
|
||
" PG 9.011208e+06\n",
|
||
" SF 2.789700e+06\n",
|
||
" SG 2.839248e+06\n",
|
||
"Name: Salary, Length: 149, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.groupby(['Team', 'Position'])['Salary'].mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * `sum()`\n",
|
||
" * `min()`\n",
|
||
" * `max()`\n",
|
||
" * `mean()`\n",
|
||
" * `size()`\n",
|
||
" * `describe()`\n",
|
||
" * `first()`\n",
|
||
" * `last()`\n",
|
||
" * `count()`\n",
|
||
" * `std()`\n",
|
||
" * `var()`\n",
|
||
" * `sem()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead tr th {\n",
|
||
" text-align: left;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead tr:last-of-type th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr>\n",
|
||
" <th></th>\n",
|
||
" <th colspan=\"3\" halign=\"left\">Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th></th>\n",
|
||
" <th>mean</th>\n",
|
||
" <th>std</th>\n",
|
||
" <th>count</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Position</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>C</th>\n",
|
||
" <td>5.967052e+06</td>\n",
|
||
" <td>5.787989e+06</td>\n",
|
||
" <td>78</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PF</th>\n",
|
||
" <td>4.562483e+06</td>\n",
|
||
" <td>4.800054e+06</td>\n",
|
||
" <td>97</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PG</th>\n",
|
||
" <td>5.077829e+06</td>\n",
|
||
" <td>5.051809e+06</td>\n",
|
||
" <td>88</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SF</th>\n",
|
||
" <td>4.857393e+06</td>\n",
|
||
" <td>6.011889e+06</td>\n",
|
||
" <td>84</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SG</th>\n",
|
||
" <td>4.009861e+06</td>\n",
|
||
" <td>4.491609e+06</td>\n",
|
||
" <td>99</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary \n",
|
||
" mean std count\n",
|
||
"Position \n",
|
||
"C 5.967052e+06 5.787989e+06 78\n",
|
||
"PF 4.562483e+06 4.800054e+06 97\n",
|
||
"PG 5.077829e+06 5.051809e+06 88\n",
|
||
"SF 4.857393e+06 6.011889e+06 84\n",
|
||
"SG 4.009861e+06 4.491609e+06 99"
|
||
]
|
||
},
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Position</th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>C</th>\n",
|
||
" <td>22275967.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PF</th>\n",
|
||
" <td>22081286.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PG</th>\n",
|
||
" <td>21412973.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SF</th>\n",
|
||
" <td>24969112.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>SG</th>\n",
|
||
" <td>19944278.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary\n",
|
||
"Position \n",
|
||
"C 22275967.0\n",
|
||
"PF 22081286.0\n",
|
||
"PG 21412973.0\n",
|
||
"SF 24969112.0\n",
|
||
"SG 19944278.0"
|
||
]
|
||
},
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def group_range(x):\n",
|
||
" return x.max() - x.min()\n",
|
||
"\n",
|
||
"df[['Position', 'Salary']].groupby('Position').apply(group_range)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Liczba grup: 5\n",
|
||
"dict_keys(['C', 'PF', 'PG', 'SF', 'SG'])\n",
|
||
" Name Team Number Position Age Height Weight \\\n",
|
||
"7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 \n",
|
||
"10 Jared Sullinger Boston Celtics 7.0 C 24.0 6-9 260.0 \n",
|
||
"14 Tyler Zeller Boston Celtics 44.0 C 26.0 7-0 253.0 \n",
|
||
"23 Brook Lopez Brooklyn Nets 11.0 C 28.0 7-0 275.0 \n",
|
||
"27 Henry Sims Brooklyn Nets 14.0 C 26.0 6-10 248.0 \n",
|
||
"\n",
|
||
" College Salary \n",
|
||
"7 Gonzaga 2165160.0 \n",
|
||
"10 Ohio State 2569260.0 \n",
|
||
"14 North Carolina 2616975.0 \n",
|
||
"23 Stanford 19689000.0 \n",
|
||
"27 Georgetown 947276.0 \n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"gb = df.groupby(['Position'])\n",
|
||
"\n",
|
||
"print('Liczba grup:', gb.ngroups)\n",
|
||
"print(gb.groups.keys())\n",
|
||
"\n",
|
||
"print(gb.get_group('C').head())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 15.36\n",
|
||
"1 15.36\n",
|
||
"2 15.36\n",
|
||
"3 15.36\n",
|
||
"4 15.36\n",
|
||
" ... \n",
|
||
"453 15.36\n",
|
||
"454 15.36\n",
|
||
"455 17.92\n",
|
||
"456 17.92\n",
|
||
"457 <NA>\n",
|
||
"Name: Height, Length: 458, dtype: Float64"
|
||
]
|
||
},
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"df.Height.str.split('-').str[0].astype('Int64') * 2.56"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Pivot\n",
|
||
"Metoda `pivot` pozwala na stworzenie nowej ramki danych, gdzie indeks i nazwy kolumn są wartościami początkowej ranki danych. \n",
|
||
"\n",
|
||
"_Przykład_: zobaczmy na poniższą ramkę danych, która zawiera informacje o jakości tłumaczenia dla pary językowej hausa-angielski. Kolumna `system` zawiera nazwę systemu, kolumna `metric` - nazwę metryki, zaś kolumna `score`- wartość metryki. Chcemy przedstawić te dane w następujący sposób: jako klucz chcemy mieć nazwę systemu, zaś jako kolumny - metryki. Możemy wykorzystać do tego metodę `pivot`, gdzie musimy podać 3 argumenty:\n",
|
||
" * `index`: nazwę kolumny, na podstawie której zostanie stworzony indeks;\n",
|
||
" * `columns`: nazwa kolumny, które zawiera nazwy kolumn dla nowej ramki danych;\n",
|
||
" * `values`: nazwa kolumny, która zawiera interesujące nas dane."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>pair</th>\n",
|
||
" <th>system</th>\n",
|
||
" <th>id</th>\n",
|
||
" <th>is_constrained</th>\n",
|
||
" <th>metric</th>\n",
|
||
" <th>score</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1214</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1215</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1216</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1217</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>NiuTrans</td>\n",
|
||
" <td>382</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1218</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1219</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1220</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1221</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Facebook-AI</td>\n",
|
||
" <td>181</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1222</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1223</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1224</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1225</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TRANSSION</td>\n",
|
||
" <td>336</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1226</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1227</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1228</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1229</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>AMU</td>\n",
|
||
" <td>628</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1230</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1231</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1232</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1233</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>P3AI</td>\n",
|
||
" <td>715</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1234</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1235</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1236</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1237</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-B</td>\n",
|
||
" <td>1356</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1238</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1239</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1240</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1241</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>TWB</td>\n",
|
||
" <td>1335</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1242</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1243</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1244</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1245</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>ZMT</td>\n",
|
||
" <td>553</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1246</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1247</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1248</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1249</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Manifold</td>\n",
|
||
" <td>437</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1250</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1251</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1252</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1253</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>Online-Y</td>\n",
|
||
" <td>1374</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1254</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1255</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1256</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1257</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>HuaweiTSC</td>\n",
|
||
" <td>758</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1258</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1259</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1260</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1261</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>MS-EgDC</td>\n",
|
||
" <td>896</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1262</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1263</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1264</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1265</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>GTCOM</td>\n",
|
||
" <td>1298</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1266</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-all</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1267</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-all</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1268</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>bleu-A</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1269</th>\n",
|
||
" <td>ha-en</td>\n",
|
||
" <td>UEdin</td>\n",
|
||
" <td>1149</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>chrf-A</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" pair system id is_constrained metric score\n",
|
||
"1214 ha-en NiuTrans 382 True bleu-all 16.512243\n",
|
||
"1215 ha-en NiuTrans 382 True chrf-all 44.724766\n",
|
||
"1216 ha-en NiuTrans 382 True bleu-A 16.512243\n",
|
||
"1217 ha-en NiuTrans 382 True chrf-A 44.724766\n",
|
||
"1218 ha-en Facebook-AI 181 False bleu-all 20.982704\n",
|
||
"1219 ha-en Facebook-AI 181 False chrf-all 48.653770\n",
|
||
"1220 ha-en Facebook-AI 181 False bleu-A 20.982704\n",
|
||
"1221 ha-en Facebook-AI 181 False chrf-A 48.653770\n",
|
||
"1222 ha-en TRANSSION 336 False bleu-all 18.834851\n",
|
||
"1223 ha-en TRANSSION 336 False chrf-all 47.238279\n",
|
||
"1224 ha-en TRANSSION 336 False bleu-A 18.834851\n",
|
||
"1225 ha-en TRANSSION 336 False chrf-A 47.238279\n",
|
||
"1226 ha-en AMU 628 True bleu-all 14.132845\n",
|
||
"1227 ha-en AMU 628 True chrf-all 41.256570\n",
|
||
"1228 ha-en AMU 628 True bleu-A 14.132845\n",
|
||
"1229 ha-en AMU 628 True chrf-A 41.256570\n",
|
||
"1230 ha-en P3AI 715 True bleu-all 17.793617\n",
|
||
"1231 ha-en P3AI 715 True chrf-all 46.307402\n",
|
||
"1232 ha-en P3AI 715 True bleu-A 17.793617\n",
|
||
"1233 ha-en P3AI 715 True chrf-A 46.307402\n",
|
||
"1234 ha-en Online-B 1356 False bleu-all 18.655658\n",
|
||
"1235 ha-en Online-B 1356 False chrf-all 46.658216\n",
|
||
"1236 ha-en Online-B 1356 False bleu-A 18.655658\n",
|
||
"1237 ha-en Online-B 1356 False chrf-A 46.658216\n",
|
||
"1238 ha-en TWB 1335 False bleu-all 12.326443\n",
|
||
"1239 ha-en TWB 1335 False chrf-all 40.282629\n",
|
||
"1240 ha-en TWB 1335 False bleu-A 12.326443\n",
|
||
"1241 ha-en TWB 1335 False chrf-A 40.282629\n",
|
||
"1242 ha-en ZMT 553 False bleu-all 18.837023\n",
|
||
"1243 ha-en ZMT 553 False chrf-all 47.231474\n",
|
||
"1244 ha-en ZMT 553 False bleu-A 18.837023\n",
|
||
"1245 ha-en ZMT 553 False chrf-A 47.231474\n",
|
||
"1246 ha-en Manifold 437 True bleu-all 16.943915\n",
|
||
"1247 ha-en Manifold 437 True chrf-all 45.638356\n",
|
||
"1248 ha-en Manifold 437 True bleu-A 16.943915\n",
|
||
"1249 ha-en Manifold 437 True chrf-A 45.638356\n",
|
||
"1250 ha-en Online-Y 1374 False bleu-all 13.898531\n",
|
||
"1251 ha-en Online-Y 1374 False chrf-all 44.842874\n",
|
||
"1252 ha-en Online-Y 1374 False bleu-A 13.898531\n",
|
||
"1253 ha-en Online-Y 1374 False chrf-A 44.842874\n",
|
||
"1254 ha-en HuaweiTSC 758 True bleu-all 17.492440\n",
|
||
"1255 ha-en HuaweiTSC 758 True chrf-all 46.795737\n",
|
||
"1256 ha-en HuaweiTSC 758 True bleu-A 17.492440\n",
|
||
"1257 ha-en HuaweiTSC 758 True chrf-A 46.795737\n",
|
||
"1258 ha-en MS-EgDC 896 True bleu-all 17.133350\n",
|
||
"1259 ha-en MS-EgDC 896 True chrf-all 45.266274\n",
|
||
"1260 ha-en MS-EgDC 896 True bleu-A 17.133350\n",
|
||
"1261 ha-en MS-EgDC 896 True chrf-A 45.266274\n",
|
||
"1262 ha-en GTCOM 1298 False bleu-all 17.794272\n",
|
||
"1263 ha-en GTCOM 1298 False chrf-all 46.714831\n",
|
||
"1264 ha-en GTCOM 1298 False bleu-A 17.794272\n",
|
||
"1265 ha-en GTCOM 1298 False chrf-A 46.714831\n",
|
||
"1266 ha-en UEdin 1149 True bleu-all 14.887836\n",
|
||
"1267 ha-en UEdin 1149 True chrf-all 42.247415\n",
|
||
"1268 ha-en UEdin 1149 True bleu-A 14.887836\n",
|
||
"1269 ha-en UEdin 1149 True chrf-A 42.247415"
|
||
]
|
||
},
|
||
"execution_count": 73,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n",
|
||
"df = df[df.pair == 'ha-en']\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th>metric</th>\n",
|
||
" <th>bleu-A</th>\n",
|
||
" <th>bleu-all</th>\n",
|
||
" <th>chrf-A</th>\n",
|
||
" <th>chrf-all</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>system</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>AMU</th>\n",
|
||
" <td>14.132845</td>\n",
|
||
" <td>14.132845</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" <td>41.256570</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Facebook-AI</th>\n",
|
||
" <td>20.982704</td>\n",
|
||
" <td>20.982704</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" <td>48.653770</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>GTCOM</th>\n",
|
||
" <td>17.794272</td>\n",
|
||
" <td>17.794272</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" <td>46.714831</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>HuaweiTSC</th>\n",
|
||
" <td>17.492440</td>\n",
|
||
" <td>17.492440</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" <td>46.795737</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>MS-EgDC</th>\n",
|
||
" <td>17.133350</td>\n",
|
||
" <td>17.133350</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" <td>45.266274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Manifold</th>\n",
|
||
" <td>16.943915</td>\n",
|
||
" <td>16.943915</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" <td>45.638356</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>NiuTrans</th>\n",
|
||
" <td>16.512243</td>\n",
|
||
" <td>16.512243</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" <td>44.724766</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Online-B</th>\n",
|
||
" <td>18.655658</td>\n",
|
||
" <td>18.655658</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" <td>46.658216</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Online-Y</th>\n",
|
||
" <td>13.898531</td>\n",
|
||
" <td>13.898531</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" <td>44.842874</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>P3AI</th>\n",
|
||
" <td>17.793617</td>\n",
|
||
" <td>17.793617</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" <td>46.307402</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>TRANSSION</th>\n",
|
||
" <td>18.834851</td>\n",
|
||
" <td>18.834851</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" <td>47.238279</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>TWB</th>\n",
|
||
" <td>12.326443</td>\n",
|
||
" <td>12.326443</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" <td>40.282629</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>UEdin</th>\n",
|
||
" <td>14.887836</td>\n",
|
||
" <td>14.887836</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" <td>42.247415</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>ZMT</th>\n",
|
||
" <td>18.837023</td>\n",
|
||
" <td>18.837023</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" <td>47.231474</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
"metric bleu-A bleu-all chrf-A chrf-all\n",
|
||
"system \n",
|
||
"AMU 14.132845 14.132845 41.256570 41.256570\n",
|
||
"Facebook-AI 20.982704 20.982704 48.653770 48.653770\n",
|
||
"GTCOM 17.794272 17.794272 46.714831 46.714831\n",
|
||
"HuaweiTSC 17.492440 17.492440 46.795737 46.795737\n",
|
||
"MS-EgDC 17.133350 17.133350 45.266274 45.266274\n",
|
||
"Manifold 16.943915 16.943915 45.638356 45.638356\n",
|
||
"NiuTrans 16.512243 16.512243 44.724766 44.724766\n",
|
||
"Online-B 18.655658 18.655658 46.658216 46.658216\n",
|
||
"Online-Y 13.898531 13.898531 44.842874 44.842874\n",
|
||
"P3AI 17.793617 17.793617 46.307402 46.307402\n",
|
||
"TRANSSION 18.834851 18.834851 47.238279 47.238279\n",
|
||
"TWB 12.326443 12.326443 40.282629 40.282629\n",
|
||
"UEdin 14.887836 14.887836 42.247415 42.247415\n",
|
||
"ZMT 18.837023 18.837023 47.231474 47.231474"
|
||
]
|
||
},
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.pivot(index='system', columns='metric', values='score')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Dane tekstowe"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"`pandas` posiada udogodnienia do pracy z wartościami tekstowymi:\n",
|
||
" * dostęp następuje przez atrybut `str`;\n",
|
||
" * funkcje:\n",
|
||
" * formatujące: `lower()`, `upper()`;\n",
|
||
" * wyrażenia regularne: `contains()`, `match()`;\n",
|
||
" * inne: `split()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 75,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"5 Allen\\t Mr. William Henry male 35.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 75,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 BRAUND\\t MR. OWEN HARRIS\n",
|
||
"2 CUMINGS\\t MRS. JOHN BRADLEY (FLORENCE BRIGGS T...\n",
|
||
"3 HEIKKINEN\\t MISS. LAINA\n",
|
||
"4 FUTRELLE\\t MRS. JACQUES HEATH (LILY MAY PEEL)\n",
|
||
"5 ALLEN\\t MR. WILLIAM HENRY\n",
|
||
" ... \n",
|
||
"887 MONTVILA\\t REV. JUOZAS\n",
|
||
"888 GRAHAM\\t MISS. MARGARET EDITH\n",
|
||
"889 JOHNSTON\\t MISS. CATHERINE HELEN \"CARRIE\"\n",
|
||
"890 BEHR\\t MR. KARL HOWELL\n",
|
||
"891 DOOLEY\\t MR. PATRICK\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.upper()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 77,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"PassengerId\n",
|
||
"1 Braund\\t Mr. Owen Harris\n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T...\n",
|
||
"3 Heikkinen\\t Miss. Laina\n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Allen\\t Mr. William Henry\n",
|
||
"Name: Name, dtype: object\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 True\n",
|
||
"3 True\n",
|
||
"4 True\n",
|
||
"5 False\n",
|
||
"Name: Name, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 77,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.Name.head())\n",
|
||
"df.Name.str.contains('Miss|Mrs').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 78,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>0</th>\n",
|
||
" <th>1</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Braund</td>\n",
|
||
" <td>Mr. Owen Harris</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Cumings</td>\n",
|
||
" <td>Mrs. John Bradley (Florence Briggs Thayer)</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Heikkinen</td>\n",
|
||
" <td>Miss. Laina</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Futrelle</td>\n",
|
||
" <td>Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Allen</td>\n",
|
||
" <td>Mr. William Henry</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>887</th>\n",
|
||
" <td>Montvila</td>\n",
|
||
" <td>Rev. Juozas</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>Graham</td>\n",
|
||
" <td>Miss. Margaret Edith</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>889</th>\n",
|
||
" <td>Johnston</td>\n",
|
||
" <td>Miss. Catherine Helen \"Carrie\"</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>890</th>\n",
|
||
" <td>Behr</td>\n",
|
||
" <td>Mr. Karl Howell</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>891</th>\n",
|
||
" <td>Dooley</td>\n",
|
||
" <td>Mr. Patrick</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>891 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" 0 1\n",
|
||
"PassengerId \n",
|
||
"1 Braund Mr. Owen Harris\n",
|
||
"2 Cumings Mrs. John Bradley (Florence Briggs Thayer)\n",
|
||
"3 Heikkinen Miss. Laina\n",
|
||
"4 Futrelle Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Allen Mr. William Henry\n",
|
||
"... ... ...\n",
|
||
"887 Montvila Rev. Juozas\n",
|
||
"888 Graham Miss. Margaret Edith\n",
|
||
"889 Johnston Miss. Catherine Helen \"Carrie\"\n",
|
||
"890 Behr Mr. Karl Howell\n",
|
||
"891 Dooley Mr. Patrick\n",
|
||
"\n",
|
||
"[891 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 78,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t', expand=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 [Braund, Mr. Owen Harris]\n",
|
||
"2 [Cumings, Mrs. John Bradley (Florence Briggs ...\n",
|
||
"3 [Heikkinen, Miss. Laina]\n",
|
||
"4 [Futrelle, Mrs. Jacques Heath (Lily May Peel)]\n",
|
||
"5 [Allen, Mr. William Henry]\n",
|
||
" ... \n",
|
||
"887 [Montvila, Rev. Juozas]\n",
|
||
"888 [Graham, Miss. Margaret Edith]\n",
|
||
"889 [Johnston, Miss. Catherine Helen \"Carrie\"]\n",
|
||
"890 [Behr, Mr. Karl Howell]\n",
|
||
"891 [Dooley, Mr. Patrick]\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 80,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 Mr. Owen Harris\n",
|
||
"2 Mrs. John Bradley (Florence Briggs Thayer)\n",
|
||
"3 Miss. Laina\n",
|
||
"4 Mrs. Jacques Heath (Lily May Peel)\n",
|
||
"5 Mr. William Henry\n",
|
||
" ... \n",
|
||
"887 Rev. Juozas\n",
|
||
"888 Miss. Margaret Edith\n",
|
||
"889 Miss. Catherine Helen \"Carrie\"\n",
|
||
"890 Mr. Karl Howell\n",
|
||
"891 Mr. Patrick\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 80,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 81,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 Mr.\n",
|
||
"2 Mrs.\n",
|
||
"3 Miss.\n",
|
||
"4 Mrs.\n",
|
||
"5 Mr.\n",
|
||
" ... \n",
|
||
"887 Rev.\n",
|
||
"888 Miss.\n",
|
||
"889 Miss.\n",
|
||
"890 Mr.\n",
|
||
"891 Mr.\n",
|
||
"Name: Name, Length: 891, dtype: object"
|
||
]
|
||
},
|
||
"execution_count": 81,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"Zestaw `nba.csv` zawiera informaję o wysokości zawodników. Oblicz wzrost każdego z zawodników w systemie metrycznym przyjmując, że stop to `30.48` cm., a cal to `2.54` cm."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"celltoolbar": "Slideshow",
|
||
"interpreter": {
|
||
"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|