6557 lines
176 KiB
Plaintext
6557 lines
176 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# Analiza Danych w Pythonie: `pandas`\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### `pandas`\n",
|
||
"Biblioteka `pandas` jest podstawowym narzędziem w ekosystemie Pythona do analizy danych:\n",
|
||
" * dostarcza dwa podstawowe typy danych: \n",
|
||
" * `Series` (szereg, 1D)\n",
|
||
" * `DataFrame` (ramka danych, 2D)\n",
|
||
" * operacje na tych obiektach: obsługa brakujących wartości, łączenie danych;\n",
|
||
" * obsługuje dane różnego typu, np. szeregi czasowe;\n",
|
||
" * biblioteka bazuje na `numpy` -- bibliotece do obliczeń numerycznych;\n",
|
||
" * pozwala też na prostą wizualizację danych;\n",
|
||
" * ETL: extract, transform, load."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby zaimportowąc bibliotekę `pandas` wystarczy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### __Zadanie 0__: sprawdź, czy masz zainstalowaną bibliotekę `pandas`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Szeregi](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) (`pd.Series`)\n",
|
||
"\n",
|
||
" Szereg reprezentuje jednorodne dane jednowymiarowe - jest odpowiednikiem wektora w R.\n",
|
||
" * Szeregi możemy tworzyć na różne sposoby (więcej za chwilę), np. z obiektów tj. listy i słowniki.\n",
|
||
" * Dane muszą być jednorodne. W przeciwnym przypadku nastąpi automatyczna konwersja.\n",
|
||
" * Podczas tworzenia szeregu musimy podać jeden obowiązkowy argument `data` - dane.\n",
|
||
" * Ponadto możemy podać też indeks (`index`), typ danych (`dtype`) lub nazwę (`name`).\n",
|
||
" \n",
|
||
" \n",
|
||
" ```\n",
|
||
" class pandas.Series(data=None, index=None, dtype=None, name=None)\n",
|
||
" ```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenie szeregu mozemy podać dane w formacie listy lub słownika.\n",
|
||
"\n",
|
||
"Poniżej jest przykład przedstawiający tworzenie szeregu z danych, które są zawarte w liście:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 211819\n",
|
||
"1 682758\n",
|
||
"2 737011\n",
|
||
"3 779511\n",
|
||
"4 673790\n",
|
||
"5 673790\n",
|
||
"6 444177\n",
|
||
"7 136791\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511, 673790, 673790, 444177, 136791]\n",
|
||
"\n",
|
||
"s = pd.Series(data)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku, gdy dane pochodzą z listy i nie podaliśmy indeksu, pandas doda automatyczny indeks liczbowy zaczynający się od 0."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"W przypadku przekazania słownika jako danych do szeregu, pandas wykorzysta klucze do stworzenia indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podczas tworzenia szeregu możemy zdefiniować indeks, jak i nazwę szeregu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819.0\n",
|
||
"May 682758.0\n",
|
||
"June 737011.0\n",
|
||
"July 779511.0\n",
|
||
"Name: Rides, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"months = ['April', 'May', 'June', 'July']\n",
|
||
"\n",
|
||
"data = [211819, 682758, 737011, 779511]\n",
|
||
"\n",
|
||
"s = pd.Series(data=data, index=months, dtype=float, name='Rides')\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Odwołanie się do poszczególnego elementu odbywa się przy pomocy klucza z indeksu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"211819\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"print(s['April'])\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dodanie elementu do szeregu odbywa się poprzez definiowanie nowego klucza:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"April 211819\n",
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"August 673790\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = {'April': 211819,'May': 682758, 'June': 737011, 'July': 779511}\n",
|
||
"\n",
|
||
"s = pd.Series(members)\n",
|
||
"\n",
|
||
"s['August'] = 673790\n",
|
||
"\n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Więcej nt. indeksowania w szeregach w dalszej części kursu."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Podstawowa cechą szeregu jest wykonywanie operacji w sposób wektorowy. Działa to w następujący sposób:\n",
|
||
" * gdy w obu szeregach jest zawarty ten sam klucz, to są sumowane ich wartości;\n",
|
||
" * w przeciwnym przypadku wartość klucza w wynikowym szeregu to `pd.NaN`. \n",
|
||
" * Równoważnie możemy wykorzystać metodę `pandas.Series.add`. W tym przypadku możemy podać domyślną wartość w przypadku braku klucza."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"August 880599.0\n",
|
||
"July 973827.0\n",
|
||
"June 908505.0\n",
|
||
"May 830656.0\n",
|
||
"October NaN\n",
|
||
"September 814282.0\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'August': 673790, 'July': 779511,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492})\n",
|
||
"\n",
|
||
"all_data = members + occasionals\n",
|
||
"# Równoważnie\n",
|
||
"all_data = members.add(occasionals)\n",
|
||
"all_data"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy wykonać operacje arytmetyczne na szeregu: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {
|
||
"scrolled": true,
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 683758\n",
|
||
"June 738011\n",
|
||
"July 780511\n",
|
||
"August 674790\n",
|
||
"September 674790\n",
|
||
"October 445177\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"members += 1000\n",
|
||
"\n",
|
||
"members"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 683758\n",
|
||
"June 738011\n",
|
||
"July 780511\n",
|
||
"August 674790\n",
|
||
"September 674790\n",
|
||
"October 445177\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 10000000000683758\n",
|
||
"June 10000000000738011\n",
|
||
"July 10000000000780511\n",
|
||
"August 10000000000674790\n",
|
||
"September 10000000000674790\n",
|
||
"October 10000000000445177\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members + 10000000000000000"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Podsumowanie\n",
|
||
" * Szeregi działają podobnie do słowników, z tą różnicą, że wartości muszą być jednorodne (tego samego typu).\n",
|
||
" * Odwołanie do poszczególnych elementów odbywa się poprzez nawiasy `[]` i podanie klucza.\n",
|
||
" * W przeciwieństwie do słowników, możemy w prosty sposób wykonywać operacje arytmetyczne."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie 1\n",
|
||
" * Stwórz szereg `n`, który będzie zawierać liczby od 0 do 10 (włącznie).\n",
|
||
" * Stwórz szereg `n2`, który będzie zawierać kwadraty liczb od 0 do 10 (włącznie).\n",
|
||
" * Następnie stwórz szereg `trojkatne`, który będzie sumą powyższych szeregów podzieloną przez 2."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n= list(range(10+1))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n = pd.Series(n)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 0\n",
|
||
"1 1\n",
|
||
"2 2\n",
|
||
"3 3\n",
|
||
"4 4\n",
|
||
"5 5\n",
|
||
"6 6\n",
|
||
"7 7\n",
|
||
"8 8\n",
|
||
"9 9\n",
|
||
"10 10\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n2 = n**2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 0\n",
|
||
"1 1\n",
|
||
"2 4\n",
|
||
"3 9\n",
|
||
"4 16\n",
|
||
"5 25\n",
|
||
"6 36\n",
|
||
"7 49\n",
|
||
"8 64\n",
|
||
"9 81\n",
|
||
"10 100\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"n2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"trojkatne = ( n + n2 ) / 2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 0.0\n",
|
||
"1 1.0\n",
|
||
"2 3.0\n",
|
||
"3 6.0\n",
|
||
"4 10.0\n",
|
||
"5 15.0\n",
|
||
"6 21.0\n",
|
||
"7 28.0\n",
|
||
"8 36.0\n",
|
||
"9 45.0\n",
|
||
"10 55.0\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"trojkatne"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### [Ramka danych](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) (`pd.DataFrame`)\n",
|
||
"\n",
|
||
"Ramka danych jest podstawową strukturą danych w bibliotece `pandas`, która pozwala na trzymanie i reprezentowanie danych tabelarycznych (dwuwymiarowych).\n",
|
||
" * Posiada kolumny (cechy) i wiersze (obserwacje, przykłady).\n",
|
||
" * Możemy też patrzeć na nią jak na słownik, którego wartościami są szeregi.\n",
|
||
"\n",
|
||
"```\n",
|
||
"class pandas.DataFrame(data=None, index=None, columns=None, dtype=None)\n",
|
||
"```\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ramkę danych możemy stworzyć na różne sposoby.\n",
|
||
"\n",
|
||
"Pierwszy z nich (\"kolumnowy\") polega na zdefiniowaniu ramki poprzez podanie szeregów jako kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 682758\n",
|
||
"June 737011\n",
|
||
"July 779511\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 32,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"May 147898\n",
|
||
"June 171494\n",
|
||
"July 194316\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 33,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"occasionals"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511.0</td>\n",
|
||
" <td>194316.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011.0</td>\n",
|
||
" <td>171494.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>147898.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Maydfdsgfdg</th>\n",
|
||
" <td>682758.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"July 779511.0 194316.0\n",
|
||
"June 737011.0 171494.0\n",
|
||
"May NaN 147898.0\n",
|
||
"Maydfdsgfdg 682758.0 NaN"
|
||
]
|
||
},
|
||
"execution_count": 35,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'Maydfdsgfdg': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Drugim popularnym sposobem jest przekazanie listy słowników. Wtedy `pandas` zinterpretuje to jako listę przykładów:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"0 682758 147898\n",
|
||
"1 737011 171494\n",
|
||
"2 779511 194316"
|
||
]
|
||
},
|
||
"execution_count": 37,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"data = [\n",
|
||
" {'members': 682758, 'occasionals': 147898},\n",
|
||
" {'occasionals': 171494,'members': 737011},\n",
|
||
" {'members': 779511, 'occasionals': 194316},\n",
|
||
"]\n",
|
||
"\n",
|
||
"df = pd.DataFrame(data)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy też wykorzystać metodę `from_dict` ([doc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html)), która pozwala zdefiniować czy podane dane są w podane w postaci kolumnowej lub wierszowej:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"index\n",
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"\n",
|
||
"columns\n",
|
||
" May June July\n",
|
||
"members 682758 737011 779511\n",
|
||
"occasionals 147898 171494 194316\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"data = {\n",
|
||
" 'May': {'members': 682758, 'occasionals': 147898},\n",
|
||
" 'June': {'members': 737011, 'occasionals': 171494},\n",
|
||
" 'July': {'members': 779511, 'occasionals': 194316}\n",
|
||
"}\n",
|
||
"\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='index')\n",
|
||
"print('index\\n', df)\n",
|
||
"print()\n",
|
||
"df = pd.DataFrame.from_dict(data, orient='columns')\n",
|
||
"print('columns\\n', df)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Wczytywanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka `pandas` pozwala na wczytanie i zapis danych z różnych formatów:\n",
|
||
" * formaty tekstowe, np. `csv`, `json`\n",
|
||
" * pliki arkuszy kalkulacyjnych: Excel (xls, xlsx)\n",
|
||
" * bazy danych\n",
|
||
" * inne: `sas` `spss`\n",
|
||
"\n",
|
||
"\n",
|
||
"Efektem wczytania danych jest odpowiednio stworzona ramka danych (`DataFrame`)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Jednym z najprostszych formatów danych jest format `csv`, gdzie kolejne wartości są rozdzielone przecinkiem.\n",
|
||
"\n",
|
||
"Żeby wczytać dane w takim formacie należy użyć funkcji `pandas.read_csv`.\n",
|
||
"\n",
|
||
"Pandas pozwala na ustawienie wielu parametrów (np. separator, cudzysłowy). Więcej na ten temat w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Country</th>\n",
|
||
" <th>female_BMI</th>\n",
|
||
" <th>male_BMI</th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>under5mortality</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" <th>fertility</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Afghanistan</td>\n",
|
||
" <td>21.07402</td>\n",
|
||
" <td>20.62058</td>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>110.4</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" <td>6.20</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Albania</td>\n",
|
||
" <td>25.65726</td>\n",
|
||
" <td>26.44657</td>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>17.9</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" <td>1.76</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Algeria</td>\n",
|
||
" <td>26.36841</td>\n",
|
||
" <td>24.59620</td>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>29.5</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.73</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Angola</td>\n",
|
||
" <td>23.48431</td>\n",
|
||
" <td>22.25083</td>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>192.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" <td>6.43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Antigua and Barbuda</td>\n",
|
||
" <td>27.50545</td>\n",
|
||
" <td>25.76602</td>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>10.9</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" <td>2.16</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>170</th>\n",
|
||
" <td>Venezuela</td>\n",
|
||
" <td>28.13408</td>\n",
|
||
" <td>27.44500</td>\n",
|
||
" <td>17911.0</td>\n",
|
||
" <td>28116716.0</td>\n",
|
||
" <td>17.1</td>\n",
|
||
" <td>74.2</td>\n",
|
||
" <td>2.53</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>171</th>\n",
|
||
" <td>Vietnam</td>\n",
|
||
" <td>21.06500</td>\n",
|
||
" <td>20.91630</td>\n",
|
||
" <td>4085.0</td>\n",
|
||
" <td>86589342.0</td>\n",
|
||
" <td>26.2</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>1.86</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>172</th>\n",
|
||
" <td>Palestine</td>\n",
|
||
" <td>29.02643</td>\n",
|
||
" <td>26.57750</td>\n",
|
||
" <td>3564.0</td>\n",
|
||
" <td>3854667.0</td>\n",
|
||
" <td>24.7</td>\n",
|
||
" <td>74.1</td>\n",
|
||
" <td>4.38</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>173</th>\n",
|
||
" <td>Zambia</td>\n",
|
||
" <td>23.05436</td>\n",
|
||
" <td>20.68321</td>\n",
|
||
" <td>3039.0</td>\n",
|
||
" <td>13114579.0</td>\n",
|
||
" <td>94.9</td>\n",
|
||
" <td>51.1</td>\n",
|
||
" <td>5.88</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>174</th>\n",
|
||
" <td>Zimbabwe</td>\n",
|
||
" <td>24.64522</td>\n",
|
||
" <td>22.02660</td>\n",
|
||
" <td>1286.0</td>\n",
|
||
" <td>13495462.0</td>\n",
|
||
" <td>98.3</td>\n",
|
||
" <td>47.3</td>\n",
|
||
" <td>3.85</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>175 rows × 8 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Country female_BMI male_BMI gdp population \\\n",
|
||
"0 Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n",
|
||
"1 Albania 25.65726 26.44657 8644.0 2968026.0 \n",
|
||
"2 Algeria 26.36841 24.59620 12314.0 34811059.0 \n",
|
||
"3 Angola 23.48431 22.25083 7103.0 19842251.0 \n",
|
||
"4 Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n",
|
||
".. ... ... ... ... ... \n",
|
||
"170 Venezuela 28.13408 27.44500 17911.0 28116716.0 \n",
|
||
"171 Vietnam 21.06500 20.91630 4085.0 86589342.0 \n",
|
||
"172 Palestine 29.02643 26.57750 3564.0 3854667.0 \n",
|
||
"173 Zambia 23.05436 20.68321 3039.0 13114579.0 \n",
|
||
"174 Zimbabwe 24.64522 22.02660 1286.0 13495462.0 \n",
|
||
"\n",
|
||
" under5mortality life_expectancy fertility \n",
|
||
"0 110.4 52.8 6.20 \n",
|
||
"1 17.9 76.8 1.76 \n",
|
||
"2 29.5 75.5 2.73 \n",
|
||
"3 192.0 56.7 6.43 \n",
|
||
"4 10.9 75.5 2.16 \n",
|
||
".. ... ... ... \n",
|
||
"170 17.1 74.2 2.53 \n",
|
||
"171 26.2 74.1 1.86 \n",
|
||
"172 24.7 74.1 4.38 \n",
|
||
"173 94.9 51.1 5.88 \n",
|
||
"174 98.3 47.3 3.85 \n",
|
||
"\n",
|
||
"[175 rows x 8 columns]"
|
||
]
|
||
},
|
||
"execution_count": 39,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('gapminder.csv')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35 \n",
|
||
"5 Allen\\t Mr. William Henry male 35 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 40,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', delimiter='\\t', index_col=0, nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do wczytania danych z arkusza kalkulacyjnego służy funkcja `pandas.read_excel`. Do otworzenia pliku `xlsx` może być koniecnze ustawienie parametru: `engine='openpyxl`. Więcej opcji w [dokumentacji](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Innym ważnym źródłem informacji są bazy danych. Pandas potrafi komunikować się z bazą danych za pomocą biblioteki [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) i dostarcza odpowiedną funkcję:\n",
|
||
" * `pandas.read_sql` - wczytanie całej tabeli lub zapytania do bazy danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 42,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_sql('Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Title</th>\n",
|
||
" <th>ArtistId</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>AlbumId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>For Those About To Rock We Salute You</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Balls to the Wall</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Restless and Wild</td>\n",
|
||
" <td>2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Let There Be Rock</td>\n",
|
||
" <td>1</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>Big Ones</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>343</th>\n",
|
||
" <td>Respighi:Pines of Rome</td>\n",
|
||
" <td>226</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>344</th>\n",
|
||
" <td>Schubert: The Late String Quartets & String Qu...</td>\n",
|
||
" <td>272</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>345</th>\n",
|
||
" <td>Monteverdi: L'Orfeo</td>\n",
|
||
" <td>273</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>346</th>\n",
|
||
" <td>Mozart: Chamber Music</td>\n",
|
||
" <td>274</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>347</th>\n",
|
||
" <td>Koyaanisqatsi (Soundtrack from the Motion Pict...</td>\n",
|
||
" <td>275</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>347 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Title ArtistId\n",
|
||
"AlbumId \n",
|
||
"1 For Those About To Rock We Salute You 1\n",
|
||
"2 Balls to the Wall 2\n",
|
||
"3 Restless and Wild 2\n",
|
||
"4 Let There Be Rock 1\n",
|
||
"5 Big Ones 3\n",
|
||
"... ... ...\n",
|
||
"343 Respighi:Pines of Rome 226\n",
|
||
"344 Schubert: The Late String Quartets & String Qu... 272\n",
|
||
"345 Monteverdi: L'Orfeo 273\n",
|
||
"346 Mozart: Chamber Music 274\n",
|
||
"347 Koyaanisqatsi (Soundtrack from the Motion Pict... 275\n",
|
||
"\n",
|
||
"[347 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 43,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import sqlalchemy\n",
|
||
"\n",
|
||
"engine = sqlalchemy.create_engine('sqlite:///Chinook.sqlite', echo=True)\n",
|
||
"connection = engine.raw_connection()\n",
|
||
"\n",
|
||
"df = pd.read_sql('SELECT * FROM Album', con='sqlite:///Chinook.sqlite', index_col='AlbumId')\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podsumowanie\n",
|
||
"\n",
|
||
"\n",
|
||
" * Biblioteka `pandas` wspiera pobieranie danych z różnych formatów i źródeł.\n",
|
||
" * Każda funkcja ma listę argumentów, które pozwalają na ustawić poszczególne parametry (np. [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv))."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zapis i eksport danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Pandas pozwala w prosty sposób na zapisywanie ramki danych do pliku. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# zapis do formatu CSV\n",
|
||
"df.to_csv('tmp.csv')\n",
|
||
"# zapis do arkusza kalkulacyjnego \n",
|
||
"df.to_excel('tmp.xlsx')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Ponadto możemy przekonwertować ramkę danych do JSONa lub Pythonowego słownika:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{\"members\":{\"May\":682758,\"June\":737011,\"July\":779511},\"occasionals\":{\"May\":147898,\"June\":171494,\"July\":194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_json())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"{'members': {'May': 682758, 'June': 737011, 'July': 779511}, 'occasionals': {'May': 147898, 'June': 171494, 'July': 194316}}\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(df.to_dict())\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Lub przekopiować dane do schowka:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.to_clipboard()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * Przekonwertuj tabele `Customer` z bazy `Chinook.sqlite` do arkusza kalkulacyjnego. Plik wynikowy nazwij `customers.xlsx`.\n",
|
||
" * Tabela `Employee` zawiera informacje o pracownikach firmy Chinook. Wyswietl dane na ekranie i podaj miasta, w których mieszkają pracownicy.\n",
|
||
" * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Ramka danych - podstawy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Kolumny\n",
|
||
"\n",
|
||
"Na ramkę danych możemy patrzeć jak na swego rodzaju słownik, którego wartościami są szeregi. Pozwoli to na uzyskanie lepszej intuicji.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population life_expectancy\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=8, usecols=['Country', 'gdp', 'population','life_expectancy'])\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do poszczególnej kolumny możemy uzystać na dwa sposoby:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# notacja z kropką\n",
|
||
"df.population"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Country\n",
|
||
"Afghanistan 26528741.0\n",
|
||
"Albania 2968026.0\n",
|
||
"Algeria 34811059.0\n",
|
||
"Angola 19842251.0\n",
|
||
"Antigua and Barbuda 85350.0\n",
|
||
"Argentina 40381860.0\n",
|
||
"Armenia 2975029.0\n",
|
||
"Australia 21370348.0\n",
|
||
"Name: population, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 60,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Operator []\n",
|
||
"df['population']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Do operatora `[]` możemy też podać listę nazw kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0\n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Algeria 12314.0 34811059.0\n",
|
||
"Angola 7103.0 19842251.0\n",
|
||
"Antigua and Barbuda 25736.0 85350.0\n",
|
||
"Argentina 14646.0 40381860.0\n",
|
||
"Armenia 7383.0 2975029.0\n",
|
||
"Australia 41312.0 21370348.0"
|
||
]
|
||
},
|
||
"execution_count": 65,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['gdp','population']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Listę kolumn możemy pobrać za pomocą:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {
|
||
"scrolled": true,
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['gdp', 'population', 'life_expectancy'], dtype='object')"
|
||
]
|
||
},
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>gdp</th>\n",
|
||
" <th>population</th>\n",
|
||
" <th>life_expectancy</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" gdp population life_expectancy\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 67,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Afghanistan</th>\n",
|
||
" <td>1311.0</td>\n",
|
||
" <td>26528741.0</td>\n",
|
||
" <td>52.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Antigua and Barbuda</th>\n",
|
||
" <td>25736.0</td>\n",
|
||
" <td>85350.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Argentina</th>\n",
|
||
" <td>14646.0</td>\n",
|
||
" <td>40381860.0</td>\n",
|
||
" <td>75.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Armenia</th>\n",
|
||
" <td>7383.0</td>\n",
|
||
" <td>2975029.0</td>\n",
|
||
" <td>72.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Australia</th>\n",
|
||
" <td>41312.0</td>\n",
|
||
" <td>21370348.0</td>\n",
|
||
" <td>81.6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Afghanistan 1311.0 26528741.0 52.8\n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7\n",
|
||
"Antigua and Barbuda 25736.0 85350.0 75.5\n",
|
||
"Argentina 14646.0 40381860.0 75.4\n",
|
||
"Armenia 7383.0 2975029.0 72.3\n",
|
||
"Australia 41312.0 21370348.0 81.6"
|
||
]
|
||
},
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.columns = ['PKB', 'Populacja', 'ODŻ']\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do poszczególnych wierszy należy wykorzystać metodę `loc`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PKB 14646.0\n",
|
||
"Populacja 40381860.0\n",
|
||
"ODŻ 75.4\n",
|
||
"Name: Argentina, dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 69,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Argentina']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Metoda `loc` również może przyjąć listę wierszy: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 70,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc[['Albania', 'Angola']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Możemy również podać drugi parametr: nazwy kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0\n",
|
||
"Angola 7103.0 19842251.0"
|
||
]
|
||
},
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df2 = df.loc[['Albania', 'Angola'], ['PKB', 'Populacja']]\n",
|
||
"\n",
|
||
"df2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Albo wykorzystać tzw. _slicing_, cyzli operator `:`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>PKB</th>\n",
|
||
" <th>Populacja</th>\n",
|
||
" <th>ODŻ</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Country</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Albania</th>\n",
|
||
" <td>8644.0</td>\n",
|
||
" <td>2968026.0</td>\n",
|
||
" <td>76.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Algeria</th>\n",
|
||
" <td>12314.0</td>\n",
|
||
" <td>34811059.0</td>\n",
|
||
" <td>75.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Angola</th>\n",
|
||
" <td>7103.0</td>\n",
|
||
" <td>19842251.0</td>\n",
|
||
" <td>56.7</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" PKB Populacja ODŻ\n",
|
||
"Country \n",
|
||
"Albania 8644.0 2968026.0 76.8\n",
|
||
"Algeria 12314.0 34811059.0 75.5\n",
|
||
"Angola 7103.0 19842251.0 56.7"
|
||
]
|
||
},
|
||
"execution_count": 72,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.loc['Albania': 'Angola', 'PKB': 'ODŻ']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Żeby odwołać się do pojedyńczej wartości możemy użyć metody `at`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.at['Angola', 'PKB']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "notes"
|
||
}
|
||
},
|
||
"source": [
|
||
"Dostęp do indeksu:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Index(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Antigua and Barbuda',\n",
|
||
" 'Argentina', 'Armenia', 'Australia'],\n",
|
||
" dtype='object', name='Country')"
|
||
]
|
||
},
|
||
"execution_count": 73,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.index"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Podstawowe metody `pd.Series` i `pd.DataFrame`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 74,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511, 'August': 673790,\n",
|
||
"'September': 673790, 'October': 444177})\n",
|
||
"\n",
|
||
"occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316, 'August': 206809,\n",
|
||
"'September': 140492, 'October': 53596})\n",
|
||
"\n",
|
||
"df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `head` pozwala tworzy nową ramkę danych z pierwszymi 5 przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494"
|
||
]
|
||
},
|
||
"execution_count": 76,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.head(2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `tail` robi to samo, ale z 5 ostatnymi przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 77,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 77,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.tail()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `sample` pozwala na stworzenie nowej ramki danych z wylosowanymi `n` przykładami:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 78,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"September 673790 140492\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494"
|
||
]
|
||
},
|
||
"execution_count": 78,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.sample(3)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `describe` zwraca podstawowe statystyki m.in.: liczebność, średnią, wartości skrajne: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 80,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>count</th>\n",
|
||
" <td>6.000000</td>\n",
|
||
" <td>6.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>mean</th>\n",
|
||
" <td>665172.833333</td>\n",
|
||
" <td>152434.166667</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>std</th>\n",
|
||
" <td>116216.045456</td>\n",
|
||
" <td>54783.506738</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>min</th>\n",
|
||
" <td>444177.000000</td>\n",
|
||
" <td>53596.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25%</th>\n",
|
||
" <td>673790.000000</td>\n",
|
||
" <td>142343.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50%</th>\n",
|
||
" <td>678274.000000</td>\n",
|
||
" <td>159696.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>75%</th>\n",
|
||
" <td>723447.750000</td>\n",
|
||
" <td>188610.500000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>max</th>\n",
|
||
" <td>779511.000000</td>\n",
|
||
" <td>206809.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"count 6.000000 6.000000\n",
|
||
"mean 665172.833333 152434.166667\n",
|
||
"std 116216.045456 54783.506738\n",
|
||
"min 444177.000000 53596.000000\n",
|
||
"25% 673790.000000 142343.500000\n",
|
||
"50% 678274.000000 159696.000000\n",
|
||
"75% 723447.750000 188610.500000\n",
|
||
"max 779511.000000 206809.000000"
|
||
]
|
||
},
|
||
"execution_count": 80,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Metoda `info` zwraca informacje techniczne o kolumnach: np. typ danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 81,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
"Index: 6 entries, May to October\n",
|
||
"Data columns (total 2 columns):\n",
|
||
" # Column Non-Null Count Dtype\n",
|
||
"--- ------ -------------- -----\n",
|
||
" 0 members 6 non-null int64\n",
|
||
" 1 occasionals 6 non-null int64\n",
|
||
"dtypes: int64(2)\n",
|
||
"memory usage: 144.0+ bytes\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"df.info()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Podstawową informacją o ramce danych to liczba przykładów w ramce danych. Możemy wykorzystać to tego funkcję `len`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 82,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"6"
|
||
]
|
||
},
|
||
"execution_count": 82,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Natomiast atrybut `shape` zwraca nam krotkę z liczbą przykładów i liczbą kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 84,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(6, 2)"
|
||
]
|
||
},
|
||
"execution_count": 84,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacja arytmetyczne\n",
|
||
"\n",
|
||
" * `max`, `idxmax`\n",
|
||
" * `min`, `idxmin`\n",
|
||
" * `mean`\n",
|
||
" * `count`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 86,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>members</th>\n",
|
||
" <th>occasionals</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>May</th>\n",
|
||
" <td>682758</td>\n",
|
||
" <td>147898</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>June</th>\n",
|
||
" <td>737011</td>\n",
|
||
" <td>171494</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>July</th>\n",
|
||
" <td>779511</td>\n",
|
||
" <td>194316</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>August</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>206809</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>September</th>\n",
|
||
" <td>673790</td>\n",
|
||
" <td>140492</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>October</th>\n",
|
||
" <td>444177</td>\n",
|
||
" <td>53596</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" members occasionals\n",
|
||
"May 682758 147898\n",
|
||
"June 737011 171494\n",
|
||
"July 779511 194316\n",
|
||
"August 673790 206809\n",
|
||
"September 673790 140492\n",
|
||
"October 444177 53596"
|
||
]
|
||
},
|
||
"execution_count": 86,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Zbiór wartości i zliczanie wartości:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 90,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"[1 3 2]\n",
|
||
"3 4\n",
|
||
"1 3\n",
|
||
"2 3\n",
|
||
"Name: count, dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.unique())\n",
|
||
"\n",
|
||
"dane = pd.Series([1, 3, 2, 3, 1, 1, 2, 3, 2, 3])\n",
|
||
"\n",
|
||
"print(dane.value_counts())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 91,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 1\n",
|
||
"1 3\n",
|
||
"2 2\n",
|
||
"3 3\n",
|
||
"4 1\n",
|
||
"5 1\n",
|
||
"6 2\n",
|
||
"7 3\n",
|
||
"8 2\n",
|
||
"9 3\n",
|
||
"dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 91,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"dane"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 92,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"3 4\n",
|
||
"1 3\n",
|
||
"2 3\n",
|
||
"Name: count, dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"print(dane.value_counts())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Sprawdzanie czy brakuje danych:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 93,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 False\n",
|
||
"3 False\n",
|
||
"4 False\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 False\n",
|
||
"889 True\n",
|
||
"890 False\n",
|
||
"891 False\n",
|
||
"Name: Age, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 93,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"df.Age.isnull()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Dodawanie i modyfikowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"conts = pd.Series({\n",
|
||
" 'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n",
|
||
"\n",
|
||
"df['continent'] = conts\n",
|
||
"\n",
|
||
"df['tmp'] = 1\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.loc['Argentina'] = {\n",
|
||
" 'female_BMI': 27.46523,\n",
|
||
" 'male_BMI': 27.5017,\n",
|
||
" 'gdp': 14646.0,\n",
|
||
" 'population': 40381860.0,\n",
|
||
" 'under5mortality': 15.4,\n",
|
||
" 'life_expectancy': 75.4,\n",
|
||
" 'fertility': 2.24\n",
|
||
"}\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.drop('gdp', axis='columns')\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Filtrowanie danych"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Biblioteka pandas posiada 2 sposoby na filtrowanie danych zawartych w ramce danych:\n",
|
||
" * operator `[]` -- najbardziej rozpowszechniony;\n",
|
||
" * metoda `query()`.\n",
|
||
"Oba sposoby mają różną składnię.\n",
|
||
" "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 94,
|
||
"metadata": {
|
||
"scrolled": true,
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Heikkinen\\t Miss. Laina</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>STON/O2. 3101282</td>\n",
|
||
" <td>7.9250</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Allen\\t Mr. William Henry</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>373450</td>\n",
|
||
" <td>8.0500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"3 1 3 \n",
|
||
"4 1 1 \n",
|
||
"5 0 3 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"3 Heikkinen\\t Miss. Laina female 26.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"5 Allen\\t Mr. William Henry male 35.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"3 0 0 STON/O2. 3101282 7.9250 NaN S \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"5 0 0 373450 8.0500 NaN S "
|
||
]
|
||
},
|
||
"execution_count": 94,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 95,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 0\n",
|
||
"2 1\n",
|
||
"3 1\n",
|
||
"4 1\n",
|
||
"5 0\n",
|
||
" ..\n",
|
||
"887 0\n",
|
||
"888 1\n",
|
||
"889 0\n",
|
||
"890 1\n",
|
||
"891 0\n",
|
||
"Name: Survived, Length: 891, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 95,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df['Survived']"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 97,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"PassengerId\n",
|
||
"1 False\n",
|
||
"2 True\n",
|
||
"3 True\n",
|
||
"4 True\n",
|
||
"5 False\n",
|
||
" ... \n",
|
||
"887 False\n",
|
||
"888 True\n",
|
||
"889 False\n",
|
||
"890 True\n",
|
||
"891 False\n",
|
||
"Name: Survived, Length: 891, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 97,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df['Survived'] == 1"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 100,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df_survived = df[df['Pclass'] == 1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 101,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"891"
|
||
]
|
||
},
|
||
"execution_count": 101,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 102,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"216"
|
||
]
|
||
},
|
||
"execution_count": 102,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"len(df_survived)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 103,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>873</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Carlsson\\t Mr. Frans Olof</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>33.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>695</td>\n",
|
||
" <td>5.0000</td>\n",
|
||
" <td>B51 B53 B55</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>890</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Behr\\t Mr. Karl Howell</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>111369</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>C148</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>216 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"... ... ... \n",
|
||
"872 1 1 \n",
|
||
"873 0 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"890 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"... ... ... ... \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"873 Carlsson\\t Mr. Frans Olof male 33.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"890 Behr\\t Mr. Karl Howell male 26.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"873 0 0 695 5.0000 B51 B53 B55 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"890 0 0 111369 30.0000 C148 C \n",
|
||
"\n",
|
||
"[216 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 103,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df_survived"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operatory\n",
|
||
"\n",
|
||
"* `&` - koniukcja (i)\n",
|
||
"* `|` - alternatywa (lub)\n",
|
||
"* `~` - negacja (nie)\n",
|
||
"* `()` - jeżeli mamy kilka warunków to warto je uporządkować w nawiasy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 104,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>857</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Wick\\t Mrs. George Dennick (Mary Hitchcock)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>45.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>36928</td>\n",
|
||
" <td>164.8667</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>863</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Swift\\t Mrs. Frederick Joel (Margaret Welles B...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>48.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17466</td>\n",
|
||
" <td>25.9292</td>\n",
|
||
" <td>D17</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>872</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>47.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11751</td>\n",
|
||
" <td>52.5542</td>\n",
|
||
" <td>D35</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>880</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>56.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11767</td>\n",
|
||
" <td>83.1583</td>\n",
|
||
" <td>C50</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>888</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Graham\\t Miss. Margaret Edith</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>19.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>112053</td>\n",
|
||
" <td>30.0000</td>\n",
|
||
" <td>B42</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>94 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"... ... ... \n",
|
||
"857 1 1 \n",
|
||
"863 1 1 \n",
|
||
"872 1 1 \n",
|
||
"880 1 1 \n",
|
||
"888 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"... ... ... ... \n",
|
||
"857 Wick\\t Mrs. George Dennick (Mary Hitchcock) female 45.0 \n",
|
||
"863 Swift\\t Mrs. Frederick Joel (Margaret Welles B... female 48.0 \n",
|
||
"872 Beckwith\\t Mrs. Richard Leonard (Sallie Monypeny) female 47.0 \n",
|
||
"880 Potter\\t Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 \n",
|
||
"888 Graham\\t Miss. Margaret Edith female 19.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"857 1 1 36928 164.8667 NaN S \n",
|
||
"863 0 0 17466 25.9292 D17 S \n",
|
||
"872 1 1 11751 52.5542 D35 S \n",
|
||
"880 0 1 11767 83.1583 C50 C \n",
|
||
"888 0 0 112053 30.0000 B42 S \n",
|
||
"\n",
|
||
"[94 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 104,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pierwsza_klasa = df['Pclass'] == 1\n",
|
||
"kobiety = df['Sex'] == 'female'\n",
|
||
"\n",
|
||
"df[pierwsza_klasa & kobiety]\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 105,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 105,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[df['SibSp'] > df['Parch']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### `pd.DataFrame.query`\n",
|
||
"\n",
|
||
"Innym sposobem na filtrowanie danych jest metoda `query`, która jako argument przyjmuje wyrażenie:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 106,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>McCarthy\\t Mr. Timothy J</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>54.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>17463</td>\n",
|
||
" <td>51.8625</td>\n",
|
||
" <td>E46</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Sloper\\t Mr. William Thompson</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113788</td>\n",
|
||
" <td>35.5000</td>\n",
|
||
" <td>A6</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"7 0 1 \n",
|
||
"12 1 1 \n",
|
||
"24 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"7 McCarthy\\t Mr. Timothy J male 54.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"24 Sloper\\t Mr. William Thompson male 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"7 0 0 17463 51.8625 E46 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"24 0 0 113788 35.5000 A6 S "
|
||
]
|
||
},
|
||
"execution_count": 106,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('Pclass == 1').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 107,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Bonnell\\t Miss. Elizabeth</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>58.0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113783</td>\n",
|
||
" <td>26.5500</td>\n",
|
||
" <td>C103</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>32</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Spencer\\t Mrs. William Augustus (Marie Eugenie)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17569</td>\n",
|
||
" <td>146.5208</td>\n",
|
||
" <td>B78</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>53</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Harper\\t Mrs. Henry Sleeper (Myna Haxtun)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>49.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17572</td>\n",
|
||
" <td>76.7292</td>\n",
|
||
" <td>D33</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"12 1 1 \n",
|
||
"32 1 1 \n",
|
||
"53 1 1 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"12 Bonnell\\t Miss. Elizabeth female 58.0 \n",
|
||
"32 Spencer\\t Mrs. William Augustus (Marie Eugenie) female NaN \n",
|
||
"53 Harper\\t Mrs. Henry Sleeper (Myna Haxtun) female 49.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"12 0 0 113783 26.5500 C103 S \n",
|
||
"32 1 0 PC 17569 146.5208 B78 C \n",
|
||
"53 1 0 PC 17572 76.7292 D33 C "
|
||
]
|
||
},
|
||
"execution_count": 107,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('(Pclass == 1) and (Sex == \"female\")').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 108,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Survived</th>\n",
|
||
" <th>Pclass</th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Sex</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>SibSp</th>\n",
|
||
" <th>Parch</th>\n",
|
||
" <th>Ticket</th>\n",
|
||
" <th>Fare</th>\n",
|
||
" <th>Cabin</th>\n",
|
||
" <th>Embarked</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>PassengerId</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Braund\\t Mr. Owen Harris</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>A/5 21171</td>\n",
|
||
" <td>7.2500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Cumings\\t Mrs. John Bradley (Florence Briggs T...</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>38.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>PC 17599</td>\n",
|
||
" <td>71.2833</td>\n",
|
||
" <td>C85</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>Futrelle\\t Mrs. Jacques Heath (Lily May Peel)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>35.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>113803</td>\n",
|
||
" <td>53.1000</td>\n",
|
||
" <td>C123</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Palsson\\t Master. Gosta Leonard</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>349909</td>\n",
|
||
" <td>21.0750</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Nasser\\t Mrs. Nicholas (Adele Achem)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>14.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>237736</td>\n",
|
||
" <td>30.0708</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>861</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Hansen\\t Mr. Claus Peter</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>41.0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>350026</td>\n",
|
||
" <td>14.1083</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>862</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Giles\\t Mr. Frederick Edward</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>28134</td>\n",
|
||
" <td>11.5000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>864</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>Sage\\t Miss. Dorothy Edith \"Dolly\"</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>CA. 2343</td>\n",
|
||
" <td>69.5500</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>S</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>867</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Duran y More\\t Miss. Asuncion</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>SC/PARIS 2149</td>\n",
|
||
" <td>13.8583</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>875</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>Abelson\\t Mrs. Samuel (Hannah Wizosky)</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>P/PP 3381</td>\n",
|
||
" <td>24.0000</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>C</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>192 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Survived Pclass \\\n",
|
||
"PassengerId \n",
|
||
"1 0 3 \n",
|
||
"2 1 1 \n",
|
||
"4 1 1 \n",
|
||
"8 0 3 \n",
|
||
"10 1 2 \n",
|
||
"... ... ... \n",
|
||
"861 0 3 \n",
|
||
"862 0 2 \n",
|
||
"864 0 3 \n",
|
||
"867 1 2 \n",
|
||
"875 1 2 \n",
|
||
"\n",
|
||
" Name Sex Age \\\n",
|
||
"PassengerId \n",
|
||
"1 Braund\\t Mr. Owen Harris male 22.0 \n",
|
||
"2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n",
|
||
"4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n",
|
||
"8 Palsson\\t Master. Gosta Leonard male 2.0 \n",
|
||
"10 Nasser\\t Mrs. Nicholas (Adele Achem) female 14.0 \n",
|
||
"... ... ... ... \n",
|
||
"861 Hansen\\t Mr. Claus Peter male 41.0 \n",
|
||
"862 Giles\\t Mr. Frederick Edward male 21.0 \n",
|
||
"864 Sage\\t Miss. Dorothy Edith \"Dolly\" female NaN \n",
|
||
"867 Duran y More\\t Miss. Asuncion female 27.0 \n",
|
||
"875 Abelson\\t Mrs. Samuel (Hannah Wizosky) female 28.0 \n",
|
||
"\n",
|
||
" SibSp Parch Ticket Fare Cabin Embarked \n",
|
||
"PassengerId \n",
|
||
"1 1 0 A/5 21171 7.2500 NaN S \n",
|
||
"2 1 0 PC 17599 71.2833 C85 C \n",
|
||
"4 1 0 113803 53.1000 C123 S \n",
|
||
"8 3 1 349909 21.0750 NaN S \n",
|
||
"10 1 0 237736 30.0708 NaN C \n",
|
||
"... ... ... ... ... ... ... \n",
|
||
"861 2 0 350026 14.1083 NaN S \n",
|
||
"862 1 0 28134 11.5000 NaN S \n",
|
||
"864 8 2 CA. 2343 69.5500 NaN S \n",
|
||
"867 1 0 SC/PARIS 2149 13.8583 NaN C \n",
|
||
"875 1 0 P/PP 3381 24.0000 NaN C \n",
|
||
"\n",
|
||
"[192 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 108,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df.query('SibSp > Parch')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 109,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"(113, 11)"
|
||
]
|
||
},
|
||
"execution_count": 109,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"young = 18\n",
|
||
"df.query('Age < @young').shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"#### Operacje na wierszach i kolumnach"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n",
|
||
"\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Iterowanie po ramce danych oznacza oznacza przejście po nazwach kolumn:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"for column_name in df:\n",
|
||
" print(column_name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for col_name, series in df.items():\n",
|
||
" print(col_name, series)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"for idx, row in df.iterrows():\n",
|
||
" print(idx, '\\n', row)\n",
|
||
" break"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def bmi_level(bmi):\n",
|
||
" if bmi <= 18.5:\n",
|
||
" level = 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" level = 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" level = 'overweight'\n",
|
||
" else:\n",
|
||
" level = 'obese'\n",
|
||
" return level\n",
|
||
"\n",
|
||
"s = df['male_BMI'].map(bmi_level)\n",
|
||
" \n",
|
||
"s"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def bmi_level(row_data):\n",
|
||
" bmi = row_data['male_BMI']\n",
|
||
" if bmi <= 18.5:\n",
|
||
" return 'underweight'\n",
|
||
" elif bmi < 25:\n",
|
||
" return 'normal'\n",
|
||
" elif bmi < 30:\n",
|
||
" return 'overweight'\n",
|
||
" return 'obese'\n",
|
||
"\n",
|
||
"df.apply(bmi_level, axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.transpose()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Grupowanie (`groupby`)\n",
|
||
"\n",
|
||
"Często zdarza się, gdy potrzebujemy podzielić dane ze względu na wartości w zadanej kolumnie, a następnie obliczenie zebranie danych w każdej z grup. Do tego służy metody `groupby`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 117,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"df = pd.read_csv('./nba.csv')\n",
|
||
"\n",
|
||
"#df.sample(5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 118,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>Team</th>\n",
|
||
" <th>Number</th>\n",
|
||
" <th>Position</th>\n",
|
||
" <th>Age</th>\n",
|
||
" <th>Height</th>\n",
|
||
" <th>Weight</th>\n",
|
||
" <th>College</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Avery Bradley</td>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>6-2</td>\n",
|
||
" <td>180.0</td>\n",
|
||
" <td>Texas</td>\n",
|
||
" <td>7730337.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Jae Crowder</td>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>99.0</td>\n",
|
||
" <td>SF</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>6-6</td>\n",
|
||
" <td>235.0</td>\n",
|
||
" <td>Marquette</td>\n",
|
||
" <td>6796117.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>John Holland</td>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>30.0</td>\n",
|
||
" <td>SG</td>\n",
|
||
" <td>27.0</td>\n",
|
||
" <td>6-5</td>\n",
|
||
" <td>205.0</td>\n",
|
||
" <td>Boston University</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>R.J. Hunter</td>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>28.0</td>\n",
|
||
" <td>SG</td>\n",
|
||
" <td>22.0</td>\n",
|
||
" <td>6-5</td>\n",
|
||
" <td>185.0</td>\n",
|
||
" <td>Georgia State</td>\n",
|
||
" <td>1148640.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Jonas Jerebko</td>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>8.0</td>\n",
|
||
" <td>PF</td>\n",
|
||
" <td>29.0</td>\n",
|
||
" <td>6-10</td>\n",
|
||
" <td>231.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>5000000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>453</th>\n",
|
||
" <td>Shelvin Mack</td>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>8.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>6-3</td>\n",
|
||
" <td>203.0</td>\n",
|
||
" <td>Butler</td>\n",
|
||
" <td>2433333.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>454</th>\n",
|
||
" <td>Raul Neto</td>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>25.0</td>\n",
|
||
" <td>PG</td>\n",
|
||
" <td>24.0</td>\n",
|
||
" <td>6-1</td>\n",
|
||
" <td>179.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>900000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>455</th>\n",
|
||
" <td>Tibor Pleiss</td>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>21.0</td>\n",
|
||
" <td>C</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>7-3</td>\n",
|
||
" <td>256.0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2900000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>456</th>\n",
|
||
" <td>Jeff Withey</td>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>24.0</td>\n",
|
||
" <td>C</td>\n",
|
||
" <td>26.0</td>\n",
|
||
" <td>7-0</td>\n",
|
||
" <td>231.0</td>\n",
|
||
" <td>Kansas</td>\n",
|
||
" <td>947276.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>457</th>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>458 rows × 9 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Name Team Number Position Age Height Weight \\\n",
|
||
"0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 \n",
|
||
"1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 \n",
|
||
"2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 \n",
|
||
"3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 \n",
|
||
"4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 \n",
|
||
".. ... ... ... ... ... ... ... \n",
|
||
"453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 \n",
|
||
"454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 \n",
|
||
"455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 \n",
|
||
"456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 \n",
|
||
"457 NaN NaN NaN NaN NaN NaN NaN \n",
|
||
"\n",
|
||
" College Salary \n",
|
||
"0 Texas 7730337.0 \n",
|
||
"1 Marquette 6796117.0 \n",
|
||
"2 Boston University NaN \n",
|
||
"3 Georgia State 1148640.0 \n",
|
||
"4 NaN 5000000.0 \n",
|
||
".. ... ... \n",
|
||
"453 Butler 2433333.0 \n",
|
||
"454 NaN 900000.0 \n",
|
||
"455 NaN 2900000.0 \n",
|
||
"456 Kansas 947276.0 \n",
|
||
"457 NaN NaN \n",
|
||
"\n",
|
||
"[458 rows x 9 columns]"
|
||
]
|
||
},
|
||
"execution_count": 118,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"_Przykład_: chcemy obliczyć średnią wypłatę dla każdej z drużyn."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 119,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Team</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>7730337.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>6796117.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>1148640.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Boston Celtics</td>\n",
|
||
" <td>5000000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>453</th>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>2433333.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>454</th>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>900000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>455</th>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>2900000.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>456</th>\n",
|
||
" <td>Utah Jazz</td>\n",
|
||
" <td>947276.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>457</th>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>458 rows × 2 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Team Salary\n",
|
||
"0 Boston Celtics 7730337.0\n",
|
||
"1 Boston Celtics 6796117.0\n",
|
||
"2 Boston Celtics NaN\n",
|
||
"3 Boston Celtics 1148640.0\n",
|
||
"4 Boston Celtics 5000000.0\n",
|
||
".. ... ...\n",
|
||
"453 Utah Jazz 2433333.0\n",
|
||
"454 Utah Jazz 900000.0\n",
|
||
"455 Utah Jazz 2900000.0\n",
|
||
"456 Utah Jazz 947276.0\n",
|
||
"457 NaN NaN\n",
|
||
"\n",
|
||
"[458 rows x 2 columns]"
|
||
]
|
||
},
|
||
"execution_count": 119,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Team', 'Salary']]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 120,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Salary</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Team</th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>Atlanta Hawks</th>\n",
|
||
" <td>2854940.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Boston Celtics</th>\n",
|
||
" <td>3021242.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Brooklyn Nets</th>\n",
|
||
" <td>1335480.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Charlotte Hornets</th>\n",
|
||
" <td>4204200.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Chicago Bulls</th>\n",
|
||
" <td>2380440.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" Salary\n",
|
||
"Team \n",
|
||
"Atlanta Hawks 2854940.0\n",
|
||
"Boston Celtics 3021242.5\n",
|
||
"Brooklyn Nets 1335480.0\n",
|
||
"Charlotte Hornets 4204200.0\n",
|
||
"Chicago Bulls 2380440.0"
|
||
]
|
||
},
|
||
"execution_count": 120,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"df[['Team', 'Salary']].groupby('Team').median().h"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Możemy też podać listę nazw kolumn. Wtedy wartości zostaną obliczone dla każdej z wytworzonych grup:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.groupby(['Team', 'Position'])['Salary'].mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
" * `sum()`\n",
|
||
" * `min()`\n",
|
||
" * `max()`\n",
|
||
" * `mean()`\n",
|
||
" * `size()`\n",
|
||
" * `describe()`\n",
|
||
" * `first()`\n",
|
||
" * `last()`\n",
|
||
" * `count()`\n",
|
||
" * `std()`\n",
|
||
" * `var()`\n",
|
||
" * `sem()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def group_range(x):\n",
|
||
" return x.max() - x.min()\n",
|
||
"\n",
|
||
"df[['Position', 'Salary']].groupby('Position').apply(group_range)\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"gb = df.groupby(['Position'])\n",
|
||
"\n",
|
||
"print('Liczba grup:', gb.ngroups)\n",
|
||
"print(gb.groups.keys())\n",
|
||
"\n",
|
||
"print(gb.get_group('C').head())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"df.Height.str.split('-').str[0].astype('Int64') * 2.56"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Pivot\n",
|
||
"Metoda `pivot` pozwala na stworzenie nowej ramki danych, gdzie indeks i nazwy kolumn są wartościami początkowej ranki danych. \n",
|
||
"\n",
|
||
"_Przykład_: zobaczmy na poniższą ramkę danych, która zawiera informacje o jakości tłumaczenia dla pary językowej hausa-angielski. Kolumna `system` zawiera nazwę systemu, kolumna `metric` - nazwę metryki, zaś kolumna `score`- wartość metryki. Chcemy przedstawić te dane w następujący sposób: jako klucz chcemy mieć nazwę systemu, zaś jako kolumny - metryki. Możemy wykorzystać do tego metodę `pivot`, gdzie musimy podać 3 argumenty:\n",
|
||
" * `index`: nazwę kolumny, na podstawie której zostanie stworzony indeks;\n",
|
||
" * `columns`: nazwa kolumny, które zawiera nazwy kolumn dla nowej ramki danych;\n",
|
||
" * `values`: nazwa kolumny, która zawiera interesujące nas dane."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n",
|
||
"df = df[df.pair == 'ha-en']\n",
|
||
"df"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.pivot(index='system', columns='metric', values='score')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Dane tekstowe"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"`pandas` posiada udogodnienia do pracy z wartościami tekstowymi:\n",
|
||
" * dostęp następuje przez atrybut `str`;\n",
|
||
" * funkcje:\n",
|
||
" * formatujące: `lower()`, `upper()`;\n",
|
||
" * wyrażenia regularne: `contains()`, `match()`;\n",
|
||
" * inne: `split()`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n",
|
||
"\n",
|
||
"df.head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.Name.str.upper()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(df.Name.head())\n",
|
||
"df.Name.str.contains('Miss|Mrs').head()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.Name.str.split('\\t', expand=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.Name.str.split('\\t')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Zadanie\n",
|
||
"Zestaw `nba.csv` zawiera informaję o wysokości zawodników. Oblicz wzrost każdego z zawodników w systemie metrycznym przyjmując, że stop to `30.48` cm., a cal to `2.54` cm."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"celltoolbar": "Slideshow",
|
||
"interpreter": {
|
||
"hash": "d4d1e4263499bec80672ea0156c357c1ee493ec2b1c70f0acce89fc37c4a6abe"
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.7"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|