diff --git a/README.md b/README.md index 4858705..54783e2 100644 --- a/README.md +++ b/README.md @@ -30,13 +30,14 @@ Do nauki można wykorzystać wiele tutoriali internetowych python (w wersji pyth - Zajęcia 2 - Wprowadzenie do python 2/2 - Zajęcia 3 - pandas - Zajęcia 4 - numpy -- Zajęcia 5 - scikit-learn -- Zajęcia 6 - przetwarzanie tekstu w python -- Zajęcia 7 - przetwarzanie obrazów w python -- Zajęcia 8 - zajęcia z analizy wizualizacji danych -- Zajęcia 9 - zajęcia z analizy wizualizacji danych +- Zajęcia 5 - scikit-learn 1 +- Zajęcia 6 - scikit-learn 2 +- Zajęcia 7 - przetwarzanie tekstu w python +- Zajęcia 8 - przetwarzanie obrazów w python +- Zajęcia 9 - zajęcia z analizy wizualizacji danych - Zajęcia 10 - zajęcia z analizy wizualizacji danych -- Zajęcia 11 - Zaliczenie +- Zajęcia 11 - zajęcia z analizy wizualizacji danych +- Zaliczenie - Zaliczenie przedmiotu 8 luty 14:30-16.45 ## Zaliczenie przedmiotu diff --git a/zajecia1/2_podstawy.ipynb b/zajecia1/2_podstawy.ipynb index 9aa02b8..23f663a 100644 --- a/zajecia1/2_podstawy.ipynb +++ b/zajecia1/2_podstawy.ipynb @@ -15,7 +15,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 18, "metadata": { "slideshow": { "slide_type": "slide" @@ -73,7 +73,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 19, "metadata": { "slideshow": { "slide_type": "slide" @@ -87,6 +87,52 @@ "email = user + '@' + mail_domain" ] }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "jakub.pokrywka@amu.edu.pl\n" + ] + } + ], + "source": [ + "print(email)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "x = 'a'" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, { "cell_type": "markdown", "metadata": { @@ -199,7 +245,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 25, "metadata": { "slideshow": { "slide_type": "slide" @@ -213,7 +259,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" @@ -257,7 +303,67 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "15 // 4" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a15 % 4" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1000000" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "100**3" + ] + }, + { + "cell_type": "code", + "execution_count": 33, "metadata": { "slideshow": { "slide_type": "slide" @@ -268,12 +374,12 @@ "name": "stdout", "output_type": "stream", "text": [ - "Dziś upłynęło 36840 sekund.\n" + "Dziś upłynęło 15240 sekund.\n" ] } ], "source": [ - "hour = 10\n", + "hour = 4\n", "minutes = 14\n", "seconds = ((60 * 60 * hour) + 60 * minutes)\n", "print(\"Dziś upłynęło\", seconds, \"sekund.\")" @@ -281,7 +387,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 34, "metadata": { "slideshow": { "slide_type": "slide" @@ -311,6 +417,73 @@ "### Operacje na zmiennej\n" ] }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "x = 5" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "x = x + 5" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "x += 5" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "15" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, { "cell_type": "code", "execution_count": 11, @@ -358,7 +531,130 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "3 " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "'3'" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a'" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a'" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "3 + int('1')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "-3" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "int(-3.9)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "float" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(float(-3))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 47, "metadata": { "slideshow": { "slide_type": "fragment" @@ -386,7 +682,88 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.3333333333333333" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "4/3" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6.8999999999999995" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "3 * 2.3" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "3 * int(2.3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 55, "metadata": { "slideshow": { "slide_type": "slide" @@ -409,6 +786,53 @@ "print(int('42') / 6)" ] }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('a')" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "str" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type('aaaa')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": { @@ -462,7 +886,94 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "True" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "False" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "True and False" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 69, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "True or False" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 77, "metadata": { "slideshow": { "slide_type": "slide" @@ -489,6 +1000,89 @@ "print(not False)" ] }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bool(0.0)" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bool(5.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": {}, + "outputs": [], + "source": [ + "a =3 " + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "4 >= 4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": { @@ -508,7 +1102,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 83, "metadata": { "slideshow": { "slide_type": "slide" @@ -535,6 +1129,23 @@ "print(0 != 0.0)" ] }, + { + "cell_type": "code", + "execution_count": 86, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False\n" + ] + } + ], + "source": [ + "print(1 != 1.0)" + ] + }, { "cell_type": "markdown", "metadata": { @@ -588,7 +1199,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 88, "metadata": { "slideshow": { "slide_type": "slide" @@ -625,6 +1236,238 @@ "### Czas na pierwsze zadanie (1a i 1b)." ] }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'\\n * stwórz zmienną o nazwie `pi` i o wartości 3.14.\\n * stwórz zmienną o nazwie `promien` i o wartości 12.\\n * oblicz pole koła i przypisz wynik do zmniennej `pole`. P = pi * r ** 2\\n * wyświetl wynik na ekran.\\n'" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "\"\"\"\n", + " * stwórz zmienną o nazwie `pi` i o wartości 3.14.\n", + " * stwórz zmienną o nazwie `promien` i o wartości 12.\n", + " * oblicz pole koła i przypisz wynik do zmniennej `pole`. P = pi * r ** 2\n", + " * wyświetl wynik na ekran.\n", + "\"\"\" \n" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "metadata": {}, + "outputs": [], + "source": [ + "pi = 3.14" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": {}, + "outputs": [], + "source": [ + "promien=12" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "metadata": {}, + "outputs": [], + "source": [ + "P = pi * promien **2" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "452.16\n" + ] + } + ], + "source": [ + "print(P)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 139, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "Zamień typ zmiennych `a`, `b` i `c` na typy liczbowe (int lub float) i oblicz ich sumę.\n", + "Wynik zapisz do zmiennej `wynik` i wyświetl go na ekranie\n", + "\"\"\" \n", + "\n", + "# zmienne do zadania\n", + "a = \"12\"\n", + "b = \"35.5\"\n", + "c = True" + ] + }, + { + "cell_type": "code", + "execution_count": 147, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "float" + ] + }, + "execution_count": 147, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(4.2)" + ] + }, + { + "cell_type": "code", + "execution_count": 150, + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "could not convert string to float: '4.2a'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[150], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mfloat\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m4.2a\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n", + "\u001b[0;31mValueError\u001b[0m: could not convert string to float: '4.2a'" + ] + } + ], + "source": [ + "type(\"4.2\")" + ] + }, + { + "cell_type": "code", + "execution_count": 151, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 151, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "4.2 == \"4.2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 140, + "metadata": {}, + "outputs": [], + "source": [ + "a = float(a)" + ] + }, + { + "cell_type": "code", + "execution_count": 141, + "metadata": {}, + "outputs": [], + "source": [ + "b = float(b)" + ] + }, + { + "cell_type": "code", + "execution_count": 142, + "metadata": {}, + "outputs": [], + "source": [ + "c = float(c)" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "metadata": {}, + "outputs": [], + "source": [ + "wynik = a + b + c" + ] + }, + { + "cell_type": "code", + "execution_count": 144, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "48.5" + ] + }, + "execution_count": 144, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "wynik" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": { @@ -641,7 +1484,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 153, "metadata": { "slideshow": { "slide_type": "fragment" @@ -698,7 +1541,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 154, "metadata": { "slideshow": { "slide_type": "slide" @@ -714,7 +1557,47 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 156, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'2'" + ] + }, + "execution_count": 156, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "str(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 160, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'sfdsf\"sd\"fdsfs\"dfsd'" + ] + }, + "execution_count": 160, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'sfdsf\"sd\"fdsfs\"dfsd'" + ] + }, + { + "cell_type": "code", + "execution_count": 161, "metadata": { "slideshow": { "slide_type": "slide" @@ -730,7 +1613,7 @@ } ], "source": [ - "sent = \"It's fine.\"\n", + "sent = \"It's f' ' ' 'ine.\"\n", "\n", "sent = 'It\\'s fine.'\n", "\n", @@ -739,7 +1622,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 163, "metadata": { "slideshow": { "slide_type": "slide" @@ -766,7 +1649,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 164, "metadata": { "slideshow": { "slide_type": "slide" @@ -778,6 +1661,26 @@ "x = ''" ] }, + { + "cell_type": "code", + "execution_count": 165, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "''" + ] + }, + "execution_count": 165, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, { "cell_type": "markdown", "metadata": { @@ -793,7 +1696,27 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 168, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'23'" + ] + }, + "execution_count": 168, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'2' + '3'" + ] + }, + { + "cell_type": "code", + "execution_count": 169, "metadata": { "slideshow": { "slide_type": "slide" @@ -817,7 +1740,27 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 171, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'OOOOOOOO'" + ] + }, + "execution_count": 171, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'O' * 8" + ] + }, + { + "cell_type": "code", + "execution_count": 170, "metadata": { "slideshow": { "slide_type": "slide" @@ -839,7 +1782,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 173, "metadata": { "slideshow": { "slide_type": "slide" @@ -877,7 +1820,106 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 175, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'pythofdsfdsn'" + ] + }, + "execution_count": 175, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'PythoFDSFDSn'.lower()" + ] + }, + { + "cell_type": "code", + "execution_count": 178, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'mickiewicz'" + ] + }, + "execution_count": 178, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "user" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'mXckXewXcz'" + ] + }, + "execution_count": 180, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "user.replace('i', 'X')" + ] + }, + { + "cell_type": "code", + "execution_count": 181, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "' 2021 '" + ] + }, + "execution_count": 181, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "' 2021 '\n" + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2021\n" + ] + } + ], + "source": [ + "print(' 2021 '.strip())\n" + ] + }, + { + "cell_type": "code", + "execution_count": 177, "metadata": { "slideshow": { "slide_type": "slide" @@ -911,7 +1953,76 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 184, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'ab'" + ] + }, + "execution_count": 184, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'a' + 'b'" + ] + }, + { + "cell_type": "code", + "execution_count": 187, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Słowo {user} ma {len(user)} liter.'" + ] + }, + "execution_count": 187, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "\n", + "'Słowo {user} ma {len(user)} liter.'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 183, "metadata": { "slideshow": { "slide_type": "slide" @@ -949,6 +2060,86 @@ "### Czas na zadanie (1c)." ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + " * Stwórz 2 zmiennie: firstname i surname, które będą zawierać Twoje imię i nazwisko.\n", + " * Połącz te zmiennie w takim sposób, żeby było rozdzielone spacją i zapisz wynik do zmiennej fullname.\n", + " * Wykorzystaj f-string i wyświetl na ekran zawartość zmiennej fullname, w taki sposób, żeby zawartość zmiennej była poprzedzona słowami \"Nazywam się \".\n", + " * Wyświetl sumaryczną długość zmiennych firstname i surname. \n", + "\"\"\"\n", + "\n", + "firstname = \"Jakub\"\n", + "surname = \"Pokrywka\"\n", + "\n", + "print(f\"Nazywam się {firstname} {surname}.\")\n", + "\n", + "print(firstname.lower())\n", + "\n", + "print(\"Nazywam się %s %s\" % (firstname, surname))\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 188, + "metadata": {}, + "outputs": [], + "source": [ + "firstname = \"Jakub\"\n", + "surname = \"Pokrywka\"" + ] + }, + { + "cell_type": "code", + "execution_count": 192, + "metadata": {}, + "outputs": [], + "source": [ + "fullname = firstname + ' ' + surname" + ] + }, + { + "cell_type": "code", + "execution_count": 193, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Jakub Pokrywka'" + ] + }, + "execution_count": 193, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fullname" + ] + }, + { + "cell_type": "code", + "execution_count": 196, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Nazywam sie Jakub Pokrywka\n" + ] + } + ], + "source": [ + "print(f\"Nazywam sie {fullname}\")" + ] + }, { "cell_type": "markdown", "metadata": { @@ -966,7 +2157,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 200, "metadata": { "slideshow": { "slide_type": "slide" @@ -979,12 +2170,33 @@ "oceny = [5, 4, 3, 5, 5]\n", "misc = [3.14, \"pi\", [\"pi\"], 3]\n", "\n", - "list_0_9 = list(range(10))\n" + "list_0_9 = list(range(10))" ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 203, "metadata": { "slideshow": { "slide_type": "slide" @@ -1004,6 +2216,13 @@ "print('Liczba elementów:', len(numbers))" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": 31, @@ -1044,7 +2263,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 205, "metadata": { "slideshow": { "slide_type": "slide" @@ -1093,7 +2312,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 206, "metadata": { "slideshow": { "slide_type": "slide" @@ -1159,7 +2378,27 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 208, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 208, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(liczby)" + ] + }, + { + "cell_type": "code", + "execution_count": 207, "metadata": { "slideshow": { "slide_type": "slide" @@ -1197,7 +2436,27 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 214, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 3, 2, 3, 1]" + ] + }, + "execution_count": 214, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "oceny[:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 209, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1298,7 +2557,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 219, "metadata": { "slideshow": { "slide_type": "slide" @@ -1311,20 +2570,43 @@ "7" ] }, - "execution_count": 39, + "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "numbers = (4, 5, 7)\n", + "numbers = [4, 5, 7]\n", "\n", "numbers[2]" ] }, { "cell_type": "code", - "execution_count": 40, + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 220, + "metadata": {}, + "outputs": [], + "source": [ + "numbers[2] = 10" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 216, "metadata": { "slideshow": { "slide_type": "slide" @@ -1394,7 +2676,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 221, "metadata": { "slideshow": { "slide_type": "fragment" @@ -1417,7 +2699,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 222, "metadata": { "slideshow": { "slide_type": "slide" @@ -1634,13 +2916,22 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 225, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Zdobyłeś wystarczającą liczbę punktów.\n", + "------\n" + ] + } + ], "source": [ "score_theory = 40\n", "score_practical = 45\n", @@ -1676,7 +2967,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 227, "metadata": { "slideshow": { "slide_type": "slide" @@ -1687,13 +2978,14 @@ "name": "stdout", "output_type": "stream", "text": [ - "Nie zdobyłeś wystarczającej liczby punktów.\n" + "Zdobyłeś wystarczającą liczbę punktów.\n", + "------\n" ] } ], "source": [ "\n", - "score_theory = 40\n", + "score_theory = 140\n", "score_practical = 45\n", "\n", "if score_theory + score_practical > 100:\n", @@ -1757,7 +3049,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 228, "metadata": { "slideshow": { "slide_type": "slide" @@ -1811,7 +3103,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 229, "metadata": { "slideshow": { "slide_type": "slide" @@ -1846,7 +3138,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 230, "metadata": { "slideshow": { "slide_type": "slide" @@ -1881,7 +3173,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 233, "metadata": { "slideshow": { "slide_type": "slide" @@ -1899,7 +3191,7 @@ "source": [ "my_programming_lang = \"Python\"\n", "\n", - "if \"Pyt\" in my_programming_lang:\n", + "if \"on\" in my_programming_lang:\n", " print('Yes!')" ] }, @@ -2027,7 +3319,54 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 236, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" + ] + }, + "execution_count": 236, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(range(10))" + ] + }, + { + "cell_type": "code", + "execution_count": 238, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "16\n", + "36\n", + "49\n" + ] + } + ], + "source": [ + "for i in [4,6,7]:\n", + " print(i**2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": 266, "metadata": { "slideshow": { "slide_type": "slide" @@ -2133,7 +3472,7 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 268, "metadata": { "slideshow": { "slide_type": "slide" @@ -2169,7 +3508,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 269, "metadata": { "slideshow": { "slide_type": "slide" @@ -2205,7 +3544,27 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 271, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'y'" + ] + }, + "execution_count": 271, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'Python'[1]" + ] + }, + { + "cell_type": "code", + "execution_count": 272, "metadata": { "slideshow": { "slide_type": "slide" @@ -2243,7 +3602,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 273, "metadata": { "slideshow": { "slide_type": "slide" @@ -2286,7 +3645,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 283, "metadata": { "slideshow": { "slide_type": "slide" @@ -2297,19 +3656,27 @@ "name": "stdout", "output_type": "stream", "text": [ + "0\n", + "10\n", + "11\n", + "aaa\n", "1\n", + "11\n", + "12\n", + "aaa\n", "2\n", - "4\n", - "3\n", - "6\n", - "9\n" + "12\n", + "13\n", + "aaa\n" ] } ], "source": [ "for i in range(3):\n", - " for j in range(i+1):\n", - " print((i + 1) * (j + 1))\n" + " print(i)\n", + " for j in range(10,12):\n", + " print(i+j)\n", + " print('aaa')" ] }, { @@ -2350,6 +3717,26 @@ " * następnie wcięty blok to będzie kod funkcji." ] }, + { + "cell_type": "code", + "execution_count": 285, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 285, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(numbers)" + ] + }, { "cell_type": "code", "execution_count": 67, @@ -2398,7 +3785,7 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 286, "metadata": { "slideshow": { "slide_type": "slide" @@ -2512,7 +3899,7 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 288, "metadata": { "slideshow": { "slide_type": "slide" @@ -2541,7 +3928,7 @@ }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 289, "metadata": { "slideshow": { "slide_type": "slide" @@ -2554,7 +3941,7 @@ "3" ] }, - "execution_count": 72, + "execution_count": 289, "metadata": {}, "output_type": "execute_result" } @@ -2604,6 +3991,64 @@ "Przykłady:" ] }, + { + "cell_type": "code", + "execution_count": 291, + "metadata": {}, + "outputs": [], + "source": [ + "import os" + ] + }, + { + "cell_type": "code", + "execution_count": 298, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'kubapok'" + ] + }, + "execution_count": 298, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "os.getenv(\"USER\")" + ] + }, + { + "cell_type": "code", + "execution_count": 299, + "metadata": {}, + "outputs": [], + "source": [ + "from os import getenv" + ] + }, + { + "cell_type": "code", + "execution_count": 300, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'kubapok'" + ] + }, + "execution_count": 300, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "getenv(\"USER\")" + ] + }, { "cell_type": "code", "execution_count": 73, @@ -2691,7 +4136,7 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 301, "metadata": { "slideshow": { "slide_type": "slide" @@ -2755,7 +4200,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 302, "metadata": { "slideshow": { "slide_type": "slide" @@ -2782,7 +4227,25 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 303, + "metadata": {}, + "outputs": [], + "source": [ + "zen_file = open('./zen_of_python.txt')\n" + ] + }, + { + "cell_type": "code", + "execution_count": 308, + "metadata": {}, + "outputs": [], + "source": [ + "zen_file.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 309, "metadata": {}, "outputs": [ { @@ -2809,7 +4272,45 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 313, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Beautiful is better than ugly.\\n',\n", + " 'Explicit is better than implicit.\\n',\n", + " 'Simple is better than complex.\\n',\n", + " 'Complex is better than complicated.\\n',\n", + " 'Flat is better than nested.\\n',\n", + " 'Sparse is better than dense.\\n',\n", + " 'Readability counts.\\n',\n", + " \"Special cases aren't special enough to break the rules.\\n\",\n", + " 'Although practicality beats purity.\\n',\n", + " 'Errors should never pass silently.\\n',\n", + " 'Unless explicitly silenced.\\n',\n", + " 'In the face of ambiguity, refuse the temptation to guess.\\n',\n", + " 'There should be one-- and preferably only one --obvious way to do it.\\n',\n", + " \"Although that way may not be obvious at first unless you're Dutch.\\n\",\n", + " 'Now is better than never.\\n',\n", + " 'Although never is often better than *right* now.\\n',\n", + " \"If the implementation is hard to explain, it's a bad idea.\\n\",\n", + " 'If the implementation is easy to explain, it may be a good idea.\\n',\n", + " \"Namespaces are one honking great idea -- let's do more of those!\\n\"]" + ] + }, + "execution_count": 313, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "zen_lines\n" + ] + }, + { + "cell_type": "code", + "execution_count": 315, "metadata": { "slideshow": { "slide_type": "slide" @@ -2833,7 +4334,24 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 318, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1_wprowadzenie_do_python.ipynb\t2_podstawy.ipynb zadania zen_of_python.txt\n" + ] + } + ], + "source": [ + "!ls" + ] + }, + { + "cell_type": "code", + "execution_count": 319, "metadata": { "slideshow": { "slide_type": "slide" @@ -2851,6 +4369,42 @@ " plik.write(country + ',' + str(num_trees) + '\\n')\n" ] }, + { + "cell_type": "code", + "execution_count": 320, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1_wprowadzenie_do_python.ipynb\tzadania\t\tzen_of_python.txt\n", + "2_podstawy.ipynb\t\tzalesienie.txt\n" + ] + } + ], + "source": [ + "!ls" + ] + }, + { + "cell_type": "code", + "execution_count": 321, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Brazil,39542\n", + "Bulgaria,24987\n" + ] + } + ], + "source": [ + "!cat zalesienie.txt" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -2884,6 +4438,46 @@ "print(today.month) # atrybut\n", "print(today.weekday()) # metoda" ] + }, + { + "cell_type": "code", + "execution_count": 324, + "metadata": {}, + "outputs": [], + "source": [ + "class Item():\n", + " def __init__(self, a):\n", + " self.aaa = a + 10" + ] + }, + { + "cell_type": "code", + "execution_count": 326, + "metadata": {}, + "outputs": [], + "source": [ + "item = Item(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 327, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "15" + ] + }, + "execution_count": 327, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "item.aaa" + ] } ], "metadata": { diff --git a/zajecia2/data_analysis.ipynb b/zajecia2/data_analysis.ipynb index 4104ac0..802510e 100644 --- a/zajecia2/data_analysis.ipynb +++ b/zajecia2/data_analysis.ipynb @@ -56,6 +56,15 @@ "import pandas as pd" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pd" + ] + }, { "cell_type": "markdown", "metadata": { @@ -206,7 +215,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": { "slideshow": { "slide_type": "slide" @@ -223,7 +232,7 @@ "Name: Rides, dtype: float64" ] }, - "execution_count": 4, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } @@ -251,7 +260,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": { "slideshow": { "slide_type": "slide" @@ -276,7 +285,7 @@ "dtype: int64" ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -305,7 +314,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": { "slideshow": { "slide_type": "slide" @@ -323,7 +332,7 @@ "dtype: int64" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -365,7 +374,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": { "slideshow": { "slide_type": "slide" @@ -384,7 +393,7 @@ "dtype: float64" ] }, - "execution_count": 7, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -415,8 +424,9 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 9, "metadata": { + "scrolled": true, "slideshow": { "slide_type": "slide" } @@ -434,7 +444,7 @@ "dtype: int64" ] }, - "execution_count": 8, + "execution_count": 9, "metadata": {}, "output_type": "execute_result" } @@ -448,6 +458,58 @@ "members" ] }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "May 683758\n", + "June 738011\n", + "July 780511\n", + "August 674790\n", + "September 674790\n", + "October 445177\n", + "dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "members" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "May 10000000000683758\n", + "June 10000000000738011\n", + "July 10000000000780511\n", + "August 10000000000674790\n", + "September 10000000000674790\n", + "October 10000000000445177\n", + "dtype: int64" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "members + 10000000000000000" + ] + }, { "cell_type": "markdown", "metadata": { @@ -478,14 +540,156 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], - "source": [] + "source": [ + "n= list(range(10+1))" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "n = pd.Series(n)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0\n", + "1 1\n", + "2 2\n", + "3 3\n", + "4 4\n", + "5 5\n", + "6 6\n", + "7 7\n", + "8 8\n", + "9 9\n", + "10 10\n", + "dtype: int64" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "n2 = n**2" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0\n", + "1 1\n", + "2 4\n", + "3 9\n", + "4 16\n", + "5 25\n", + "6 36\n", + "7 49\n", + "8 64\n", + "9 81\n", + "10 100\n", + "dtype: int64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n2" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "trojkatne = ( n + n2 ) / 2" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.0\n", + "1 1.0\n", + "2 3.0\n", + "3 6.0\n", + "4 10.0\n", + "5 15.0\n", + "6 21.0\n", + "7 28.0\n", + "8 36.0\n", + "9 45.0\n", + "10 55.0\n", + "dtype: float64" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trojkatne" + ] }, { "cell_type": "markdown", @@ -522,7 +726,53 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "May 682758\n", + "June 737011\n", + "July 779511\n", + "dtype: int64" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "members" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "May 147898\n", + "June 171494\n", + "July 194316\n", + "dtype: int64" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "occasionals" + ] + }, + { + "cell_type": "code", + "execution_count": 35, "metadata": { "slideshow": { "slide_type": "slide" @@ -556,38 +806,44 @@ " \n", " \n", " \n", - " May\n", - " 682758\n", - " 147898\n", + " July\n", + " 779511.0\n", + " 194316.0\n", " \n", " \n", " June\n", - " 737011\n", - " 171494\n", + " 737011.0\n", + " 171494.0\n", " \n", " \n", - " July\n", - " 779511\n", - " 194316\n", + " May\n", + " NaN\n", + " 147898.0\n", + " \n", + " \n", + " Maydfdsgfdg\n", + " 682758.0\n", + " NaN\n", " \n", " \n", "\n", "" ], "text/plain": [ - " members occasionals\n", - "May 682758 147898\n", - "June 737011 171494\n", - "July 779511 194316" + " members occasionals\n", + "July 779511.0 194316.0\n", + "June 737011.0 171494.0\n", + "May NaN 147898.0\n", + "Maydfdsgfdg 682758.0 NaN" ] }, - "execution_count": 9, + "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "members = pd.Series({'May': 682758, 'June': 737011, 'July': 779511})\n", + "members = pd.Series({'Maydfdsgfdg': 682758, 'June': 737011, 'July': 779511})\n", "occasionals = pd.Series({'May': 147898, 'June': 171494, 'July': 194316})\n", "\n", "df = pd.DataFrame({'members': members, 'occasionals': occasionals})\n", @@ -607,7 +863,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 37, "metadata": { "slideshow": { "slide_type": "slide" @@ -666,7 +922,7 @@ "2 779511 194316" ] }, - "execution_count": 10, + "execution_count": 37, "metadata": {}, "output_type": "execute_result" } @@ -696,7 +952,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 38, "metadata": { "slideshow": { "slide_type": "slide" @@ -780,7 +1036,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 39, "metadata": { "slideshow": { "slide_type": "slide" @@ -975,7 +1231,7 @@ "[175 rows x 8 columns]" ] }, - "execution_count": 12, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -988,7 +1244,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 40, "metadata": { "slideshow": { "slide_type": "slide" @@ -1144,7 +1400,7 @@ "5 0 0 373450 8.0500 NaN S " ] }, - "execution_count": 13, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -1167,113 +1423,13 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
start_datestart_station_codeend_dateend_station_codeduration_secis_member
02019-04-14 07:55:2260012019-04-14 08:07:1661327131
12019-04-14 07:59:3164112019-04-14 08:09:1864115871
22019-04-14 07:59:5560972019-04-14 08:12:1160367361
32019-04-14 07:59:5763102019-04-14 08:27:58634516801
42019-04-14 08:00:3770292019-04-14 08:14:1262508140
\n", - "
" - ], - "text/plain": [ - " start_date start_station_code end_date \\\n", - "0 2019-04-14 07:55:22 6001 2019-04-14 08:07:16 \n", - "1 2019-04-14 07:59:31 6411 2019-04-14 08:09:18 \n", - "2 2019-04-14 07:59:55 6097 2019-04-14 08:12:11 \n", - "3 2019-04-14 07:59:57 6310 2019-04-14 08:27:58 \n", - "4 2019-04-14 08:00:37 7029 2019-04-14 08:14:12 \n", - "\n", - " end_station_code duration_sec is_member \n", - "0 6132 713 1 \n", - "1 6411 587 1 \n", - "2 6036 736 1 \n", - "3 6345 1680 1 \n", - "4 6250 814 0 " - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = pd.read_excel('./bikes.xlsx', engine='openpyxl', nrows=5)\n", "df" @@ -1293,7 +1449,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 42, "metadata": { "slideshow": { "slide_type": "slide" @@ -1409,7 +1565,7 @@ "[347 rows x 2 columns]" ] }, - "execution_count": 15, + "execution_count": 42, "metadata": {}, "output_type": "execute_result" } @@ -1422,7 +1578,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 43, "metadata": { "slideshow": { "slide_type": "slide" @@ -1538,7 +1694,7 @@ "[347 rows x 2 columns]" ] }, - "execution_count": 16, + "execution_count": 43, "metadata": {}, "output_type": "execute_result" } @@ -1592,7 +1748,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 44, "metadata": { "slideshow": { "slide_type": "slide" @@ -1608,7 +1764,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 45, "metadata": { "slideshow": { "slide_type": "slide" @@ -1635,7 +1791,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 46, "metadata": { "slideshow": { "slide_type": "slide" @@ -1656,7 +1812,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 47, "metadata": { "slideshow": { "slide_type": "slide" @@ -1688,7 +1844,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 51, "metadata": { "slideshow": { "slide_type": "slide" @@ -1727,17 +1883,6 @@ " * Tabela `Invoice` zawiera informacje o fakturach. Przekonwertuj kolumnę `BillingCountry` do pythonowego słownika, a następnie podaj najcześciej występującą wartość. Ile razy pojawiła się?\n" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": { @@ -1765,7 +1910,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 58, "metadata": { "slideshow": { "slide_type": "slide" @@ -1870,7 +2015,7 @@ "Australia 41312.0 21370348.0 81.6" ] }, - "execution_count": 22, + "execution_count": 58, "metadata": {}, "output_type": "execute_result" } @@ -1894,7 +2039,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 59, "metadata": { "slideshow": { "slide_type": "slide" @@ -1916,7 +2061,7 @@ "Name: population, dtype: float64" ] }, - "execution_count": 23, + "execution_count": 59, "metadata": {}, "output_type": "execute_result" } @@ -1928,7 +2073,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 60, "metadata": { "slideshow": { "slide_type": "slide" @@ -1950,7 +2095,7 @@ "Name: population, dtype: float64" ] }, - "execution_count": 24, + "execution_count": 60, "metadata": {}, "output_type": "execute_result" } @@ -1973,7 +2118,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 65, "metadata": { "slideshow": { "slide_type": "slide" @@ -2068,7 +2213,7 @@ "Australia 41312.0 21370348.0" ] }, - "execution_count": 25, + "execution_count": 65, "metadata": {}, "output_type": "execute_result" } @@ -2090,8 +2235,9 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 66, "metadata": { + "scrolled": true, "slideshow": { "slide_type": "fragment" } @@ -2103,7 +2249,7 @@ "Index(['gdp', 'population', 'life_expectancy'], dtype='object')" ] }, - "execution_count": 26, + "execution_count": 66, "metadata": {}, "output_type": "execute_result" } @@ -2114,7 +2260,119 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
gdppopulationlife_expectancy
Country
Afghanistan1311.026528741.052.8
Albania8644.02968026.076.8
Algeria12314.034811059.075.5
Angola7103.019842251.056.7
Antigua and Barbuda25736.085350.075.5
Argentina14646.040381860.075.4
Armenia7383.02975029.072.3
Australia41312.021370348.081.6
\n", + "
" + ], + "text/plain": [ + " gdp population life_expectancy\n", + "Country \n", + "Afghanistan 1311.0 26528741.0 52.8\n", + "Albania 8644.0 2968026.0 76.8\n", + "Algeria 12314.0 34811059.0 75.5\n", + "Angola 7103.0 19842251.0 56.7\n", + "Antigua and Barbuda 25736.0 85350.0 75.5\n", + "Argentina 14646.0 40381860.0 75.4\n", + "Armenia 7383.0 2975029.0 72.3\n", + "Australia 41312.0 21370348.0 81.6" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 68, "metadata": { "slideshow": { "slide_type": "slide" @@ -2219,7 +2477,7 @@ "Australia 41312.0 21370348.0 81.6" ] }, - "execution_count": 27, + "execution_count": 68, "metadata": {}, "output_type": "execute_result" } @@ -2243,7 +2501,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 69, "metadata": { "slideshow": { "slide_type": "slide" @@ -2259,7 +2517,7 @@ "Name: Argentina, dtype: float64" ] }, - "execution_count": 28, + "execution_count": 69, "metadata": {}, "output_type": "execute_result" } @@ -2281,7 +2539,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 70, "metadata": { "slideshow": { "slide_type": "slide" @@ -2344,7 +2602,7 @@ "Angola 7103.0 19842251.0 56.7" ] }, - "execution_count": 29, + "execution_count": 70, "metadata": {}, "output_type": "execute_result" } @@ -2366,7 +2624,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 71, "metadata": { "slideshow": { "slide_type": "slide" @@ -2425,7 +2683,7 @@ "Angola 7103.0 19842251.0" ] }, - "execution_count": 30, + "execution_count": 71, "metadata": {}, "output_type": "execute_result" } @@ -2449,7 +2707,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 72, "metadata": { "slideshow": { "slide_type": "slide" @@ -2519,7 +2777,7 @@ "Angola 7103.0 19842251.0 56.7" ] }, - "execution_count": 31, + "execution_count": 72, "metadata": {}, "output_type": "execute_result" } @@ -2541,24 +2799,13 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "7103.0" - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.at['Angola', 'PKB']" ] @@ -2576,7 +2823,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 73, "metadata": { "slideshow": { "slide_type": "slide" @@ -2591,7 +2838,7 @@ " dtype='object', name='Country')" ] }, - "execution_count": 33, + "execution_count": 73, "metadata": {}, "output_type": "execute_result" } @@ -2613,7 +2860,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 74, "metadata": { "slideshow": { "slide_type": "slide" @@ -2690,7 +2937,7 @@ "October 444177 53596" ] }, - "execution_count": 34, + "execution_count": 74, "metadata": {}, "output_type": "execute_result" } @@ -2716,7 +2963,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 76, "metadata": { "slideshow": { "slide_type": "slide" @@ -2759,41 +3006,23 @@ " 737011\n", " 171494\n", " \n", - " \n", - " July\n", - " 779511\n", - " 194316\n", - " \n", - " \n", - " August\n", - " 673790\n", - " 206809\n", - " \n", - " \n", - " September\n", - " 673790\n", - " 140492\n", - " \n", " \n", "\n", "" ], "text/plain": [ - " members occasionals\n", - "May 682758 147898\n", - "June 737011 171494\n", - "July 779511 194316\n", - "August 673790 206809\n", - "September 673790 140492" + " members occasionals\n", + "May 682758 147898\n", + "June 737011 171494" ] }, - "execution_count": 35, + "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df.head()" + "df.head(2)" ] }, { @@ -2805,7 +3034,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 77, "metadata": { "slideshow": { "slide_type": "slide" @@ -2876,7 +3105,7 @@ "October 444177 53596" ] }, - "execution_count": 36, + "execution_count": 77, "metadata": {}, "output_type": "execute_result" } @@ -2894,7 +3123,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 78, "metadata": { "slideshow": { "slide_type": "slide" @@ -2933,15 +3162,15 @@ " 140492\n", " \n", " \n", - " August\n", - " 673790\n", - " 206809\n", - " \n", - " \n", " May\n", " 682758\n", " 147898\n", " \n", + " \n", + " June\n", + " 737011\n", + " 171494\n", + " \n", " \n", "\n", "" @@ -2949,11 +3178,11 @@ "text/plain": [ " members occasionals\n", "September 673790 140492\n", - "August 673790 206809\n", - "May 682758 147898" + "May 682758 147898\n", + "June 737011 171494" ] }, - "execution_count": 37, + "execution_count": 78, "metadata": {}, "output_type": "execute_result" } @@ -2971,7 +3200,91 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 79, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
membersoccasionals
May682758147898
June737011171494
July779511194316
August673790206809
September673790140492
October44417753596
\n", + "
" + ], + "text/plain": [ + " members occasionals\n", + "May 682758 147898\n", + "June 737011 171494\n", + "July 779511 194316\n", + "August 673790 206809\n", + "September 673790 140492\n", + "October 444177 53596" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df" + ] + }, + { + "cell_type": "code", + "execution_count": 80, "metadata": { "slideshow": { "slide_type": "slide" @@ -3060,7 +3373,7 @@ "max 779511.000000 206809.000000" ] }, - "execution_count": 38, + "execution_count": 80, "metadata": {}, "output_type": "execute_result" } @@ -3078,7 +3391,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 81, "metadata": { "slideshow": { "slide_type": "slide" @@ -3114,7 +3427,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 82, "metadata": { "slideshow": { "slide_type": "slide" @@ -3127,7 +3440,7 @@ "6" ] }, - "execution_count": 40, + "execution_count": 82, "metadata": {}, "output_type": "execute_result" } @@ -3145,7 +3458,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 84, "metadata": { "slideshow": { "slide_type": "slide" @@ -3158,7 +3471,7 @@ "(6, 2)" ] }, - "execution_count": 41, + "execution_count": 84, "metadata": {}, "output_type": "execute_result" } @@ -3185,28 +3498,86 @@ }, { "cell_type": "code", - "execution_count": 42, - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, + "execution_count": 86, + "metadata": {}, "outputs": [ { "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
membersoccasionals
May682758147898
June737011171494
July779511194316
August673790206809
September673790140492
October44417753596
\n", + "
" + ], "text/plain": [ - "members 665172.833333\n", - "occasionals 152434.166667\n", - "dtype: float64" + " members occasionals\n", + "May 682758 147898\n", + "June 737011 171494\n", + "July 779511 194316\n", + "August 673790 206809\n", + "September 673790 140492\n", + "October 444177 53596" ] }, - "execution_count": 42, + "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df.mean()" + "df" ] }, { @@ -3222,7 +3593,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 90, "metadata": { "slideshow": { "slide_type": "slide" @@ -3251,6 +3622,64 @@ "print(dane.value_counts())" ] }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 1\n", + "1 3\n", + "2 2\n", + "3 3\n", + "4 1\n", + "5 1\n", + "6 2\n", + "7 3\n", + "8 2\n", + "9 3\n", + "dtype: int64" + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dane" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3 4\n", + "1 3\n", + "2 3\n", + "Name: count, dtype: int64\n" + ] + } + ], + "source": [ + "print(dane.value_counts())" + ] + }, { "cell_type": "markdown", "metadata": { @@ -3264,7 +3693,7 @@ }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 93, "metadata": { "slideshow": { "slide_type": "slide" @@ -3289,7 +3718,7 @@ "Name: Age, Length: 891, dtype: bool" ] }, - "execution_count": 44, + "execution_count": 93, "metadata": {}, "output_type": "execute_result" } @@ -3312,131 +3741,13 @@ }, { "cell_type": "code", - "execution_count": 45, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
female_BMImale_BMIgdppopulationunder5mortalitylife_expectancyfertility
Country
Afghanistan21.0740220.620581311.026528741.0110.452.86.20
Albania25.6572626.446578644.02968026.017.976.81.76
Algeria26.3684124.5962012314.034811059.029.575.52.73
Angola23.4843122.250837103.019842251.0192.056.76.43
Antigua and Barbuda27.5054525.7660225736.085350.010.975.52.16
\n", - "
" - ], - "text/plain": [ - " female_BMI male_BMI gdp population \\\n", - "Country \n", - "Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n", - "Albania 25.65726 26.44657 8644.0 2968026.0 \n", - "Algeria 26.36841 24.59620 12314.0 34811059.0 \n", - "Angola 23.48431 22.25083 7103.0 19842251.0 \n", - "Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n", - "\n", - " under5mortality life_expectancy fertility \n", - "Country \n", - "Afghanistan 110.4 52.8 6.20 \n", - "Albania 17.9 76.8 1.76 \n", - "Algeria 29.5 75.5 2.73 \n", - "Angola 192.0 56.7 6.43 \n", - "Antigua and Barbuda 10.9 75.5 2.16 " - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n", "\n", @@ -3445,153 +3756,13 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
female_BMImale_BMIgdppopulationunder5mortalitylife_expectancyfertilitycontinenttmp
Country
Afghanistan21.0740220.620581311.026528741.0110.452.86.20Asia1
Albania25.6572626.446578644.02968026.017.976.81.76Europe1
Algeria26.3684124.5962012314.034811059.029.575.52.73Africa1
Angola23.4843122.250837103.019842251.0192.056.76.43Africa1
Antigua and Barbuda27.5054525.7660225736.085350.010.975.52.16Americas1
\n", - "
" - ], - "text/plain": [ - " female_BMI male_BMI gdp population \\\n", - "Country \n", - "Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n", - "Albania 25.65726 26.44657 8644.0 2968026.0 \n", - "Algeria 26.36841 24.59620 12314.0 34811059.0 \n", - "Angola 23.48431 22.25083 7103.0 19842251.0 \n", - "Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n", - "\n", - " under5mortality life_expectancy fertility continent \\\n", - "Country \n", - "Afghanistan 110.4 52.8 6.20 Asia \n", - "Albania 17.9 76.8 1.76 Europe \n", - "Algeria 29.5 75.5 2.73 Africa \n", - "Angola 192.0 56.7 6.43 Africa \n", - "Antigua and Barbuda 10.9 75.5 2.16 Americas \n", - "\n", - " tmp \n", - "Country \n", - "Afghanistan 1 \n", - "Albania 1 \n", - "Algeria 1 \n", - "Angola 1 \n", - "Antigua and Barbuda 1 " - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "conts = pd.Series({\n", " 'Afghanistan': 'Asia', 'Albania': 'Europe', 'Algeria':' Africa', 'Angola': 'Africa', 'Antigua and Barbuda': 'Americas'})\n", @@ -3605,168 +3776,13 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
female_BMImale_BMIgdppopulationunder5mortalitylife_expectancyfertilitycontinenttmp
Country
Afghanistan21.0740220.620581311.026528741.0110.452.86.20Asia1.0
Albania25.6572626.446578644.02968026.017.976.81.76Europe1.0
Algeria26.3684124.5962012314.034811059.029.575.52.73Africa1.0
Angola23.4843122.250837103.019842251.0192.056.76.43Africa1.0
Antigua and Barbuda27.5054525.7660225736.085350.010.975.52.16Americas1.0
Argentina27.4652327.5017014646.040381860.015.475.42.24NaNNaN
\n", - "
" - ], - "text/plain": [ - " female_BMI male_BMI gdp population \\\n", - "Country \n", - "Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n", - "Albania 25.65726 26.44657 8644.0 2968026.0 \n", - "Algeria 26.36841 24.59620 12314.0 34811059.0 \n", - "Angola 23.48431 22.25083 7103.0 19842251.0 \n", - "Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n", - "Argentina 27.46523 27.50170 14646.0 40381860.0 \n", - "\n", - " under5mortality life_expectancy fertility continent \\\n", - "Country \n", - "Afghanistan 110.4 52.8 6.20 Asia \n", - "Albania 17.9 76.8 1.76 Europe \n", - "Algeria 29.5 75.5 2.73 Africa \n", - "Angola 192.0 56.7 6.43 Africa \n", - "Antigua and Barbuda 10.9 75.5 2.16 Americas \n", - "Argentina 15.4 75.4 2.24 NaN \n", - "\n", - " tmp \n", - "Country \n", - "Afghanistan 1.0 \n", - "Albania 1.0 \n", - "Algeria 1.0 \n", - "Angola 1.0 \n", - "Antigua and Barbuda 1.0 \n", - "Argentina NaN " - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.loc['Argentina'] = {\n", " 'female_BMI': 27.46523,\n", @@ -3782,151 +3798,13 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
female_BMImale_BMIpopulationunder5mortalitylife_expectancyfertilitycontinenttmp
Country
Afghanistan21.0740220.6205826528741.0110.452.86.20Asia1.0
Albania25.6572626.446572968026.017.976.81.76Europe1.0
Algeria26.3684124.5962034811059.029.575.52.73Africa1.0
Angola23.4843122.2508319842251.0192.056.76.43Africa1.0
Antigua and Barbuda27.5054525.7660285350.010.975.52.16Americas1.0
Argentina27.4652327.5017040381860.015.475.42.24NaNNaN
\n", - "
" - ], - "text/plain": [ - " female_BMI male_BMI population under5mortality \\\n", - "Country \n", - "Afghanistan 21.07402 20.62058 26528741.0 110.4 \n", - "Albania 25.65726 26.44657 2968026.0 17.9 \n", - "Algeria 26.36841 24.59620 34811059.0 29.5 \n", - "Angola 23.48431 22.25083 19842251.0 192.0 \n", - "Antigua and Barbuda 27.50545 25.76602 85350.0 10.9 \n", - "Argentina 27.46523 27.50170 40381860.0 15.4 \n", - "\n", - " life_expectancy fertility continent tmp \n", - "Country \n", - "Afghanistan 52.8 6.20 Asia 1.0 \n", - "Albania 76.8 1.76 Europe 1.0 \n", - "Algeria 75.5 2.73 Africa 1.0 \n", - "Angola 56.7 6.43 Africa 1.0 \n", - "Antigua and Barbuda 75.5 2.16 Americas 1.0 \n", - "Argentina 75.4 2.24 NaN NaN " - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.drop('gdp', axis='columns')\n" ] @@ -3959,8 +3837,9 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 94, "metadata": { + "scrolled": true, "slideshow": { "slide_type": "slide" } @@ -4115,7 +3994,7 @@ "5 0 0 373450 8.0500 NaN S " ] }, - "execution_count": 49, + "execution_count": 94, "metadata": {}, "output_type": "execute_result" } @@ -4128,12 +4007,8 @@ }, { "cell_type": "code", - "execution_count": 50, - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, + "execution_count": 95, + "metadata": {}, "outputs": [ { "data": { @@ -4153,7 +4028,7 @@ "Name: Survived, Length: 891, dtype: int64" ] }, - "execution_count": 50, + "execution_count": 95, "metadata": {}, "output_type": "execute_result" } @@ -4164,7 +4039,20 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": null, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "df['Survived']" + ] + }, + { + "cell_type": "code", + "execution_count": 97, "metadata": { "slideshow": { "slide_type": "slide" @@ -4189,7 +4077,7 @@ "Name: Survived, Length: 891, dtype: bool" ] }, - "execution_count": 51, + "execution_count": 97, "metadata": {}, "output_type": "execute_result" } @@ -4200,12 +4088,65 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 100, "metadata": { "slideshow": { "slide_type": "slide" } }, + "outputs": [], + "source": [ + "df_survived = df[df['Pclass'] == 1]" + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "891" + ] + }, + "execution_count": 101, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "data": { + "text/plain": [ + "216" + ] + }, + "execution_count": 102, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df_survived)" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": {}, "outputs": [ { "data": { @@ -4461,13 +4402,13 @@ "[216 rows x 11 columns]" ] }, - "execution_count": 52, + "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df[df['Pclass'] == 1]" + "df_survived" ] }, { @@ -4488,7 +4429,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 104, "metadata": { "slideshow": { "slide_type": "slide" @@ -4749,7 +4690,7 @@ "[94 rows x 11 columns]" ] }, - "execution_count": 53, + "execution_count": 104, "metadata": {}, "output_type": "execute_result" } @@ -4763,7 +4704,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 105, "metadata": { "slideshow": { "slide_type": "slide" @@ -5024,13 +4965,12 @@ "[192 rows x 11 columns]" ] }, - "execution_count": 54, + "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "\n", "df[df['SibSp'] > df['Parch']]" ] }, @@ -5049,7 +4989,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 106, "metadata": { "slideshow": { "slide_type": "slide" @@ -5205,7 +5145,7 @@ "24 0 0 113788 35.5000 A6 S " ] }, - "execution_count": 55, + "execution_count": 106, "metadata": {}, "output_type": "execute_result" } @@ -5216,7 +5156,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 107, "metadata": { "slideshow": { "slide_type": "slide" @@ -5372,7 +5312,7 @@ "53 1 0 PC 17572 76.7292 D33 C " ] }, - "execution_count": 56, + "execution_count": 107, "metadata": {}, "output_type": "execute_result" } @@ -5383,7 +5323,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 108, "metadata": { "slideshow": { "slide_type": "slide" @@ -5644,7 +5584,7 @@ "[192 rows x 11 columns]" ] }, - "execution_count": 57, + "execution_count": 108, "metadata": {}, "output_type": "execute_result" } @@ -5655,7 +5595,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 109, "metadata": { "slideshow": { "slide_type": "slide" @@ -5668,7 +5608,7 @@ "(113, 11)" ] }, - "execution_count": 58, + "execution_count": 109, "metadata": {}, "output_type": "execute_result" } @@ -5702,131 +5642,13 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
female_BMImale_BMIgdppopulationunder5mortalitylife_expectancyfertility
Country
Afghanistan21.0740220.620581311.026528741.0110.452.86.20
Albania25.6572626.446578644.02968026.017.976.81.76
Algeria26.3684124.5962012314.034811059.029.575.52.73
Angola23.4843122.250837103.019842251.0192.056.76.43
Antigua and Barbuda27.5054525.7660225736.085350.010.975.52.16
\n", - "
" - ], - "text/plain": [ - " female_BMI male_BMI gdp population \\\n", - "Country \n", - "Afghanistan 21.07402 20.62058 1311.0 26528741.0 \n", - "Albania 25.65726 26.44657 8644.0 2968026.0 \n", - "Algeria 26.36841 24.59620 12314.0 34811059.0 \n", - "Angola 23.48431 22.25083 7103.0 19842251.0 \n", - "Antigua and Barbuda 27.50545 25.76602 25736.0 85350.0 \n", - "\n", - " under5mortality life_expectancy fertility \n", - "Country \n", - "Afghanistan 110.4 52.8 6.20 \n", - "Albania 17.9 76.8 1.76 \n", - "Algeria 29.5 75.5 2.73 \n", - "Angola 192.0 56.7 6.43 \n", - "Antigua and Barbuda 10.9 75.5 2.16 " - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = pd.read_csv('./gapminder.csv', index_col='Country', nrows=5)\n", "\n", @@ -5846,27 +5668,13 @@ }, { "cell_type": "code", - "execution_count": 60, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "female_BMI\n", - "male_BMI\n", - "gdp\n", - "population\n", - "under5mortality\n", - "life_expectancy\n", - "fertility\n" - ] - } - ], + "outputs": [], "source": [ "for column_name in df:\n", " print(column_name)" @@ -5874,23 +5682,9 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "female_BMI Country\n", - "Afghanistan 21.07402\n", - "Albania 25.65726\n", - "Algeria 26.36841\n", - "Angola 23.48431\n", - "Antigua and Barbuda 27.50545\n", - "Name: female_BMI, dtype: float64\n" - ] - } - ], + "outputs": [], "source": [ "for col_name, series in df.items():\n", " print(col_name, series)\n", @@ -5899,29 +5693,13 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Afghanistan \n", - " female_BMI 2.107402e+01\n", - "male_BMI 2.062058e+01\n", - "gdp 1.311000e+03\n", - "population 2.652874e+07\n", - "under5mortality 1.104000e+02\n", - "life_expectancy 5.280000e+01\n", - "fertility 6.200000e+00\n", - "Name: Afghanistan, dtype: float64\n" - ] - } - ], + "outputs": [], "source": [ "for idx, row in df.iterrows():\n", " print(idx, '\\n', row)\n", @@ -5930,30 +5708,13 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "Country\n", - "Afghanistan normal\n", - "Albania overweight\n", - "Algeria normal\n", - "Angola normal\n", - "Antigua and Barbuda overweight\n", - "Name: male_BMI, dtype: object" - ] - }, - "execution_count": 63, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "def bmi_level(bmi):\n", " if bmi <= 18.5:\n", @@ -5973,30 +5734,13 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "Country\n", - "Afghanistan normal\n", - "Albania overweight\n", - "Algeria normal\n", - "Angola normal\n", - "Antigua and Barbuda overweight\n", - "dtype: object" - ] - }, - "execution_count": 64, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "def bmi_level(row_data):\n", " bmi = row_data['male_BMI']\n", @@ -6013,127 +5757,13 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
CountryAfghanistanAlbaniaAlgeriaAngolaAntigua and Barbuda
female_BMI2.107402e+012.565726e+012.636841e+012.348431e+0127.50545
male_BMI2.062058e+012.644657e+012.459620e+012.225083e+0125.76602
gdp1.311000e+038.644000e+031.231400e+047.103000e+0325736.00000
population2.652874e+072.968026e+063.481106e+071.984225e+0785350.00000
under5mortality1.104000e+021.790000e+012.950000e+011.920000e+0210.90000
life_expectancy5.280000e+017.680000e+017.550000e+015.670000e+0175.50000
fertility6.200000e+001.760000e+002.730000e+006.430000e+002.16000
\n", - "
" - ], - "text/plain": [ - "Country Afghanistan Albania Algeria Angola \\\n", - "female_BMI 2.107402e+01 2.565726e+01 2.636841e+01 2.348431e+01 \n", - "male_BMI 2.062058e+01 2.644657e+01 2.459620e+01 2.225083e+01 \n", - "gdp 1.311000e+03 8.644000e+03 1.231400e+04 7.103000e+03 \n", - "population 2.652874e+07 2.968026e+06 3.481106e+07 1.984225e+07 \n", - "under5mortality 1.104000e+02 1.790000e+01 2.950000e+01 1.920000e+02 \n", - "life_expectancy 5.280000e+01 7.680000e+01 7.550000e+01 5.670000e+01 \n", - "fertility 6.200000e+00 1.760000e+00 2.730000e+00 6.430000e+00 \n", - "\n", - "Country Antigua and Barbuda \n", - "female_BMI 27.50545 \n", - "male_BMI 25.76602 \n", - "gdp 25736.00000 \n", - "population 85350.00000 \n", - "under5mortality 10.90000 \n", - "life_expectancy 75.50000 \n", - "fertility 2.16000 " - ] - }, - "execution_count": 65, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.transpose()" ] @@ -6160,12 +5790,25 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 117, "metadata": { "slideshow": { "slide_type": "slide" } }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_csv('./nba.csv')\n", + "\n", + "#df.sample(5)" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "metadata": {}, "outputs": [ { "data": { @@ -6201,96 +5844,179 @@ " \n", " \n", " \n", - " 202\n", - " Solomon Hill\n", - " Indiana Pacers\n", - " 44.0\n", - " SF\n", - " 25.0\n", - " 6-7\n", - " 225.0\n", - " Arizona\n", - " 1358880.0\n", - " \n", - " \n", - " 286\n", - " Tim Frazier\n", - " New Orleans Pelicans\n", - " 2.0\n", + " 0\n", + " Avery Bradley\n", + " Boston Celtics\n", + " 0.0\n", " PG\n", " 25.0\n", - " 6-1\n", - " 170.0\n", - " Penn State\n", - " 845059.0\n", - " \n", - " \n", - " 210\n", - " Joe Young\n", - " Indiana Pacers\n", - " 1.0\n", - " PG\n", - " 23.0\n", " 6-2\n", " 180.0\n", - " Oregon\n", - " 1007026.0\n", + " Texas\n", + " 7730337.0\n", " \n", " \n", - " 420\n", - " Nazr Mohammed\n", - " Oklahoma City Thunder\n", - " 13.0\n", - " C\n", - " 38.0\n", - " 6-10\n", - " 250.0\n", - " Kentucky\n", - " 222888.0\n", + " 1\n", + " Jae Crowder\n", + " Boston Celtics\n", + " 99.0\n", + " SF\n", + " 25.0\n", + " 6-6\n", + " 235.0\n", + " Marquette\n", + " 6796117.0\n", " \n", " \n", - " 258\n", - " Tony Allen\n", - " Memphis Grizzlies\n", - " 9.0\n", + " 2\n", + " John Holland\n", + " Boston Celtics\n", + " 30.0\n", " SG\n", - " 34.0\n", - " 6-4\n", - " 213.0\n", - " Oklahoma State\n", - " 5158539.0\n", + " 27.0\n", + " 6-5\n", + " 205.0\n", + " Boston University\n", + " NaN\n", + " \n", + " \n", + " 3\n", + " R.J. Hunter\n", + " Boston Celtics\n", + " 28.0\n", + " SG\n", + " 22.0\n", + " 6-5\n", + " 185.0\n", + " Georgia State\n", + " 1148640.0\n", + " \n", + " \n", + " 4\n", + " Jonas Jerebko\n", + " Boston Celtics\n", + " 8.0\n", + " PF\n", + " 29.0\n", + " 6-10\n", + " 231.0\n", + " NaN\n", + " 5000000.0\n", + " \n", + " \n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " ...\n", + " \n", + " \n", + " 453\n", + " Shelvin Mack\n", + " Utah Jazz\n", + " 8.0\n", + " PG\n", + " 26.0\n", + " 6-3\n", + " 203.0\n", + " Butler\n", + " 2433333.0\n", + " \n", + " \n", + " 454\n", + " Raul Neto\n", + " Utah Jazz\n", + " 25.0\n", + " PG\n", + " 24.0\n", + " 6-1\n", + " 179.0\n", + " NaN\n", + " 900000.0\n", + " \n", + " \n", + " 455\n", + " Tibor Pleiss\n", + " Utah Jazz\n", + " 21.0\n", + " C\n", + " 26.0\n", + " 7-3\n", + " 256.0\n", + " NaN\n", + " 2900000.0\n", + " \n", + " \n", + " 456\n", + " Jeff Withey\n", + " Utah Jazz\n", + " 24.0\n", + " C\n", + " 26.0\n", + " 7-0\n", + " 231.0\n", + " Kansas\n", + " 947276.0\n", + " \n", + " \n", + " 457\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", + " NaN\n", " \n", " \n", "\n", + "

458 rows × 9 columns

\n", "" ], "text/plain": [ - " Name Team Number Position Age Height \\\n", - "202 Solomon Hill Indiana Pacers 44.0 SF 25.0 6-7 \n", - "286 Tim Frazier New Orleans Pelicans 2.0 PG 25.0 6-1 \n", - "210 Joe Young Indiana Pacers 1.0 PG 23.0 6-2 \n", - "420 Nazr Mohammed Oklahoma City Thunder 13.0 C 38.0 6-10 \n", - "258 Tony Allen Memphis Grizzlies 9.0 SG 34.0 6-4 \n", + " Name Team Number Position Age Height Weight \\\n", + "0 Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 \n", + "1 Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 \n", + "2 John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 \n", + "3 R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 \n", + "4 Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0 \n", + ".. ... ... ... ... ... ... ... \n", + "453 Shelvin Mack Utah Jazz 8.0 PG 26.0 6-3 203.0 \n", + "454 Raul Neto Utah Jazz 25.0 PG 24.0 6-1 179.0 \n", + "455 Tibor Pleiss Utah Jazz 21.0 C 26.0 7-3 256.0 \n", + "456 Jeff Withey Utah Jazz 24.0 C 26.0 7-0 231.0 \n", + "457 NaN NaN NaN NaN NaN NaN NaN \n", "\n", - " Weight College Salary \n", - "202 225.0 Arizona 1358880.0 \n", - "286 170.0 Penn State 845059.0 \n", - "210 180.0 Oregon 1007026.0 \n", - "420 250.0 Kentucky 222888.0 \n", - "258 213.0 Oklahoma State 5158539.0 " + " College Salary \n", + "0 Texas 7730337.0 \n", + "1 Marquette 6796117.0 \n", + "2 Boston University NaN \n", + "3 Georgia State 1148640.0 \n", + "4 NaN 5000000.0 \n", + ".. ... ... \n", + "453 Butler 2433333.0 \n", + "454 NaN 900000.0 \n", + "455 NaN 2900000.0 \n", + "456 Kansas 947276.0 \n", + "457 NaN NaN \n", + "\n", + "[458 rows x 9 columns]" ] }, - "execution_count": 66, + "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import pandas as pd\n", - "\n", - "df = pd.read_csv('./nba.csv')\n", - "\n", - "df.sample(5)" + "df" ] }, { @@ -6306,7 +6032,124 @@ }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 119, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TeamSalary
0Boston Celtics7730337.0
1Boston Celtics6796117.0
2Boston CelticsNaN
3Boston Celtics1148640.0
4Boston Celtics5000000.0
.........
453Utah Jazz2433333.0
454Utah Jazz900000.0
455Utah Jazz2900000.0
456Utah Jazz947276.0
457NaNNaN
\n", + "

458 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " Team Salary\n", + "0 Boston Celtics 7730337.0\n", + "1 Boston Celtics 6796117.0\n", + "2 Boston Celtics NaN\n", + "3 Boston Celtics 1148640.0\n", + "4 Boston Celtics 5000000.0\n", + ".. ... ...\n", + "453 Utah Jazz 2433333.0\n", + "454 Utah Jazz 900000.0\n", + "455 Utah Jazz 2900000.0\n", + "456 Utah Jazz 947276.0\n", + "457 NaN NaN\n", + "\n", + "[458 rows x 2 columns]" + ] + }, + "execution_count": 119, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['Team', 'Salary']]" + ] + }, + { + "cell_type": "code", + "execution_count": 120, "metadata": { "slideshow": { "slide_type": "slide" @@ -6344,45 +6187,45 @@ " \n", " \n", " Atlanta Hawks\n", - " 4.860197e+06\n", + " 2854940.0\n", " \n", " \n", " Boston Celtics\n", - " 4.181505e+06\n", + " 3021242.5\n", " \n", " \n", " Brooklyn Nets\n", - " 3.501898e+06\n", + " 1335480.0\n", " \n", " \n", " Charlotte Hornets\n", - " 5.222728e+06\n", + " 4204200.0\n", " \n", " \n", " Chicago Bulls\n", - " 5.785559e+06\n", + " 2380440.0\n", " \n", " \n", "\n", "" ], "text/plain": [ - " Salary\n", - "Team \n", - "Atlanta Hawks 4.860197e+06\n", - "Boston Celtics 4.181505e+06\n", - "Brooklyn Nets 3.501898e+06\n", - "Charlotte Hornets 5.222728e+06\n", - "Chicago Bulls 5.785559e+06" + " Salary\n", + "Team \n", + "Atlanta Hawks 2854940.0\n", + "Boston Celtics 3021242.5\n", + "Brooklyn Nets 1335480.0\n", + "Charlotte Hornets 4204200.0\n", + "Chicago Bulls 2380440.0" ] }, - "execution_count": 67, + "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "df[['Team', 'Salary']].groupby('Team').mean().head()" + "df[['Team', 'Salary']].groupby('Team').median().h" ] }, { @@ -6394,36 +6237,13 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "Team Position\n", - "Atlanta Hawks C 7.585417e+06\n", - " PF 5.988067e+06\n", - " PG 4.881700e+06\n", - " SF 3.000000e+06\n", - " SG 2.607758e+06\n", - " ... \n", - "Washington Wizards C 8.163476e+06\n", - " PF 5.650000e+06\n", - " PG 9.011208e+06\n", - " SF 2.789700e+06\n", - " SG 2.839248e+06\n", - "Name: Salary, Length: 149, dtype: float64" - ] - }, - "execution_count": 68, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.groupby(['Team', 'Position'])['Salary'].mean()" ] @@ -6452,185 +6272,26 @@ }, { "cell_type": "code", - "execution_count": 69, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Salary
meanstdcount
Position
C5.967052e+065.787989e+0678
PF4.562483e+064.800054e+0697
PG5.077829e+065.051809e+0688
SF4.857393e+066.011889e+0684
SG4.009861e+064.491609e+0699
\n", - "
" - ], - "text/plain": [ - " Salary \n", - " mean std count\n", - "Position \n", - "C 5.967052e+06 5.787989e+06 78\n", - "PF 4.562483e+06 4.800054e+06 97\n", - "PG 5.077829e+06 5.051809e+06 88\n", - "SF 4.857393e+06 6.011889e+06 84\n", - "SG 4.009861e+06 4.491609e+06 99" - ] - }, - "execution_count": 69, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df[['Position', 'Salary']].groupby('Position').agg(['mean', 'std', 'count'])" ] }, { "cell_type": "code", - "execution_count": 70, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Salary
Position
C22275967.0
PF22081286.0
PG21412973.0
SF24969112.0
SG19944278.0
\n", - "
" - ], - "text/plain": [ - " Salary\n", - "Position \n", - "C 22275967.0\n", - "PF 22081286.0\n", - "PG 21412973.0\n", - "SF 24969112.0\n", - "SG 19944278.0" - ] - }, - "execution_count": 70, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "def group_range(x):\n", " return x.max() - x.min()\n", @@ -6640,35 +6301,13 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Liczba grup: 5\n", - "dict_keys(['C', 'PF', 'PG', 'SF', 'SG'])\n", - " Name Team Number Position Age Height Weight \\\n", - "7 Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 \n", - "10 Jared Sullinger Boston Celtics 7.0 C 24.0 6-9 260.0 \n", - "14 Tyler Zeller Boston Celtics 44.0 C 26.0 7-0 253.0 \n", - "23 Brook Lopez Brooklyn Nets 11.0 C 28.0 7-0 275.0 \n", - "27 Henry Sims Brooklyn Nets 14.0 C 26.0 6-10 248.0 \n", - "\n", - " College Salary \n", - "7 Gonzaga 2165160.0 \n", - "10 Ohio State 2569260.0 \n", - "14 North Carolina 2616975.0 \n", - "23 Stanford 19689000.0 \n", - "27 Georgetown 947276.0 \n" - ] - } - ], + "outputs": [], "source": [ "gb = df.groupby(['Position'])\n", "\n", @@ -6680,31 +6319,9 @@ }, { "cell_type": "code", - "execution_count": 72, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0 15.36\n", - "1 15.36\n", - "2 15.36\n", - "3 15.36\n", - "4 15.36\n", - " ... \n", - "453 15.36\n", - "454 15.36\n", - "455 17.92\n", - "456 17.92\n", - "457 \n", - "Name: Height, Length: 458, dtype: Float64" - ] - }, - "execution_count": 72, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "\n", "df.Height.str.split('-').str[0].astype('Int64') * 2.56" @@ -6729,616 +6346,13 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
pairsystemidis_constrainedmetricscore
1214ha-enNiuTrans382Truebleu-all16.512243
1215ha-enNiuTrans382Truechrf-all44.724766
1216ha-enNiuTrans382Truebleu-A16.512243
1217ha-enNiuTrans382Truechrf-A44.724766
1218ha-enFacebook-AI181Falsebleu-all20.982704
1219ha-enFacebook-AI181Falsechrf-all48.653770
1220ha-enFacebook-AI181Falsebleu-A20.982704
1221ha-enFacebook-AI181Falsechrf-A48.653770
1222ha-enTRANSSION336Falsebleu-all18.834851
1223ha-enTRANSSION336Falsechrf-all47.238279
1224ha-enTRANSSION336Falsebleu-A18.834851
1225ha-enTRANSSION336Falsechrf-A47.238279
1226ha-enAMU628Truebleu-all14.132845
1227ha-enAMU628Truechrf-all41.256570
1228ha-enAMU628Truebleu-A14.132845
1229ha-enAMU628Truechrf-A41.256570
1230ha-enP3AI715Truebleu-all17.793617
1231ha-enP3AI715Truechrf-all46.307402
1232ha-enP3AI715Truebleu-A17.793617
1233ha-enP3AI715Truechrf-A46.307402
1234ha-enOnline-B1356Falsebleu-all18.655658
1235ha-enOnline-B1356Falsechrf-all46.658216
1236ha-enOnline-B1356Falsebleu-A18.655658
1237ha-enOnline-B1356Falsechrf-A46.658216
1238ha-enTWB1335Falsebleu-all12.326443
1239ha-enTWB1335Falsechrf-all40.282629
1240ha-enTWB1335Falsebleu-A12.326443
1241ha-enTWB1335Falsechrf-A40.282629
1242ha-enZMT553Falsebleu-all18.837023
1243ha-enZMT553Falsechrf-all47.231474
1244ha-enZMT553Falsebleu-A18.837023
1245ha-enZMT553Falsechrf-A47.231474
1246ha-enManifold437Truebleu-all16.943915
1247ha-enManifold437Truechrf-all45.638356
1248ha-enManifold437Truebleu-A16.943915
1249ha-enManifold437Truechrf-A45.638356
1250ha-enOnline-Y1374Falsebleu-all13.898531
1251ha-enOnline-Y1374Falsechrf-all44.842874
1252ha-enOnline-Y1374Falsebleu-A13.898531
1253ha-enOnline-Y1374Falsechrf-A44.842874
1254ha-enHuaweiTSC758Truebleu-all17.492440
1255ha-enHuaweiTSC758Truechrf-all46.795737
1256ha-enHuaweiTSC758Truebleu-A17.492440
1257ha-enHuaweiTSC758Truechrf-A46.795737
1258ha-enMS-EgDC896Truebleu-all17.133350
1259ha-enMS-EgDC896Truechrf-all45.266274
1260ha-enMS-EgDC896Truebleu-A17.133350
1261ha-enMS-EgDC896Truechrf-A45.266274
1262ha-enGTCOM1298Falsebleu-all17.794272
1263ha-enGTCOM1298Falsechrf-all46.714831
1264ha-enGTCOM1298Falsebleu-A17.794272
1265ha-enGTCOM1298Falsechrf-A46.714831
1266ha-enUEdin1149Truebleu-all14.887836
1267ha-enUEdin1149Truechrf-all42.247415
1268ha-enUEdin1149Truebleu-A14.887836
1269ha-enUEdin1149Truechrf-A42.247415
\n", - "
" - ], - "text/plain": [ - " pair system id is_constrained metric score\n", - "1214 ha-en NiuTrans 382 True bleu-all 16.512243\n", - "1215 ha-en NiuTrans 382 True chrf-all 44.724766\n", - "1216 ha-en NiuTrans 382 True bleu-A 16.512243\n", - "1217 ha-en NiuTrans 382 True chrf-A 44.724766\n", - "1218 ha-en Facebook-AI 181 False bleu-all 20.982704\n", - "1219 ha-en Facebook-AI 181 False chrf-all 48.653770\n", - "1220 ha-en Facebook-AI 181 False bleu-A 20.982704\n", - "1221 ha-en Facebook-AI 181 False chrf-A 48.653770\n", - "1222 ha-en TRANSSION 336 False bleu-all 18.834851\n", - "1223 ha-en TRANSSION 336 False chrf-all 47.238279\n", - "1224 ha-en TRANSSION 336 False bleu-A 18.834851\n", - "1225 ha-en TRANSSION 336 False chrf-A 47.238279\n", - "1226 ha-en AMU 628 True bleu-all 14.132845\n", - "1227 ha-en AMU 628 True chrf-all 41.256570\n", - "1228 ha-en AMU 628 True bleu-A 14.132845\n", - "1229 ha-en AMU 628 True chrf-A 41.256570\n", - "1230 ha-en P3AI 715 True bleu-all 17.793617\n", - "1231 ha-en P3AI 715 True chrf-all 46.307402\n", - "1232 ha-en P3AI 715 True bleu-A 17.793617\n", - "1233 ha-en P3AI 715 True chrf-A 46.307402\n", - "1234 ha-en Online-B 1356 False bleu-all 18.655658\n", - "1235 ha-en Online-B 1356 False chrf-all 46.658216\n", - "1236 ha-en Online-B 1356 False bleu-A 18.655658\n", - "1237 ha-en Online-B 1356 False chrf-A 46.658216\n", - "1238 ha-en TWB 1335 False bleu-all 12.326443\n", - "1239 ha-en TWB 1335 False chrf-all 40.282629\n", - "1240 ha-en TWB 1335 False bleu-A 12.326443\n", - "1241 ha-en TWB 1335 False chrf-A 40.282629\n", - "1242 ha-en ZMT 553 False bleu-all 18.837023\n", - "1243 ha-en ZMT 553 False chrf-all 47.231474\n", - "1244 ha-en ZMT 553 False bleu-A 18.837023\n", - "1245 ha-en ZMT 553 False chrf-A 47.231474\n", - "1246 ha-en Manifold 437 True bleu-all 16.943915\n", - "1247 ha-en Manifold 437 True chrf-all 45.638356\n", - "1248 ha-en Manifold 437 True bleu-A 16.943915\n", - "1249 ha-en Manifold 437 True chrf-A 45.638356\n", - "1250 ha-en Online-Y 1374 False bleu-all 13.898531\n", - "1251 ha-en Online-Y 1374 False chrf-all 44.842874\n", - "1252 ha-en Online-Y 1374 False bleu-A 13.898531\n", - "1253 ha-en Online-Y 1374 False chrf-A 44.842874\n", - "1254 ha-en HuaweiTSC 758 True bleu-all 17.492440\n", - "1255 ha-en HuaweiTSC 758 True chrf-all 46.795737\n", - "1256 ha-en HuaweiTSC 758 True bleu-A 17.492440\n", - "1257 ha-en HuaweiTSC 758 True chrf-A 46.795737\n", - "1258 ha-en MS-EgDC 896 True bleu-all 17.133350\n", - "1259 ha-en MS-EgDC 896 True chrf-all 45.266274\n", - "1260 ha-en MS-EgDC 896 True bleu-A 17.133350\n", - "1261 ha-en MS-EgDC 896 True chrf-A 45.266274\n", - "1262 ha-en GTCOM 1298 False bleu-all 17.794272\n", - "1263 ha-en GTCOM 1298 False chrf-all 46.714831\n", - "1264 ha-en GTCOM 1298 False bleu-A 17.794272\n", - "1265 ha-en GTCOM 1298 False chrf-A 46.714831\n", - "1266 ha-en UEdin 1149 True bleu-all 14.887836\n", - "1267 ha-en UEdin 1149 True chrf-all 42.247415\n", - "1268 ha-en UEdin 1149 True bleu-A 14.887836\n", - "1269 ha-en UEdin 1149 True chrf-A 42.247415" - ] - }, - "execution_count": 73, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = pd.read_csv('https://raw.githubusercontent.com/wmt-conference/wmt21-news-systems/main/scores/automatic-scores.tsv', sep='\\t')\n", "df = df[df.pair == 'ha-en']\n", @@ -7347,174 +6361,13 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
metricbleu-Ableu-allchrf-Achrf-all
system
AMU14.13284514.13284541.25657041.256570
Facebook-AI20.98270420.98270448.65377048.653770
GTCOM17.79427217.79427246.71483146.714831
HuaweiTSC17.49244017.49244046.79573746.795737
MS-EgDC17.13335017.13335045.26627445.266274
Manifold16.94391516.94391545.63835645.638356
NiuTrans16.51224316.51224344.72476644.724766
Online-B18.65565818.65565846.65821646.658216
Online-Y13.89853113.89853144.84287444.842874
P3AI17.79361717.79361746.30740246.307402
TRANSSION18.83485118.83485147.23827947.238279
TWB12.32644312.32644340.28262940.282629
UEdin14.88783614.88783642.24741542.247415
ZMT18.83702318.83702347.23147447.231474
\n", - "
" - ], - "text/plain": [ - "metric bleu-A bleu-all chrf-A chrf-all\n", - "system \n", - "AMU 14.132845 14.132845 41.256570 41.256570\n", - "Facebook-AI 20.982704 20.982704 48.653770 48.653770\n", - "GTCOM 17.794272 17.794272 46.714831 46.714831\n", - "HuaweiTSC 17.492440 17.492440 46.795737 46.795737\n", - "MS-EgDC 17.133350 17.133350 45.266274 45.266274\n", - "Manifold 16.943915 16.943915 45.638356 45.638356\n", - "NiuTrans 16.512243 16.512243 44.724766 44.724766\n", - "Online-B 18.655658 18.655658 46.658216 46.658216\n", - "Online-Y 13.898531 13.898531 44.842874 44.842874\n", - "P3AI 17.793617 17.793617 46.307402 46.307402\n", - "TRANSSION 18.834851 18.834851 47.238279 47.238279\n", - "TWB 12.326443 12.326443 40.282629 40.282629\n", - "UEdin 14.887836 14.887836 42.247415 42.247415\n", - "ZMT 18.837023 18.837023 47.231474 47.231474" - ] - }, - "execution_count": 74, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.pivot(index='system', columns='metric', values='score')" ] @@ -7559,167 +6412,13 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund\\t Mr. Owen Harrismale22.010A/5 211717.2500NaNS
211Cumings\\t Mrs. John Bradley (Florence Briggs T...female38.010PC 1759971.2833C85C
313Heikkinen\\t Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
411Futrelle\\t Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
503Allen\\t Mr. William Henrymale35.0003734508.0500NaNS
\n", - "
" - ], - "text/plain": [ - " Survived Pclass \\\n", - "PassengerId \n", - "1 0 3 \n", - "2 1 1 \n", - "3 1 3 \n", - "4 1 1 \n", - "5 0 3 \n", - "\n", - " Name Sex Age \\\n", - "PassengerId \n", - "1 Braund\\t Mr. Owen Harris male 22.0 \n", - "2 Cumings\\t Mrs. John Bradley (Florence Briggs T... female 38.0 \n", - "3 Heikkinen\\t Miss. Laina female 26.0 \n", - "4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel) female 35.0 \n", - "5 Allen\\t Mr. William Henry male 35.0 \n", - "\n", - " SibSp Parch Ticket Fare Cabin Embarked \n", - "PassengerId \n", - "1 1 0 A/5 21171 7.2500 NaN S \n", - "2 1 0 PC 17599 71.2833 C85 C \n", - "3 0 0 STON/O2. 3101282 7.9250 NaN S \n", - "4 1 0 113803 53.1000 C123 S \n", - "5 0 0 373450 8.0500 NaN S " - ] - }, - "execution_count": 75, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df = pd.read_csv('./titanic_train.tsv', sep='\\t', index_col='PassengerId')\n", "\n", @@ -7728,79 +6427,26 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "PassengerId\n", - "1 BRAUND\\t MR. OWEN HARRIS\n", - "2 CUMINGS\\t MRS. JOHN BRADLEY (FLORENCE BRIGGS T...\n", - "3 HEIKKINEN\\t MISS. LAINA\n", - "4 FUTRELLE\\t MRS. JACQUES HEATH (LILY MAY PEEL)\n", - "5 ALLEN\\t MR. WILLIAM HENRY\n", - " ... \n", - "887 MONTVILA\\t REV. JUOZAS\n", - "888 GRAHAM\\t MISS. MARGARET EDITH\n", - "889 JOHNSTON\\t MISS. CATHERINE HELEN \"CARRIE\"\n", - "890 BEHR\\t MR. KARL HOWELL\n", - "891 DOOLEY\\t MR. PATRICK\n", - "Name: Name, Length: 891, dtype: object" - ] - }, - "execution_count": 76, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.Name.str.upper()" ] }, { "cell_type": "code", - "execution_count": 77, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "PassengerId\n", - "1 Braund\\t Mr. Owen Harris\n", - "2 Cumings\\t Mrs. John Bradley (Florence Briggs T...\n", - "3 Heikkinen\\t Miss. Laina\n", - "4 Futrelle\\t Mrs. Jacques Heath (Lily May Peel)\n", - "5 Allen\\t Mr. William Henry\n", - "Name: Name, dtype: object\n" - ] - }, - { - "data": { - "text/plain": [ - "PassengerId\n", - "1 False\n", - "2 True\n", - "3 True\n", - "4 True\n", - "5 False\n", - "Name: Name, dtype: bool" - ] - }, - "execution_count": 77, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "print(df.Name.head())\n", "df.Name.str.contains('Miss|Mrs').head()" @@ -7808,235 +6454,52 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
01
PassengerId
1BraundMr. Owen Harris
2CumingsMrs. John Bradley (Florence Briggs Thayer)
3HeikkinenMiss. Laina
4FutrelleMrs. Jacques Heath (Lily May Peel)
5AllenMr. William Henry
.........
887MontvilaRev. Juozas
888GrahamMiss. Margaret Edith
889JohnstonMiss. Catherine Helen \"Carrie\"
890BehrMr. Karl Howell
891DooleyMr. Patrick
\n", - "

891 rows × 2 columns

\n", - "
" - ], - "text/plain": [ - " 0 1\n", - "PassengerId \n", - "1 Braund Mr. Owen Harris\n", - "2 Cumings Mrs. John Bradley (Florence Briggs Thayer)\n", - "3 Heikkinen Miss. Laina\n", - "4 Futrelle Mrs. Jacques Heath (Lily May Peel)\n", - "5 Allen Mr. William Henry\n", - "... ... ...\n", - "887 Montvila Rev. Juozas\n", - "888 Graham Miss. Margaret Edith\n", - "889 Johnston Miss. Catherine Helen \"Carrie\"\n", - "890 Behr Mr. Karl Howell\n", - "891 Dooley Mr. Patrick\n", - "\n", - "[891 rows x 2 columns]" - ] - }, - "execution_count": 78, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.Name.str.split('\\t', expand=True)" ] }, { "cell_type": "code", - "execution_count": 79, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "PassengerId\n", - "1 [Braund, Mr. Owen Harris]\n", - "2 [Cumings, Mrs. John Bradley (Florence Briggs ...\n", - "3 [Heikkinen, Miss. Laina]\n", - "4 [Futrelle, Mrs. Jacques Heath (Lily May Peel)]\n", - "5 [Allen, Mr. William Henry]\n", - " ... \n", - "887 [Montvila, Rev. Juozas]\n", - "888 [Graham, Miss. Margaret Edith]\n", - "889 [Johnston, Miss. Catherine Helen \"Carrie\"]\n", - "890 [Behr, Mr. Karl Howell]\n", - "891 [Dooley, Mr. Patrick]\n", - "Name: Name, Length: 891, dtype: object" - ] - }, - "execution_count": 79, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.Name.str.split('\\t')" ] }, { "cell_type": "code", - "execution_count": 80, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "PassengerId\n", - "1 Mr. Owen Harris\n", - "2 Mrs. John Bradley (Florence Briggs Thayer)\n", - "3 Miss. Laina\n", - "4 Mrs. Jacques Heath (Lily May Peel)\n", - "5 Mr. William Henry\n", - " ... \n", - "887 Rev. Juozas\n", - "888 Miss. Margaret Edith\n", - "889 Miss. Catherine Helen \"Carrie\"\n", - "890 Mr. Karl Howell\n", - "891 Mr. Patrick\n", - "Name: Name, Length: 891, dtype: object" - ] - }, - "execution_count": 80, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.Name.str.split('\\t').str[1]" ] }, { "cell_type": "code", - "execution_count": 81, + "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, - "outputs": [ - { - "data": { - "text/plain": [ - "PassengerId\n", - "1 Mr.\n", - "2 Mrs.\n", - "3 Miss.\n", - "4 Mrs.\n", - "5 Mr.\n", - " ... \n", - "887 Rev.\n", - "888 Miss.\n", - "889 Miss.\n", - "890 Mr.\n", - "891 Mr.\n", - "Name: Name, Length: 891, dtype: object" - ] - }, - "execution_count": 81, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "df.Name.str.split('\\t').str[1].str.strip().str.split(' ').str[0]" ] diff --git a/zajecia2/tmp.xlsx b/zajecia2/tmp.xlsx index 8376404..59151ee 100644 Binary files a/zajecia2/tmp.xlsx and b/zajecia2/tmp.xlsx differ diff --git a/zajecia3/1.ipynb b/zajecia3/1.ipynb new file mode 100644 index 0000000..f2d77db --- /dev/null +++ b/zajecia3/1.ipynb @@ -0,0 +1,2280 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "56b06287-d1ba-409a-a207-2125edc31719", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "id": "fa9d3b45-d381-4cfa-91b3-701a7c2578f7", + "metadata": {}, + "source": [ + "# NumPy" + ] + }, + { + "cell_type": "markdown", + "id": "7d008b7b-b668-42bd-993e-64cc2de4ae17", + "metadata": {}, + "source": [ + "## wymiary" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "cb38fb13-3671-4a7b-89e7-247d8db91a13", + "metadata": {}, + "outputs": [], + "source": [ + "arr = np.array([1, 2, 3, 4, 5, 6, 7])" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "445ebe15-088f-4811-96c6-cc0043fb5ab4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1 2 3 4 5 6 7]\n" + ] + } + ], + "source": [ + "print(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "9428e371-82e5-4f94-b6f6-e54808a7b72c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "bea5efad-20cc-4284-8575-bd32becaddd7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "numpy.ndarray" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "cb12cac9-cf3b-4924-90e3-a82f5ca7274c", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "()" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.array(123).shape" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "7e589615-ab19-4786-b2bc-49f46c706adc", + "metadata": {}, + "outputs": [], + "source": [ + "arr = np.array([[1, 2, 3], [4, 5, 6]])" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "a0925cb2-1586-48a7-9256-2d07228547e1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 3)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "10b857a9-0bf0-4ee4-9f86-dabef5452d59", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[1 2 3]\n", + " [4 5 6]]\n" + ] + } + ], + "source": [ + "print(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "0029c4f5-135b-4a37-bf95-7f8e9afed454", + "metadata": {}, + "outputs": [], + "source": [ + "arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "6fe7be82-a5d9-493f-8410-c1201d49606d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 2, 3)" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "61686743-b034-4fb7-a891-e4aed1d7bd66", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[[1 2 3]\n", + " [4 5 6]]\n", + "\n", + " [[1 2 3]\n", + " [4 5 6]]]\n" + ] + } + ], + "source": [ + "print(arr)" + ] + }, + { + "cell_type": "markdown", + "id": "ae9cd324-0297-42bc-8010-4ea0fa7074cc", + "metadata": {}, + "source": [ + "### zadanie 1\n", + "\n", + "1. Utwórz jednowymiarową tablicę zawierającą liczby od 10 do 20 włącznie. Wyświetl:\n", + " - Tablicę,\n", + " - Jej kształt (`shape`),\n", + " - Jej typ (`type`).\n", + "\n", + "2. Utwórz macierz 3x2 o elementach (10,20,30,40,50,60) i wyświetl:\n", + " - Tablicę,\n", + " - Jej kształt (`shape`),\n", + " - Jej typ (`type`).\n" + ] + }, + { + "cell_type": "markdown", + "id": "d38ccaec-ac11-4bef-8d48-7087931f4157", + "metadata": {}, + "source": [ + "## dostęp do elementów" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "a1cf151f-9128-414f-b065-2060b8a8a411", + "metadata": {}, + "outputs": [], + "source": [ + "arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "89975e71-f996-48af-a50d-3dd5f86c6775", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[ 1 2 3 4 5]\n", + " [ 6 7 8 9 10]]\n" + ] + } + ], + "source": [ + "print(arr)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "3f3b6ab3-7c1f-491c-9b9d-648073e65378", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[1,2]" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "66527b7d-b6a6-435a-b330-2890792ca56e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "9" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[1,-2]" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "2bb01400-e860-4c75-9fd1-ce44268463b6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([8, 9])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[1,2:4]" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "ed9309c6-9666-41fe-9310-0597408740f7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 8, 9, 10])" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[1,2:]" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "8b1a9f33-5e91-4ffc-ac7a-d79286dcf1da", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 6, 7, 8, 9, 10])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[1,:]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "d9ec87f1-7c83-44f4-a71b-c9f291a51644", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4, 5])" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[0,:]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "825d6cf5-4302-4b94-b00a-dd8b5c61f058", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([3, 4])" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr[0,2:4]" + ] + }, + { + "cell_type": "markdown", + "id": "0cb14fd3-88aa-43a8-a11d-a71fd9f8aeb6", + "metadata": {}, + "source": [ + "### zadanie 2\n", + "1. Utwórz dwuwymiarową tablicę NumPy o wymiarach (3,3) zawierającą liczby od 1 do 9.\n", + "2. Wykonaj następujące operacje na tablicy:\n", + " - Wyświetl element znajdujący się w drugim wierszu\n", + " - Wyświetl wszystkie elementy znajdujące się w drugim wierszu\n", + " - Wyświetl wszystkie elementy znajdujące się w drugiej kolumnie\n", + " - Wyświetl macierz, ale bez pierwszego wiersza i bez pierwszej kolumny\n" + ] + }, + { + "cell_type": "markdown", + "id": "4ad7bed0-d812-4683-a03b-033db0911197", + "metadata": {}, + "source": [ + "## Typy danych" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "c529565a-fc38-47ae-8649-352cd0525178", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "int64\n" + ] + } + ], + "source": [ + "arr = np.array([1, 2, 3, 4])\n", + "\n", + "print(arr.dtype)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "64996026-ef86-4a43-bdb0-1dcb64e3228b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 1\u001b[0m arr[\u001b[38;5;241m3\u001b[39m]\n", + "\u001b[0;31mIndexError\u001b[0m: index 3 is out of bounds for axis 0 with size 1" + ] + } + ], + "source": [ + "arr[3]" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "2563e811-a4e4-4f6d-bef7-379ff24b1624", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4])" + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr.squeeze()" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "9ca46070-ed05-4f09-8a51-bdd2ed121af9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr.squeeze()[3]" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "id": "ff357a34-afb3-4cb1-b898-33708b1ad9e0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 2, 3, 4]])" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "63318611-23d1-413a-bda7-d09cdf240ae2", + "metadata": {}, + "outputs": [], + "source": [ + "arr = np.array([1, 2, 3, 4])" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "id": "245ae6b2-ee99-4537-bea6-3fbca70636ca", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([1, 2, 3, 4])" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "arr" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "6a2c2e36-2b01-473e-b912-c1d042eaf4fe", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1, 2, 3, 4]])" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.expand_dims(arr, axis=0)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "dab2689b-fefb-4495-a958-61472d562ff1", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1],\n", + " [2],\n", + " [3],\n", + " [4]])" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.expand_dims(arr, axis=1)" + ] + }, + { + "cell_type": "markdown", + "id": "ed9bae4d-900f-4f09-b2eb-09155bc3ef9e", + "metadata": {}, + "source": [ + "### Zadanie 4" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "0c15a6c3-55a0-482f-b102-a2c3ab20e8bd", + "metadata": {}, + "outputs": [ + { + "ename": "SyntaxError", + "evalue": "invalid syntax (1725909133.py, line 4)", + "output_type": "error", + "traceback": [ + "\u001b[0;36m Cell \u001b[0;32mIn[63], line 4\u001b[0;36m\u001b[0m\n\u001b[0;31m 1. Utwórz tablicę NumPy zawierającą liczby [1, 2, 3, 4, 5]. Przypisz ją do zmiennej `x` i zmień pierwszy element tablicy na 50. Wyświetl:\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" + ] + } + ], + "source": [ + "### Zadanie 4\n", + "\n", + "\n", + "1. Utwórz tablicę NumPy zawierającą liczby [1, 2, 3, 4, 5]. Przypisz ją do zmiennej `x` i zmień pierwszy element tablicy na 50. Wyświetl:\n", + " - Tablicę `arr`,\n", + " - Tablicę `x`.\n", + "\n", + "2. Zrób kopię tablicy `arr` za pomocą metody `.copy()` i zmień pierwszy element tablicy `arr` na 50. Wyświetl:\n", + " - Tablicę `arr`,\n", + " - Tablicę `x` (po kopii).\n", + "\n", + "3. Zrób kopię tablicy `arr` przy użyciu modułu `copy.deepcopy` i zmień pierwszy element tablicy `arr` na 50. Wyświetl:\n", + " - Tablicę `arr`,\n", + " - Tablicę `x` (po głębokiej kopii).\n", + "\n", + "4. Utwórz tablicę NumPy o wymiarach (1,4) zawierającą elementy \\( [1, 2, 3, 4, 5] \\) i wykonaj następujące operacje:\n", + " - Wyświetl tablicę.\n", + " - Zmień tablicę, aby była jednowymiarowa za pomocą metody `.squeeze()`\n", + "\n", + "\n", + "# Punkt 1 - Przypisanie do zmiennej\n", + "arr = np.array([1, 2, 3, 4, 5])\n", + "x = arr\n", + "arr[0] = 42\n", + "print(arr) # Tablica arr po zmianie\n", + "print(x) # Tablica x po zmianie\n", + "\n", + "# Punkt 2 - Kopia tablicy\n", + "arr = np.array([1, 2, 3, 4, 5])\n", + "x = arr.copy()\n", + "arr[0] = 42\n", + "print(arr) # Tablica arr po zmianie\n", + "print(x) # Tablica x po kopii\n", + "\n", + "# Punkt 3 - Głęboka kopia\n", + "arr = np.array([1, 2, 3, 4, 5])\n", + "x = copy.deepcopy(arr)\n", + "arr[0] = 42\n", + "print(arr) # Tablica arr po zmianie\n", + "\n", + "print(x) # Tablica x po głębokiej kopii\n" + ] + }, + { + "cell_type": "markdown", + "id": "4562728a-8fea-4643-ac20-4880ee264e5c", + "metadata": {}, + "source": [ + "### Reshape jeszcze raze" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "id": "6b10a6a1-2f81-41a4-8466-222990ac904c", + "metadata": { + "jupyter": { + "source_hidden": true + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 1 2 3 4 5 6 7 8 9 10 11 12]\n", + "[[ 1 2 3]\n", + " [ 4 5 6]\n", + " [ 7 8 9]\n", + " [10 11 12]]\n" + ] + } + ], + "source": [ + "arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])\n", + "print(arr)\n", + "newarr = arr.reshape(4, 3)\n", + "print(newarr)" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "8c982360-8838-4036-ac05-92069f789a30", + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "cannot reshape array of size 12 into shape (4,2)", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[65], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m newarr \u001b[38;5;241m=\u001b[39m arr\u001b[38;5;241m.\u001b[39mreshape(\u001b[38;5;241m4\u001b[39m, \u001b[38;5;241m2\u001b[39m)\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28mprint\u001b[39m(newarr)\n", + "\u001b[0;31mValueError\u001b[0m: cannot reshape array of size 12 into shape (4,2)" + ] + } + ], + "source": [ + "newarr = arr.reshape(4, 2)\n", + "print(newarr)" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "id": "5ff60614-82d0-4421-9ebf-7b0598cda361", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[ 1 2 3]\n", + " [ 4 5 6]\n", + " [ 7 8 9]\n", + " [10 11 12]]\n" + ] + } + ], + "source": [ + "newarr = arr.reshape(4, -1)\n", + "print(newarr)" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "e951ca90-8d1e-4ca8-8196-f00db99cf866", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[ 1 2]\n", + " [ 3 4]\n", + " [ 5 6]\n", + " [ 7 8]\n", + " [ 9 10]\n", + " [11 12]]\n" + ] + } + ], + "source": [ + "newarr = arr.reshape(-1, 2)\n", + "print(newarr)" + ] + }, + { + "cell_type": "markdown", + "id": "d9ed66d4-9440-4076-8ee0-9855238762bf", + "metadata": {}, + "source": [ + "### Zadanie 5\n", + "\n", + "1. Utwórz tablicę NumPy zawierającą liczby [1, 2, 3, ..., 10] o wymiarach (2,5). Wyświetl:\n", + " - Tablicę (2,5)\n", + "\n", + "2. Zmień kształt tablicy na (5,2) za pomocą metody `reshape`. Wyświetl ją.\n", + "\n", + "3. Zmień kształt tablicy na (10,1) za pomocą metody `reshape`. Wyświetl ją\n", + " \n", + "4. Użyj wartości `-1` w jednym z wymiarów w metodzie `reshape`, aby automatycznie dostosować pozostałe wymiary na (5,-1)\n", + "\n", + "5. Utwórz tablicę o wymiarach (5,1,2,1) na podstawie powyższego przykładu" + ] + }, + { + "cell_type": "markdown", + "id": "7927210e-feb0-4e61-87a4-7bb9b8b76b17", + "metadata": {}, + "source": [ + "## Obliczenia wektorowe i macierzowe" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "5b8fd161-d367-47b7-a558-2ac275a7a308", + "metadata": {}, + "outputs": [], + "source": [ + "x = np.array([1,2,4])\n", + "y = np.array([100,101,102])" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "ef246bbf-66dc-42cd-b851-c56d477a4879", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([101, 103, 106])" + ] + }, + "execution_count": 69, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x+y" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "3085a99f-9461-4f7d-a141-d051fd670866", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([101, 102, 104])" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x+100" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "id": "25d19f04-240b-4054-a8d0-e2cbf3056db1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([100, 200, 400])" + ] + }, + "execution_count": 71, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x*100" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "6e71e5a0-427a-4d3d-93cf-f6e45cd69226", + "metadata": {}, + "outputs": [], + "source": [ + "x = np.array([[1,2,4], [10,11,12]])" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "67e64578-3574-44c4-8f9d-d116412bc83e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 4],\n", + " [10, 11, 12]])" + ] + }, + "execution_count": 73, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "markdown", + "id": "cdf9ffb5-6ce3-400a-bf8f-0f364f17fefc", + "metadata": {}, + "source": [ + "#### element-wise multiplication" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "d182d4a9-d654-4f07-8577-b367a9892860", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 100, 200, 400],\n", + " [1000, 1100, 1200]])" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x*100" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "db4ec9ff-7ca9-4d63-90cd-a513b3bb59ea", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 4],\n", + " [10, 11, 12]])" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "505e245c-d4e6-45ea-b932-0e4de1327fc4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([100, 101, 102])" + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "7c80b3f2-4dc9-4662-9036-1d736ecb6af1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 4, 16],\n", + " [100, 121, 144]])" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x*x" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "id": "a49e39b2-f64e-4cbe-9fe2-cac29f217637", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 100, 202, 408],\n", + " [1000, 1111, 1224]])" + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x*y" + ] + }, + { + "cell_type": "markdown", + "id": "75af9e4a-2d8f-4c77-8dac-f25c09c549f5", + "metadata": {}, + "source": [ + "#### mnożenie macierzy, iloczny skalarny\n", + "dot product- iloczyn skaralny dla wektorów \n", + "mnożenie macierzowe - dla macierzy" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "id": "79413af7-c8b2-4cae-9979-992782c5ad74", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([100, 101, 102])" + ] + }, + "execution_count": 79, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "id": "6bc6192b-fbce-4eb0-95de-e6e6bd6729a2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30605" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y.dot(y)" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "8d2f6687-f4cc-450a-bd5c-ec9ddf220cb6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30605" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y@y" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "fc5fd3fe-6b34-409f-ad34-25828988f0b2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "30605" + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.matmul(y,y)" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "640e1c27-4ab9-4cb7-bd00-a895fdcc5476", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 4],\n", + " [10, 11, 12]])" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "49d9ff43-3e93-4836-b87f-7095b2667854", + "metadata": {}, + "outputs": [ + { + "ename": "ValueError", + "evalue": "shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[84], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m x\u001b[38;5;241m.\u001b[39mdot(x)\n", + "\u001b[0;31mValueError\u001b[0m: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)" + ] + } + ], + "source": [ + "x.dot(x)" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "9e5a2c8f-d398-4246-b215-8c5fb5e1da00", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(2, 3)" + ] + }, + "execution_count": 85, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "c31c73f0-4848-43fa-97a7-fb5522a87862", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(3,)" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "47e8851e-47a2-4453-aee5-c1e3bdd01541", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 710, 3335])" + ] + }, + "execution_count": 87, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x.dot(y)" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "id": "3b5ed3a6-591a-4181-b4eb-efe0c85a852c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 710, 3335])" + ] + }, + "execution_count": 88, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x @ y" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "a1f68122-808c-40df-948f-08fa3993f336", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 710, 3335])" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.matmul(x,y)" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "6d4ed5fa-6537-418e-b648-a64c14ae40cf", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 4],\n", + " [10, 11, 12]])" + ] + }, + "execution_count": 90, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "13ed3203-b5b8-47dc-a471-0eda865a8eed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 2, 4],\n", + " [11, 12]])" + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x[:,1:]" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "id": "d7991eb9-2774-4424-aa19-373456fe18bb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 48, 56],\n", + " [154, 188]])" + ] + }, + "execution_count": 92, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.matmul(x[:,1:] , x[:,1:])" + ] + }, + { + "cell_type": "code", + "execution_count": 93, + "id": "74d2da83-8cf3-46f4-859d-6694fce7f194", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 2, 4],\n", + " [10, 11, 12]])" + ] + }, + "execution_count": 93, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "id": "06981ab2-4b03-4dcd-87d2-f4f689e75f2d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[ 1, 10],\n", + " [ 2, 11],\n", + " [ 4, 12]])" + ] + }, + "execution_count": 94, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x.T" + ] + }, + { + "cell_type": "markdown", + "id": "aaf38f60-7ef3-47c9-922c-64016e35196e", + "metadata": {}, + "source": [ + "### Zadanie 6\n", + "\n", + "\n", + "1. Utwórz dwie tablice jednowymiarowe `x` i `y`:\n", + " - `x` zawiera liczby \\( [3, 5, 7] \\),\n", + " - `y` zawiera liczby \\( [50, 60, 70] \\).\n", + "\n", + "2. Wykonaj następujące operacje i wyświetl wyniki:\n", + " - Dodaj tablicę `x` i `y`.\n", + " - Dodaj do każdego elementu tablicy `x` liczbę 10.\n", + " - Pomnóż każdy element tablicy `x` przez 5.\n", + "\n", + "3. Utwórz dwuwymiarową tablicę `z` zawierającą:\n", + "\n", + " \\begin{bmatrix}\n", + " 3 & 5 & 7 \\\\\n", + " 8 & 10 & 12 \\\\\n", + "\n", + " 10 & 10 & 10 \\\\\n", + " \\end{bmatrix}\n", + "\n", + " Następnie wykonaj następujące operacje:\n", + " - Pomnóż macierzowo x i z\n", + " - Oblicz iloczyn skalrny x i y\n" + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "id": "eb34ddb2-6f92-4444-87dd-00fb392d416e", + "metadata": {}, + "outputs": [], + "source": [ + "## Tworzenie macierzy zerowych, jednostkowych, itp" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "de2ca8df-31fe-4fef-ba7f-bd2fa2bfa766", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])" + ] + }, + "execution_count": 96, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.zeros((10,10))" + ] + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "e5850227-6736-4867-ba93-6d56685d6d64", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],\n", + " [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])" + ] + }, + "execution_count": 97, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.ones((10,10))" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "69f3a9e8-5ffa-4eb6-916f-d89bab25438a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.],\n", + " [123., 123., 123., 123., 123., 123., 123., 123., 123., 123.]])" + ] + }, + "execution_count": 98, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.ones((10,10)) * 123" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "7f9568c6-cb77-452c-9298-cfa4229157fc", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],\n", + " [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])" + ] + }, + "execution_count": 99, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "np.eye(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "id": "6fda600b-9422-45b2-a593-8afe96738a7c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\n", + "2\n", + "3\n", + "4\n", + "5\n", + "6\n" + ] + } + ], + "source": [ + "#\n", + "arr = np.array([[1, 2, 3], [4, 5, 6]])\n", + "\n", + "for x in arr:\n", + " for y in x:\n", + " print(y)" + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "id": "af0eb0d3-4ec1-41c4-9860-63cdd54b818b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1\n", + "2\n", + "3\n", + "4\n", + "5\n", + "6\n" + ] + } + ], + "source": [ + "#\n", + "arr = np.array([[1, 2, 3], [4, 5, 6]])\n", + "\n", + "for i in range(len(arr)):\n", + " for j in range(len(arr[i])):\n", + " print(arr[i,j])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/zajecia3/1_odpowiedzi.ipynb b/zajecia3/1_odpowiedzi.ipynb new file mode 100644 index 0000000..6859e1f --- /dev/null +++ b/zajecia3/1_odpowiedzi.ipynb @@ -0,0 +1,311 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "23ed41a0-7a05-493e-a640-4bfb10c42164", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "fa3799c5-d3a0-4967-98d4-a340d19dbfc6", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10 11 12 13 14 15 16 17 18 19 20]\n", + "(11,)\n", + "\n" + ] + } + ], + "source": [ + "#Zadanie 1.1\n", + "# Tworzenie tablicy jednowymiarowej\n", + "arr = np.array([10,11,12,13,14,15,16,17,18,19,20])\n", + "print(arr)\n", + "print(arr.shape)\n", + "print(type(arr))\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b6b4fa7d-7ee5-416c-8060-39057b49d77b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[10 20]\n", + " [30 40]\n", + " [50 60]]\n", + "(3, 2)\n", + "\n" + ] + } + ], + "source": [ + "# Zadanie 1.2\n", + "arr = np.array([[10, 20], [30, 40], [50, 60]])\n", + "\n", + "print(arr)\n", + "\n", + "print(arr.shape)\n", + "\n", + "print(type(arr))" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "f96f774c-d6cd-440f-b6bf-a2d373404de3", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[[1 2 3]\n", + " [4 5 6]\n", + " [7 8 9]]\n", + "8\n", + "[[7 8 9]]\n", + "[3 6 9]\n", + "[[5 6]\n", + " [8 9]]\n" + ] + } + ], + "source": [ + "# Zadanie 2\n", + "# Tworzenie dwuwymiarowej tablicy\n", + "arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n", + "\n", + "print(arr)\n", + "\n", + "\n", + "print(arr[2, 1])\n", + "\n", + "print(arr[2:])\n", + "\n", + "\n", + "print(arr[:,2])\n", + "\n", + "print(arr[1:,1:])\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "43216855-9d5d-4d03-9512-557f4d228571", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10 20 30 40]\n", + "int64\n", + "[10. 20. 30. 40.]\n", + "float32\n", + "['Python' 'NumPy' 'Coding']\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
female_BMImale_BMIgdppopulationunder5mortalitylife_expectancyfertility
Country
Japan21.8708823.5000434800.0127317900.03.482.51.34
\n", + "" + ], + "text/plain": [ + " female_BMI male_BMI gdp population under5mortality \\\n", + "Country \n", + "Japan 21.87088 23.50004 34800.0 127317900.0 3.4 \n", + "\n", + " life_expectancy fertility \n", + "Country \n", + "Japan 82.5 1.34 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[df['life_expectancy'].max() == df['life_expectancy']]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "175" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 2** Stwórz kolumnę `gdp_log`, która powstanie z kolumny `gdp` poprzez zastowanie funkcji `log` (logarytm). \n", + "\n", + "Hint 1: Wykorzystaj funkcję `apply` (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply).\n", + "\n", + "Hint 2: Wykorzystaj fukcję `log` z pakietu `np`." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "df['gdp_log'] = df['gdp'].apply(np.log)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Naszym zadaniem będzie oszacowanie długości życia (kolumna `life_expectancy`) na podstawie pozostałych zmiennych. Na samym początku, zastosujemy regresje jednowymiarową na `fertility`." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Y shape: (175,)\n", + "X shape: (175,)\n" + ] + } + ], + "source": [ + "y = df['life_expectancy'].values\n", + "X = df['fertility'].values\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Będziemy korzystać z gotowej implementacji regreji liniowej z pakietu sklearn. Żeby móc wykorzystać, musimy napierw zmienić shape na dwuwymiarowy." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Y shape: (175, 1)\n", + "X shape: (175, 1)\n" + ] + } + ], + "source": [ + "y = y.reshape(-1, 1)\n", + "X = X.reshape(-1, 1)\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Jeszcze przed właściwą analizą, narysujmy wykres i zobaczny czy istnieje \"wizualny\" związek pomiędzy kolumnami." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df.plot.scatter('fertility', 'life_expectancy')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 3** Zaimportuj `LinearRegression` z pakietu `sklearn.linear_model`." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Tworzymy obiekt modelu regresji liniowej." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "model = LinearRegression()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Trening modelu ogranicza się do wywołania metodu `fit`, która przyjmuje dwa argumenty:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "LinearRegression()" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model.fit(X,y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Współczynniki modelu:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Wyraz wolny (bias): [83.2025629]\n", + "Współczynniki cech: [[-4.41400624]]\n" + ] + } + ], + "source": [ + "print(\"Wyraz wolny (bias):\", model.intercept_)\n", + "print(\"Współczynniki cech:\", model.coef_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 4** Wytrenuj nowy model `model2`, który będzie jako X przyjmie kolumnę `gdp_log`. Wyświetl parametry nowego modelu." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Y shape: (175,)\n", + "X shape: (175,)\n", + "Y shape: (175, 1)\n", + "X shape: (175, 1)\n" + ] + } + ], + "source": [ + "y = df['gdp_log'].values\n", + "X = df['fertility'].values\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)\n", + "\n", + "y = y.reshape(-1, 1)\n", + "X = X.reshape(-1, 1)\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "df.plot.scatter('gdp_log', 'life_expectancy')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" + ], + "text/plain": [ + "LinearRegression()" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model2 = LinearRegression()\n", + "model2.fit(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Wyraz wolny (bias): [10.97412729]\n", + "Współczynniki cech: [[-0.63200209]]\n" + ] + } + ], + "source": [ + "print(\"Wyraz wolny (bias):\", model2.intercept_)\n", + "print(\"Współczynniki cech:\", model2.coef_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mając wytrenowany model możemy wykorzystać go do predykcji. Wystarczy wywołać metodę `predict`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "input: 6.2\t predicted: 55.83572421482946\t expected: 7.1785454837637\n", + "input: 1.76\t predicted: 75.43391191760766\t expected: 9.064620717626777\n", + "input: 2.73\t predicted: 71.15232586542413\t expected: 9.418492105471156\n", + "input: 6.43\t predicted: 54.82050277977564\t expected: 8.86827250899781\n", + "input: 2.16\t predicted: 73.66830942186188\t expected: 10.155646068918863\n" + ] + } + ], + "source": [ + "X_test = X[:5,:]\n", + "y_test = y[:5,:]\n", + "output = model.predict(X_test)\n", + "\n", + "for i in range(5):\n", + " print(\"input: {}\\t predicted: {}\\t expected: {}\".format(X_test[i,0], output[i,0], y_test[i,0]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Sprawdzenie jakości modelu - metryki: $MSE$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Istnieją 3 metryki, które określają jak dobry jest nasz model:\n", + " * $MSE$: [błąd średnio-kwadratowy](https://pl.wikipedia.org/wiki/B%C5%82%C4%85d_%C5%9Bredniokwadratowy) \n", + " * $RMSE = \\sqrt{MSE}$" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Root Mean Squared Error: 61.20258121223673\n" + ] + } + ], + "source": [ + "from sklearn.metrics import mean_squared_error\n", + "\n", + "rmse = np.sqrt(mean_squared_error(y, model.predict(X)))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Root Mean Squared Error: 0.8330994741525843\n" + ] + } + ], + "source": [ + "# Import necessary modules\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Create training and test sets\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n", + "\n", + "# Create the regressor: reg_all\n", + "reg_all = LinearRegression()\n", + "\n", + "# Fit the regressor to the training data\n", + "reg_all.fit(X_train, y_train)\n", + "\n", + "# Predict on the test data: y_pred\n", + "y_pred = reg_all.predict(X_test)\n", + "\n", + "# Compute and print RMSE\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Regresja wielu zmiennych" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Model regresji liniowej wielu zmiennych nie różni się istotnie od modelu jednej zmiennej. Np. chcąc zbudować model oparty o dwie kolumny: `fertility` i `gdp` wystarczy zmienić X (cechy wejściowe):" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(175, 2)\n", + "Wyraz wolny (bias): [9.47431285]\n", + "Współczynniki cech: [[-3.58540438e-01 4.05443491e-05]]\n", + "Root Mean Squared Error: 0.5039206253337853\n" + ] + } + ], + "source": [ + "X = df[['fertility', 'gdp']]\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n", + "\n", + "print(X.shape)\n", + "\n", + "model_mv = LinearRegression()\n", + "model_mv.fit(X_train, y_train)\n", + "\n", + "print(\"Wyraz wolny (bias):\", model_mv.intercept_)\n", + "print(\"Współczynniki cech:\", model_mv.coef_)\n", + "\n", + "y_pred = model_mv.predict(X_test)\n", + "\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 6** \n", + " * Zbuduj model regresji liniowej, która oszacuje wartność kolumny `life_expectancy` na podstawie pozostałych kolumn.\n", + "* Wyświetl współczynniki modelu.\n", + "* Oblicz wartości metryki rmse na zbiorze trenującym.\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['female_BMI', 'male_BMI', 'gdp', 'population', 'under5mortality',\n", + " 'life_expectancy', 'fertility', 'gdp_log'],\n", + " dtype='object')" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(175, 7)\n", + "Wyraz wolny (bias): [-2.48689958e-14]\n", + "Współczynniki cech: [[-4.53155263e-16 4.57243814e-16 5.81045637e-19 3.74348839e-26\n", + " 4.40441174e-16 -1.32227302e-16 1.00000000e+00]]\n", + "Root Mean Squared Error: 1.854651242181147e-14\n" + ] + } + ], + "source": [ + "X = df[['female_BMI', 'male_BMI', 'gdp', 'population', 'under5mortality', 'fertility', 'gdp_log']]\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n", + "\n", + "print(X.shape)\n", + "\n", + "model_mv = LinearRegression()\n", + "model_mv.fit(X_train, y_train)\n", + "\n", + "print(\"Wyraz wolny (bias):\", model_mv.intercept_)\n", + "print(\"Współczynniki cech:\", model_mv.coef_)\n", + "\n", + "y_pred = model_mv.predict(X_test)\n", + "\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 7**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " Zaimplementuj metrykę $RMSE$ jako fukcję rmse (szablon poniżej). Fukcja rmse przyjmuje dwa parametry typu list i ma zwrócić wartość metryki $RMSE$ ." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "5.234841906276239\n", + "5.234841906276239\n" + ] + } + ], + "source": [ + "def rmse(expected, predicted):\n", + " \"\"\"\n", + " argumenty:\n", + " expected (type: list): poprawne wartości\n", + " predicted (type: list): oszacowanie z modelu\n", + " \"\"\"\n", + " return np.sqrt(sum([(e-p)**2 for e,p in zip(expected,predicted)])/len(expected))\n", + " \n", + "\n", + "y = df['life_expectancy'].values\n", + "X = df[['fertility', 'gdp']].values\n", + "\n", + "test_model = LinearRegression()\n", + "test_model.fit(X, y)\n", + "\n", + "predicted = list(test_model.predict(X))\n", + "expected = list(y)\n", + "\n", + "print(rmse(predicted,expected))\n", + "print(np.sqrt(mean_squared_error(predicted, expected)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/zajecia4/sklearn cz. 1.ipynb b/zajecia4/sklearn cz. 1.ipynb new file mode 100644 index 0000000..c8bbf32 --- /dev/null +++ b/zajecia4/sklearn cz. 1.ipynb @@ -0,0 +1,430 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Kkolejna część zajęć będzie wprowadzeniem do szeroko używanej biblioteki w Pythonie: `sklearn`. Zajęcia będą miały charaktere case-study poprzeplatane zadaniami do wykonania. Zacznijmy od załadowania odpowiednich bibliotek." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "# ! pip install matplotlib" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Zacznijmy od załadowania danych. Na dzisiejszych zajęciach będziemy korzystać z danych z portalu [gapminder.org](https://www.gapminder.org/data/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_csv('gapminder.csv', index_col=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dane zawierają różne informacje z większość państw świata (z roku 2008). Poniżej znajduje się opis kolumn:\n", + " * female_BMI - średnie BMI u kobiet\n", + " * male_BMI - średnie BMI u mężczyzn\n", + " * gdp - PKB na obywatela\n", + " * population - wielkość populacji\n", + " * under5mortality - wskaźnik śmiertelności dzieni pon. 5 roku życia (na 1000 urodzonych dzieci)\n", + " * life_expectancy - średnia długość życia\n", + " * fertility - wskaźnik dzietności" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 1**\n", + "Na podstawie danych zawartych w `df` odpowiedz na następujące pytania:\n", + " * Jaki był współczynniki dzietności w Polsce w 2018?\n", + " * W którym kraju ludzie żyją najdłużej?\n", + " * Z ilu krajów zostały zebrane dane?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 2** Stwórz kolumnę `gdp_log`, która powstanie z kolumny `gdp` poprzez zastowanie funkcji `log` (logarytm). \n", + "\n", + "Hint 1: Wykorzystaj funkcję `apply` (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply).\n", + "\n", + "Hint 2: Wykorzystaj fukcję `log` z pakietu `np`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Naszym zadaniem będzie oszacowanie długości życia (kolumna `life_expectancy`) na podstawie pozostałych zmiennych. Na samym początku, zastosujemy regresje jednowymiarową na `fertility`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = df['life_expectancy'].values\n", + "X = df['fertility'].values\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Będziemy korzystać z gotowej implementacji regreji liniowej z pakietu sklearn. Żeby móc wykorzystać, musimy napierw zmienić shape na dwuwymiarowy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y = y.reshape(-1, 1)\n", + "X = X.reshape(-1, 1)\n", + "\n", + "print(\"Y shape:\", y.shape)\n", + "print(\"X shape:\", X.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Jeszcze przed właściwą analizą, narysujmy wykres i zobaczny czy istnieje \"wizualny\" związek pomiędzy kolumnami." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.plot.scatter('fertility', 'life_expectancy')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 3** Zaimportuj `LinearRegression` z pakietu `sklearn.linear_model`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Tworzymy obiekt modelu regresji liniowej." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = LinearRegression()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Trening modelu ogranicza się do wywołania metodu `fit`, która przyjmuje dwa argumenty:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model.fit(X, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Współczynniki modelu:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"Wyraz wolny (bias):\", model.intercept_)\n", + "print(\"Współczynniki cech:\", model.coef_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 4** Wytrenuj nowy model `model2`, który będzie jako X przyjmie kolumnę `gdp_log`. Wyświetl parametry nowego modelu." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mając wytrenowany model możemy wykorzystać go do predykcji. Wystarczy wywołać metodę `predict`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X_test = X[:5,:]\n", + "y_test = y[:5,:]\n", + "output = model.predict(X_test)\n", + "\n", + "for i in range(5):\n", + " print(\"input: {}\\t predicted: {}\\t expected: {}\".format(X_test[i,0], output[i,0], y_test[i,0]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Sprawdzenie jakości modelu - metryki: $MSE$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Istnieją 3 metryki, które określają jak dobry jest nasz model:\n", + " * $MSE$: [błąd średnio-kwadratowy](https://pl.wikipedia.org/wiki/B%C5%82%C4%85d_%C5%9Bredniokwadratowy) \n", + " * $RMSE = \\sqrt{MSE}$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import mean_squared_error\n", + "\n", + "rmse = np.sqrt(mean_squared_error(y, model.predict(X)))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Import necessary modules\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.metrics import mean_squared_error\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "# Create training and test sets\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n", + "\n", + "# Create the regressor: reg_all\n", + "reg_all = LinearRegression()\n", + "\n", + "# Fit the regressor to the training data\n", + "reg_all.fit(X_train, y_train)\n", + "\n", + "# Predict on the test data: y_pred\n", + "y_pred = reg_all.predict(X_test)\n", + "\n", + "# Compute and print R^2 and RMSE\n", + "print(\"R^2: {}\".format(reg_all.score(X_test, y_test)))\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Regresja wielu zmiennych" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Model regresji liniowej wielu zmiennych nie różni się istotnie od modelu jednej zmiennej. Np. chcąc zbudować model oparty o dwie kolumny: `fertility` i `gdp` wystarczy zmienić X (cechy wejściowe):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X = df[['fertility', 'gdp']]\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n", + "\n", + "print(X.shape)\n", + "\n", + "model_mv = LinearRegression()\n", + "model_mv.fit(X_train, y_train)\n", + "\n", + "print(\"Wyraz wolny (bias):\", model_mv.intercept_)\n", + "print(\"Współczynniki cech:\", model_mv.coef_)\n", + "\n", + "y_pred = model_mv.predict(X_test)\n", + "\n", + "rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n", + "print(\"Root Mean Squared Error: {}\".format(rmse))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 6** \n", + " * Zbuduj model regresji liniowej, która oszacuje wartność kolumny `life_expectancy` na podstawie pozostałych kolumn.\n", + " * Wyświetl współczynniki modelu.\n", + " * Oblicz wartości metryki rmse na zbiorze trenującym.\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**zad. 7**\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " Zaimplementuj metrykę $RMSE$ jako fukcję rmse (szablon poniżej). Fukcja rmse przyjmuje dwa parametry typu list i ma zwrócić wartość metryki $RMSE$ ." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def rmse(expected, predicted):\n", + " \"\"\"\n", + " argumenty:\n", + " expected (type: list): poprawne wartości\n", + " predicted (type: list): oszacowanie z modelu\n", + " \"\"\"\n", + " pass\n", + " \n", + "\n", + "y = df['life_expectancy'].values\n", + "X = df[['fertility', 'gdp']].values\n", + "\n", + "test_model = LinearRegression()\n", + "test_model.fit(X, y)\n", + "\n", + "predicted = list(test_model.predict(X))\n", + "expected = list(y)\n", + "\n", + "print(rmse(predicted,expected))\n", + "print(np.sqrt(mean_squared_error(predicted, expected)))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}