2023-programowanie-w-pythonie/zajecia3/sklearn cz. 1.ipynb
Jakub Pokrywka cd07404449 add sklearn
2023-11-19 12:33:16 +01:00

588 lines
30 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kkolejna część zajęć będzie wprowadzeniem do drugiej, szeroko używanej biblioteki w Pythonie: `sklearn`. Zajęcia będą miały charaktere case-study poprzeplatane zadaniami do wykonania. Zacznijmy od załadowania odpowiednich bibliotek."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Zacznijmy od załadowania danych. Na dzisiejszych zajęciach będziemy korzystać z danych z portalu [gapminder.org](https://www.gapminder.org/data/)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('gapminder.csv', index_col=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dane zawierają różne informacje z większość państw świata (z roku 2008). Poniżej znajduje się opis kolumn:\n",
" * female_BMI - średnie BMI u kobiet\n",
" * male_BMI - średnie BMI u mężczyzn\n",
" * gdp - PKB na obywatela\n",
" * population - wielkość populacji\n",
" * under5mortality - wskaźnik śmiertelności dzieni pon. 5 roku życia (na 1000 urodzonych dzieci)\n",
" * life_expectancy - średnia długość życia\n",
" * fertility - wskaźnik dzietności"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 1**\n",
"Na podstawie danych zawartych w `df` odpowiedz na następujące pytania:\n",
" * Jaki był współczynniki dzietności w Polsce w 2018?\n",
" * W którym kraju ludzie żyją najdłużej?\n",
" * Z ilu krajów zostały zebrane dane?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 2** Stwórz kolumnę `gdp_log`, która powstanie z kolumny `gdp` poprzez zastowanie funkcji `log` (logarytm). \n",
"\n",
"Hint 1: Wykorzystaj funkcję `apply` (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply).\n",
"\n",
"Hint 2: Wykorzystaj fukcję `log` z pakietu `np`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naszym zadaniem będzie oszacowanie długości życia (kolumna `life_expectancy`) na podstawie pozostałych zmiennych. Na samym początku, zastosujemy regresje jednowymiarową na `fertility`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Y shape: (175,)\n",
"X shape: (175,)\n"
]
}
],
"source": [
"y = df['life_expectancy'].values\n",
"X = df['fertility'].values\n",
"\n",
"print(\"Y shape:\", y.shape)\n",
"print(\"X shape:\", X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Będziemy korzystać z gotowej implementacji regreji liniowej z pakietu sklearn. Żeby móc wykorzystać, musimy napierw zmienić shape na dwuwymiarowy."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Y shape: (175, 1)\n",
"X shape: (175, 1)\n"
]
}
],
"source": [
"y = y.reshape(-1, 1)\n",
"X = X.reshape(-1, 1)\n",
"\n",
"print(\"Y shape:\", y.shape)\n",
"print(\"X shape:\", X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jeszcze przed właściwą analizą, narysujmy wykres i zobaczny czy istnieje \"wizualny\" związek pomiędzy kolumnami."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7fc2c92991f0>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEGCAYAAACNaZVuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de5wU5ZXw8d/pngtXheAlXETW9ZJXWJmYWYni+lGIWaOGJG+Im6jBmE3URIyJGjFXNLzvrqhJ1sRsskjiarwkiqsguq6u4LvRJLiDDgioK3FVGFCQFRSEuXSf94+uZnq6q6ereqq6q6bO9/PhM3RNV/fTw3DqqfM8z3lEVTHGGJMsqXo3wBhjTO1Z8DfGmASy4G+MMQlkwd8YYxLIgr8xxiRQQ70b4NVBBx2kkyZNqnczjDEmVlavXv2Wqh5cfDw2wX/SpEm0tbXVuxnGGBMrIvKa23FL+xhjTAJZ8DfGmASy4G+MMQlkwd8YYxLIgr8xxiRQIoP/jt2drNm0kx27O0M9xxhjoio2Uz2DsrS9g3n3r6UxlaI7m+WGTx/HrJbxgZ8Tth27O9n89l4mjB7KmBHNdW2LMSZ+EhX8d+zuZN79a9nXnWUfWQCuvn8t0488qGwAreacsEXxYmSMiZdEpX02v72XxlTfj9yYSrH57b1Vn1PrdFDhxejdzh72dWe5+v61lo4yxviSqJ7/hNFD6c5m+xzrzmaZMHpoVefUoweevxjl70Kg92Jk6R9jjFeJ6vmPGdHMDZ8+jiGNKUY2NzCkMcUNnz6u36Dpds73zj6W9Vve4eol4fbA3e4qqrmAGWNMsUT1/AFmtYxn+pEH+RosLTxnXccuFizfQAqhs6dvEA6yB17uriJ/Mbq66HvW6zfG+JG44A+53rzfYJl//t8s+gP7urOuzwmqB15pkLmaC5gxxhRKZPCvllu+HWBYU5qsqmsPvJopmeu3vEMK6XOs+K6imguYMcbkWfD3wS3fDnDBSYfzpZOPKAnGhambrkyWuacdybnTJvYbtJe2d3D1krUlKaW93T2W1zfGBCZRA77VKBx0HTOimSs+cnTJc257+lXX8wqnZHb2ZPnh4//FSdc/wbL2jrLvNe/+0sAPkNUBf5TQ2SpoY+LDev79KB50Pad1Ar95ZlPJ89wGesuliDp7tOwisXLnQC74r9/yDqccXbIhTyTYwjNj4sV6/mW4Laa64w+v05Up7YLv7c4wvCnd51i5FBHkLhbrt7zjaRpnX73vHaVeti08MyZ+Qg/+IvINEVkvIutE5B4RGSIi7xORx0XkZefr6LDb4Zfbyt5yerLKmT/5HXet6t0tLT8ls7mh9DX2dvfw5TvaOH/xKqYvXLE/DdTfOQ0pYdyBuZz/0vYOpi9cUXJ+vVSzctoYU1+hBn8RGQ98DWhV1SlAGvgscA3whKoeBTzhPI6Uyr3wvroyynceWMddf+y9AMxqGc/vr5nBlacfTXODMLK5geYGQSS3RsCtlzz9yIO4dU4rXzn1CJrSQlM690+UFjj7lqe464+vRa6XbQvPjImfWqR9GoChItIADAO2AJ8Abne+fzvwyRq0w5filb3NDcWTL91d99D6PoF4zIhmLpt5FL+/ZiZ3fmkat85pZUhD3xRRSoQ//GkH1//rC3z47/+dr9y5mtuefpWrPnoM+VRPZ0bZ153luofW05BynwZaL9WsnDbG1FeoA76q2iEiNwGvA3uBx1T1MRE5VFW3Os/ZKiKHuJ0vIhcBFwFMnDgxzKa6KlxMtWtvN1+5czV7ujL9ntOYdl/lm5+Xv2N3Z0kv+b2uDHPveW7/4+5M7j1ueuwlmhpSdGV637MxnSoZd4hCL9sWnhkTL2GnfUaT6+X/GTAOGC4i53s9X1UXqWqrqrYefHB1s1wGOjA6ZkQzUw8bxab/ea8k8KddbgUyqv0G4v7y+sUaUlIS6DOqzP/4sZHsZed/VlFoizGmf2FP9fwI8N+quh1ARP4FOAl4U0TGOr3+scC2MN48qOmHO3Z3suDhDSXHf/CJKUAu1dOYTpEps8q32KyW8Ywa1sQlv17Ne93l7yR6ssr8WZNZsHxDyWc4Y/L7rZdtjKla2MH/deDDIjKMXNpnJtAG7AEuAK53vi4N+o2D3ITFbf798OY0U8YfyNTDRnHGFP+BePK4A8jS/8qt+R+fzHnTDncN9FbewRgzEKGmfVR1FbAEeBZ43nm/ReSC/uki8jJwuvM4UEFOP3SbzZLJKsOb0qzZtBPAd7qjcJC02SV/lBYYOaRh/3MtnWKMCVLoK3xVdT4wv+hwJ7m7gNAEOf3QrYzyOR+awNm3PDWglFLhIGl3T4bP3fpH8gVDMxrsdpG2568xptCgLe8QdN37wkA9vCnN2bc8FUhKKZ++WbNpJ+l0quSCVTxzqJogXuvSC3ahMSb6Bm3wh+CnHxYG6qC3UhzelC7ZJ2Bfd7ZP2YhqgrifsY8ggrbV+DEmHgZ18IdwBkbDWNG6pytDc1roLJja2ZyW/dNLKwXxcoHb656/QQTtIAfZjTHhGvTBPwyVUkrV9KAnjB6KpCSX7HdISvZfUNyCeDolrHxxG509WRY8XDodNP+6lS5UQQVt21zemPiw4O/CS/AuTikBrNm0M7fHb5lA3J9KFxS3IL6nM8P8ZevY05U7ng+631zSG7i9jH0EFbStxo8x8WHBv4if9Ec+uObPSUtvmqaaHnR/YxSFQTydEvZ05t4nH/gLdfZkuXvV61w286iKrwvuQbszkxtv8HMXY5vLGxMfohqDLaKA1tZWbWtrC/U9duzuZPrCFX0GXoc0pnh63oyyAcztnEIjmxu480vTmHrYqMDauPLFbcxftr7fOkPNDcLvr5npOfAua+/g6vvXArmB5ua0kAVUlaGNDb7uYmy2jzHRISKrVbW1+Lht5lKgmoVhler+59MeQW2+MmZEMy2HjaK7wr6OTem0rwVts1rGs3zuyWSzvVVEuzNKT5b9paOvum8NG99811Mba7koLUob2xgTF5b2KVBNzrpc3f/hzWky2Vytn6c2vlWSSvI7BTXfm86PKYhzxzakMYUq9GSyhWPFVeXa93RlaG5I05Xpcf1+V0Y586dPcdPs6EzftKmlxlTHgn+BanLWbud876xjmTL+wP3BN58Wyo8DXHnfGlKS6517CVj5ANeQEnZ39k31ZLPKI1/7KzZsfWfAuXYvG9h09WQjM33TppYaUz0L/kWqWRjW3zluC8K6nS56Z0+uh91fwCoMcG6aG9Ls6coEsqCt+EK2rydDNqsUb1sclembNrXUmOpZ8HfhdWFY8cCm2zleetP9BSy3AFeoML3jpd2VBmOLLyJv7+nizJ8+RVdPwcXLZZ1APQZ4bWqpMdWz4F8lr7nm4t50VybrKz9f7uIxrDFFT1b53lnHBl7jp/AiMmZEMzfNLp8KqzbnHsQFw6aWGlM9m+pZhWqnhN616nV+tvJlRGT/dEpJScWAmZ+GmQ9ws6aO48H2LTSlhR5nUNlLjR+/bS4+vzhYV/uaQQ/S2tRSY8orN9XTev5VqDbX/I9PbqSzR8lvyq4iPDz3ZI48dGTJcwsDmltF0a6eLF3OpBwvg5x+2+yW0ip+XjU/hzAGaW1jG2P8s+DvUWEwrCbX7BYom9Mp14Va5XrG5SqKpkRYv2UXpxx9SNke+q69XX02gu+vzW7v7zaYHNTPwQZpjak9C/4euAVDv7nm/gJlYcAG+u0Zu73Oe10ZvnxHG3/Tehj3rt7cp03qvF5jKkVWoSFFnxW7bmWdi9+/3NTUanLuNkhrTDSEmvMXkWOA3xYcOgL4PjAK+DKw3Tn+bVV9pL/XqlfOv7+8NuAr11ycuy8Ozt3ZLJeeeiSL/uMV3u3sXWhVXCJiWXsH31yyls6e/mcRNTcIIH2e19yQ4tY5rUwed4Brm9ds2sn5i1f1ef9ixXl9vzl3t5+DLcwyJhx1yfmr6ktAi9OANNABPABcCPxYVW8K8/2DsPntvaSl7x67+TSF3xIGbpVAixeA3bLyZaDv+xX3jGe1jGfUsCYu+fVq3usuX98nLanil6KzJ8uaTTs55eiDXc+pZmqq35x70JvsGGP8q2Vtn5nAn1T1tRq+54Ct69hVkpcfSJqisO6NW12gpnSauacdyZDGFCObGxjSmHJNpUwedwBZ+r9ry2iWjEsNoFtWvly2Dk7hxvIjmxtobkjRUPRbEkSaxjalN6a+apnz/yxwT8HjuSIyB2gDrlTVt4tPEJGLgIsAJk6cWJNGFtqxu5MFD28oOV44t34g0wzL5b/PnTaRc6dNdB24LTxWsql86wTubeub839tx3v88PH/6vMe+aJvXvcquPnf/4s7/vj6/u+f0zrBgrYxMVeT4C8iTcAs4FvOoZ8DC8jNeVwA/BD4YvF5qroIWAS5nH8t2lrIbWbK8OY0U8Yf2GfefrkaPZUuDJUGTL1ss1icPrl85tF9Hu/Y3cktK192ppjmeFmhm0/l7Njdyb2rN/dp971tm7l85tF2ATAmxmrV8/8Y8KyqvgmQ/wogIrcCy2vUDl/ceuaZrLKuYxfn/NMf9g+kutXo8bqQKR/A1295B1Amjzuw5DmV5sYXb/pSfKfw/bMnl+wu5nWFbhBTM5O4CCuJn9nES62C/+coSPmIyFhV3eo8/BSwrkbt8KV496zujHLFR45mwcMbXGfapEX219D3s5DJreTzQANwcVD/3tnHMmXcgSVppErtHOjUzCSWXE7iZzbxE/qAr4gMA04H/qXg8A0i8ryIrAVOA74RdjuqNatlPN8761i6e7I0poSbHnup7HP3dGVYt2WXr01hCgNwftOUq+9f22dA1m8AdnvNBcs3lPRCvbSzeAC43AC013YUf7bBJomf2cRT6D1/VX0PGFN07PNhv29Q8oO+XRntXSFbXOO4wILlG1g+92TPwdpLr97vYqpyO3gV3yl4vahUOzUziat5k/iZTTzZCt8K3MsyCCpCg0jJPPuUCFt27fUcrMMIwMOb0iX1//d15zZkh775aK/trKZ+ThJX8ybxM5t4suBfgdt/ZknlCrK9+MY7XHHvWroyvd/Pl1q4cfZUnp43o2KwLhlX6Mly4UmTyj7XSwDe05WhOS10FtyhNKeFPV0Z13y0l3ZWI4kll5P4mU08WUlnD77/4PN95rnPOXEiHzr8fcy7fy2a1T5BNq8pLTzytb9yrdjp5q4/vsb3l60jfx1pSMGPzmmpaqBw45vvcuZPfkdXQbuGNKZYPvdkzr7lqarLOlcriTNfkviZTTSVK+9QyxW+seQ2z/23/7mZq5esYV931jXwQ+9m58vaOzy9xw+Wb6DgBoKeLHxzyRrfA4VL2zs4+5anSKVydR2a07J/kHZPV8bzQHSQkriaN4mf2cSLBf8K3GbEpFOSq5tTQX6z80oBfPPbe0mnpOR4WvwF5j5TN53evYqwfO7JzGoZb/loY8x+FvwrKLfQq/hYWiDt8tP00rOeMHqoaw2ejPoLzG4XqsI9AwYybdNUZ8fuTtZs2mlTPU3k2IBvBcUDePt6MmSy2f1XzeYGIZNVRISmtLCny3/PesyIZm6cfRxX3reGbieNlBa4cfZUX4HZS8/eKmrWji32MlFmPX8PZrWM5+l5M/jZeR8kJbl8fD7Gd/YoqtCd0T6Bf3hz2lfPelbLeK79+GQa0zC0IUVDujQNVInXnr3lo8Nni71M1FnP36MxI5o5cGgTDakUnfSd21885ju8Kc11H5/MaR84pN8Au/HNd2nftJOWw0YxengTCx7eQHcGugewt22ce/aDaYaMLfYyUWfB34cJo4fSnclWfF5GtWLgL54+etaU9wcWLOK4oflgS5HY4LqJOkv7+DBmRDPzPz655HhjWmhuEM+DqBvffLdP4Ad4eN0b+6uD5iUlWAzGFIkNrpuos56/T+d9+HAQuO6hDTSmc4O9N3z6OF+plvZNO12Pf+ZDE7n/uc2JWxnqJ0VSr9RQNe8b5xScGfws+FfhvGmHc8bk97tugOJFi7MRe7ELp0/iio8enbhg4TVFUq/U0EDeN44pOJMMlvap0kBmzBx56EjmnNh3W8o5J07kyENHBjITJ25zy72kSOqVGhqMKSljwHr+dfODT/wFcz48af9sH681gCrx20sNKo0y0NeplCKp1+wZm7VjBisL/gGoNvAdeejIwIJ+vh1+dhAr2e3rrGOZMv5A358jqHRMfymSes2esVk7ZrCy4D9AUZqiWKmXWniRgtKtJr/z4DpGNKfpcQaxvXwOtwvOVfet4dixBwR6YatlqeTii7mVaDaDUajBX0SOAX5bcOgI4PvAHc7xScCrwDmq+naYbQnajt2drN/yDlcvWUtnj7eedtj666UWX6QuPfXIkgsFwO7O3AI2r5/D7YKTr2h60+xgL4S1mD1T7mJus3bMYON5wFdEbhKR0knu/VDVl1S1RVVbgA8B7wEPANcAT6jqUcATzuPYWNrewfSFK7jk16tLNnKvRYnkcsoNnAIlg5a3rNzYuy2lC6+fw+2CA94rmuZ5HaQOszRFf4O7VhLDDDZ+ev4vAotEpAG4DbhHVXf5OH8m8CdVfU1EPgGc6hy/HXgSmOfjteqmMEC4qXc+2K2XumbTzpLeeUNK+GTLBJY8u4mGVG/lzzyvnyN/wbnqvjV9No8B7wOjYaTOqhmHscFdkySee/6qulhVpwNzyKVr1orI3SJymseX+Cxwj/P3Q1V1q/O6W4FD3E4QkYtEpE1E2rZv3+61qaFyK5sMMKzJXyG3MBX3Ut1653u6MjzY3oEqzDnxcP7vp6ZUvRp1Vst4HvnaX9HU0Pfn4uUCEsZUyvyd2fmLVzF94QpPG+qADe6aZPE1z19E0sAHnD9vAWuAK0TkNxXOawJmAff5eT9VXaSqraraevDBB/s5NTRuAaK5QfjF+cfz9LwZgQ/2BjFnvzAdlN/EHXIXgK6M8vP/9wooPD1vBnd+aVpVn+PIQ0dy02z/5QzcLqYDSZ0N5GJiJRlMknhO+4jIj8gF8CeAv1PVZ5xvLRSRlyqc/jHgWVV903n8poiMVdWtIjIW2Oa34fVSbvbHKUe73rwMSJDpkHw6aOWL25i/bH1Jmue6h9ZzxpT3M7XM6mM/7+En3RJ0b3ugqRsb3DVJ4Sfnvw74rqq+5/K9Eyqc+zl6Uz4Ay4ALgOudr0t9tKPuahEg/M7Z92LMiGZO+8AhfPvBdSXfa0wHk9v2W84g6KmUQVxMrCSDSQI/wf9toDH/QERGAaeq6oP9DfyKyDDgdODigsPXA/eKyN8CrwOf8dXqCAg7QIQ1+JirTHos33mg7wUgo1q33HaQF1Obl2+MN36C/3xVfSD/QFV3ish84MH+TnLuFMYUHdtBbvaPKSPMwcfzph0Omkv1NKZTZFTrHiCDvJha6saYyvwEf7fBYVshHJKge7DFUx/P+/DhnDGltDJptaK2C5elbozpn5/g3eYM+v4MUOAyYHUorTJAcD3YcgPHQQXIKJW4MMZ442eq52VAF7myDPcB+4BLw2hU3IRZQnmgK0vDLkkchZLHcSthHZSkfm4TDM89f1XdQ8zKMIShOL0R9V5v2KtW670qNuo//7Ak9XOb4PiZ5380cBW51b37z1PVGcE3K5pKSiCffSwLlm8IdDpm0MJetVrPVbFhTIeNg6R+bhMsP2mf+4DngO8C3yz4kwhu6Y3rHtpAWqTP8+pZ2M1N2KtW67kqNujVwXGR1M9tguVnwLdHVX8eWksizjW9kRa6e6JfC6bcwHGuLPUuQJg87oCqA3a9plYmtRZPf587arOuTHT5Cf4PichXyZVk3j/CpKr/E3irIsjtP1wmq8z/+GQWPLwh8guKimf2LG3v4Kr71tDtVOJsSMGPzmmpOm9cj6mVSV3QVe5zP7XxLRsHMJ6JqlZ+FiAi/+1yWFX1iGCb5K61tVXb2tpq8VZlLWvvKPkPN6tlfOx6Wzt2d3LS9StK9iJobhB+f83MWHyGQnH7+QeleGe26QtX9Ck1PqQxxdPzZiTqZ2JKichqVW0tPu5nts+fBduk+CmX3ojbgqLNb+8lnZKS42mJZ+36KP38a3khKvzcbns22F4Epj++VuiKyBTgWGBI/piq3hF0o6IsSoGmWhNGDyWTLb3jy+jgz5eHqZ7TL5M6/mGq52cbx/nAT50/pwE3kCvxbGJmzIhmbpx9HI3p3t5/QwpunD019he2eqn3Yjfbi8D45afnPxuYCjynqheKyKHA4nCaZcKWT2EFMdunkiTk5N1mg6VFWPniNk77wCE1+dxW0M744Sf471XVrIj0iMgB5DZgqclgrwnHmBHNoWxCUygpK1HLbZV57UPr+e7SdTX73IMhLWlqw88irzanhv+t5Aq6PQs80/8pJsnqnQqppXJbZe7uzAzqz90fqz0UbX5m+3zV+esvRORR4ABVXRtOs8xgUO+6P7VWuFXmtQ+tZ3dn71aZ1X7uuKbMknLHF2d+BnyfyP9dVV9V1bWFx4wplsQZKPmtMnuKZlNV87mXtncwfeEKzl+8iukLV7CsvSPIpoYmSXd8cVYx+IvIEBF5H3CQiIwWkfc5fyYB4zycP0pElojIiyLygoicKCLXikiHiLQ7f84c+EcxYarmFj6pM1CC+NxxDqBWeygevKR9Lga+Ti7Qrwby8wPfIbexSyU3A4+q6mwRaQKGAX8N/FhVb/LfZFONgaQPlrZ3cPWStaRTQiar3Djb+y18UmegDPRzxzlllsQ7vjiqGPxV9WbgZhG5TFV/6ufFnVlBpwBfcF6rC+gSKV1dasJTTf41f7EY3pTuUwMI4Mr71vgqH5zUGSgD+dxxDqBJrbkUN36memZFZJSq7gQQkdHA51T1H/s55whgO3CbiEwld+dwufO9uSIyB2gDrlTVt4tPFpGLgIsAJk6c6KOpJq+a2u+FF4vOnkyfwA/QnVHWb9kV+jTRJIt7AE3qHV+c+An+X1bV/WkeVX1bRL4M9Bf8G4DjgctUdZWI3ExuN7BbgAXk9gJeAPwQ+GLxyaq6CFgEucJuPtpqHH7TB24XC3d29xa2uAfQpN7xxYWfef4pKcjXiEgaaKpwzmZgs6quch4vAY5X1TdVNaOqWXLrBk7w02jjnd/0gdtgXbGGFEwed0BgbRyowTyfvJo9nAfzz8MEx0/P/9+Ae0XkF+R67JcAj/Z3gqq+ISKbROQYVX0JmAlsEJGxqrrVedqngHVVtN144Dd94HaxaEwLgtKQSpPRbKRqANl88r7s52G88lPPP0Vu5s9Mcvf8jwGLVTVT4bwWcjWAmoBXgAuBnwAt5C4irwIXF1wMXEWhnn+c+Znt47ZvQRTTDzt2d1oN+wL28zBugqjnnxWRfwZWOL14r+e1A8Vv/Hmv55tg+Mm/9rdvQZTEeTpkGNx+HimE9Vve4ZSjD65jy0wU+VnhOwtox0n1iEiLiCwLq2GmvqrJNdeaW4qqK5OJxXTIMLj9PN7rzvDlO9piszrY1I6fAd/55AZmd8L+Hv2kENpkjCf58YyGgt/irMLTG9+qX6PqKP/zaG7oOxOrsyc+q4NN7fgJ/j2quiu0lhhThelHHkS6YHZSd0YTHehmtYzn1jmtDCuoLApWXsGU8hP814nIuUBaRI4SkZ8Cvw+pXcZ4svntvTSlrY5MocnjDiSrAy8sZwY3P8H/MmAy0AncQ662z9fDaJQxXsW5DEJYklpQz/jjearn/hNy9XpUVd8Np0nubKqnKcdtamo957ZHpQZ/VNph6mvAUz1F5C+BXwEjnce7gC+q6urAWmlMFaJUBiHoRVYDCeBWXsH0x88K318CX1XV3wGIyMnAbcBxYTTMGD9qGejKBeRqiuj1x1brmjD5Cf7v5gM/gKo+JSI1Tf0YU40g0x9uATl/17Frb1dgi86CvpAYU8xP8H9GRP6J3GCvAn8DPCkixwOo6rMhtM8MMrXOQwfZe3YLyFfet4aUQFM6TVcmQ9HujVUPPtvqZRM2P8G/xfk6v+j4SeQuBjMCaZEZtGqdxgi69+wWkPN7HXT29AC5iqfNDSma0gOrwW+zmEzY/NT2OS3MhpjBrR5pjKB7z24BudjQxgZ+dt7xHDi0cUB3N3HfzMVEn5/ZPr8G5uZX+YrI4cCvVHVmWI0zg0eQgdhr6ijo3nNxQO7KZMlks/QUvEV3NsvkcQcEEqSjNIvJDD5+0j5PAatE5ApgPPBN4MpQWmUGnaACsZ/UURi95+KA/PTGt0Lrnds8fRMmX4u8nOmdK4G3gA+q6hthNayYLfKKv4Euxqq2Xn3YQTSM17dpniYoQSzy+jzwPWAOubn9j4jIhaq6JrhmmsFsoGmMalNHYa8BCPr1ozrN0+5EBhc/aZ9PAyer6jbgHhF5ALid3llAxlQ0kECZlBkwUZzmaXcig4/nwm6q+klV3SYiw53Hz+Bh43URGSUiS0TkRRF5QUROFJH3icjjIvKy83X0AD6DSYikFCwL+iKX39B945vvVrWxe+GdyLudPezrtv0BBgM/aZ8TyZV4GAFMFJGp5Pb0/WqFU28GHlXV2SLSBAwDvg08oarXi8g1wDXAvGo+gEmWJMyACXKgOt9j16zSmVGa0oII3Dh7queeexTvRMzA+Un7/APw18AyAFVdIyKn9HeCUwH0FOALzjldQJeIfAI41Xna7cCTWPA3HiWhYFkQF7nCHntel7Mo7Ru/bWfUsCZP01KTkm5LGj/1/FHVTUWHMhVOOQLYDtwmIs+JyGInbXSoqm51XnMrcIjbySJykYi0iUjb9u3b/TR1UMrfvtvtdjIMdB/lfI/dTUbh4jvamL5wRcX9fZOSbksaPz3/TSJyEqBO+uZrwAseXv944DJVXSUiN5NL8XiiqouARZCb6umjrYOODbgZvyqtSN7b430mUZjpNptFVB9+ev6XAJeSW+C1mdwsn0srnLMZ2Kyqq5zHS8hdDN4UkbEAztdtfhqdNEkdcLM7nYEpt6F7Ma/bXg70TsTN0vYOpi9cwfmLV3m6C/HKfncq81Pb5y3gvHLfF5FvqerfF53zhohsEpFjVPUlYCawwflzAXC983VpNY1PiiQOuNmdTjDyPfa7V73OLSs3khLY2x2N/H1Y6xnsd8cbP2mfSj4D/L3L8cuAu5xU0SvAhY+m6GQAAA6pSURBVOTuOO4Vkb8FXnfONWUkbcAtqouc4qQ4lXLZzKM4d9pENr+9l3Udu1jw8Ia6F4wLo1NjvzveBRn8Xe8tVbUdKFlaTO4uwHiQtAqPUbnTiWsuulzPNz9Lauphozhjyvvr/tnC6NQE8bsT1393v4IM/okekA1bEua350XhTieuqQOvPd8oTJcNo1Mz0N+duP67VyP0nr8JThT+w9ZCve904pw6iMpdk1dBd2oG8rsT53/3agQZ/O8L8LVMwtXzTiduAbRQFO6a6q3a350o/ruHmYLyU97haODn5BZoTRGR44BZqvp/AFT17wJtmUm8et3pxDmA1vuuya+w0izV/O5E7d897BSUn3n+twLfAroBVHUt8NnAWmJMRER5RauX+euzWsbz9LwZ3PmlaTw9b0a/AaOe8+HDWL8ykM8TpX/3Wqzt8ZP2Gaaqz4j0Se33BNYSYyIkigPsfncxq9Tmeg9uBp1mCeLzROXfvRYpKD89/7dE5M9xZvWIyGxgayCtMCaCwljRWq2ge4Jur/fNJWtqegcQZJolyJ9PFP7da5GC8hP8LwX+CfiAiHQAXydX8sEYEzK3Im1eyzJ4fb3OHuXuVa9X3Ua/gkyzBP3zqbdapKAqpn1E5HJVvRkYq6ofcapyplT13cBaYYzpV9A9wQmjh9KVKS3Ke8vKjZw7bWLNer1BpVlynyc6g7VBCDsF5aXnf6Hz9acAqrrHAr8xtTXQnmDxQOiYEc3MPe2okuc1pWvfWw4izfLUxrfIFFwcG9MSmUH6gQgzBeVlwPcFEXkVOFhE1hYcF0BV9bjAW2WMKVFtT7DcQOi50yZyy8qNdPb0Bs049pbz+f6Cj0FKYPqRB9WvUTFQMfir6udE5P3AvwGzwm+SMaYcv/PXK61avXF2fNYElOM2M6YpnY7Forx68jTVU1XfAKaG3BZjTMAqTRmMytTGauRXvw5vSkdqcVZceBnwvVdVzxGR5+lbvM3SPsZEnJeB4jjWjCpOZZ3TOoF72zbH+g6m1rz0/C93vp4dZkOMMcGLW7kHL9xSWfe2bWb53JPZ05WJ3R1MvXjJ+ec3Wn8t/OYYY7zwU/ArzqkdN+VSWXu6Mkw9bFQdWxYvXtI+7+Jeqz+f9jkg8FYZY8qqpoxBHFM75UStAFtcVZznr6ojVfUAlz8jvQR+EXlVRJ4XkXYRaXOOXSsiHc6xdhE5M4gPY8xgV4uCX1EXpQJscRZkPf/+nOZsAF/ox6p6U43e35hBIYo15+thsKWy6qFWwd8YEwBLefQaTKmsevBT2K1aCjwmIqtF5KKC43NFZK2I/EpERrudKCIXiUibiLRt3769Bk01Jtos5WGCIqrh7rsuIuNUdYuIHAI8DlwGvAS8Re7CsIBc0bgv9vc6ra2t2tbWFmpbjYmLMLf3S7rB9rMVkdWq2lp8PPS0j6pucb5uE5EHgBNU9T8KGnYrsDzsdhgzmNQj5THYgqKbem9wU0uhBv/C8s/O3z8K/EBExubXDwCfAtaF2Q5jzMAkIShWqoM02ITd8z8UeMDZ+rEBuFtVHxWRX4tIC7m0z6vAxSG3wxhTpaQExaTNpAo1+KvqK7gUhFPVz4f5vsaYHLdUjd/0TVKCYtJmUtlUT5OIXG4SuaVqFHynb5ISFAdjHaT+hD7bJyg22yccA83l2oUjmnbs7mT6whXs6+4N2s0NKUDp7On9Pz+kMcXT82ZU/Ldb1t5REhQHW84/b7D9Ttdtto+JroHmcpMwCBhXbqmadEpABejdu9dr+iZJK2qTsnisFou8TETlA0ShfDCoxGrMRJtbqiaTVTJaffomzP1k+1O8/7AJhgX/BBtILncgFw4TPreVwDfOPo4bZ0+N1ergpe0dTF+4gvMXr2L6whUsa++od5P6iPOFydI+CTaQAa6kDALGWblUTS3SN0HkzaM+xTTuaU8L/glXbS43aTMj4sotfx12TjuooBjlKaZRvzB5YcHfVB0MkjQIaLwJMihG+e4yyhcmryznbwakXoOAJpqCHAuKcgXTKF+YvLKevzEmMEEHxajeXQ6GtKcFf2NMYMIIilGddx/VC5NXFvyNMYGKe1D0I6oXJi8s+BtjAlccFAdbyYTBwIK/MSZUcZ8PP1jZbB9jTGisDEh0WfA3xoTGyoBElwV/Y0xoBsN8+MEq9OAvIq+KyPMi0i4ibc6x94nI4yLysvN1dNjtMMbUXpQXaiVdrQZ8T1PVtwoeXwM8oarXi8g1zuN5NWqLMaaGkjT1M07qNdvnE8Cpzt9vB57Egr8xg1ac58MPVrXI+SvwmIisFpGLnGOHqupWAOfrIW4nishFItImIm3bt2+vQVONMSYZatHzn66qW0TkEOBxEXnR64mqughYBLk9fMNqoDHGJE3oPX9V3eJ83QY8AJwAvCkiYwGcr9vCbocxxpheoQZ/ERkuIiPzfwc+CqwDlgEXOE+7AFgaZjuMMcb0FXba51DgARHJv9fdqvqoiPwncK+I/C3wOvCZkNthjDGmQKjBX1VfAaa6HN8BzAzzvY0xxpRnK3yNMSaBLPgbY0wCWfA3xpgEsuBvjDEJZMHfGGMSyIK/SZwduztZs2mnbShiEs22cTSJYlsKGpNjPX+TGLaloDG9LPibxLAtBY3pZcHfJIZtKWhMLwv+JjFsS0FjetmAr0kU21LQmBwL/iZxbEtBYyztY4wxiWTB3xhjEsiCvzHGJJAFf2OMSaCaBH8RSYvIcyKy3Hl8rYh0iEi78+fMWrTDJIfV7zGmf7Wa7XM58AJwQMGxH6vqTTV6f5MgVr/HmMpC7/mLyATgLGBx2O9ljNXvMcabWqR9/gG4GsgWHZ8rImtF5FciMtrtRBG5SETaRKRt+/btoTfUREe1aRur32OMN6EGfxE5G9imqquLvvVz4M+BFmAr8EO381V1kaq2qmrrwQcfHGZTTYQsbe9g+sIVnL94FdMXrmBZe4fnc61+jzHehN3znw7MEpFXgd8AM0TkTlV9U1UzqpoFbgVOCLkdJiYGmrax+j3GeBPqgK+qfgv4FoCInApcparni8hYVd3qPO1TwLow22HiI5+22VeQJcynbbwGcKvfY0xl9artc4OItAAKvApcXKd2mIgJKm1j9XuM6V/Ngr+qPgk86fz987V6XxMv+bTN1UVTNS2QGxMsq+ppIsfSNsaEz4K/iSRL2xgTLqvtY4wxCWTB3xhjEsiCvzHGJJAFf2OMSSAL/sYYk0CiqvVugycish14LYCXOgh4K4DXqbU4tjuObQZrd63Fsd1xavPhqlpSHC02wT8oItKmqq31bodfcWx3HNsM1u5ai2O749jmYpb2McaYBLLgb4wxCZTE4L+o3g2oUhzbHcc2g7W71uLY7ji2uY/E5fyNMcYks+dvjDGJZ8HfGGMSKDHB39kofpuIxGbXMBE5TERWisgLIrJeRC6vd5u8EJEhIvKMiKxx2n1dvdvklYikReQ5EVle77b4ISKvisjzItIuIm31bo8XIjJKRJaIyIvO7/iJ9W5TJSJyjPMzzv95R0S+Xu92VSMxOX8ROQXYDdyhqlPq3R4vRGQsMFZVnxWRkcBq4JOquqHOTeuXiAgwXFV3i0gj8BRwuar+sc5Nq0hErgBagQNU9ex6t8crZ5/sVlWNy8IjROR24HequlhEmoBhqrqz3u3ySkTSQAcwTVWDWIBaU4np+avqfwD/U+92+KGqW1X1Wefv7wIvAOPr26rKNGe387DR+RP5XoaITADOAhbXuy2DnYgcAJwC/BJAVbviFPgdM4E/xTHwQ4KCf9yJyCTgg8Cq+rbEGyd90g5sAx5X1Ti0+x+Aq4FspSdGkAKPichqEbmo3o3x4AhgO3Cbk2ZbLCLD690onz4L3FPvRlTLgn8MiMgI4H7g66r6Tr3b44WqZlS1BZgAnCAikU61icjZwDZVXV3vtlRpuqoeD3wMuNRJc0ZZA3A88HNV/SCwB7imvk3yzklTzQLuq3dbqmXBP+KcnPn9wF2q+i/1bo9fzq38k8AZdW5KJdOBWU7u/DfADBG5s75N8k5VtzhftwEPACfUt0UVbQY2F9wRLiF3MYiLjwHPquqb9W5ItSz4R5gzcPpL4AVV/VG92+OViBwsIqOcvw8FPgK8WN9W9U9Vv6WqE1R1Ernb+RWqen6dm+WJiAx3JgTgpE4+CkR6VpuqvgFsEpFjnEMzgUhPZCjyOWKc8oEEbeAuIvcApwIHichmYL6q/rK+rapoOvB54Hknfw7wbVV9pI5t8mIscLszGyIF3KuqsZo6GTOHAg/k+go0AHer6qP1bZInlwF3OSmUV4AL69weT0RkGHA6cHG92zIQiZnqaYwxppelfYwxJoEs+BtjTAJZ8DfGmASy4G+MMQlkwd8YYxLIgr9JNBH5mlNR8i6Pz58kIucWPG4VkZ84f/+CiNzi/P0SEZlTcHxcGO03plqJmedvTBlfBT6mqv9d6Yki0gBMAs4F7gZQ1TagpISyqv6i4OEXyC262jLw5hoTDAv+JrFE5BfkCowtE5HfAH8O/AW5/xfXqupSEfkCuUqfQ4DhwDDgfzmL7m4HngOuKi7/LCLXkish/iq5EtF3iche4DvAl1T1U87zTge+oqr/O9xPa0xflvYxiaWql5DrjZ9GLrCvUNW/dB7fWFBl8kTgAlWdQa742O9UtUVVf+zhPZaQuzM4zyl09wi5i8fBzlMuBG4L8nMZ44UFf2NyPgpc4/TonyTX05/ofO9xVQ1kLwjNLan/NXC+U//oROBfg3htY/ywtI8xOQJ8WlVf6nNQZBq5csNBug14CNgH3KeqPQG/vjEVWc/fmJx/Ay5zKqkiIh8s87x3gZE+X7vPOU755S3Ad4F/9t1SYwJgwd+YnAXktptcKyLrnMdu1gI9zub03/D42v8M/MLZ8Huoc+wuYFPU92M2g5dV9TSmDpz1AM/FoKy4GaQs+BtTYyKymtw4wumq2lnv9phksuBvjDEJZDl/Y4xJIAv+xhiTQBb8jTEmgSz4G2NMAlnwN8aYBPr/82nrsEdahvEAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df.plot.scatter('fertility', 'life_expectancy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 3** Zaimportuj `LinearRegression` z pakietu `sklearn.linear_model`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tworzymy obiekt modelu regresji liniowej."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"model = LinearRegression()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Trening modelu ogranicza się do wywołania metodu `fit`, która przyjmuje dwa argumenty:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression()"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Współczynniki modelu:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wyraz wolny (bias): [83.2025629]\n",
"Współczynniki cech: [[-4.41400624]]\n"
]
}
],
"source": [
"print(\"Wyraz wolny (bias):\", model.intercept_)\n",
"print(\"Współczynniki cech:\", model.coef_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 4** Wytrenuj nowy model `model2`, który będzie jako X przyjmie kolumnę `gdp_log`. Wyświetl parametry nowego modelu."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mając wytrenowany model możemy wykorzystać go do predykcji. Wystarczy wywołać metodę `predict`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input: 6.2\t predicted: 55.835724214829476\t expected: 52.8\n",
"input: 1.76\t predicted: 75.43391191760767\t expected: 76.8\n",
"input: 2.73\t predicted: 71.15232586542415\t expected: 75.5\n",
"input: 6.43\t predicted: 54.82050277977565\t expected: 56.7\n",
"input: 2.16\t predicted: 73.66830942186188\t expected: 75.5\n"
]
}
],
"source": [
"X_test = X[:5,:]\n",
"y_test = y[:5,:]\n",
"output = model.predict(X_test)\n",
"\n",
"for i in range(5):\n",
" print(\"input: {}\\t predicted: {}\\t expected: {}\".format(X_test[i,0], output[i,0], y_test[i,0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sprawdzenie jakości modelu - metryki: $R^2$ i $MSE$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Istnieją 3 metryki, które określają jak dobry jest nasz model:\n",
" * $R^2$: [Współczynnik determinacji](https://pl.wikipedia.org/wiki/Wsp%C3%B3%C5%82czynnik_determinacji)\n",
" * $MSE$: [błąd średnio-kwadratowy](https://pl.wikipedia.org/wiki/B%C5%82%C4%85d_%C5%9Bredniokwadratowy) \n",
" * $RMSE = \\sqrt{MSE}$"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R^2: 0.5955222955385271\n",
"Root Mean Squared Error: 5.682032704570357\n"
]
}
],
"source": [
"from sklearn.metrics import mean_squared_error\n",
"\n",
"print(\"R^2: {}\".format(model.score(X, y)))\n",
"rmse = np.sqrt(mean_squared_error(y, model.predict(X)))\n",
"print(\"Root Mean Squared Error: {}\".format(rmse))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R^2: 0.6742169953190864\n",
"Root Mean Squared Error: 4.874062378427941\n"
]
}
],
"source": [
"# Import necessary modules\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"# Create training and test sets\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n",
"\n",
"# Create the regressor: reg_all\n",
"reg_all = LinearRegression()\n",
"\n",
"# Fit the regressor to the training data\n",
"reg_all.fit(X_train, y_train)\n",
"\n",
"# Predict on the test data: y_pred\n",
"y_pred = reg_all.predict(X_test)\n",
"\n",
"# Compute and print R^2 and RMSE\n",
"print(\"R^2: {}\".format(reg_all.score(X_test, y_test)))\n",
"rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n",
"print(\"Root Mean Squared Error: {}\".format(rmse))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regresja wielu zmiennych"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Model regresji liniowej wielu zmiennych nie różni się istotnie od modelu jednej zmiennej. Np. chcąc zbudować model oparty o dwie kolumny: `fertility` i `gdp` wystarczy zmienić X (cechy wejściowe):"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(175,)\n",
"(175, 1)\n"
]
},
{
"data": {
"text/plain": [
"0.6566838211706549"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = df[['fertility', 'gdp']]\n",
"print(df['fertility'].shape)\n",
"print(df[['fertility']].shape)\n",
"\n",
"model_mv = LinearRegression()\n",
"model_mv.fit(X, y)\n",
"\n",
"model_mv.score(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6** Która kombinacja dwóch kolumn daje najlepszy wynik w metryce $R^2$? Tak jak poprzednio, próbujemy przewidzieć zawartosć kolumny `life_expectancy`.\n",
"\n",
"Uwaga: Należy wyłączyć kolumnę `life_expectancy` spośród szukanych."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 7** \n",
" * Zbuduj model regresji liniowej, która oszacuje wartność kolumny `life_expectancy` na podstawie pozostałych kolumn.\n",
" * Wyświetl współczynniki modelu? Dla jakich cech współczynniki modelu są bliskie 0? Dlaczego?\n",
" * Oblicz wartości obu metryk na zbiorze trenującym.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6**\n",
"Wykonaj jedno z zadań 6.1 lub 6.2."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6.1** Zaimplementuj metrykę $R^2$ jako fukcję `r2` (szablon poniżej). Fukcja `r2` przyjmuje dwa parametry typu *list* i ma zwrócić wartość metryki $R^2$."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Real R2: 0.6566838211706549\n",
"Calculated: None\n"
]
}
],
"source": [
"def r2(expected, predicted):\n",
" \"\"\"\n",
" argumenty:\n",
" expected (type: list): poprawne wartości\n",
" predicted (type: list): oszacowanie z modelu\n",
" \"\"\"\n",
" pass\n",
"\n",
"y = df['life_expectancy'].values\n",
"X = df[['fertility', 'gdp']].values\n",
"\n",
"test_model = LinearRegression()\n",
"test_model.fit(X, y)\n",
"\n",
"print(\"Real R2:\", test_model.score(X, y))\n",
"\n",
"predicted = list(test_model.predict(X))\n",
"expected = list(y)\n",
"\n",
"print(\"Calculated:\", r2(expected, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Zaimplementuj metrykę $RMSE$ jako fukcję rmse (szablon poniżej). Fukcja rmse przyjmuje dwa parametry typu list i ma zwrócić wartość metryki $RMSE$ ."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Real R2: 5.234841906276239\n",
"Calculated: None\n"
]
}
],
"source": [
"def rmse(expected, predicted):\n",
" \"\"\"\n",
" argumenty:\n",
" expected (type: list): poprawne wartości\n",
" predicted (type: list): oszacowanie z modelu\n",
" \"\"\"\n",
" pass\n",
"\n",
"y = df['life_expectancy'].values\n",
"X = df[['fertility', 'gdp']].values\n",
"\n",
"test_model = LinearRegression()\n",
"test_model.fit(X, y)\n",
"\n",
"print(\"Real R2:\", np.sqrt(mean_squared_error(y, test_model.predict(X))))\n",
"\n",
"predicted = list(test_model.predict(X))\n",
"expected = list(y)\n",
"\n",
"print(\"Calculated:\", r2(expected, predicted))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}