Python2019/labs05/sklearn cz. 1.ipynb
2019-02-10 08:39:58 +01:00

607 lines
31 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kkolejna część zajęć będzie wprowadzeniem do drugiej, szeroko używanej biblioteki w Pythonie: `sklearn`. Zajęcia będą miały charaktere case-study poprzeplatane zadaniami do wykonania. Zacznijmy od załadowania odpowiednich bibliotek."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Zacznijmy od załadowania danych. Na dzisiejszych zajęciach będziemy korzystać z danych z portalu [gapminder.org](https://www.gapminder.org/data/)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('gapminder.csv', index_col=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dane zawierają różne informacje z większość państw świata (z roku 2008). Poniżej znajduje się opis kolumn:\n",
" * female_BMI - średnie BMI u kobiet\n",
" * male_BMI - średnie BMI u mężczyzn\n",
" * gdp - PKB na obywatela\n",
" * population - wielkość populacji\n",
" * under5mortality - wskaźnik śmiertelności dzieni pon. 5 roku życia (na 1000 urodzonych dzieci)\n",
" * life_expectancy - średnia długość życia\n",
" * fertility - wskaźnik dzietności"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 1**\n",
"Na podstawie danych zawartych w `df` odpowiedz na następujące pytania:\n",
" * Jaki był współczynniki dzietności w Polsce w 2018?\n",
" * W którym kraju ludzie żyją najdłużej?\n",
" * Z ilu krajów zostały zebrane dane?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 2** Stwórz kolumnę `gdp_log`, która powstanie z kolumny `gdp` poprzez zastowanie funkcji `log` (logarytm). \n",
"\n",
"Hint 1: Wykorzystaj funkcję `apply` (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html#pandas.Series.apply).\n",
"\n",
"Hint 2: Wykorzystaj fukcję `log` z pakietu `np`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naszym zadaniem będzie oszacowanie długości życia (kolumna `life_expectancy`) na podstawie pozostałych zmiennych. Na samym początku, zastosujemy regresje jednowymiarową na `fertility`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Y shape: (176,)\n",
"X shape: (176,)\n"
]
}
],
"source": [
"y = df['life_expectancy'].values\n",
"X = df['fertility'].values\n",
"\n",
"print(\"Y shape:\", y.shape)\n",
"print(\"X shape:\", X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Będziemy korzystać z gotowej implementacji regreji liniowej z pakietu sklearn. Żeby móc wykorzystać, musimy napierw zmienić shape na dwuwymiarowy."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Y shape: (176, 1)\n",
"X shape: (176, 1)\n"
]
}
],
"source": [
"y = y.reshape(-1, 1)\n",
"X = X.reshape(-1, 1)\n",
"\n",
"print(\"Y shape:\", y.shape)\n",
"print(\"X shape:\", X.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jeszcze przed właściwą analizą, narysujmy wykres i zobaczny czy istnieje \"wizualny\" związek pomiędzy kolumnami."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f10a1ee1978>"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEKCAYAAADw2zkCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3XuYHFWZ+PHv2z0znZBAEgPEXAioCPwID4nsLFHDspKIi6BR18gqIMrKxWcl3kCC10XZ/f0IYXVVvCzguiCgQpBNuIgigV1BjTuBSUi4rNEFkgmXJBsCE4aeme7390dXJz091dNV3VVdVV3v53nyZKbT1X16kpy3znvec46oKsYYY9ItE3UDjDHGRM+CgTHGGAsGxhhjLBgYY4zBgoExxhgsGBhjjMGCgTHGGCwYGGOMwYKBMcYYoCPqBnh14IEH6mGHHRZ1M4wxJlHWrVu3Q1UPqve8xASDww47jJ6enqibYYwxiSIiT3t5nqWJjDHGWDAwxhhjwcAYYwwWDIwxxmDBwBhjDCkNBjv786zf8iI7+/Mtuc4YY+IuMaWlQVnV28ey2zbQmckwVCxy5fuPZfG8maFdF5ad/Xm27hpg1pTxTJ2Yi6wdxpj2kKpgsLM/z7LbNvDqUJFXKQJwyW0bWHD4gWN2qI1eF5a4BSZjTPKlKk20ddcAnZmRH7kzk2HrroGmrmtl+qgyML2cH+bVoSKX3LbBUlfGmKakamQwa8p4horFEY8NFYvMmjK+4etafZdeDkzlEQrsC0yWLjLGNCpVI4OpE3Nc+f5jGdeZYf9cB+M6M1z5/mPrdqLV1+U6MnzibYeza89gqHfpbiOORgOaMcaMRVQ16jZ40t3drUHtTdTo5OvO/jw3rX2G79z/B7qyWfLDBTIZ4dWhfZ3z/rkObjx3PnMPmdxUG8cacazu7eMSmzMwxnggIutUtbve81KVJiqbOjHXcErluw9sJj+s5IeHSw8URgbTIO7S601YL543kwWHH2jVRMaYwKQyGDTKLV/flRUQIZfdd5de2Tk3MgrZtO0lMsiIx6rnBZoJaMYYU82CgQ9u+frBgvKFU49k/uumjurwK1M9g4UiF550OGfMnz1mJ76qt49LVm4gPzzyfQaGhm1ewBgTmlRNIDeichJ36sQcn337EaOe8/V7/3tUIKguAc0PF/mne/+bt15xH6t7+2q+17LbRgcCgGICpnZshbYxyWUjgzFUT+Ke3j2Ln/x+y6jnuZV2uqWUAPLDWnPBWq1roBQMNm17iROPqHtgUSRsIZwxyWYjgxrcFnfd8NtnGCyMvkUfGCowoSs74jG3lFJZZybDpm0veSobHWnfe8fpLtwWwhmTfKEHAxH5jIhsEpGNIvJjERknIq8TkbUisllEfioiXWG3wy+3Vce1DBeV0779IN++7w97O8Dy2oRcx+jXGBga5rwbejjrurUsWL5mb9porGs6MsKMSaU5g1W9fSxYvmbU9VFpdGW3MSY+Qg0GIjIT+CTQrarHAFngg8By4BuqejiwC/hYmO1oRP279JHc5gQWz5vJby5dyEUnH0GuQ5wFa4KIkB92v4tePG8mdy09gTOPn01XVujKlv6KsgLvuvpBbvrd07G7C7eFcMYkXyvSRB3AeBHpAPYDngUWAiudP78eeG8L2uGL26pjqX/Z3jmByhHC0kVv5DeXLuLGc+dz7dndjOsYmVLKZoQ71m/jzvV9LP/545z27V+zev02QCloqZPNF5RXh4p89Y5NdGTcy06j0ujKbmNMfIQ6gayqfSJyFfAMMAD8ElgHvKiqzqottgKuM40icj5wPsDs2bPDbKqrysVduweG+MRND/NyfrjudW4TyuV1ATv786PuovfkC1x2x2MjHtu7qK36tbOZUfMWcbgLt4VwxiRb2GmiKcB7gNcBM4AJwCler1fVa1S1W1W7DzqosSqaZidap07MMfeQycyZcQADQyM7aIG9aZxKY3XOlXfRE3JZ1+eMpaDK37/76FjehZd/VnFoizHGn7BLS98O/I+qbgcQkZ8BC4DJItLhjA5mAaHMgAZd7igiVFb0dGSFuz95Aj/f+BxXO/sVua1Crla+i77/iRf48r9vZMBlXcHe98hANpOhKzvyM5wy57V2F26MCUzYweAZ4M0ish+lNNEioAe4H1gC/AT4CLAq6DcO+kCarbsGGNeRZaiwb3QwriPLnsECSxe9kTPmz/bVOU+dmOOkow6mSO3VZLkOYcWSua7pF9uOwhgTpFDTRKq6ltJE8cPAo877XQMsAz4rIpuBqcAPgn7voMsd3beiKLB7YHDv6mS/KZKpE3OsWDIXl0pSsgJfefccFs+baekXY0zo2nYL6539eRYsXzNie+lxnRkeWraw4U61cuvogaFhRKQ0WmgyBbWzP89v/7iTT//0ESozRs22t/o9LK1kTPp43cK6bVcgh1HuuHjeTB5atpDvnHkc2UyGoYIGUus/dWKOQ16zHx0uk9HVI5lGJsRbvUgtTqujjTHetPXeRGGUO06dmGPS+E66spkRG8o1e/TkhK7siFEMwKtDxRHbXDQyIe5n7iSI0YPtUWRMMrV1MIBwJlrDWHG7Z7BALivkK9YQ5LLCnsECUL9Tr9WRez0zOYhOPOhJe2NM67R9MAhDOQVVffRkucNr5A571pTxSEZGnJwmGdkbYGp16pu27aZ3y+69R3FWd+ReAldQnbjXwGOMiR8LBi68dObVKSiA9VteZGPfbi6/6zHfd9j1Aoxbp/7qcIFzr+/ZuyK5vGr5cyv3deT1XheC68RtjyJjksuCQRU/6ZJyZ1u+Jiv70jqN3GGPNcdR3akPFooUikWGXNar5YeL3Lz2GZYuemPd1wX3TjxfKM1X+BnleAk8xph4atvS0kY0Uo7qdk2l/XMd3HjufOYeMjmwNnrZKynXIfzm0kWeO+Jy2SyUJq5zWaEIqCrjOzt8jXKsjNWY+Eh9aWkjGlmoVu/cg8o0SRAll1Mn5va+3mCh9jYWXdmsrwV2i+fN5M4LT6DonK+ZLyhDBWW4yN7y2YtvXc/m51/21MZWLpKzUlZjmmdpogqN5LxrnXswIZelUNS9aZLq9NOX33U0x8yY5OvueWd/npvWPrN3srhQLNKZFbo6MuzJF3y1282ewQK5jiyDBffRxmBBOfXbD3LVkviUi1opqzHBsGBQoZGct9s1Xz7taI6Zua+jd6vW+eLtG5nQlaWg6qkDW9XbxyUrN+xd21CeLM51wPfOPI4t/zswauLa7525lwN9BoeLsSkXtVJWY4JjwaBKIwvV6l1T66D78mRzvQ6s3OnlXXY37cpmmTS+ixPffDCnHNPcTqbVge3V4QLFolJ97HNcykWtlNWY4FgwcOF1oVr1RGmta+rdcdfrwGoFExiZDvLS7nqTu9WBbdeeQU799oMMVgQit3UKUUwYWymrMcGxYNAgvyWo5TvujAivDPrL79cKJp1Z+PJpR3vugL22uTKoTJ2Y46oltVNnjebsgwggVspqTHCstLQBje6IetPvnuard2xCpHRWci4rSEY8daCVO6a+MujsmNqZYbjobc6h2V1c3TrvRl8z6ElfK2U1pjavpaU2MmhAI7nqnf15Lr/rsRHnF6sId114AodP27/mNeVOrpy+2bTtJc67oYf8cJH+vLc5h0ba7JYC8zIX4uXnEPSkrx30Y0zzLBh4VNk5NpKrdus4c9nM3knkarXunt12TM2IsGnbbk484uCad/C7BwYZLHhLT7m9t9sEeVA/B5v0NSZ6Fgw8cOsc/eaqx+o4qzvwse6e3V7nlcEC593Qw990H8It67aOaJPC3rYXtXSmcuWKYrdtrKvf+6Jb15MRRm2E10jO3iZ9jYmnUOcMRORI4KcVD70e+AowGTgP2O48/gVVvXus14pqzmCsvDjgK1ddmfd366zLjx06dQJnXbd2xFYTldtarO7t43Mr3UtNK+U6BJARz+vKCl8/fR5vecNU1zav3/LiqPeuVj0v4Ddn7/ZzsIVixoQjFnMGqvokMM9pUBboA24HzgG+oapXhfn+Qdi6a4CsyIjHymkNv1suuO10Wg40lSOAOy88Ycy758XzZjJ5vy4+/qN1vDLknmYCyEoGRjadwYLy2Vt6ueoDc107YC8Lz6rTOn5z9mEcOmSMaU4r9yZaBPxRVZ9u4Xs2bWPf7lF5/WbSGpX79tTaC2nPYKHukZ1zZhxAkbFHdQUtUiiOfs5gQWse01l9XGiuI0NH1b+SINI6rd6/yBgztlbOGXwQ+HHF9xeKyNlAD3CRqu6qvkBEzgfOB5g9e3ZLGlmpXAFUrbq2v9HSxrHy53MPmTzq7rn6farz9ad3z+KWnpFzBgAXr9wwYtEYjD1pW33n/s1f/Tc3/O6ZvX9+evcs68SNaTMtWWcgIl3ANmCOqj4vItOAHYAClwPTVfVvx3qNKOYM3PLnE3JZbj73zcyaMp6tuwbGPMzGS5Dwmj+vVV3kNvlc/Z6bn3+ZU7/16xFlrZV5/7Ha2ez6BGNMtGIxZ1DhncDDqvo8QPl3ABG5FrizRe3wxe3OvVBUNvbt5m+u+S0dGdlb619d9fPg5h2eFlZVrh8AZc6MSaOeU682v/oQHLeJ3as+MNe16qfeArAgSkHTuCgsjZ/ZJFurgsGHqEgRich0VX3W+fZ9wMYWtcOXylRMNiMMFZTPvv0ILr/rsZqH2ZTPJfazsKpe4GikQ3br5B9attBzCWszawnqtaPdK4fS+JlN8oU+gSwiE4CTgZ9VPHyliDwqIhuAk4DPhN2ORi2eN5Mvn3Y0Q8NFOjPCVb98csznDwwNA+L5kJzKDrl8iEz15K7fDrnWawIjJm29HOZTPaHsNpldi5fP1m7S+JlNewh9ZKCqe4CpVY99OOz3DUrlNhJ7V/BW7+lcQUSYMWmc587by12/38VdtU44qx5JeA0yjZaCpnG1cRo/s2kPtgK5DvdtJAQVoUNkVJ1/ZzbDtt0DnjvvMDrkCV3ZUWmsV4dKB9zDyHy213Y2sv9PGlcbp/Ezm/ZgwaAO10VYAncvPYEnnnuJz96yYcRZxOWtIVYsmTsqR+9m1LzEcJFz3npYzed66ZD3DBbIZYV8xQgmlxX2DBY8zSUEJY1bTKfxM5v2YFtYe7C6t4/P3tJLuVS/Myt86PhDuKVnK1rUEZ1uWa5DuPbsbubMmOSpI7jpd0/zldUbKceVjgx8/fR5DU087uzP89Yr7iM/PLKU9M4LT+BdVz/Y8jLRNFbWpPEzm3iKW2lpoi04/ECymQzDzghhqKDc8NtnxrwmP6x8/MaHKXo443hnf56v3fkYFQMMhovwuZXrG9ra+cHNO6hceNyRgSvffyx7BguR5LPTuMV0Gj+zSbZWbkeRWFt3DdCV9f+jemWw4KmaZOuuAbIZGfV4VtwrkMZSrmYZqhitZDOZmjueWj7bGAMWDDzxsnlbVkrpo3HVG/lQu6y08vXd9hAqqP+O2q1ctCu77+6/0TJR05id/XnWb3nRSktN7FmayIPKSUEoVeZkgQKluYFCURERch2lQ2c6szLizrze3ffUiTlWLDmWi25dv/e6rMCKJXMDr+CxHUNbxxafmSSxkYFHi+fN5M4LT6Do3MGXC0rzw4pqaR6hP19gqKCoKrkOf3ffi+fN5LJ3z6EzC+M7MnRkR6eNvPBy9287hobPFp+ZpLGRgQ97Bgt0ZjOjjo+sLiYa39nBd848jknjO8e8+97Zn9+7J9GMSeO5/K7HGCrAUJNnAyf17r+dKnBs8ZlJGgsGPsyaMp6hwthzB1BKy8yZccCY/+lX9fZxUUW5akdGqB4MNNN5JK2apd1SKjZZb5LG0kQ+TJ2Y4+/fPWfU451ZIdchntNCO/vzXLJyPcMjSklHr1dIS+fRjikVm6w3SWMjA5/OfPOhIPDVOx6jM1uaPL7y/cf6SsuUjtLMsG/moaRUvqrkOrKpWrnqJ6USVSqpkfdNarrOpJMFgwacOf9QTpnz2lH/yb3+Z581ZTwFHZ1uEoG7lv4FewYLqeo8vKZUokolNfO+SUvXmfSyNFGDmqnIKZWSzh1xtnBnVlix5FgOn7Z/IJU+Sapv95JSiSqV1I4pLGPc2MggIm4nnAV1B+nnTjaotEuzr1MvpRJVdY5VBZm0sGAQkEY6w6kTc5x4xEGBt8PrKWvVQePLpx3NMTMn+e7Qg0rfjJVSiao6x6qCTFpYMAhAnMoi693JloPWhK7sqKDxxX/fyMRcluFi/c31ytyCz8W3rufo6Qdw+LT9A/tcrdwaujqw25bUJg08BwMR+RnwA+Dnqi6zn+7XHAn8tOKh1wNfAW5wHj8MeAo4XVV3eW1LHIzVqTa6WCwIY93JVgatfKGIuGxf3p8vVTh5/QxuwWewoJz67Qe5akmwQbEV1Tm1ArtVBZl252cC+bvAGcAfROQKp6Mfk6o+qarzVHUe8GfAK8DtwKXAfar6RuA+5/vEWNXbx4LlazjrurWc+q1fj/rzehvThanWZCwwYiJ0cLjoeg5DmdfPUGsTv8FhfxOtXie8w9xKY6zJYtvCIzmSVDwRJ55HBqr6K+BXIjIJ+JDz9RbgWuBGVR2q8xKLgD+q6tMi8h7gbc7j1wMPAMt8tj0SbmmR6v0oos4pu93Jrt/y4qg7+K6soKp0dWTZMzhyzYPXz1AOPhffup7Bqp+D14nWMNJsjczh2GRx8sUpZZs0vkpLRWQq8FHgXOAR4JvAccC9Hi7/IPBj5+tpqvqs8/VzwLQa73e+iPSISM/27dv9NDU0bltE57JCl8+N6cJWfSfrdgc/WFAQ4ey3HMo/vu+YhlfLLp43k7s/+Rd0VW3f7SWghFG6WTlyW7B8Dat7+zxdZ5PFyWZlwM3xHAxE5Hbg18B+wLtVdbGq/lRVlwIT61zbBSwGbq3+My2du+mar1DVa1S1W1W7Dzoo2KqbRrl1GJIR7l56AjeeO5+Hli0M9E4kqCFv+Q4+1zFyA6ShgvK9//gTKDy0bGHDn+Hwaftz1RL/2y+4Bddm0mzNdAi2hUSyBf1vKW38VBN9S1Xvd/sDD+drvhN4WFWfd75/XkSmq+qzIjIdeMFHOyJVq7okyMqZsqCHvIvnzWTyfp1c8KN1DAyNDGhfvWMTpxzzWuYeMrmp1/c70Rr03XizqR6bLE4uG9k1x08wOFpEHlHVFwFEZArwIVX9rodrP8S+FBHAauAjwBXO76t8tCNyregw/KwX8GPOjEmjttwG6MwGkxv3u/1C0KWbQXQItoVEMlkZcHP8BIPzVPU75W9UdZeInEepyqgmEZkAnAxcUPHwFcAtIvIx4GngdB/tiIWwO4ywJjNLO68ezRdv3zji8YJqZHdQQQZX6xDSzUZ2jfMTDLIiIk6OHxHJAl31LlLVPcDUqsd2UqouMjWEOeQ9c/6hoKXUUGc2Q0E18g4zyOBqHUK62ciuMX6CwT3AT0XkX5zvL3AeMyEI+g63utTyzDcfyinHjN55tVFxO6XMOgRj/BF1WYXq+kSRDKUAUL6jvxe4TlULta8KTnd3t/b09LTirWIliE427Nprq+02Jr5EZJ2HIh/vwSBqcQ8GcbszLtvZn2fB8jW8WlE9NK4zw0PLFgbSzrBf32sb4vizD1taP7fxx2sw8LM30QLgMuBQ5zqhtEzg9Y02Monc/gPG+c447FW1Ua/ajfPPPkxp/dwmPH7mDH4AfAZYR/V5jSnh9h9wweEHxmqjumph115HWdsdVvlt3KX1c5tw+dmOYreq/lxVX1DVneVfobUsZmqtbN207aVYr3oMe1VtlKt207riNK2f24TLz8jgfhFZAfwM2Lu2X1UfDrxVMVQrHQIa+1WPtUotd/bn2bRtNyDMmXFAwx14VKWcaV1xOtbntnkE0yg/wWC+83vlRIQCC4NrTnzV+g84Z8akRCxyqi61XNXbx8W3rmfIWY7ckYGvnz6v4bxzFKWcaV1gVutzP7h5h80jmIZZNZEPq3v7Rv0HLP9nS9Id2c7+PG+9Yg354ZHBLdch/ObSRbFvf7Uk/eyDVPm5gcirukw8BV5N5LzoacAcYFz5MVX9mv/mJdNY6ZAkLXLaumuAbEZGPZ6VZO7dH6effSsDU+Xndjuvws5iMH74KS39PqXtq08CrgOWAL8PqV2xFaeOp1GzpoynUBw9Iixo++fbwxRluWda509McPxUE71VVc8GdqnqV4G3AEeE0ywTpqkTc6xYciyd2X2jg44MrFgyN/GBLipRH6xiZzGYZvlJE5Xr1l4RkRnATmB68E0yrVBOeQVRTVRPGnL6btVmWRHuf+IFTjrq4JZ8btugzzTDTzC4U0QmAyuAhylVEl0XSqtMS0ydmOPEIw4O9T3SslLWLU2zZ7DAZXds4kurNrbsc7dDGtNEw0+a6EpVfVFVb6O0JcVRwD+E0yzTDqJOnbRSZZpmQld27+P9+UJbf+5agjqu1bSOn5HBb4HjAFQ1D+RF5OHyY8ZUi3rfolYrp2nuf+IFLrtjE/35fbu2NPq5k5hiS8tosN3UDQYi8lpgJjBeRN5EaYM6gAMoVRcZ4yqNFS5TJ+Y46aiD+dKqkSfJNfK5k9ip2r5JyeUlTfRXwFXALOCfKn59BvhCvYtFZLKIrBSRJ0TkcRF5i4hcJiJ9ItLr/Dq1mQ9hWsPv0D+tFS5BfO6kpths36TkqjsyUNXrgetF5P3OfIFf3wTuUdUlItJFaTTxV8A3VPWqBl7PNKDZdMOq3j4uWbmBbEYoFJUVS7zdpaa1wqXZz53UFFsaR4Ptws+cwZ+JyH2q+iKAiEwBLlLVL9W6QEQmAScCHwVQ1UFgUGT06lcTnkbSDdVbHVTuYwRw0a3rPQ/901rh0sznTmqnmtb9otqBn2DwTlXdmxZS1V1OeqdmMABeB2wHfigicymdhfAp588uFJGzgR5KQWVX9cUicj5wPsDs2bN9NNWUNZLDrQ4e57z1sBGBAGCooGzatjv00tS0SnKnmtbRYNL5CQZZEck5lUSIyHig3t9yB6Vqo6WqulZEvglcClwNXE5prcLllOYg/rb6YlW9BrgGShvV+WircfhNN7gFjx88+FSNV7cRXpiS3KmmdTSYZH6CwU3AfSLyQ+f7c4Dr61yzFdiqqmud71cCl6rq8+UniMi1wJ0+2mF88JtucA0eHUKhCJWDg44MzJlxQCht9iuJ5ZdeNdKptvPPw4THczBQ1eUish54u/PQ5ar6izrXPCciW0TkSFV9ElgEPCYi01X1Wedp7wM21n4V0wy/6Qa34FEoKl97zzF87c5NZCVDQYux2ccoieWXYbKfh2mUr/MMRORQ4I2q+isR2Q/IqurLda6ZR2nbii7gT5RGFN8C5lFKEz0FXFARHFzF4TyDJPNzt1jr3Ia43XHu7M/bHv4V7Odh3AR+noGInEdpMvc1wBsoLUT7PqW7/ZpUtZeRp6MBfNjr+5pg+Ek31MpVxy0PnNTyy7C4/TwyCJu2vcSJRxwUYctMEvjZm+gTwALgJQBV/QNgpSRtaurEHHMPmRzrTtUtpTVYKMS+/DIsbj+PV4YKnHdDD6t7+yJqlUkKP8Eg76wTAEBEOiileYyJRHk+pKPiX3FR4aHNO6JrVITKP49cx8gqr/xwMlYvm2j5CQb/ISJfoLRH0cnArcAd4TTLGG8WHH4g2YrtD4YKmuqOb/G8mVx7djf7VeycCrYlhKnPTzC4lNICskeBC4C7GXvBmTGh27prgK6s7YVTac6MSRSrCkOSsHrZRMtPaWlRRK4H1lJKDz2pfkqRjAlBUrdtCFOSVy+b6PipJjqNUvXQHyktPX2diFygqj8Pq3HG1BPHji8OJbhJXr1souFnBfI/ASep6mYAEXkDcBdgwcBEKk4dX9CLvpoJLHErBTbx5icYvFwOBI4/AWMuODOmVVrZ8dXqoIM+2MVWE5tW8hMMekTkbuAWSnMGHwD+S0T+GkBVfxZC+4xpWpBpG7cOujwq2T0wGNgiODsxzLSan2AwDnge+Evn++3AeODdlIKDBQNTV6vz6UHeXbt10Bfdup6MQFc2y2ChQLGqpKLRyWxbXW1azU810TnVj4lIV+VCNGPG0uq0R9B3124ddPmch/zwMFDazTXXkaEr29xktlVJmVbzvM5ARB4QkcMqvv9z4L9CaJNpQ1Gc6Rv0ebxuHXS18Z0dXHt2NzeeO5+Hli1sONil9fxoEx0/aaL/B9wjIt+itEndqZR2IDWmriDTHl5TTUHfXVeXsQ4WihSKRYYr3mKoWGTOjAMC6bTjVCVl2p+fNNEvROTjwL3ADuBNqvpcaC0zbSWojtlPqimMNQjVHfRDm3eEtsYhDusVTHp4Ps9ARL4MnE5pG+tjgc9QOrv4rvCat4+dZ5B8tc5J8KrR/frD7lTDeH0rKzVBCfw8A2AqcLyqDgC/FZF7KB1a05JgYJKv2bRHo6mmsNcgBP36cSwrtVFK+/OTJvo0gIjsp6qvqOrTwMmhtcy0pWY6zrRU2MStrNRGKengp5roLSLyGPCE8/1cEfmuh+smi8hKEXlCRB53Xuc1InKviPzB+X1KE5/BpERaKmyCDno7+/Os3/Iim59/mfVbXvRVwRVFFZiJhp800T8DfwWsBlDV9SJyoofrvgnco6pLRKQL2A/4AnCfql4hIpdS2h57mb+mmzRKQ4VNkBPf5bt6LSr5gtKVFURgxZK5nu7u4zZKMeHxEwxQ1S0iI05RKoz1fBGZBJwIfNS5fhAYFJH3AG9znnY98AAWDIxHadiALYigV3lXXzboLJL7zE97mbxfV90y2LSk5oy/w222iMhbARWRThG5GHi8zjWvo7RtxQ9F5BERuU5EJgDTVPVZ5znPAdPcLhaR80WkR0R6tm/f7qOp7ak83Lchejo0ew6126K7soLCBTf0sGD5mjHPR05Las74Ky09kFLK5+2UzjP4JfApVd05xjXdwO+ABaq6VkS+CbwELFXVyRXP26WqY84bpL201CbxjF9upbhuoizPtSql8HktLfU8MlDVHap6pqpOU9WDVfWsykAgIp93uWwrsFVV1zrfrwSOA54XkenOddOBF7y2I43SOolnI6HmlO/qcx0y5vO8bNHR7CjFzarePhYsX8NZ162tO0Lxw/7dNMbXnEEdH6C0ZcVeqvoaVNV8AAAPB0lEQVSciGwRkSNV9UlgEfCY8+sjwBXO76sCbEfbSeMkno2EglGee7h57TNcff9mMgIDQ9HPAYS1lsL+3TQuyGBQ6/ZjKXCTU0n0J0r7GWWAW0TkY8DTlFY2mxrSNokXx0VXSVOdflm66I2cMX82W3cNsLFvN5ff9Vikx4SGcYNj/26aE2QwcJ18UNVewC1ftSjA925rcTznN0xxGQklNZ9d6+64XIU195DJnHLMayP9bGHc4AT17yapf+/NasXIwAQgDfX1ZXEYCSU13eD17jjq8twwbnCC+HeT1L/3IAQZDG4N8LWMi6j/A7dK1COhJKcb4jKq8iLoG5xm/90k+e89CJ6DgYgcAXyP0hqBY0TkWGCxqv4DgKr+35DaaFIoypFQkjrUanEYVflR/nmWq5ma/fk28+8mjn/vrUxZ+RkZXAt8DvgXAFXdICI3A/8QRsOMiWoklLQOtVLUoyq/wkjLNPrvJm5/761OWflZgbyfqv6+6rHhIBtjTBzEedWtlxr6xfNm8tCyhZ6O3oyyJj+M9TPNfJ44/b1HsbbIz8hgh4i8AadqSESWAM+OfYkxyRTHCXu/p7zVa3PUk6VBp2WC+Dxx+XuPImXlJxh8ArgGOEpE+oD/Ac4MpVXGxECcJuyDntx0e73PrVzf0snSINMyQf584vD3HkXKqm6aSEQ+5Xw5XVXfDhwEHKWqJzgH3BhjQua26ZyXbST8vF5+WLl57TMNt9GvINMyQf98ohZFysrLyOAcShvUfRs4TlX3hNYaY4yroO8UZ00Zz2Bh9A70V9+/mTPmz27ZnXFQaZnS54nP5G8QWp2y8jKB/LiI/AE4UkQ2VPx6VEQ2hNo6YwzQ/J1i9cTq1Ik5LjzpjaOe15Vt/d10EJvgPbh5B4WKYNmZldhM+jcjjA0Ca6k7MlDVD4nIa4FfAItDb5ExxlWjd4q1JlbPmD+bq+/fTH54XyeaxLvp8nxBxccgI7Dg8AOja1QCeSotVdXnVHWuqj5d/SvsBhpj9vF7pzhWieLUiTlWLIlHKWUz3OYLurLZxM4XRKXuyEBEblHV00XkUUZuRieAquqxobXOGNOUeiWKcSmlbER5de6ErmysFosllZcJ5HI10bvCbIgxJnheJp7jUErpV3Xq6/TuWdzSszURq67jysucwbPO75YSMiZhkrY9hRduawpu6dnKnReewJ7BQuJGOHHhJU30Mu5nFZTTRAcE3ipjzJj8bGCW5FSQm1qprz2DBeYeMnmMK81YvIwM9m9FQ4wx3jSy7UISU0G1xG1DuXbhZ6O6hojIU86ahF4R6XEeu0xE+pzHekXk1LDbYUw7iGIDs7iJ04Zy7STIw23GcpKq7qh67BuqelWL3t+YthDHPfej0G6przhoVTAwxgTAUiT7tFPqKw5CTxNRmnz+pYisE5HzKx6/0NnW4l9FZIrbhSJyvoj0iEjP9u3bW9BUY+LNUiQmLKLqVigU4BuIzFTVPhE5GLgXWAo8CeygFCgup7Qj6t+O9Trd3d3a09MTaluNSYpWHoeYNu32sxWRdaraXe95oaeJVLXP+f0FEbkdOF5V/7P85yJyLXBn2O0wpp20OkXSbh1kLVEf+BOlUIOBiEwAMqr6svP1O4Cvicj08mI24H3AxjDbYYxpXFo6yKAPEEqasEcG04DbRaT8Xjer6j0i8iMRmUcpTfQUcEHI7TDGNCBNHWTaK7VCDQaq+idgrsvjHw7zfY0xJW7pHT8pnzR1kGmv1LLSUpOafHDauKV3FHylfNLUQbbjPk5+hF5NFBSrJgpHs/lgCyTxtLM/z4Lla3h1aF9HnuvIAEp+eN//+XGdGR5atnDMv7vVvX2jOsh2nDMoa7d/07GpJjLx1Ww+OC0Ti0nklt7JZgRUgH1nH3tJ+aRttW9aF7O1YtGZiSm3E6LKnUM9tkdOvLmldwpFpaCNpXxaeRZvterzm004LBikWDP54GYCiQmf20rlFUuOZcWSuYlavbyqt48Fy9dw1nVrWbB8Dat7+6Ju0gjtFKgsTZRizUyYpWliMalqpXfCTvkElXOPe1lru6VJLRikXKP54LRXXiSFW/47zJx4kB1knMta4x6oGmHBwDTcOaRtYtGMLegOMs6jzzgHqkbZnIFpSpQTiyZegp5HivMOrXEOVI2ykYExJhBhdJBxHX22Y5rUgoExJhBhdZBxrfuPa6BqlAUDY0xg2q2DrCeugaoRFgyMMYFy6yDbbYuHdmTBwBgTqnarx29XVk1kjAmNbVuSHBYMjDGhsW1LksOCgTEmNO1Yj9+uQg8GIvKUiDwqIr0i0uM89hoRuVdE/uD8PiXsdhhjWi/OC8fMSK2aQD5JVXdUfH8pcJ+qXiEilzrfL2tRW4wxLZS2ctOkiqqa6D3A25yvrwcewIKBMW2rnerx21Ur5gwU+KWIrBOR853Hpqnqs87XzwHT3C4UkfNFpEdEerZv396CphpjTDq1YmRwgqr2icjBwL0i8kTlH6qqiojrQcyqeg1wDZTOQA6/qcYYk06hjwxUtc/5/QXgduB44HkRmQ7g/P5C2O0wxhhTW6jBQEQmiMj+5a+BdwAbgdXAR5ynfQRYFWY7jDHGjC3sNNE04HYRKb/Xzap6j4j8F3CLiHwMeBo4PeR2GGOMGUOowUBV/wTMdXl8J7AozPc2xhjjna1ANsYYY8HAGGOMBQNjjDFYMDDGGIMFA2OMMVgwMCm0sz/P+i0v2gErxlSwYy9NqtgRjMa4s5GBSQ07gtGY2iwYmNSwIxiNqc2CgUkNO4LRmNosGJjUsCMYjanNJpBNqtgRjMa4s2BgUseOYDRmNEsTGWOMsWBgjDHGgoExxhgsGBhjjKFFwUBEsiLyiIjc6Xz/byLyPyLS6/ya14p2mPSw/YeM8adV1USfAh4HDqh47HOqurJF729SxPYfMsa/0EcGIjILOA24Luz3Msb2HzKmMa1IE/0zcAlQrHr8H0Vkg4h8Q0Rci75F5HwR6RGRnu3bt4feUBMfjaZ5bP8hYxoTajAQkXcBL6jquqo/+jxwFPDnwGuAZW7Xq+o1qtqtqt0HHXRQmE01MbKqt48Fy9dw1nVrWbB8Dat7+zxfa/sPGdOYsEcGC4DFIvIU8BNgoYjcqKrPakke+CFwfMjtMAnRbJrH9h8ypjGhTiCr6ucpjQIQkbcBF6vqWSIyXVWfFREB3gtsDLMdJjnKaZ5XK7KK5TSP1w7d9h8yxr+o9ia6SUQOAgToBT4eUTtMzASV5rH9h4zxp2XBQFUfAB5wvl7Yqvc1yVJO81xSVRpqHbsx4bJdS03sWJrHmNazYGBiydI8xrSW7U1kjDHGgoExxhgLBsYYY7BgYIwxBgsGxhhjAFHVqNvgiYhsB54O4KUOBHYE8DqtlsR2J7HNYO1utSS2O0ltPlRV627ulphgEBQR6VHV7qjb4VcS253ENoO1u9WS2O4ktrkeSxMZY4yxYGCMMSadweCaqBvQoCS2O4ltBmt3qyWx3Uls85hSN2dgjDFmtDSODIwxxlRJTTAQkX8VkRdEJDEH6YjIISJyv4g8JiKbRORTUbfJCxEZJyK/F5H1Tru/GnWbvBKRrIg8IiJ3Rt0Wr0TkKRF5VER6RaQn6vZ4JSKTRWSliDwhIo+LyFuiblM9InKk83Mu/3pJRD4ddbuCkJo0kYicCPQDN6jqMVG3xwsRmQ5MV9WHRWR/YB3wXlV9LOKmjck5wW6CqvaLSCfwIPApVf1dxE2rS0Q+C3QDB6jqu6JujxfOsbLdqpqUuncAROR64Neqep2IdAH7qeqLUbfLKxHJAn3AfFUNYg1UpFIzMlDV/wT+N+p2+OGcFf2w8/XLwOPAzGhbVZ9zvnW/822n8yv2dx0iMgs4Dbgu6ra0OxGZBJwI/ABAVQeTFAgci4A/tkMggBQFg6QTkcOANwFro22JN066pRd4AbhXVZPQ7n8GLgGK9Z4YMwr8UkTWicj5UTfGo9cB24EfOmm560RkQtSN8umDwI+jbkRQLBgkgIhMBG4DPq2qL0XdHi9UtaCq84BZwPEiEuvUnIi8C3hBVddF3ZYGnKCqxwHvBD7hpETjrgM4Dvieqr4J2ANcGm2TvHPSWouBW6NuS1AsGMSck3O/DbhJVX8WdXv8cob+9wOnRN2WOhYAi538+0+AhSJyY7RN8kZV+5zfXwBuB46PtkWebAW2VowYV1IKDknxTuBhVX0+6oYExYJBjDkTsT8AHlfVr0fdHq9E5CARmex8PR44GXgi2laNTVU/r6qzVPUwSsP/Nap6VsTNqktEJjjFBThplncAsa+YU9XngC0icqTz0CIg1oURVT5EG6WIIEVnIIvIj4G3AQeKyFbg71X1B9G2qq4FwIeBR538O8AXVPXuCNvkxXTgeqfaIgPcoqqJKdVMmGnA7aX7BjqAm1X1nmib5NlS4CYn5fIn4JyI2+OJE3RPBi6Iui1BSk1pqTHGmNosTWSMMcaCgTHGGAsGxhhjsGBgjDEGCwbGGGOwYGBSTkQ+6eyYeZPH5x8mImdUfN8tIt9yvv6oiFztfP1xETm74vEZYbTfmKCkZp2BMTX8HfB2Vd1a74ki0gEcBpwB3Aygqj3AqG2jVfX7Fd9+lNJCsG3NN9eYcFgwMKklIt8HXg/8XER+ArwBOIbSLquXqeoqEfko8NfARCAL5ID/4ywCvB54BLi4ertrEbmM0pbpT1HaEvsmERkAvgicp6rvdZ53MvB3qvq+cD+tMWOzNJFJLVX9OKW79ZOACZS2oDje+X5FxS6axwFLVPUvKW2m9mtVnaeq3/DwHispjRzOdDbuuxs4SkQOcp5yDvCvQX4uYxphwcCYkncAlzp3/A8A44DZzp/dq6qBnIWhpSX/PwLOcvZvegvw8yBe25hmWJrImBIB3q+qT454UGQ+pe2Vg/RD4A7gVeBWVR0O+PWN8c1GBsaU/AJY6uwUi4i8qcbzXgb29/naI65R1W2U0lNfohQYjImcBQNjSi6nNHG8QUQ2Od+72QAURGS9iHzG42v/G/B95wD18c5jNwFbVPXxZhptTFBs11JjIuCsR3gkAduom5SwYGBMi4nIOkrzECeraj7q9hgDFgyMMcZgcwbGGGOwYGCMMQYLBsYYY7BgYIwxBgsGxhhjsGBgjDEG+P/sX2GfauxyHQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df.plot.scatter('fertility', 'life_expectancy')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 3** Zaimportuj `LinearRegression` z pakietu `sklearn.linear_model`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tworzymy obiekt modelu regresji liniowej."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'LinearRegression' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-8-28d4d21f64c7>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLinearRegression\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mNameError\u001b[0m: name 'LinearRegression' is not defined"
]
}
],
"source": [
"model = LinearRegression()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Trening modelu ogranicza się do wywołania metodu `fit`, która przyjmuje dwa argumenty:"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.fit(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Współczynniki modelu:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wyraz wolny (bias): [82.92548061]\n",
"Współczynniki cech: [[-4.29315763]]\n"
]
}
],
"source": [
"print(\"Wyraz wolny (bias):\", model.intercept_)\n",
"print(\"Współczynniki cech:\", model.coef_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 4** Wytrenuj nowy model `model2`, który będzie jako X przyjmie kolumnę `gdp_log`. Wyświetl parametry nowego modelu."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mając wytrenowany model możemy wykorzystać go do predykcji. Wystarczy wywołać metodę `predict`."
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"input: 6.2\t predicted: 56.30790328839987\t expected: 52.8\n",
"input: 1.76\t predicted: 75.36952317631349\t expected: 76.8\n",
"input: 2.73\t predicted: 71.2051602728729\t expected: 75.5\n",
"input: 6.43\t predicted: 55.32047703294488\t expected: 56.7\n",
"input: 2.16\t predicted: 73.6522601233483\t expected: 75.5\n"
]
}
],
"source": [
"X_test = X[:5,:]\n",
"y_test = y[:5,:]\n",
"output = model.predict(X_test)\n",
"\n",
"for i in range(5):\n",
" print(\"input: {}\\t predicted: {}\\t expected: {}\".format(X_test[i,0], output[i,0], y_test[i,0]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sprawdzenie jakości modelu - metryki: $R^2$ i $MSE$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Istnieją 3 metryki, które określają jak dobry jest nasz model:\n",
" * $R^2$: [Współczynnik determinacji](https://pl.wikipedia.org/wiki/Wsp%C3%B3%C5%82czynnik_determinacji)\n",
" * $MSE$: [błąd średnio-kwadratowy](https://pl.wikipedia.org/wiki/B%C5%82%C4%85d_%C5%9Bredniokwadratowy) \n",
" * $RMSE = \\sqrt{MSE}$"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R^2: 0.5793180095847132\n",
"Root Mean Squared Error: 5.77824860865276\n"
]
}
],
"source": [
"from sklearn.metrics import mean_squared_error\n",
"\n",
"print(\"R^2: {}\".format(model.score(X, y)))\n",
"rmse = np.sqrt(mean_squared_error(y, model.predict(X)))\n",
"print(\"Root Mean Squared Error: {}\".format(rmse))"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"R^2: 0.4881890528165924\n",
"Root Mean Squared Error: 5.889906845544505\n"
]
}
],
"source": [
"# Import necessary modules\n",
"from sklearn.linear_model import LinearRegression\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"# Create training and test sets\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state=42)\n",
"\n",
"# Create the regressor: reg_all\n",
"reg_all = LinearRegression()\n",
"\n",
"# Fit the regressor to the training data\n",
"reg_all.fit(X_train, y_train)\n",
"\n",
"# Predict on the test data: y_pred\n",
"y_pred = reg_all.predict(X_test)\n",
"\n",
"# Compute and print R^2 and RMSE\n",
"print(\"R^2: {}\".format(reg_all.score(X_test, y_test)))\n",
"rmse = np.sqrt(mean_squared_error(y_test, y_pred))\n",
"print(\"Root Mean Squared Error: {}\".format(rmse))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Regresja wielu zmiennych"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Model regresji liniowej wielu zmiennych nie różni się istotnie od modelu jednej zmiennej. Np. chcąc zbudować model oparty o dwie kolumny: `fertility` i `gdp` wystarczy zmienić X (cechy wejściowe):"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(176,)\n",
"(176, 1)\n"
]
},
{
"data": {
"text/plain": [
"0.6421567875738732"
]
},
"execution_count": 149,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X = df[['fertility', 'gdp']]\n",
"print(df['fertility'].shape)\n",
"print(df[['fertility']].shape)\n",
"\n",
"model_mv = LinearRegression()\n",
"model_mv.fit(X, y)\n",
"\n",
"model_mv.score(X, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6** Która kombinacja dwóch kolumn daje najlepszy wynik w metryce $R^2$? Tak jak poprzednio, próbujemy przewidzieć zawartosć kolumny `life_expectancy`.\n",
"\n",
"Uwaga: Należy wyłączyć kolumnę `life_expectancy` spośród szukanych."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 7** \n",
" * Zbuduj model regresji liniowej, która oszacuje wartność kolumny `life_expectancy` na podstawie pozostałych kolumn.\n",
" * Wyświetl współczynniki modelu? Dla jakich cech współczynniki modelu są bliskie 0? Dlaczego?\n",
" * Oblicz wartości obu metryk na zbiorze trenującym.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6**\n",
"Wykonaj jedno z zadań 6.1 lub 6.2."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**zad. 6.1** Zaimplementuj metrykę $R^2$ jako fukcję `r2` (szablon poniżej). Fukcja `r2` przyjmuje dwa parametry typu *list* i ma zwrócić wartość metryki $R^2$."
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Real R2: 0.6421567875738732\n",
"Calculated: None\n"
]
}
],
"source": [
"def r2(expected, predicted):\n",
" \"\"\"\n",
" argumenty:\n",
" expected (type: list): poprawne wartości\n",
" predicted (type: list): oszacowanie z modelu\n",
" \"\"\"\n",
" pass\n",
"\n",
"y = df['life_expectancy'].values\n",
"X = df[['fertility', 'gdp']].values\n",
"\n",
"test_model = LinearRegression()\n",
"test_model.fit(X, y)\n",
"\n",
"print(\"Real R2:\", test_model.score(X, y))\n",
"\n",
"predicted = list(test_model.predict(X))\n",
"expected = list(y)\n",
"\n",
"print(\"Calculated:\", r2(expected, predicted))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Zaimplementuj metrykę $RMSE$ jako fukcję rmse (szablon poniżej). Fukcja rmse przyjmuje dwa parametry typu list i ma zwrócić wartość metryki $RMSE$ ."
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Real R2: 5.329244618514514\n",
"Calculated: None\n"
]
}
],
"source": [
"def rmse(expected, predicted):\n",
" \"\"\"\n",
" argumenty:\n",
" expected (type: list): poprawne wartości\n",
" predicted (type: list): oszacowanie z modelu\n",
" \"\"\"\n",
" pass\n",
"\n",
"y = df['life_expectancy'].values\n",
"X = df[['fertility', 'gdp']].values\n",
"\n",
"test_model = LinearRegression()\n",
"test_model.fit(X, y)\n",
"\n",
"print(\"Real R2:\", np.sqrt(mean_squared_error(y, test_model.predict(X))))\n",
"\n",
"predicted = list(test_model.predict(X))\n",
"expected = list(y)\n",
"\n",
"print(\"Calculated:\", r2(expected, predicted))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}