Up linear regression

This commit is contained in:
Filip Gralinski 2021-05-10 13:36:40 +02:00
parent 9866eb875e
commit 91c4d13617
1 changed files with 106 additions and 18 deletions

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "continent-intermediate",
"id": "cathedral-newark",
"metadata": {},
"source": [
"# Regresja liniowa\n",
@ -23,7 +23,7 @@
},
{
"cell_type": "markdown",
"id": "original-speed",
"id": "heard-clinton",
"metadata": {},
"source": [
"Regresje liniowa (jednej zmiennej) jest łatwa do rozwiązania — wystarczy podstawić do wzoru!\n",
@ -43,7 +43,7 @@
},
{
"cell_type": "markdown",
"id": "significant-relaxation",
"id": "preceding-impression",
"metadata": {},
"source": [
"## Regresja wielu zmiennych\n",
@ -53,31 +53,45 @@
"\n",
"Cena mieszkań może być prognozowana na podstawie:\n",
"\n",
"* powierzchni ($x_1 = 32.3$) \n",
"* powierzchni w m$^2$ ($x_1 = 32.3$) $w_1 = 7000$\n",
"\n",
"* liczby pokoi ($x_2 = 3$)\n",
"* liczby pokoi ($x_2 = 3$) $w_2 = -30000$\n",
" \n",
"* piętra ($x_3 = 4$)\n",
"* nr piętra ($x_3 = 4$) \n",
"\n",
"* wieku ($x_4 = 13$)\n",
"* wieku ($x_4 = 13$) $w_3 = -1000$\n",
"\n",
"* odległości od Dworca Centralnego w Warszawie ($x_5 = 371.3$)\n",
"\n",
"* wielkość miasta\n",
"\n",
"* gęstość zaludnienia\n",
"\n",
"* cech zerojedynkowych:\n",
"\n",
" * czy wielka płyta? ($x_6 = 0$)\n",
"\n",
" * czy jest jacuzzi? ($x_7 = 1$)\n",
"\n",
" * czy jest grzyb? ($x_8 = 0$)\n",
" * czy jest jacuzzi? ($x_7 = 1$) $w_7 = 5000$\n",
"\n",
" * czy jest grzyb? ($x_8 = 0$) $w_8 = -40000$\n",
" \n",
" * czy to Kielce? ($x_9 = 1$)\n",
" \n",
" * czy to Kraków ($x_{10} = 0$)\n",
" \n",
" * czy to Katowice ($x_{11} = 0$)\n",
" \n",
" * czy obok budynku jest parking \n",
" \n",
" * czy w budynku jest parking\n",
"\n",
"* ...\n",
"* zakodowany opis \n",
"\n",
" * $(x_{12}, x_{|V|+12})$ - wektor tf-idf \n",
"\n",
"... więc uogólniamy na wiele ($k$) wymiarów:\n",
"\n",
"$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j $$\n",
"$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j = w_0 + \\vec{w}\\vec{x}$$\n",
"\n",
"gdzie:\n",
"\n",
@ -97,10 +111,10 @@
},
{
"cell_type": "markdown",
"id": "ordinary-appendix",
"id": "confused-increase",
"metadata": {},
"source": [
"## Kilka sporzeżeń\n",
"## Kilka spostrzeżeń\n",
"\n",
"Regresja liniowa to najprostszy możliwy model:\n",
"\n",
@ -114,7 +128,7 @@
},
{
"cell_type": "markdown",
"id": "egyptian-austria",
"id": "freelance-controversy",
"metadata": {},
"source": [
"## Uczenie\n",
@ -148,7 +162,7 @@
},
{
"cell_type": "markdown",
"id": "exact-train",
"id": "divine-medium",
"metadata": {},
"source": [
"## Ewaluacja regresji\n",
@ -171,7 +185,7 @@
},
{
"cell_type": "markdown",
"id": "selective-agriculture",
"id": "supreme-tennessee",
"metadata": {},
"source": [
"## Regresja liniowa dla tekstu\n",
@ -181,10 +195,84 @@
"![schemat regresji liniowej](08_files/regresja-liniowa-tekst.png)\n"
]
},
{
"cell_type": "markdown",
"id": "seasonal-syndication",
"metadata": {},
"source": [
"### Przykład \n",
"\n",
"Wyzwanie RetroC2 - odgadywanie roku dla krótkiego tekstu (1814-2013), <https://gonito.net/challenge/retroc2>.\n",
" \n",
"Lista słów (obcięta do 7 znaków) z największą/najmniejszymi wagami. \n",
"\n",
"```\n",
"wzbudze -0.08071490\n",
"paczka -0.08000180\n",
"szarpi -0.05906200\n",
"spadoch -0.05784140\n",
"rzymsko -0.05466660\n",
"sosnowy -0.05162170\n",
"dębowyc -0.04778910\n",
"nawinię -0.04649400\n",
"odmówie -0.04522140\n",
"zacisko -0.04480620\n",
"funkcją -0.04479500\n",
"werben -0.04423350\n",
"nieumyś -0.04415200\n",
"wodomie -0.04351570\n",
"szczote -0.04313390\n",
"exekucy -0.04297940\n",
"listew -0.04214090\n",
"daley -0.04145400\n",
"metro -0.04080110\n",
"wyjąwsz -0.04078060\n",
"salda -0.04042050\n",
"tkach -0.04020180\n",
"cetnar -0.03999050\n",
"zgóry -0.03855980\n",
"belek -0.03833100\n",
"formier -0.03805890\n",
"wekslu -0.03796510\n",
"odmową -0.03753760\n",
"\n",
"odwadni 0.04662140\n",
"dozując 0.04672770\n",
"wyników 0.04744650\n",
"sprawst 0.04746330\n",
"jakub 0.04750710\n",
"ścieran 0.04791070\n",
"wrodzon 0.04799800\n",
"koryguj 0.04843560\n",
"odnotow 0.04854360\n",
"tłumiąc 0.04917320\n",
"leasing 0.04963200\n",
"ecznej 0.04994810\n",
"2013r 0.05009500\n",
"kompens 0.05049060\n",
"comarch 0.05058620\n",
"pojazde 0.05078540\n",
"badanyc 0.05340480\n",
"kontakc 0.05377990\n",
"sygnali 0.05601120\n",
"piasta 0.05658670\n",
"2000r 0.05716820\n",
"stropni 0.06123470\n",
"oszone 0.06124600\n",
"zamonto 0.06424310\n",
"……….. 0.06498500\n",
"kumulat 0.06596770\n",
"faktura 0.07313080\n",
"wielost 0.09677770\n",
"wielomi 0.12307300\n",
"```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "numerous-limitation",
"id": "encouraging-martial",
"metadata": {},
"outputs": [],
"source": []