Merge branch 'master' of git.wmi.amu.edu.pl:filipg/aitech-eks

This commit is contained in:
kubapok 2021-05-11 12:27:18 +02:00
commit 36155ad5b4
1 changed files with 106 additions and 18 deletions

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "continent-intermediate",
"id": "cathedral-newark",
"metadata": {},
"source": [
"# Regresja liniowa\n",
@ -23,7 +23,7 @@
},
{
"cell_type": "markdown",
"id": "original-speed",
"id": "heard-clinton",
"metadata": {},
"source": [
"Regresje liniowa (jednej zmiennej) jest łatwa do rozwiązania — wystarczy podstawić do wzoru!\n",
@ -43,7 +43,7 @@
},
{
"cell_type": "markdown",
"id": "significant-relaxation",
"id": "preceding-impression",
"metadata": {},
"source": [
"## Regresja wielu zmiennych\n",
@ -53,31 +53,45 @@
"\n",
"Cena mieszkań może być prognozowana na podstawie:\n",
"\n",
"* powierzchni ($x_1 = 32.3$) \n",
"* powierzchni w m$^2$ ($x_1 = 32.3$) $w_1 = 7000$\n",
"\n",
"* liczby pokoi ($x_2 = 3$)\n",
"* liczby pokoi ($x_2 = 3$) $w_2 = -30000$\n",
" \n",
"* piętra ($x_3 = 4$)\n",
"* nr piętra ($x_3 = 4$) \n",
"\n",
"* wieku ($x_4 = 13$)\n",
"* wieku ($x_4 = 13$) $w_3 = -1000$\n",
"\n",
"* odległości od Dworca Centralnego w Warszawie ($x_5 = 371.3$)\n",
"\n",
"* wielkość miasta\n",
"\n",
"* gęstość zaludnienia\n",
"\n",
"* cech zerojedynkowych:\n",
"\n",
" * czy wielka płyta? ($x_6 = 0$)\n",
"\n",
" * czy jest jacuzzi? ($x_7 = 1$)\n",
"\n",
" * czy jest grzyb? ($x_8 = 0$)\n",
" * czy jest jacuzzi? ($x_7 = 1$) $w_7 = 5000$\n",
"\n",
" * czy jest grzyb? ($x_8 = 0$) $w_8 = -40000$\n",
" \n",
" * czy to Kielce? ($x_9 = 1$)\n",
" \n",
" * czy to Kraków ($x_{10} = 0$)\n",
" \n",
" * czy to Katowice ($x_{11} = 0$)\n",
" \n",
" * czy obok budynku jest parking \n",
" \n",
" * czy w budynku jest parking\n",
"\n",
"* ...\n",
"* zakodowany opis \n",
"\n",
" * $(x_{12}, x_{|V|+12})$ - wektor tf-idf \n",
"\n",
"... więc uogólniamy na wiele ($k$) wymiarów:\n",
"\n",
"$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j $$\n",
"$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j = w_0 + \\vec{w}\\vec{x}$$\n",
"\n",
"gdzie:\n",
"\n",
@ -97,10 +111,10 @@
},
{
"cell_type": "markdown",
"id": "ordinary-appendix",
"id": "confused-increase",
"metadata": {},
"source": [
"## Kilka sporzeżeń\n",
"## Kilka spostrzeżeń\n",
"\n",
"Regresja liniowa to najprostszy możliwy model:\n",
"\n",
@ -114,7 +128,7 @@
},
{
"cell_type": "markdown",
"id": "egyptian-austria",
"id": "freelance-controversy",
"metadata": {},
"source": [
"## Uczenie\n",
@ -148,7 +162,7 @@
},
{
"cell_type": "markdown",
"id": "exact-train",
"id": "divine-medium",
"metadata": {},
"source": [
"## Ewaluacja regresji\n",
@ -171,7 +185,7 @@
},
{
"cell_type": "markdown",
"id": "selective-agriculture",
"id": "supreme-tennessee",
"metadata": {},
"source": [
"## Regresja liniowa dla tekstu\n",
@ -181,10 +195,84 @@
"![schemat regresji liniowej](08_files/regresja-liniowa-tekst.png)\n"
]
},
{
"cell_type": "markdown",
"id": "seasonal-syndication",
"metadata": {},
"source": [
"### Przykład \n",
"\n",
"Wyzwanie RetroC2 - odgadywanie roku dla krótkiego tekstu (1814-2013), <https://gonito.net/challenge/retroc2>.\n",
" \n",
"Lista słów (obcięta do 7 znaków) z największą/najmniejszymi wagami. \n",
"\n",
"```\n",
"wzbudze -0.08071490\n",
"paczka -0.08000180\n",
"szarpi -0.05906200\n",
"spadoch -0.05784140\n",
"rzymsko -0.05466660\n",
"sosnowy -0.05162170\n",
"dębowyc -0.04778910\n",
"nawinię -0.04649400\n",
"odmówie -0.04522140\n",
"zacisko -0.04480620\n",
"funkcją -0.04479500\n",
"werben -0.04423350\n",
"nieumyś -0.04415200\n",
"wodomie -0.04351570\n",
"szczote -0.04313390\n",
"exekucy -0.04297940\n",
"listew -0.04214090\n",
"daley -0.04145400\n",
"metro -0.04080110\n",
"wyjąwsz -0.04078060\n",
"salda -0.04042050\n",
"tkach -0.04020180\n",
"cetnar -0.03999050\n",
"zgóry -0.03855980\n",
"belek -0.03833100\n",
"formier -0.03805890\n",
"wekslu -0.03796510\n",
"odmową -0.03753760\n",
"\n",
"odwadni 0.04662140\n",
"dozując 0.04672770\n",
"wyników 0.04744650\n",
"sprawst 0.04746330\n",
"jakub 0.04750710\n",
"ścieran 0.04791070\n",
"wrodzon 0.04799800\n",
"koryguj 0.04843560\n",
"odnotow 0.04854360\n",
"tłumiąc 0.04917320\n",
"leasing 0.04963200\n",
"ecznej 0.04994810\n",
"2013r 0.05009500\n",
"kompens 0.05049060\n",
"comarch 0.05058620\n",
"pojazde 0.05078540\n",
"badanyc 0.05340480\n",
"kontakc 0.05377990\n",
"sygnali 0.05601120\n",
"piasta 0.05658670\n",
"2000r 0.05716820\n",
"stropni 0.06123470\n",
"oszone 0.06124600\n",
"zamonto 0.06424310\n",
"……….. 0.06498500\n",
"kumulat 0.06596770\n",
"faktura 0.07313080\n",
"wielost 0.09677770\n",
"wielomi 0.12307300\n",
"```\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "numerous-limitation",
"id": "encouraging-martial",
"metadata": {},
"outputs": [],
"source": []