Up linear regression

This commit is contained in:
Filip Gralinski 2021-05-10 13:36:40 +02:00
parent 9866eb875e
commit 91c4d13617

View File

@ -2,7 +2,7 @@
"cells": [ "cells": [
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "continent-intermediate", "id": "cathedral-newark",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Regresja liniowa\n", "# Regresja liniowa\n",
@ -23,7 +23,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "original-speed", "id": "heard-clinton",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Regresje liniowa (jednej zmiennej) jest łatwa do rozwiązania — wystarczy podstawić do wzoru!\n", "Regresje liniowa (jednej zmiennej) jest łatwa do rozwiązania — wystarczy podstawić do wzoru!\n",
@ -43,7 +43,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "significant-relaxation", "id": "preceding-impression",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Regresja wielu zmiennych\n", "## Regresja wielu zmiennych\n",
@ -53,31 +53,45 @@
"\n", "\n",
"Cena mieszkań może być prognozowana na podstawie:\n", "Cena mieszkań może być prognozowana na podstawie:\n",
"\n", "\n",
"* powierzchni ($x_1 = 32.3$) \n", "* powierzchni w m$^2$ ($x_1 = 32.3$) $w_1 = 7000$\n",
"\n", "\n",
"* liczby pokoi ($x_2 = 3$)\n", "* liczby pokoi ($x_2 = 3$) $w_2 = -30000$\n",
" \n", " \n",
"* piętra ($x_3 = 4$)\n", "* nr piętra ($x_3 = 4$) \n",
"\n", "\n",
"* wieku ($x_4 = 13$)\n", "* wieku ($x_4 = 13$) $w_3 = -1000$\n",
"\n", "\n",
"* odległości od Dworca Centralnego w Warszawie ($x_5 = 371.3$)\n", "* odległości od Dworca Centralnego w Warszawie ($x_5 = 371.3$)\n",
"\n", "\n",
"* wielkość miasta\n",
"\n",
"* gęstość zaludnienia\n",
"\n",
"* cech zerojedynkowych:\n", "* cech zerojedynkowych:\n",
"\n", "\n",
" * czy wielka płyta? ($x_6 = 0$)\n", " * czy wielka płyta? ($x_6 = 0$)\n",
"\n", "\n",
" * czy jest jacuzzi? ($x_7 = 1$)\n", " * czy jest jacuzzi? ($x_7 = 1$) $w_7 = 5000$\n",
"\n",
" * czy jest grzyb? ($x_8 = 0$)\n",
"\n", "\n",
" * czy jest grzyb? ($x_8 = 0$) $w_8 = -40000$\n",
" \n",
" * czy to Kielce? ($x_9 = 1$)\n", " * czy to Kielce? ($x_9 = 1$)\n",
" \n",
" * czy to Kraków ($x_{10} = 0$)\n",
" \n",
" * czy to Katowice ($x_{11} = 0$)\n",
" \n",
" * czy obok budynku jest parking \n",
" \n",
" * czy w budynku jest parking\n",
"\n", "\n",
"* ...\n", "* zakodowany opis \n",
"\n",
" * $(x_{12}, x_{|V|+12})$ - wektor tf-idf \n",
"\n", "\n",
"... więc uogólniamy na wiele ($k$) wymiarów:\n", "... więc uogólniamy na wiele ($k$) wymiarów:\n",
"\n", "\n",
"$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j $$\n", "$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j = w_0 + \\vec{w}\\vec{x}$$\n",
"\n", "\n",
"gdzie:\n", "gdzie:\n",
"\n", "\n",
@ -97,10 +111,10 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "ordinary-appendix", "id": "confused-increase",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Kilka sporzeżeń\n", "## Kilka spostrzeżeń\n",
"\n", "\n",
"Regresja liniowa to najprostszy możliwy model:\n", "Regresja liniowa to najprostszy możliwy model:\n",
"\n", "\n",
@ -114,7 +128,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "egyptian-austria", "id": "freelance-controversy",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Uczenie\n", "## Uczenie\n",
@ -148,7 +162,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "exact-train", "id": "divine-medium",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Ewaluacja regresji\n", "## Ewaluacja regresji\n",
@ -171,7 +185,7 @@
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"id": "selective-agriculture", "id": "supreme-tennessee",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Regresja liniowa dla tekstu\n", "## Regresja liniowa dla tekstu\n",
@ -181,10 +195,84 @@
"![schemat regresji liniowej](08_files/regresja-liniowa-tekst.png)\n" "![schemat regresji liniowej](08_files/regresja-liniowa-tekst.png)\n"
] ]
}, },
{
"cell_type": "markdown",
"id": "seasonal-syndication",
"metadata": {},
"source": [
"### Przykład \n",
"\n",
"Wyzwanie RetroC2 - odgadywanie roku dla krótkiego tekstu (1814-2013), <https://gonito.net/challenge/retroc2>.\n",
" \n",
"Lista słów (obcięta do 7 znaków) z największą/najmniejszymi wagami. \n",
"\n",
"```\n",
"wzbudze -0.08071490\n",
"paczka -0.08000180\n",
"szarpi -0.05906200\n",
"spadoch -0.05784140\n",
"rzymsko -0.05466660\n",
"sosnowy -0.05162170\n",
"dębowyc -0.04778910\n",
"nawinię -0.04649400\n",
"odmówie -0.04522140\n",
"zacisko -0.04480620\n",
"funkcją -0.04479500\n",
"werben -0.04423350\n",
"nieumyś -0.04415200\n",
"wodomie -0.04351570\n",
"szczote -0.04313390\n",
"exekucy -0.04297940\n",
"listew -0.04214090\n",
"daley -0.04145400\n",
"metro -0.04080110\n",
"wyjąwsz -0.04078060\n",
"salda -0.04042050\n",
"tkach -0.04020180\n",
"cetnar -0.03999050\n",
"zgóry -0.03855980\n",
"belek -0.03833100\n",
"formier -0.03805890\n",
"wekslu -0.03796510\n",
"odmową -0.03753760\n",
"\n",
"odwadni 0.04662140\n",
"dozując 0.04672770\n",
"wyników 0.04744650\n",
"sprawst 0.04746330\n",
"jakub 0.04750710\n",
"ścieran 0.04791070\n",
"wrodzon 0.04799800\n",
"koryguj 0.04843560\n",
"odnotow 0.04854360\n",
"tłumiąc 0.04917320\n",
"leasing 0.04963200\n",
"ecznej 0.04994810\n",
"2013r 0.05009500\n",
"kompens 0.05049060\n",
"comarch 0.05058620\n",
"pojazde 0.05078540\n",
"badanyc 0.05340480\n",
"kontakc 0.05377990\n",
"sygnali 0.05601120\n",
"piasta 0.05658670\n",
"2000r 0.05716820\n",
"stropni 0.06123470\n",
"oszone 0.06124600\n",
"zamonto 0.06424310\n",
"……….. 0.06498500\n",
"kumulat 0.06596770\n",
"faktura 0.07313080\n",
"wielost 0.09677770\n",
"wielomi 0.12307300\n",
"```\n",
"\n"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"id": "numerous-limitation", "id": "encouraging-martial",
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [] "source": []