From 91c4d13617e48f2d39b2d73013ece3477e99aa06 Mon Sep 17 00:00:00 2001 From: Filip Gralinski Date: Mon, 10 May 2021 13:36:40 +0200 Subject: [PATCH] Up linear regression --- wyk/08_Regresja_liniowa.ipynb | 124 +++++++++++++++++++++++++++++----- 1 file changed, 106 insertions(+), 18 deletions(-) diff --git a/wyk/08_Regresja_liniowa.ipynb b/wyk/08_Regresja_liniowa.ipynb index 4e01af8..121951f 100644 --- a/wyk/08_Regresja_liniowa.ipynb +++ b/wyk/08_Regresja_liniowa.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "continent-intermediate", + "id": "cathedral-newark", "metadata": {}, "source": [ "# Regresja liniowa\n", @@ -23,7 +23,7 @@ }, { "cell_type": "markdown", - "id": "original-speed", + "id": "heard-clinton", "metadata": {}, "source": [ "Regresje liniowa (jednej zmiennej) jest łatwa do rozwiązania — wystarczy podstawić do wzoru!\n", @@ -43,7 +43,7 @@ }, { "cell_type": "markdown", - "id": "significant-relaxation", + "id": "preceding-impression", "metadata": {}, "source": [ "## Regresja wielu zmiennych\n", @@ -53,31 +53,45 @@ "\n", "Cena mieszkań może być prognozowana na podstawie:\n", "\n", - "* powierzchni ($x_1 = 32.3$) \n", + "* powierzchni w m$^2$ ($x_1 = 32.3$) $w_1 = 7000$\n", "\n", - "* liczby pokoi ($x_2 = 3$)\n", + "* liczby pokoi ($x_2 = 3$) $w_2 = -30000$\n", " \n", - "* piętra ($x_3 = 4$)\n", + "* nr piętra ($x_3 = 4$) \n", "\n", - "* wieku ($x_4 = 13$)\n", + "* wieku ($x_4 = 13$) $w_3 = -1000$\n", "\n", "* odległości od Dworca Centralnego w Warszawie ($x_5 = 371.3$)\n", "\n", + "* wielkość miasta\n", + "\n", + "* gęstość zaludnienia\n", + "\n", "* cech zerojedynkowych:\n", "\n", " * czy wielka płyta? ($x_6 = 0$)\n", "\n", - " * czy jest jacuzzi? ($x_7 = 1$)\n", - "\n", - " * czy jest grzyb? ($x_8 = 0$)\n", + " * czy jest jacuzzi? ($x_7 = 1$) $w_7 = 5000$\n", "\n", + " * czy jest grzyb? ($x_8 = 0$) $w_8 = -40000$\n", + " \n", " * czy to Kielce? ($x_9 = 1$)\n", + " \n", + " * czy to Kraków ($x_{10} = 0$)\n", + " \n", + " * czy to Katowice ($x_{11} = 0$)\n", + " \n", + " * czy obok budynku jest parking \n", + " \n", + " * czy w budynku jest parking\n", "\n", - "* ...\n", + "* zakodowany opis \n", + "\n", + " * $(x_{12}, x_{|V|+12})$ - wektor tf-idf \n", "\n", "... więc uogólniamy na wiele ($k$) wymiarów:\n", "\n", - "$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j $$\n", + "$$ y = w_0 + w_1x_1 + \\ldots + w_kx_k = w_0 + \\sum_{j=1}^{k} w_jx_j = w_0 + \\vec{w}\\vec{x}$$\n", "\n", "gdzie:\n", "\n", @@ -97,10 +111,10 @@ }, { "cell_type": "markdown", - "id": "ordinary-appendix", + "id": "confused-increase", "metadata": {}, "source": [ - "## Kilka sporzeżeń\n", + "## Kilka spostrzeżeń\n", "\n", "Regresja liniowa to najprostszy możliwy model:\n", "\n", @@ -114,7 +128,7 @@ }, { "cell_type": "markdown", - "id": "egyptian-austria", + "id": "freelance-controversy", "metadata": {}, "source": [ "## Uczenie\n", @@ -148,7 +162,7 @@ }, { "cell_type": "markdown", - "id": "exact-train", + "id": "divine-medium", "metadata": {}, "source": [ "## Ewaluacja regresji\n", @@ -171,7 +185,7 @@ }, { "cell_type": "markdown", - "id": "selective-agriculture", + "id": "supreme-tennessee", "metadata": {}, "source": [ "## Regresja liniowa dla tekstu\n", @@ -181,10 +195,84 @@ "![schemat regresji liniowej](08_files/regresja-liniowa-tekst.png)\n" ] }, + { + "cell_type": "markdown", + "id": "seasonal-syndication", + "metadata": {}, + "source": [ + "### Przykład \n", + "\n", + "Wyzwanie RetroC2 - odgadywanie roku dla krótkiego tekstu (1814-2013), .\n", + " \n", + "Lista słów (obcięta do 7 znaków) z największą/najmniejszymi wagami. \n", + "\n", + "```\n", + "wzbudze -0.08071490\n", + "paczka -0.08000180\n", + "szarpi -0.05906200\n", + "spadoch -0.05784140\n", + "rzymsko -0.05466660\n", + "sosnowy -0.05162170\n", + "dębowyc -0.04778910\n", + "nawinię -0.04649400\n", + "odmówie -0.04522140\n", + "zacisko -0.04480620\n", + "funkcją -0.04479500\n", + "werben -0.04423350\n", + "nieumyś -0.04415200\n", + "wodomie -0.04351570\n", + "szczote -0.04313390\n", + "exekucy -0.04297940\n", + "listew -0.04214090\n", + "daley -0.04145400\n", + "metro -0.04080110\n", + "wyjąwsz -0.04078060\n", + "salda -0.04042050\n", + "tkach -0.04020180\n", + "cetnar -0.03999050\n", + "zgóry -0.03855980\n", + "belek -0.03833100\n", + "formier -0.03805890\n", + "wekslu -0.03796510\n", + "odmową -0.03753760\n", + "\n", + "odwadni 0.04662140\n", + "dozując 0.04672770\n", + "wyników 0.04744650\n", + "sprawst 0.04746330\n", + "jakub 0.04750710\n", + "ścieran 0.04791070\n", + "wrodzon 0.04799800\n", + "koryguj 0.04843560\n", + "odnotow 0.04854360\n", + "tłumiąc 0.04917320\n", + "leasing 0.04963200\n", + "ecznej 0.04994810\n", + "2013r 0.05009500\n", + "kompens 0.05049060\n", + "comarch 0.05058620\n", + "pojazde 0.05078540\n", + "badanyc 0.05340480\n", + "kontakc 0.05377990\n", + "sygnali 0.05601120\n", + "piasta 0.05658670\n", + "2000r 0.05716820\n", + "stropni 0.06123470\n", + "oszone 0.06124600\n", + "zamonto 0.06424310\n", + "……….. 0.06498500\n", + "kumulat 0.06596770\n", + "faktura 0.07313080\n", + "wielost 0.09677770\n", + "wielomi 0.12307300\n", + "```\n", + "\n" + ] + }, { "cell_type": "code", "execution_count": null, - "id": "numerous-limitation", + "id": "encouraging-martial", "metadata": {}, "outputs": [], "source": []