{ "cells": [ { "cell_type": "markdown", "id": "50d1198e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Regresja jądrowa

\n", "\n", "####
Karolina Oparczyk, Tomasz Grzybowski, Jan Nowak
\n" ] }, { "cell_type": "markdown", "id": "f792be04", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Regresja jądrowa używana jest jako funkcja wagi do opracowania modelu regresji nieparametrycznej. Nadaje ona niektórym elementom zbioru większą \"wagę\", która ma wpływ na ostateczny wynik. \n", "\n", "Można ją porównać do rysowania krzywej na wykresie punktowym tak, aby była jak najlepiej do nich dopasowana." ] }, { "cell_type": "markdown", "id": "ce35888e", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Właściwości regresji jądrowej:\n", "* symetryczna - wartość maksymalna leży pośrodku krzywej\n", "\n", "* powierzchnia pod krzywą funkcji wynosi 1\n", "* wartość funkcji jądrowej nie jest ujemna" ] }, { "cell_type": "markdown", "id": "dbe6165c", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Do implementacji regresji jądrowej można użyć wielu różnych jąder. Przykłady użyte w projekcie:\n", "* jądro Gaussa\n", "\\begin{equation}\n", "K(x) = \\frac1{h\\sqrt{2\\pi}}e^{-\\frac12(\\frac{x - x_i}h)^2}\n", "\\end{equation}\n", "" ] }, { "cell_type": "markdown", "id": "f46e92e5", "metadata": {}, "source": [ "* jądro Epanechnikova\n", "\\begin{equation}\n", "K(x) = (\\frac34)(1-(\\frac{x - x_i}h)^2) \\text{ dla } {|x|\\leq1}\n", "\\end{equation}\n", "" ] }, { "cell_type": "markdown", "id": "6d60bbc1", "metadata": {}, "source": [ "Istotne znaczenie ma nie tylko dobór jądra, ale również parametru wygładzania, czyli szerokości okna. W zależności od niego, punkty są grupowane i dla każdej grupy wyliczana jest wartość funkcji. Jeśli okno będzie zbyt szerokie, funkcja będzie bardziej przypominała prostą (under-fitting). Natomiast jeśli będzie zbyt wąskie, funkcja będzie za bardzo \"skakać\" (over-fitting).\n", "\n", "Wyliczenie wartości funkcji polega na wzięciu średniej ważonej z $y_{i}$\n", " dla takich $x_{i}$, które znajdują się blisko x, dla którego wyznaczamy wartość. Wagi przy $y_{i}$ dla x sumują się do 1 i są wyższe, kiedy $x_{i}$ jest bliżej x oraz niższe w przeciwnym przypadku." ] }, { "cell_type": "code", "execution_count": 10, "id": "4ae1bce9", "metadata": {}, "outputs": [], "source": [ "import ipywidgets as widgets\n", "import numpy as np\n", "import plotly.express as px\n", "import plotly.graph_objs as go\n", "import pandas as pd \n", "import KernelRegression\n", "\n", "fires_thefts = pd.read_csv('fires_thefts.csv', names=['x','y'])\n", "XXX = np.sort(np.array(fires_thefts.x))\n", "YYY = np.array(fires_thefts.y)" ] }, { "cell_type": "code", "execution_count": 11, "id": "loved-clinton", "metadata": {}, "outputs": [], "source": [ "\n", "dropdown_bw = widgets.Dropdown(options=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], value=1, description='Szerokość okna')\n", "\n", "def interactive_kernel(bw_manual):\n", " fig = px.scatter(x=XXX,y=YYY)\n", " for i in range(0, len(XXX), bw_manual):\n", " fig.add_vline(x=i)\n", " fig.show()\n", " \n", " Y_pred_gauss = KernelRegression.ker_reg(XXX, YYY, bw_manual, 'gauss')\n", " Y_pred_epanechnikov = KernelRegression.ker_reg(XXX, YYY, bw_manual, 'epanechnikov')\n", "\n", " fig = px.scatter(x=XXX,y=YYY)\n", " fig.add_trace(go.Scatter(x=XXX, y=np.array(Y_pred_gauss), name='Gauss', mode='lines'))\n", " fig.add_trace(go.Scatter(x=XXX, y=np.array(Y_pred_epanechnikov), name='Epanechnikov', mode='lines'))\n", " fig.show()" ] }, { "cell_type": "code", "execution_count": 12, "id": "injured-english", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a38d1ceed30b42d19a56951cf88e16e6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='Szerokość okna', options=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), value=1)…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "widgets.interact(interactive_kernel, bw_manual=dropdown_bw)" ] }, { "cell_type": "code", "execution_count": 10, "id": "0406c2c4", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "hovertemplate": "x=%{x}
y=%{y}", "legendgroup": "", "marker": { "color": "#636efa", "symbol": "circle" }, "mode": "markers", "name": "", "orientation": "v", "showlegend": false, "type": "scatter", "x": [ 2, 2.2, 2.2, 2.5, 3.4, 3.6, 4, 4.8, 5, 5.4, 5.6, 5.7, 6.2, 6.9, 7.2, 7.3, 7.7, 8.6, 9, 9.5, 10.5, 10.5, 10.7, 10.8, 11, 11.3, 11.9, 12.2, 15.1, 15.1, 16.5, 17.4, 18.4, 18.5, 21.6, 21.8, 23.3, 28.6, 29.1, 34.1, 36.2, 39.7 ], "xaxis": "x", "y": [ 29, 44, 36, 37, 53, 68, 75, 18, 31, 25, 34, 14, 11, 11, 22, 16, 27, 9, 29, 30, 40, 32, 41, 147, 22, 29, 46, 23, 4, 31, 39, 15, 32, 27, 32, 34, 17, 46, 42, 43, 34, 19 ], "yaxis": "y" }, { "mode": "lines", "name": "Gauss", "type": "scatter", "x": [ 2, 2.2, 2.2, 2.5, 3.4, 3.6, 4, 4.8, 5, 5.4, 5.6, 5.7, 6.2, 6.9, 7.2, 7.3, 7.7, 8.6, 9, 9.5, 10.5, 10.5, 10.7, 10.8, 11, 11.3, 11.9, 12.2, 15.1, 15.1, 16.5, 17.4, 18.4, 18.5, 21.6, 21.8, 23.3, 28.6, 29.1, 34.1, 36.2, 39.7 ], "y": [ 45.35291990598444, 45.23209793727989, 45.23209793727989, 44.932976281560826, 41.48437130002368, 40.61042382588775, 38.925594250967386, 33.92879256965945, 32.551542513167796, 29.875849422931722, 28.67530224525044, 27.96885069817401, 24.365959289735812, 20.599211267605632, 20.235117327476594, 20.228581460674157, 20.6183234421365, 30.07350283166646, 33.83362339018447, 37.61082760150877, 42.77730756033398, 42.77730756033398, 43.36324537951877, 43.56181350782893, 43.99071649607236, 44.74597026115207, 45.96985294117647, 45.46339514066496, 22.453306066803, 22.453306066803, 24.66946711473836, 26.004702970297032, 27.43594646271511, 27.372644574398965, 28.936850851682596, 28.63091865641441, 26.441903019213175, 44.02816901408451, 43.971830985915496, 39.96026490066226, 37.03973509933775, 19 ] }, { "mode": "lines", "name": "Epanechnikov", "type": "scatter", "x": [ 2, 2.2, 2.2, 2.5, 3.4, 3.6, 4, 4.8, 5, 5.4, 5.6, 5.7, 6.2, 6.9, 7.2, 7.3, 7.7, 8.6, 9, 9.5, 10.5, 10.5, 10.7, 10.8, 11, 11.3, 11.9, 12.2, 15.1, 15.1, 16.5, 17.4, 18.4, 18.5, 21.6, 21.8, 23.3, 28.6, 29.1, 34.1, 36.2, 39.7 ], "y": [ 37.66141754476209, 37.3722698204789, 37.3722698204789, 36.915184565586024, 35.41724225587412, 35.06966909119557, 34.37492744184658, 33.06269633456447, 32.76790651477745, 32.239639579969264, 32.012105919533475, 31.908723273813248, 31.508131351321847, 31.315775670225353, 31.374640004534676, 31.41314663999423, 31.65837167622411, 32.671979216749506, 33.26978316154742, 34.075347693814074, 35.607264384723926, 35.607264384723926, 35.862621777479305, 35.9805254118714, 36.19443866404054, 36.45409008366464, 36.716512230160795, 36.704017462854736, 32.068111095309085, 32.068111095309085, 29.22055160491219, 28.129973313900773, 27.61811669602926, 27.59859118807964, 28.014196400371894, 28.062604240971872, 28.74980594153628, 40.914928843639686, 41.565506725071714, 38.415431371593726, 34.175154720645594, 26.047688036744347 ] }, { "mode": "lines", "name": "Linear", "type": "scatter", "x": [ 2, 2.2, 2.2, 2.5, 3.4, 3.6, 4, 4.8, 5, 5.4, 5.6, 5.7, 6.2, 6.9, 7.2, 7.3, 7.7, 8.6, 9, 9.5, 10.5, 10.5, 10.7, 10.8, 11, 11.3, 11.9, 12.2, 15.1, 15.1, 16.5, 17.4, 18.4, 18.5, 21.6, 21.8, 23.3, 28.6, 29.1, 34.1, 36.2, 39.7 ], "y": [ 35.00605448426939, 34.98100247432091, 34.98100247432091, 34.94342445939819, 34.83069041463002, 34.80563840468154, 34.755534384784575, 34.655326344990655, 34.630274335042174, 34.58017031514521, 34.55511830519673, 34.54259230022249, 34.47996227535128, 34.3922802405316, 34.354702225608875, 34.342176220634634, 34.29207220073767, 34.17933815596951, 34.12923413607255, 34.06660411120134, 33.941344061458935, 33.941344061458935, 33.91629205151045, 33.90376604653621, 33.87871403658774, 33.841136021665015, 33.76597999181957, 33.72840197689685, 33.36514783264387, 33.36514783264387, 33.1897837630045, 33.07704971823634, 32.95178966849393, 32.93926366351969, 32.55095750931823, 32.52590549936975, 32.338015424756136, 31.674137161121386, 31.611507136250186, 30.985206887538155, 30.7221607830791, 30.28375060898068 ] } ], "layout": { "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "x" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "y" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "# linear regression\n", "reg = LinearRegression().fit(XXX.reshape(-1, 1), YYY.reshape(-1, 1))\n", "Y_pred_linear = reg.predict(XXX.reshape(-1, 1))\n", "\n", "# kernel regression\n", "Y_pred_gauss = KernelRegression.ker_reg(XXX, YYY, bw_manual, 'gauss')\n", "Y_pred_epanechnikov = KernelRegression.ker_reg(XXX, YYY, bw_manual, 'epanechnikov')\n", "\n", "fig = px.scatter(x=XXX,y=YYY)\n", "fig.add_trace(go.Scatter(x=XXX, y=np.array(Y_pred_gauss), name='Gauss', mode='lines'))\n", "fig.add_trace(go.Scatter(x=XXX, y=np.array(Y_pred_epanechnikov), name='Epanechnikov', mode='lines'))\n", "fig.add_trace(go.Scatter(x=XXX, y=np.array(Y_pred_linear.flatten().tolist()), name='Linear', mode='lines'))" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }