PolynomialRegression/Polynomial Regression.ipynb
2021-06-24 13:03:10 +02:00

369 lines
53 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Algorytm najszybszego spadku dla regresji wielomianowej. \n",
"Zakładamy, że dysponujemy zbiorem składającym się z dwóch cech (x i y). Modelujemy zależność y od x za pomocą funkcji wielomianowej. Celem projektu jest implementacja metody najszybszego spadku dla tego problemu. Zakładamy kwadratową funkcję straty. Implementacja powinna umożliwiać podanie stopnia wielomianu, który ma być użyty do modelowania. Implementacja powinna zwracać wektor oszacowanych parametrów oraz pokazywać wizualnie zmiany wartości funkcji straty wraz z postępem uczenia."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Zbiór danych: https://www.kaggle.com/varpit94/apple-stock-data-updated-till-22jun2021?select=AAPL.csv\n",
"This dataset provides historical data of APPLE INC. stock (AAPL). The data is available at a daily level. Currency is USD."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"degree = 6\n",
"X_plot = np.linspace(0, 150, 1000)\n",
"initial_theta = [0] * (degree + 1)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Date</th>\n",
" <th>Open</th>\n",
" <th>High</th>\n",
" <th>Low</th>\n",
" <th>Close</th>\n",
" <th>Adj Close</th>\n",
" <th>Volume</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1980-12-12</td>\n",
" <td>0.128348</td>\n",
" <td>0.128906</td>\n",
" <td>0.128348</td>\n",
" <td>0.128348</td>\n",
" <td>0.100751</td>\n",
" <td>469033600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1980-12-15</td>\n",
" <td>0.122210</td>\n",
" <td>0.122210</td>\n",
" <td>0.121652</td>\n",
" <td>0.121652</td>\n",
" <td>0.095495</td>\n",
" <td>175884800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1980-12-16</td>\n",
" <td>0.113281</td>\n",
" <td>0.113281</td>\n",
" <td>0.112723</td>\n",
" <td>0.112723</td>\n",
" <td>0.088485</td>\n",
" <td>105728000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1980-12-17</td>\n",
" <td>0.115513</td>\n",
" <td>0.116071</td>\n",
" <td>0.115513</td>\n",
" <td>0.115513</td>\n",
" <td>0.090676</td>\n",
" <td>86441600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1980-12-18</td>\n",
" <td>0.118862</td>\n",
" <td>0.119420</td>\n",
" <td>0.118862</td>\n",
" <td>0.118862</td>\n",
" <td>0.093304</td>\n",
" <td>73449600</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Date Open High Low Close Adj Close Volume\n",
"0 1980-12-12 0.128348 0.128906 0.128348 0.128348 0.100751 469033600\n",
"1 1980-12-15 0.122210 0.122210 0.121652 0.121652 0.095495 175884800\n",
"2 1980-12-16 0.113281 0.113281 0.112723 0.112723 0.088485 105728000\n",
"3 1980-12-17 0.115513 0.116071 0.115513 0.115513 0.090676 86441600\n",
"4 1980-12-18 0.118862 0.119420 0.118862 0.118862 0.093304 73449600"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df = pd.read_csv('AAPL.csv')\n",
"X = df[['Low']]\n",
"Y = df['Volume']\n",
"display(df.head(5))"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure(figsize=(10,5))\n",
"chart = fig.add_subplot()\n",
"chart.plot(X,Y ,\"go\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Metody do regresji wielomianowej"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"11\n",
"441.0\n"
]
}
],
"source": [
"def polynomial_regression(theta, x):\n",
" value = 0\n",
" for i in range(len(theta)):\n",
" value += theta[i] * x**i\n",
" return value \n",
"\n",
"def mean_squared_error(Y_predicted, Y):\n",
" result = 0\n",
" for i in range(len(Y)):\n",
" result += (Y_predicted[i] - Y[i]) ** 2\n",
" return result/len(Y)\n",
"\n",
"def gradient(theta, X, Y):\n",
" return 1.0 / len(y) * (X.T * (X * theta - Y)) \n",
"\n",
"def gradient_descent(X, Y, theta, cost_function = mean_squared_error, alpha=0.1, eps=0.001, max_steps = 1000000):\n",
" cost = cost_function([polynomial_regression(theta, x) for x in X], Y)\n",
" logs = [[cost, theta]]\n",
" \n",
" for i in range(max_steps):\n",
" theta = theta - alpha * gradient(theta, X, Y)\n",
" next_cost = cost_function([polynomial_regression(theta, x) for x in X], Y)\n",
" logs.append([next_cost, theta])\n",
" if abs(cost - next_cost) <= eps:\n",
" break\n",
" return theta, logs\n",
" \n",
" \n",
"print(polynomial_regression([1,1,0,1], 2))\n",
"print(mean_squared_error([1,2,1,1],[1,2,43,1]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Metody do wykresów"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def plot_polynomial_regression(theta):\n",
" fig = plt.figure(figsize=(10,5))\n",
" Y_plot = [polynomial_regression(theta, x) for x in X_plot]\n",
" chart = fig.add_subplot()\n",
" chart.plot(X_plot, Y_plot, color=\"red\", lw=2, label=f\"degree {len(theta)}\")\n",
" chart.plot(X,Y ,\"go\")\n",
" plt.show()\n",
" \n",
"plot_polynomial_regression([1,100,0,1000]) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Wyniki za pomocą gotowej biblioteki"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import PolynomialFeatures, StandardScaler\n",
"from sklearn.pipeline import make_pipeline\n",
"from sklearn.linear_model import Ridge, LinearRegression"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=6)),\n",
" ('linearregression', LinearRegression())])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = make_pipeline(PolynomialFeatures(degree=degree, include_bias=True), \n",
" LinearRegression())\n",
"model.fit(X,Y)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 720x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"Y_plot = model.predict([[x] for x in X_plot])\n",
"\n",
"fig = plt.figure(figsize=(10,5))\n",
"chart = fig.add_subplot()\n",
"chart.plot(X,Y ,\"go\")\n",
"chart.plot(X_plot, Y_plot, color=\"red\", lw=2, label=f\"degree {degree}\")\n",
"degree"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}