s434760/PolynomialRegression

Fork 0

patrycjalazna d11fb5e451 improved momentum parameters

2021-06-28 10:01:22 +02:00

386 KiB

Raw Blame History

Algorytm najszybszego spadku dla regresji wielomianowej.

Skład grupy:

Nowak Anna,
Łaźna Patrycja,
Bregier Damian

0. Podstawowe informacje o zbiorze danych

Ze względu na specyfikę wytycznych projektu zakładającego wykorzystanie jedynie dwóch cech: x i y oraz modelowanie zależności y od x za pomocą funkcji wielomianowej dobrany został przez nas odpowiedni dataset. Obejmuje on jedynie trzy kolumny, czyli: płeć, wzrost w calach oraz wagę w funtach, z czego ze względu na specyfikę projektu wykorzystywane są jedynie wzrost oraz waga. Każdy z parametrów zawiera po 10 tysięcy unikalnych wartości.

Dokładne informacje na temat tego zbioru, jego zawartości i samych danych można znaleźć pod adresem: https://www.kaggle.com/mustafaali96/weight-height.

import pandas as pd
import random
import matplotlib.pyplot as plt
import numpy as np
import ipywidgets as widgets
np.set_printoptions(suppress=True)

1. Wczytywanie i preprocessing danych

# Stopień wielomianu
degree = 4
X_plot = np.linspace(0, 100, 1000)
initial_theta = np.matrix([0] * (degree + 1)).reshape(degree + 1, 1)

# Wybór dwóch kolumn - dotyczących wzrostu i wagi
data = pd.read_csv('weight-height.csv')[["Height", "Weight"]]
# Czyszczenie tabeli i wartości pustych
data = data.dropna()
data_matrix = np.matrix(data)

m, n_plus_1 = data_matrix.shape
n = n_plus_1 - 1
X = (np.ones((m, 1)))

for i in range(1, degree + 1):
    Xn = np.power(data_matrix[:, 0:n], i)
    Xn /= np.amax(Xn, axis=0)
    X = np.concatenate((X, Xn), axis=1)

X = np.matrix(X).reshape(m, degree * n + 1)
Y = np.matrix(data_matrix[:, -1])

2. Metody regresji wielomianowej

def print_theta(theta):
    print("f(x) = ", end="")
    for i,x in enumerate(theta.tolist()[:-1]):
        x = x[0]
        print(f"{x}x^{i}", end=" + ")
    print(f"{theta.tolist()[-1][0]}^{len(theta) - 1}")

# Implementacja wzrosu na regresję wielomianową
def polynomial_regression(theta, x):
    x = x/data["Height"].max()
    return sum(theta * np.power(x, i) for i, theta in enumerate(theta.tolist()))

# Implementacja wzoru na RMSE, czyli pierwiastek z błędu średniokwadratowego
def mean_squared_error(theta, X, Y):
    J = 1.0 / (2.0 * m) * ((X * theta - Y).T * (X * theta - Y))
    return J.item()

3. Wybrane rodzaje gradientów

Metoda gradientu prostego to algorytm numeryczny mający na celu znalezienie minimum lokalnego dla zadanej funkcji celu. Jak sama nazwa wskazuje jest to jedna z prostszych metod, która opiera się na iteracyjnym poszukiwaniu minimum. Proces ten (dla pierwszych czterech kroków), został pokazany na poniższym rysunku.

Metoda gradientu prostego

# Wzór na gradient prosty
def gradient(theta, X, Y):
    return 1.0 / len(Y) * (X.T * (X * theta - Y))

Metoda najszybszego spadku jest iteracyjnym algorytmem wyszukiwania minimym zadanej funkcji celu i stanowi modyfikację gradientu prostego. Na samym początku algorytmu wybierany jest punkt startowy, w którym jest obliczany antygradient funkcji celu stanowiący kierunek poszukiwań w procedurze. Kolejny krok jest obliczany w oparciu o poprzedni i jeżeli nie spełnia on warunku stopu to obliczenia są powtarzane do skutku.

Podstawową różnicę w stosunku do gradientu prostego stanowi sposób wyszukiwania wartości, która jest dokonywana przez minimalizację kierunkową funkcji.

# Steepest descent
def steepest_descent(X, Y, theta, cost_function = mean_squared_error, 
                     single_step=0.01, epochs=30):
    cost = cost_function(theta, X, Y)
    logs = [[cost, theta]]
    for i in range(epochs):
        direction = gradient(theta, X, Y)
        j = 0
        while(True):
            j+=1
            next_theta = theta - single_step * direction
            next_cost = cost_function(next_theta, X, Y)
            if(next_cost >= cost or type(next_cost) != float):
                break
            cost = next_cost
            theta = next_theta
        logs.append([next_cost, theta])
    return theta, logs

Ponadto w toku prac projektowych przetestowane zostaly także inne algorytmy takie jak batch gradient descend (BGD), BGD z momentum, mini-batch gradient descend oraz stochastic gradient descend.

BGD to metoda w której błąd obliczany jest dla każdego parametru ze zbioru treningowego, ale sam model jest aktualizowany dopiero po ocenie wszystkich danych.

MBGD opiera się na podziale danch treningowych na niewielkie pakiety, które są potem wykorzystywane do obliczania wartości błędu i aktualizacji modelu.

# Batch gradient descend (BGD)
def BGD(X, Y, theta, cost_function = mean_squared_error, learning_rate=0.1, 
        eps=10**-5, max_steps = 10000000000):
    cost = cost_function(theta, X, Y)
    logs = [[cost, theta]]
    
    for i in range(max_steps):
        theta = theta - learning_rate * gradient(theta, X, Y)
        next_cost = cost_function(theta, X, Y)
        logs.append([next_cost, theta])
        if abs(cost - next_cost) <= eps:
            break
        cost = next_cost
    return theta, logs

# BGD z momentum
def momentum(X, Y, theta, cost_function = mean_squared_error, 
             learning_rate=0.1, momentum = 0.3, epochs=30):
    cost = cost_function(theta, X, Y)
    logs = [[cost, theta]]
    delta_history = [np.matrix([0]*(degree+1)).T]
    for i in range(epochs):
        delta_history.append(momentum * delta_history[-1] + 
                             learning_rate * gradient(theta, X, Y))
        theta = theta - delta_history[-1]
        next_cost = cost_function(theta, X, Y)
        logs.append([next_cost, theta])
        cost = next_cost
    return theta, logs

# Mini-batch gradient descend (MBGD)
def MBGD(X, Y, theta, cost_function = mean_squared_error, 
         learning_rate=0.1, epochs=5, batch_size=16):
    cost = cost_function(theta, X, Y)
    logs = [[cost, theta]]
    start, end = 0, batch_size
    
    steps = m / batch_size
    for i in range(epochs):
        zipped_XY = list(zip(X, Y))
        random.shuffle(zipped_XY)
        X_shuffled, Y_shuffled = zip(*zipped_XY)
        X_shuffled = np.concatenate(X_shuffled, axis=0) 
        Y_shuffled = np.concatenate(Y_shuffled, axis=0) 
        for j in range(int(steps)):
            batch =  X_shuffled[start:end,:], Y_shuffled[start:end,:]
            theta = theta - learning_rate * gradient(theta, batch[0], batch[1])
            cost = cost_function(theta, X, Y)
            logs.append([cost, theta])

            if start + batch_size < batch_size:
                start += batch_size
            else:
                start = 0
            end = min(start + batch_size, m)
    return theta, logs

# Stochastic gradient descend (SGD)
def SGD(X, Y, theta, cost_function = mean_squared_error, learning_rate=0.1,
        epochs=5):
    return MBGD(X, Y, theta, cost_function, learning_rate, epochs, 1)

4. Zestawienie wyników

final_theta_SGD, logs_3 = SGD(X, Y, initial_theta, epochs = 60)
print("SGD:")
print_theta(final_theta_SGD)
print(f"{len(logs_3)} updates\n")

final_theta_BGD, logs_1 = BGD(X, Y, initial_theta)
print("BGD:")
print_theta(final_theta_BGD)
print(f"{len(logs_1)} updates\n")

final_theta_MBGD, logs_2 = MBGD(X, Y, initial_theta, epochs = 60)
print("MBGD:")
print_theta(final_theta_MBGD)
print(f"{len(logs_2)} updates\n")


final_theta_momentum, logs_4 = momentum(X, Y, initial_theta, epochs = 60)
print("momentum:")
print_theta(final_theta_momentum)
print(f"{len(logs_4)} updates\n")

final_steepest_descent, logs_5 = steepest_descent(X, Y, initial_theta, epochs = 60)
print("steepest descent:")
print_theta(final_steepest_descent)
print(f"{len(logs_5)} updates\n")

SGD:
f(x) = 16.32876722693443x^0 + 34.45307103302153x^1 + 45.94300992401012x^2 + 52.620176422095824x^3 + 55.84736037491846^4
600001 updates

BGD:
f(x) = -65.64506277392503x^0 + 68.77648241255996x^1 + 114.75349304333463x^2 + 102.20675310280693x^3 + 52.80014652046723^4
82705 updates

MBGD:
f(x) = -43.73372717859179x^0 + 44.52908852914625x^1 + 88.75625965596099x^2 + 102.68565372634689x^3 + 96.42646314762048^4
37501 updates

momentum:
f(x) = 46.73334506478952x^0 + 46.04136957395274x^1 + 44.48349874963421x^2 + 42.3749456478724x^3 + 39.943624482130815^4
61 updates

steepest descent:
f(x) = 13.638648831593645x^0 + 39.092493619820296x^1 + 55.99541645170152x^2 + 66.58927282539155x^3 + 72.5680972400324^4
61 updates

Wniosek: Metoda najszybszego spadku oraz momentum cechują się najszybszą zbieżnością do minimum lokalnego danej funkcji. Przeciętnie, dla zadanych parametrów, minimum jest osiągane w ciągu 61 aktualizacji. Podczas gdy SGD wymaga ich ponad 600 tysięcy.

4. Reprezentacja graficzna regresji wielomianowej i funkcji kosztu

def plot_polynomial_regression(theta):
    fig = plt.figure(figsize=(10,5))
    Y_plot = polynomial_regression(theta, X_plot).tolist()
    chart = fig.add_subplot()
    chart.plot(data["Height"], Y ,"go")
    chart.plot(X_plot, Y_plot, color="red", lw=2, label=f"degree {len(theta)}")
    plt.ylim([0,250])
    plt.show()
    

def plot_cost_function(logs):
    no_iter = no_iter = [x for x in range(1, len(logs) + 1)]
    all_cost = [row[0] for row in logs]
    
    fig = plt.figure(figsize=(10,5))  
    plt.plot(no_iter, all_cost)
    plt.xlabel('No. of iterations')
    plt.ylabel('Cost')
    plt.show()
    

print("BGD:")
plot_polynomial_regression(final_theta_BGD)
plot_cost_function(logs_1)
print("MBGD:")
plot_polynomial_regression(final_theta_MBGD)
plot_cost_function(logs_2)
print("SGD:")
plot_polynomial_regression(final_theta_SGD)
plot_cost_function(logs_3)
print("Momentum:")
plot_polynomial_regression(final_theta_momentum)
plot_cost_function(logs_4)
print("Metoda najszybszego spadku:")
plot_polynomial_regression(final_steepest_descent)
plot_cost_function(logs_5)

BGD:

MBGD:

SGD:

Momentum:

Metoda najszybszego spadku:

Wizualizacja metody najszybszego spadku dla wielomianu drugiego stopnia

degree = 1
initial_theta = np.matrix([0] * (degree + 1)).reshape(degree + 1, 1)
m, n_plus_1 = data_matrix.shape
n = n_plus_1 - 1
X = (np.ones((m, 1)))

for i in range(1, degree + 1):
    Xn = np.power(data_matrix[:, 0:n], i)
    Xn /= np.amax(Xn, axis=0)
    X = np.concatenate((X, Xn), axis=1)

X = np.matrix(X).reshape(m, degree * n + 1)
Y = np.matrix(data_matrix[:, -1])


steepest_descent_deg_2, logs_deg_2 = steepest_descent(X, Y, initial_theta, epochs = 60)

cost_history = [row[0] for row in logs_deg_2]
all_thetas = [row[1] for row in logs_deg_2]
theta0_history = [row[0].item() for row in all_thetas]
theta1_history = [row[1].item() for row in all_thetas]

cost_history = np.array(cost_history)
theta0_history = np.array(theta0_history)
theta1_history = np.array(theta1_history)

theta0_vals = theta0_history
theta1_vals = theta1_history
J_vals = np.zeros((len(theta0_vals), len(theta1_vals)))

c1=0
c2=0
pom = 0
for i in theta0_vals:
    for j in theta1_vals:
        t = np.array([i, j])
        J_vals[c1][c2] = cost_history[pom]
        c2=c2+1
    c1=c1+1
    pom += 1
    c2=0

import plotly.graph_objects as go

fig = go.Figure(data=[go.Surface(x=theta0_vals, y=theta1_vals, z=J_vals)])
fig.update_layout(title='Loss function for different thetas', autosize=True,
                  width=600, height=600, xaxis_title='theta0', 
                 yaxis_title='theta1')
fig.show()

5. Regresja wielomianowa z wykorzystaniem gotowych bibliotek

from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import Ridge, LinearRegression

model = make_pipeline(PolynomialFeatures(degree=degree, include_bias=True), 
                       LinearRegression())
model.fit(data[["Height"]],Y)

Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=4)),
                ('linearregression', LinearRegression())])

Y_plot_2 = model.predict([[x] for x in X_plot])

fig = plt.figure(figsize=(10,5))
chart = fig.add_subplot()
chart.plot(data["Height"], Y ,"go")
chart.plot(X_plot, Y_plot_2, color="red", lw=2, label=f"degree {degree}")
plt.ylim([0,250])
plt.xlim([0,100])
degree

polyfit_theta = np.polyfit(data["Height"]/data["Height"].max(), Y, degree)
plot_polynomial_regression(polyfit_theta)

386 KiB Raw Blame History