pms/umz21

Fork 1

Paweł Skórzewski 8f796b51a3 Czytelniejszy motyw dla wykładu 6

2021-04-14 10:13:55 +02:00

448 KiB

Raw Blame History

Uczenie maszynowe – zastosowania

6. Naiwny klasyfikator bayesowski

Naiwny klasyfikator bayesowski jest algorytmem dla problemu klasyfikacji wieloklasowej.
Naszym celem jest znalezienie funkcji uczącej $f \colon x \mapsto y$, gdzie $y$ oznacza jedną ze zdefiniowanych wcześniej klas.
Klasyfikacja probabilistyczna polega na wskazaniu klasy o najwyższym prawdopodobieństwie: $$ \hat{y} = \mathop{\arg \max}_y P( y ,|, x ) $$
Naiwny klasyfikator bayesowski należy do rodziny klasyfikatorów probabilistycznych

Thomas Bayes (wymowa: /beɪz/) (1702–1761) – angielski matematyk i duchowny

Twierdzenie Bayesa – wzór ogólny

$$ P( Y ,|, X ) = \frac{ P( X ,|, Y ) \cdot P( Y ) }{ P ( X ) } $$

Twierdzenie Bayesa opisuje związek między prawdopodobieństwami warunkowymi dwóch zdarzeń warunkujących się nawzajem.

Twierdzenie Bayesa

(po zastosowaniu wzoru na prawdopodobieństwo całkowite)

$$ \underbrace{P( y_k ,|, x )}_\textrm{ prawd. a posteriori } = \frac{ \overbrace{ P( x ,|, y_k )}^\textrm{ model klasy } \cdot \overbrace{P( y_k )}^\textrm{ prawd. a priori } }{ \underbrace{\sum{i} P( x ,|, y_i ) , P( y_i )}_\textrm{wyrażenie normalizacyjne} } $$

W tym przypadku „zdarzenie $x$” oznacza, że cechy wejściowe danej obserwacji przyjmują wartości opisane wektorem $x$.
„Zdarzenie $y_k$” oznacza, że dana obserwacja należy do klasy $y_k$.
Model klasy $y_k$ opisuje rozkład prawdopodobieństwa cech obserwacji należących do tej klasy.
Prawdopodobieństwo _a priori to prawdopodobienstwo, że losowa obserwacja należy do klasy $y_k$.
Prawdopodobieństwo _a posteriori to prawdopodobieństwo, którego szukamy: że obserwacja opisana wektorem cech $x$ należy do klasy $y_k$.

Rola wyrażenia normalizacyjnego w twierdzeniu Bayesa

Wartość wyrażenia normalizacyjnego nie wpływa na wynik klasyfikacji.

_Przykład: obserwacja nietypowa ma małe prawdopodobieństwo względem dowolnej klasy, wyrażenie normalizacyjne sprawia, że to prawdopodobieństwo staje się porównywalne z prawdopodobieństwami typowych obserwacji, ale nie wpływa na klasyfikację!

Klasyfikatory dyskryminatywne a generatywne

Klasyfikatory generatywne tworzą model rozkładu prawdopodobieństwa dla każdej z klas.
Klasyfikatory dyskryminatywne wyznaczają granicę klas (_decision boundary) bezpośrednio.
Naiwny klasyfikator baywsowski jest klasyfikatorem generatywnym (ponieważ wyznacza $P( x ,|, y )$).
Wszystkie klasyfikatory generatywne są probabilistyczne, ale nie na odwrót.
Regresja logistyczna jest przykładem klasyfikatora dyskryminatywnego.

Założenie niezależności dla naiwnego klasyfikatora bayesowskiego

Naiwny klasyfikator bayesowski jest _naiwny, ponieważ zakłada, że poszczególne cechy są niezależne od siebie: $$ P( x_1, \ldots, x_n ,|, y ) ,=, \prod_{i=1}^n P( x_i ,|, x_1, \ldots, x_{i-1}, y ) ,=, \prod_{i=1}^n P( x_i ,|, y ) $$
To założenie jest bardzo przydatne ze względów obliczeniowych, ponieważ bardzo często mamy do czynienia z ogromną liczbą cech (bitmapy, słowniki itp.)

Naiwny klasyfikator bayesowski – przykład

# Przydtne importy

import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
import pandas

%matplotlib inline

# Wczytanie danych (gatunki kosaćców)

data_iris = pandas.read_csv('iris.csv')
data_iris_setosa = pandas.DataFrame()
data_iris_setosa['dł. płatka'] = data_iris['pl']  # "pl" oznacza "petal length"
data_iris_setosa['szer. płatka'] = data_iris['pw']  # "pw" oznacza "petal width"
data_iris_setosa['Iris setosa?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-setosa' else 0)

m, n_plus_1 = data_iris_setosa.values.shape
n = n_plus_1 - 1
Xn = data_iris_setosa.values[:, 0:n].reshape(m, n)

X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)
Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)

classes = [0, 1]
count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]
prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]

print('liczba przykładów: ', {c: count[c] for c in classes})
print('prior probability:', {c: prior_prob[c] for c in classes})

liczba przykładów:  {0: 100, 1: 50}
prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}

# Wykres danych (wersja macierzowa)
def plot_data_for_classification(X, Y, xlabel, ylabel):    
    fig = plt.figure(figsize=(16*.6, 9*.6))
    ax = fig.add_subplot(111)
    fig.subplots_adjust(left=0.1, right=0.9, bottom=0.1, top=0.9)
    X = X.tolist()
    Y = Y.tolist()
    X1n = [x[1] for x, y in zip(X, Y) if y[0] == 0]
    X1p = [x[1] for x, y in zip(X, Y) if y[0] == 1]
    X2n = [x[2] for x, y in zip(X, Y) if y[0] == 0]
    X2p = [x[2] for x, y in zip(X, Y) if y[0] == 1]
    ax.scatter(X1n, X2n, c='r', marker='x', s=50, label='Dane')
    ax.scatter(X1p, X2p, c='g', marker='o', s=50, label='Dane')
    
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.margins(.05, .05)
    return fig

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')

XY = np.column_stack((X, Y))
XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]
X_split = [XY_split[c][:,0:3] for c in classes]
Y_split = [XY_split[c][:,3] for c in classes]

X_mean = [np.mean(X_split[c], axis=0) for c in classes]
X_std = [np.std(X_split[c], axis=0) for c in classes]
print('średnia: ', X_mean) 
print('odchylenie standardowe: ', X_std)

print(X_std[0].shape)

średnia:  [matrix([[1.   , 4.906, 1.676]]), matrix([[1.   , 1.464, 0.244]])]
odchylenie standardowe:  [matrix([[0.        , 0.8214402 , 0.42263933]]), matrix([[0.        , 0.17176728, 0.10613199]])]
(1, 3)

# Rysowanie średnich
def draw_means(fig, means, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):
    class_color = {0: 'r', 1: 'g'}
    classes = range(len(means))
    ax = fig.axes[0]
    mean_x1 = [means[c].item(0, 1) for c in classes]
    mean_x2 = [means[c].item(0, 2) for c in classes]
    for c in classes:
        ax.plot([mean_x1[c], mean_x1[c]], [xmin, xmax],
                color=class_color.get(c, 'c'), linestyle='dashed')
        ax.plot([ymin, ymax], [mean_x2[c], mean_x2[c]],
                color=class_color.get(c, 'c'), linestyle='dashed')

from scipy.stats import norm

# Prawdopodobieństwo klasy dla pojedynczej cechy
# Uwaga: jeżeli odchylenie standardowe dla danej cechy jest równe 0, 
# to nie można określić prawdopodbieństwa klasy!
def prob(x, c, feature, mean, std):
    sd = std[c].item(0, feature)
    if sd == 0:
        print('Nie można określić prawdopodobieństwa klasy dla cechy {}.!'.format(feature))
    return norm(mean[c].item(0, feature), sd).pdf(x)

# Prawdopodobieństwo klasy
# Uwaga: tu bierzemy iloczyn dwóch cech (1. i 2.), w ogólności może być ich więcej
def class_prob(x, c, mean, std, features=[1, 2]):
    result = 1
    for feature in features:
        result *= prob(x[feature], c, feature, mean, std)
    return result

print(X_std[0].shape)
print(X_std)
print(X_mean)

X_prob_0=class_prob(X, 0, X_mean, X_std)
print(X_prob_0)

(1, 3)
[matrix([[0.        , 0.8214402 , 0.42263933]]), matrix([[0.        , 0.17176728, 0.10613199]])]
[matrix([[1.   , 4.906, 1.676]]), matrix([[1.   , 1.464, 0.244]])]
[[1.57003335e-06 1.61965173e-23 3.09005273e-08]]

# Wykres prawdopodobieństw klas
def plot_prob(fig, X_mean, X_std, classes, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):
    class_color = {0: 'r', 1: 'g'}
    ax = fig.axes[0]
    x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),
                         np.arange(xmin, xmax, 0.02))
    for c in classes:
        fun1 = lambda x: prob(x, c, 1, X_mean, X_std)
        fun2 = lambda x: prob(x, c, 2, X_mean, X_std)
        p = fun1(x1) * fun2(x2)
        plt.contour(x1, x2, p, levels=np.arange(0.0, 1.0, 0.1),
                    colors=class_color.get(c, 'c'), lw=3)

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
draw_means(fig, X_mean)
plot_prob(fig, X_mean, X_std, classes)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:12: UserWarning: The following kwargs were not used by contour: 'lw'
  if sys.path[0] == '':

# Prawdopodobieństwo a posteriori
def posterior_prob(x, c):
    normalizer = sum(class_prob(x, c, X_mean, X_std)
                     * prior_prob[c]
                    for c in classes)
    return (class_prob(x, c, X_mean, X_std) 
            * prior_prob[c]
            / normalizer)

Aby teraz przewidzieć klasę $y$ dla dowolnego zestawu cech $x$, wystarczy sprawdzić, dla której klasy prawdopodobieństwo _a posteriori jest większe:

# Funkcja klasyfikująca (funkcja predykcji)
def predict_class(x):
    p = [posterior_prob(x, c) for c in classes]
    if p[1] > p[0]:
        return 1
    else:
        return 0

x = [1, 2.0, 0.5]  # długość płatka: 2.0, szerokość płatka: 0.5
y = predict_class(x)
print(y)  # 1 – To prawdopodobnie jest Iris setosa

x = [1, 2.5, 1.0]  # długość płatka: 2.5, szerokość płatka: 1.0
y = predict_class(x)
print(y)  # 0 – To prawdopodobnie nie jest Iris setosa

1
0

Zobaczmy, jak to wygląda na wykresie. Narysujemy w tym celu granicę między klasą 1 a 0:

# Wykres granicy klas dla naiwnego Bayesa
def plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=0.0, xmax=7.0, ymin=0.0, ymax=7.0):
    ax = fig.axes[0]
    x1, x2 = np.meshgrid(np.arange(xmin, xmax, 0.02),
                         np.arange(ymin, ymax, 0.02))
    p = [posterior_prob([1, x1, x2], c) for c in classes]
    p_diff = p[1] - p[0]
    plt.contour(x1, x2, p_diff, levels=[0.0], colors='c', lw=3);

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary_bayes(fig, X_mean, X_std)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'

Dla porównania: regresja logistyczna na tych samych danych

def powerme(x1,x2,n):
    X = []
    for m in range(n+1):
        for i in range(m+1):
            X.append(np.multiply(np.power(x1,i),np.power(x2,(m-i))))
    return np.hstack(X)

# Funkcja logistyczna
def safeSigmoid(x, eps=0):
    y = 1.0/(1.0 + np.exp(-x))
    if eps > 0:
        y[y < eps] = eps
        y[y > 1 - eps] = 1 - eps
    return y

# Funkcja hipotezy dla regresji logistycznej
def h(theta, X, eps=0.0):
    return safeSigmoid(X*theta, eps)

# Funkcja kosztu dla regresji logistycznej
def J(h,theta,X,y, lamb=0):
    m = len(y)
    f = h(theta, X, eps=10**-7)
    j = -np.sum(np.multiply(y, np.log(f)) + 
                np.multiply(1 - y, np.log(1 - f)), axis=0)/m
    if lamb > 0:
        j += lamb/(2*m) * np.sum(np.power(theta[1:],2))
    return j

# Gradient funkcji kosztu
def dJ(h,theta,X,y,lamb=0):
    g = 1.0/y.shape[0]*(X.T*(h(theta,X)-y))
    if lamb > 0:
        g[1:] += lamb/float(y.shape[0]) * theta[1:] 
    return g

# Funkcja klasyfikująca
def classifyBi(theta, X):
    prob = h(theta, X)
    return prob

# Przygotowanie danych dla wielomianowej regresji logistycznej

data = np.matrix(data_iris_setosa)

Xpl = powerme(data[:, 1], data[:, 0], n)
Ypl = np.matrix(data[:, 2]).reshape(m, 1)

# Metoda gradientu prostego dla regresji logistycznej
def GD(h, fJ, fdJ, theta, X, y, alpha=0.01, eps=10**-3, maxSteps=10000):
    errorCurr = fJ(h, theta, X, y)
    errors = [[errorCurr, theta]]
    while True:
        # oblicz nowe theta
        theta = theta - alpha * fdJ(h, theta, X, y)
        # raportuj poziom błędu
        errorCurr, errorPrev = fJ(h, theta, X, y), errorCurr
        # kryteria stopu
        if abs(errorPrev - errorCurr) <= eps:
            break
        if len(errors) > maxSteps:
            break
        errors.append([errorCurr, theta]) 
    return theta, errors

# Uruchomienie metody gradientu prostego dla regresji logistycznej
theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)
theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, 
                       alpha=0.1, eps=10**-7, maxSteps=100000)
print(r'theta = {}'.format(theta))

theta = [[ 4.01960795]
 [ 3.89499137]
 [ 0.18747599]
 [-1.3524039 ]
 [-2.00123783]
 [-0.87625505]]

# Wykres granicy klas
def plot_decision_boundary(fig, theta, Xpl, xmin=0.0, xmax=7.0):
    ax = fig.axes[0]
    xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.02),
                         np.arange(xmin, xmax, 0.02))
    l = len(xx.ravel())
    C = powerme(yy.reshape(l, 1), xx.reshape(l, 1), n)
    z = classifyBi(theta, C).reshape(int(np.sqrt(l)), int(np.sqrt(l)))

    plt.contour(xx, yy, z, levels=[0.5], colors='m', lw=3);

fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary(fig, theta, Xpl)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.

fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary(fig, theta, Xpl)
plot_decision_boundary_bayes(fig, X_mean, X_std)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.
/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'

Inny przykład

# Wczytanie danych (gatunki kosaćców)

data_iris = pandas.read_csv('iris.csv')
data_iris_versicolor = pandas.DataFrame()
data_iris_versicolor['dł. płatka'] = data_iris['pl']  # "pl" oznacza "petal length"
data_iris_versicolor['szer. płatka'] = data_iris['pw']  # "pw" oznacza "petal width"
data_iris_versicolor['Iris versicolor?'] = data_iris['Gatunek'].apply(lambda x: 1 if x=='Iris-versicolor' else 0)

m, n_plus_1 = data_iris_versicolor.values.shape
n = n_plus_1 - 1
Xn = data_iris_versicolor.values[:, 0:n].reshape(m, n)

X = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)
Y = np.matrix(data_iris_setosa.values[:, 2]).reshape(m, 1)

classes = [0, 1]
count = [sum(1 if y == c else 0 for y in Y.T.tolist()[0]) for c in classes]
prior_prob = [float(count[c]) / float(Y.shape[0]) for c in classes]

print('liczba przykładów: ', {c: count[c] for c in classes})
print('prior probability:', {c: prior_prob[c] for c in classes})

liczba przykładów:  {0: 100, 1: 50}
prior probability: {0: 0.6666666666666666, 1: 0.3333333333333333}

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')

XY = np.column_stack((X, Y))
XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]
X_split = [XY_split[c][:,0:3] for c in classes]
Y_split = [XY_split[c][:,3] for c in classes]

X_mean = [np.mean(X_split[c], axis=0) for c in classes]
X_std = [np.std(X_split[c], axis=0) for c in classes]
print('średnia: ', X_mean) 
print('odchylenie standardowe: ', X_std)

średnia:  [matrix([[1.   , 4.906, 1.676]]), matrix([[1.   , 1.464, 0.244]])]
odchylenie standardowe:  [matrix([[0.        , 0.8214402 , 0.42263933]]), matrix([[0.        , 0.17176728, 0.10613199]])]

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
draw_means(fig, X_mean)
plot_prob(fig, X_mean, X_std, classes)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:12: UserWarning: The following kwargs were not used by contour: 'lw'
  if sys.path[0] == '':

fig = plot_data_for_classification(X, Y, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary_bayes(fig, X_mean, X_std)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'

# Przygotowanie danych dla wielomianowej regresji logistycznej

data = np.matrix(data_iris_versicolor)

Xpl = powerme(data[:, 1], data[:, 0], n)
Ypl = np.matrix(data[:, 2]).reshape(m, 1)

# Uruchomienie metody gradientu prostego dla regresji logistycznej
theta_start = np.matrix(np.zeros(Xpl.shape[1])).reshape(Xpl.shape[1], 1)
theta, errors = GD(h, J, dJ, theta_start, Xpl, Ypl, 
                       alpha=0.05, eps=10**-7, maxSteps=100000)
print(r'theta = {}'.format(theta))

theta = [[-10.68923095]
 [  5.52649967]
 [  5.83316957]
 [ -0.60707243]
 [ -0.46353729]
 [ -2.82974456]]

fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary(fig, theta, Xpl)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.

fig = plot_data_for_classification(Xpl, Ypl, xlabel=u'dł. płatka', ylabel=u'szer. płatka')
plot_decision_boundary(fig, theta, Xpl)
plot_decision_boundary_bayes(fig, X_mean, X_std)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.
/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'

Kiedy naiwny Bayes nie działa?

# Wczytanie danych
import pandas
import numpy as np

alldata = pandas.read_csv('bayes_nasty.tsv', sep='\t')
data = np.matrix(alldata)

m, n_plus_1 = data.shape
n = n_plus_1 - 1
Xn = data[:, 1:]

Xbn = np.matrix(np.concatenate((np.ones((m, 1)), Xn), axis=1)).reshape(m, n_plus_1)
Xbnp = powerme(data[:, 1], data[:, 2], n)
Ybn = np.matrix(data[:, 0]).reshape(m, 1)

fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')

classes = [0, 1]
count = [sum(1 if y == c else 0 for y in Ybn.T.tolist()[0]) for c in classes]
prior_prob = [float(count[c]) / float(Ybn.shape[0]) for c in classes]

print('liczba przykładów: ', {c: count[c] for c in classes})
print('prior probability:', {c: prior_prob[c] for c in classes})

liczba przykładów:  {0: 69, 1: 30}
prior probability: {0: 0.696969696969697, 1: 0.30303030303030304}

XY = np.column_stack((Xbn, Ybn))
XY_split = [XY[np.where(XY[:,3] == c)[0]] for c in classes]
X_split = [XY_split[c][:,0:3] for c in classes]
Y_split = [XY_split[c][:,3] for c in classes]

X_mean = [np.mean(X_split[c], axis=0) for c in classes]
X_std = [np.std(X_split[c], axis=0) for c in classes]
print('średnia: ', X_mean) 
print('odchylenie standardowe: ', X_std)

średnia:  [matrix([[1.        , 0.03949835, 0.02825019]]), matrix([[1.        , 0.09929617, 0.06206227]])]
odchylenie standardowe:  [matrix([[0.        , 0.52318432, 0.60106092]]), matrix([[0.        , 0.61370281, 0.6081128 ]])]

fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')
draw_means(fig, X_mean, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)
plot_prob(fig, X_mean, X_std, classes, xmin=-1.0, xmax=1.0, ymin=-1.0, ymax=1.0)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:12: UserWarning: The following kwargs were not used by contour: 'lw'
  if sys.path[0] == '':

fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')
plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'

# Uruchomienie metody gradientu prostego dla regresji logistycznej
theta_start = np.matrix(np.zeros(Xbnp.shape[1])).reshape(Xbnp.shape[1], 1)
theta, errors = GD(h, J, dJ, theta_start, Xbnp, Ybn, 
                       alpha=0.05, eps=10**-7, maxSteps=100000)
print(r'theta = {}'.format(theta))

theta = [[-0.31582268]
 [ 0.43496774]
 [-0.21840373]
 [-7.88802319]
 [22.73897346]
 [-4.43682364]]

fig = plot_data_for_classification(Xbnp, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')
plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.

fig = plot_data_for_classification(Xbn, Ybn, xlabel=r'$x_1$', ylabel=r'$x_2$')
plot_decision_boundary_bayes(fig, X_mean, X_std, xmin=-4.0, xmax=4.0, ymin=-4.0, ymax=4.0)
plot_decision_boundary(fig, theta, Xbnp, xmin=-4.0, xmax=4.0)

/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:8: UserWarning: The following kwargs were not used by contour: 'lw'
  
/home/pawel/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: The following kwargs were not used by contour: 'lw'
  # Remove the CWD from sys.path while we load stuff.

Naiwny klasyfikator Bayesa nie działa, jeżeli dane nie różnią się ani średnią, ani odchyleniem standardowym.

448 KiB Raw Blame History Unescape Escape