This commit is contained in:
Filip Gralinski 2022-07-06 08:43:40 +02:00
parent 59b20b3de5
commit 00d84daae3
9 changed files with 98 additions and 410 deletions

View File

@ -1,5 +1,20 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
"<div class=\"alert alert-block alert-info\">\n",
"<h1> Modelowanie języka</h1>\n",
"<h2> 07. <i>Wygładzanie w n-gramowych modelach języka</i> [wykład]</h2> \n",
"<h3> Filip Graliński (2022)</h3>\n",
"</div>\n",
"\n",
"![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -54,7 +69,7 @@
"Rozpatrzmy przykład z 3 kolorami (wiemy, że w urnie mogą być kule\n",
"żółte, zielone i czerwone, tj. $m=3$) i 4 losowaniami ($T=4$):\n",
"\n",
"![img](./05_Wygladzanie/urna.drawio.png)\n",
"![img](./07_Wygladzanie/urna.drawio.png)\n",
"\n",
"Gdybyśmy w prosty sposób oszacowali prawdopodobieństwa, doszlibyśmy do\n",
"wniosku, że prawdopodobieństwo wylosowania kuli czerwonej wynosi 3/4, żółtej — 1/4,\n",
@ -168,7 +183,7 @@
"$k_i$ to ile razy w zbiorze uczącym pojawił się $i$-ty wyraz słownika,\n",
"$T$ — długość zbioru uczącego.\n",
"\n",
"![img](./05_Wygladzanie/urna-wyrazy.drawio.png)\n",
"![img](./07_Wygladzanie/urna-wyrazy.drawio.png)\n",
"\n",
"A zatem przy użyciu wygładzania +1 w następujący sposób estymować\n",
"będziemy prawdopodobieństwo słowa $w$:\n",
@ -303,113 +318,11 @@
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['<s>',\n",
" 'lubisz',\n",
" 'curry',\n",
" ',',\n",
" 'prawda',\n",
" '?',\n",
" '</s>',\n",
" '<s>',\n",
" 'nałożę',\n",
" 'ci',\n",
" 'więcej',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" 'hey',\n",
" '!',\n",
" '</s>',\n",
" '<s>',\n",
" 'smakuje',\n",
" 'ci',\n",
" '?',\n",
" '</s>',\n",
" '<s>',\n",
" 'hey',\n",
" ',',\n",
" 'brzydalu',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" 'spójrz',\n",
" 'na',\n",
" 'nią',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'wariatka',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'zadałam',\n",
" 'ci',\n",
" 'pytanie',\n",
" '!',\n",
" '</s>',\n",
" '<s>',\n",
" 'no',\n",
" ',',\n",
" 'tak',\n",
" 'lepiej',\n",
" '!',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'wygląda',\n",
" 'dobrze',\n",
" '!',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'tak',\n",
" 'lepiej',\n",
" '!',\n",
" '</s>',\n",
" '<s>',\n",
" 'pasuje',\n",
" 'jej',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'hey',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" '-',\n",
" 'co',\n",
" 'do',\n",
" '...?',\n",
" '</s>',\n",
" '<s>',\n",
" 'co',\n",
" 'do',\n",
" 'cholery',\n",
" 'robisz',\n",
" '?',\n",
" '</s>',\n",
" '<s>',\n",
" 'zejdź',\n",
" 'mi',\n",
" 'z',\n",
" 'oczu',\n",
" ',',\n",
" 'zdziro',\n",
" '.',\n",
" '</s>',\n",
" '<s>',\n",
" 'przestań',\n",
" 'dokuczać']"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"['<s>', 'lubisz', 'curry', ',', 'prawda', '?', '</s>', '<s>', 'nałożę', 'ci', 'więcej', '.', '</s>', '<s>', 'hey', '!', '</s>', '<s>', 'smakuje', 'ci', '?', '</s>', '<s>', 'hey', ',', 'brzydalu', '.', '</s>', '<s>', 'spójrz', 'na', 'nią', '.', '</s>', '<s>', '-', 'wariatka', '.', '</s>', '<s>', '-', 'zadałam', 'ci', 'pytanie', '!', '</s>', '<s>', 'no', ',', 'tak', 'lepiej', '!', '</s>', '<s>', '-', 'wygląda', 'dobrze', '!', '</s>', '<s>', '-', 'tak', 'lepiej', '!', '</s>', '<s>', 'pasuje', 'jej', '.', '</s>', '<s>', '-', 'hey', '.', '</s>', '<s>', '-', 'co', 'do', '...?', '</s>', '<s>', 'co', 'do', 'cholery', 'robisz', '?', '</s>', '<s>', 'zejdź', 'mi', 'z', 'oczu', ',', 'zdziro', '.', '</s>', '<s>', 'przestań', 'dokuczać']"
]
}
],
"source": [
@ -448,7 +361,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@ -459,18 +372,15 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"48113"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"48113"
]
}
],
"source": [
@ -479,7 +389,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@ -518,18 +428,15 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"926594"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"926594"
]
}
],
"source": [
@ -553,128 +460,25 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>liczba tokenów</th>\n",
" <th>średnia częstość w części B</th>\n",
" <th>estymacje +1</th>\n",
" <th>estymacje +0.01</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>388334</td>\n",
" <td>1.900495</td>\n",
" <td>0.993586</td>\n",
" <td>0.009999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>403870</td>\n",
" <td>0.592770</td>\n",
" <td>1.987172</td>\n",
" <td>1.009935</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>117529</td>\n",
" <td>1.565809</td>\n",
" <td>2.980759</td>\n",
" <td>2.009870</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>62800</td>\n",
" <td>2.514268</td>\n",
" <td>3.974345</td>\n",
" <td>3.009806</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40856</td>\n",
" <td>3.504944</td>\n",
" <td>4.967931</td>\n",
" <td>4.009741</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>29443</td>\n",
" <td>4.454098</td>\n",
" <td>5.961517</td>\n",
" <td>5.009677</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22709</td>\n",
" <td>5.232023</td>\n",
" <td>6.955103</td>\n",
" <td>6.009612</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18255</td>\n",
" <td>6.157929</td>\n",
" <td>7.948689</td>\n",
" <td>7.009548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>15076</td>\n",
" <td>7.308039</td>\n",
" <td>8.942276</td>\n",
" <td>8.009483</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>12859</td>\n",
" <td>8.045649</td>\n",
" <td>9.935862</td>\n",
" <td>9.009418</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" liczba tokenów średnia częstość w części B estymacje +1 estymacje +0.01\n",
"0 388334 1.900495 0.993586 0.009999\n",
"1 403870 0.592770 1.987172 1.009935\n",
"2 117529 1.565809 2.980759 2.009870\n",
"3 62800 2.514268 3.974345 3.009806\n",
"4 40856 3.504944 4.967931 4.009741\n",
"5 29443 4.454098 5.961517 5.009677\n",
"6 22709 5.232023 6.955103 6.009612\n",
"7 18255 6.157929 7.948689 7.009548\n",
"8 15076 7.308039 8.942276 8.009483\n",
"9 12859 8.045649 9.935862 9.009418"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"liczba tokenów średnia częstość w części B estymacje +1 estymacje +0.01\n",
"0 388334 1.900495 0.993586 0.009999\n",
"1 403870 0.592770 1.987172 1.009935\n",
"2 117529 1.565809 2.980759 2.009870\n",
"3 62800 2.514268 3.974345 3.009806\n",
"4 40856 3.504944 4.967931 4.009741\n",
"5 29443 4.454098 5.961517 5.009677\n",
"6 22709 5.232023 6.955103 6.009612\n",
"7 18255 6.157929 7.948689 7.009548\n",
"8 15076 7.308039 8.942276 8.009483\n",
"9 12859 8.045649 9.935862 9.009418"
]
}
],
"source": [
@ -716,128 +520,25 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>liczba tokenów</th>\n",
" <th>średnia częstość w części B</th>\n",
" <th>estymacje +1</th>\n",
" <th>Good-Turing</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>388334</td>\n",
" <td>1.900495</td>\n",
" <td>0.993586</td>\n",
" <td>1.040007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>403870</td>\n",
" <td>0.592770</td>\n",
" <td>1.987172</td>\n",
" <td>0.582014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>117529</td>\n",
" <td>1.565809</td>\n",
" <td>2.980759</td>\n",
" <td>1.603009</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>62800</td>\n",
" <td>2.514268</td>\n",
" <td>3.974345</td>\n",
" <td>2.602293</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>40856</td>\n",
" <td>3.504944</td>\n",
" <td>4.967931</td>\n",
" <td>3.603265</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>29443</td>\n",
" <td>4.454098</td>\n",
" <td>5.961517</td>\n",
" <td>4.627721</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>22709</td>\n",
" <td>5.232023</td>\n",
" <td>6.955103</td>\n",
" <td>5.627064</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>18255</td>\n",
" <td>6.157929</td>\n",
" <td>7.948689</td>\n",
" <td>6.606847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>15076</td>\n",
" <td>7.308039</td>\n",
" <td>8.942276</td>\n",
" <td>7.676506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>12859</td>\n",
" <td>8.045649</td>\n",
" <td>9.935862</td>\n",
" <td>8.557431</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" liczba tokenów średnia częstość w części B estymacje +1 Good-Turing\n",
"0 388334 1.900495 0.993586 1.040007\n",
"1 403870 0.592770 1.987172 0.582014\n",
"2 117529 1.565809 2.980759 1.603009\n",
"3 62800 2.514268 3.974345 2.602293\n",
"4 40856 3.504944 4.967931 3.603265\n",
"5 29443 4.454098 5.961517 4.627721\n",
"6 22709 5.232023 6.955103 5.627064\n",
"7 18255 6.157929 7.948689 6.606847\n",
"8 15076 7.308039 8.942276 7.676506\n",
"9 12859 8.045649 9.935862 8.557431"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"liczba tokenów średnia częstość w części B estymacje +1 Good-Turing\n",
"0 388334 1.900495 0.993586 1.040007\n",
"1 403870 0.592770 1.987172 0.582014\n",
"2 117529 1.565809 2.980759 1.603009\n",
"3 62800 2.514268 3.974345 2.602293\n",
"4 40856 3.504944 4.967931 3.603265\n",
"5 29443 4.454098 5.961517 4.627721\n",
"6 22709 5.232023 6.955103 5.627064\n",
"7 18255 6.157929 7.948689 6.606847\n",
"8 15076 7.308039 8.942276 7.676506\n",
"9 12859 8.045649 9.935862 8.557431"
]
}
],
"source": [
@ -1008,18 +709,15 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('k', 'o', 't'), ('o', 't', 'e'), ('t', 'e', 'k')]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"[('k', 'o', 't'), ('o', 't', 'e'), ('t', 'e', 'k')]"
]
}
],
"source": [
@ -1036,7 +734,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
@ -1048,23 +746,13 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"321"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"len(histories['jork'])\n",
"len(histories['zielony'])"
"len(histories['zielony'])\n",
"histories['jork']"
]
},
{
@ -1112,7 +800,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![img](./05_Wygladzanie/size-perplexity.gif \"Perplexity dla różnych rozmiarów zbioru testowego\")\n",
"![img](./07_Wygladzanie/size-perplexity.gif \"Perplexity dla różnych rozmiarów zbioru testowego\")\n",
"\n"
]
},
@ -1128,7 +816,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![img](./05_Wygladzanie/size-perplexity2.gif \"Perplexity dla różnych rozmiarów zbioru uczącego\")\n",
"![img](./07_Wygladzanie/size-perplexity2.gif \"Perplexity dla różnych rozmiarów zbioru uczącego\")\n",
"\n"
]
},
@ -1144,7 +832,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![img](./05_Wygladzanie/order-perplexity.gif \"Perplexity dla różnych wartości rządu modelu\")\n",
"![img](./07_Wygladzanie/order-perplexity.gif \"Perplexity dla różnych wartości rządu modelu\")\n",
"\n"
]
}
@ -1165,7 +853,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
"version": "3.10.5"
},
"org": null
},

View File

@ -25,7 +25,7 @@ $$p_i = \frac{k_i}{T}.$$
Rozpatrzmy przykład z 3 kolorami (wiemy, że w urnie mogą być kule
żółte, zielone i czerwone, tj. $m=3$) i 4 losowaniami ($T=4$):
[[./05_Wygladzanie/urna.drawio.png]]
[[./07_Wygladzanie/urna.drawio.png]]
Gdybyśmy w prosty sposób oszacowali prawdopodobieństwa, doszlibyśmy do
wniosku, że prawdopodobieństwo wylosowania kuli czerwonej wynosi 3/4, żółtej — 1/4,
@ -85,7 +85,7 @@ losowania kul z urny: $m$ to liczba wszystkich wyrazów (czyli rozmiar słownika
$k_i$ to ile razy w zbiorze uczącym pojawił się $i$-ty wyraz słownika,
$T$ — długość zbioru uczącego.
[[./05_Wygladzanie/urna-wyrazy.drawio.png]]
[[./07_Wygladzanie/urna-wyrazy.drawio.png]]
A zatem przy użyciu wygładzania +1 w następujący sposób estymować
będziemy prawdopodobieństwo słowa $w$:
@ -173,7 +173,7 @@ Stwórzmy generator, który będzie wczytywał słowa z pliku, dodatkowo:
- dodamy specjalne tokeny na początek i koniec zdania (~<s>~ i ~</s>~).
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
from itertools import islice
import regex as re
import sys
@ -200,7 +200,7 @@ Stwórzmy generator, który będzie wczytywał słowa z pliku, dodatkowo:
Zobaczmy, ile razy, średnio w drugiej połówce korpusu występują
wyrazy, które w pierwszej wystąpiły określoną liczbę razy.
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
from collections import Counter
counterA = Counter(get_words_from_file('opensubtitlesA.pl.txt'))
@ -210,7 +210,7 @@ wyrazy, które w pierwszej wystąpiły określoną liczbę razy.
:results:
:end:
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
counterA['taki']
#+END_SRC
@ -219,7 +219,7 @@ counterA['taki']
48113
:end:
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
max_r = 10
buckets = {}
@ -251,7 +251,7 @@ counterA['taki']
Policzmy teraz jakiej liczby wystąpień byśmy oczekiwali, gdyby użyć wygładzania +1 bądź +0.01.
(Uwaga: zwracamy liczbę wystąpień, a nie względną częstość, stąd przemnażamy przez rozmiar całego korpusu).
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
def plus_alpha_smoothing(alpha, m, t, k):
return t*(k + alpha)/(t + alpha * m)
@ -275,7 +275,7 @@ Policzmy teraz jakiej liczby wystąpień byśmy oczekiwali, gdyby użyć wygład
926594
:end:
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
import pandas as pd
pd.DataFrame(data, columns=["liczba tokenów", "średnia częstość w części B", "estymacje +1", "estymacje +0.01"])
@ -309,7 +309,7 @@ $$p(w) = \frac{\# w + 1}{|C|}\frac{N_{r+1}}{N_r}.$$
**** Przykład
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
good_turing_counts = [(ix+1)*nb_of_types[ix+1]/nb_of_types[ix] for ix in range(0, max_r)]
data2 = list(zip(nb_of_types, empirical_counts, plus_one_counts, good_turing_counts))
@ -415,7 +415,7 @@ W metodzie Knesera-Neya w następujący sposób estymujemy prawdopodobieństwo u
$$P(w) = \frac{N_{1+}(\bullet w)}{\sum_{w_j} N_{1+}(\bullet w_j)}.$$
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
def ngrams(iter, size):
ngram = []
for item in iter:
@ -433,7 +433,7 @@ $$P(w) = \frac{N_{1+}(\bullet w)}{\sum_{w_j} N_{1+}(\bullet w_j)}.$$
:end:
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
histories = { }
for prev_token, token in ngrams(get_words_from_file('opensubtitlesA.pl.txt'), 2):
histories.setdefault(token, set())
@ -444,7 +444,7 @@ $$P(w) = \frac{N_{1+}(\bullet w)}{\sum_{w_j} N_{1+}(\bullet w_j)}.$$
:results:
:end:
#+BEGIN_SRC python :session mysession :exports both :results raw drawer
#+BEGIN_SRC ipython :session mysession :exports both :results raw drawer
len(histories['jork'])
len(histories['zielony'])
histories['jork']
@ -472,15 +472,15 @@ Knesera-Neya połączone z *przycinaniem* słownika n-gramów (wszystkie
**** Zmiana perplexity przy zwiększaniu zbioru testowego
#+CAPTION: Perplexity dla różnych rozmiarów zbioru testowego
[[./05_Wygladzanie/size-perplexity.gif]]
[[./07_Wygladzanie/size-perplexity.gif]]
**** Zmiana perplexity przy zwiększaniu zbioru uczącego
#+CAPTION: Perplexity dla różnych rozmiarów zbioru uczącego
[[./05_Wygladzanie/size-perplexity2.gif]]
[[./07_Wygladzanie/size-perplexity2.gif]]
**** Zmiana perplexity przy zwiększaniu rządu modelu
#+CAPTION: Perplexity dla różnych wartości rządu modelu
[[./05_Wygladzanie/order-perplexity.gif]]
[[./07_Wygladzanie/order-perplexity.gif]]

View File

Before

Width:  |  Height:  |  Size: 4.1 KiB

After

Width:  |  Height:  |  Size: 4.1 KiB

View File

Before

Width:  |  Height:  |  Size: 4.5 KiB

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

Before

Width:  |  Height:  |  Size: 4.8 KiB

After

Width:  |  Height:  |  Size: 4.8 KiB

View File

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

View File

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 17 KiB