61 KiB
Parsing semantyczny z wykorzystaniem technik uczenia maszynowego
Wprowadzenie
Problem wykrywania slotów i ich wartości w wypowiedziach użytkownika można sformułować jako zadanie polegające na przewidywaniu dla poszczególnych słów etykiet wskazujących na to czy i do jakiego slotu dane słowo należy.
chciałbym zarezerwować stolik na jutro/day na godzinę dwunastą/hour czterdzieści/hour pięć/hour na pięć/size osób
Granice slotów oznacza się korzystając z wybranego schematu etykietowania.
Schemat IOB
Prefix | Znaczenie |
---|---|
I | wnętrze slotu (inside) |
O | poza slotem (outside) |
B | początek slotu (beginning) |
chciałbym zarezerwować stolik na jutro/B-day na godzinę dwunastą/B-hour czterdzieści/I-hour pięć/I-hour na pięć/B-size osób
Schemat IOBES
Prefix | Znaczenie |
---|---|
I | wnętrze slotu (inside) |
O | poza slotem (outside) |
B | początek slotu (beginning) |
E | koniec slotu (ending) |
S | pojedyncze słowo (single) |
chciałbym zarezerwować stolik na jutro/S-day na godzinę dwunastą/B-hour czterdzieści/I-hour pięć/E-hour na pięć/S-size osób
Jeżeli dla tak sformułowanego zadania przygotujemy zbiór danych złożony z wypowiedzi użytkownika z oznaczonymi slotami (tzw. _zbiór uczący), to możemy zastosować techniki (nadzorowanego) uczenia maszynowego w celu zbudowania modelu annotującego wypowiedzi użytkownika etykietami slotów.
Do zbudowania takiego modelu można wykorzystać między innymi:
warunkowe pola losowe (Lafferty i in.; 2001),
rekurencyjne sieci neuronowe, np. sieci LSTM (Hochreiter i Schmidhuber; 1997),
transformery (Vaswani i in., 2017).
Przykład
Skorzystamy ze zbioru danych przygotowanego przez Schustera (2019).
!mkdir -p l07
%cd l07
!curl -L -C - https://fb.me/multilingual_task_oriented_data -o data.zip
!unzip data.zip
%cd ..
g:\studia\studia uam\systemy dialogowe\notebooks\l07
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 14 8714k 14 1307k 0 0 860k 0 0:00:10 0:00:01 0:00:09 1354k 100 8714k 100 8714k 0 0 4650k 0 0:00:01 0:00:01 --:--:-- 6606k
g:\studia\studia uam\systemy dialogowe\notebooks
'unzip' is not recognized as an internal or external command, operable program or batch file.
'Expand-Archive' is not recognized as an internal or external command, operable program or batch file.
Zbiór ten gromadzi wypowiedzi w trzech językach opisane slotami dla dwunastu ram należących do trzech dziedzin Alarm
, Reminder
oraz Weather
. Dane wczytamy korzystając z biblioteki conllu.
from conllu import parse_incr
fields = ['id', 'form', 'frame', 'slot']
def nolabel2o(line, i):
return 'O' if line[i] == 'NoLabel' else line[i]
with open('l07/en/train-en.conllu') as trainfile:
trainset = list(parse_incr(trainfile, fields=fields, field_parsers={'slot': nolabel2o}))
with open('l07/en/test-en.conllu') as testfile:
testset = list(parse_incr(testfile, fields=fields, field_parsers={'slot': nolabel2o}))
Zobaczmy kilka przykładowych wypowiedzi z tego zbioru.
from tabulate import tabulate
tabulate(trainset[0], tablefmt='html')
1 | tell | weather/find | O |
2 | me | weather/find | O |
3 | the | weather/find | O |
4 | weather | weather/find | B-weather/noun |
5 | report | weather/find | I-weather/noun |
6 | for | weather/find | O |
7 | half | weather/find | B-location |
8 | moon | weather/find | I-location |
9 | bay | weather/find | I-location |
tabulate(trainset[1000], tablefmt='html')
1 | remind | reminder/set_reminder | O |
2 | me | reminder/set_reminder | O |
3 | about | reminder/set_reminder | O |
4 | game | reminder/set_reminder | B-reminder/todo |
5 | night | reminder/set_reminder | I-reminder/todo |
6 | on | reminder/set_reminder | B-datetime |
7 | friday | reminder/set_reminder | I-datetime |
8 | at | reminder/set_reminder | I-datetime |
9 | 4pm | reminder/set_reminder | I-datetime |
tabulate(trainset[2000], tablefmt='html')
1 | set | alarm/set_alarm | O |
2 | alarm | alarm/set_alarm | O |
3 | for | alarm/set_alarm | B-datetime |
4 | 20 | alarm/set_alarm | I-datetime |
5 | minutes | alarm/set_alarm | I-datetime |
Na potrzeby prezentacji procesu uczenia w jupyterowym notatniku zawęzimy zbiór danych do początkowych przykładów.
trainset = trainset[:300]
testset = testset[:300]
Budując model skorzystamy z architektury opartej o rekurencyjne sieci neuronowe zaimplementowanej w bibliotece flair (Akbik i in. 2018).
from flair.data import Corpus, Sentence, Token
from flair.datasets import FlairDatapointDataset
from flair.embeddings import StackedEmbeddings
from flair.embeddings import WordEmbeddings
from flair.embeddings import CharacterEmbeddings
from flair.embeddings import FlairEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
# determinizacja obliczeń
import random
import torch
random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
torch.cuda.manual_seed(0)
torch.cuda.manual_seed_all(0)
torch.backends.cudnn.enabled = False
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
c:\Users\macty\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Dane skonwertujemy do formatu wykorzystywanego przez flair
, korzystając z następującej funkcji.
def conllu2flair(sentences, label=None):
fsentences = []
for sentence in sentences:
fsentence = Sentence(' '.join(token['form'] for token in sentence), use_tokenizer=False)
start_idx = None
end_idx = None
tag = None
if label:
for idx, (token, ftoken) in enumerate(zip(sentence, fsentence)):
if token[label].startswith('B-'):
start_idx = idx
end_idx = idx
tag = token[label][2:]
elif token[label].startswith('I-'):
end_idx = idx
elif token[label] == 'O':
if start_idx is not None:
fsentence[start_idx:end_idx+1].add_label(label, tag)
start_idx = None
end_idx = None
tag = None
if start_idx is not None:
fsentence[start_idx:end_idx+1].add_label(label, tag)
fsentences.append(fsentence)
return FlairDatapointDataset(fsentences)
corpus = Corpus(train=conllu2flair(trainset, 'slot'), test=conllu2flair(testset, 'slot'))
print(corpus)
tag_dictionary = corpus.make_label_dictionary(label_type='slot')
print(tag_dictionary)
Corpus: 270 train + 30 dev + 300 test sentences 2023-04-16 23:32:52,141 Computing label dictionary. Progress:
270it [00:00, 53998.76it/s]
2023-04-16 23:32:52,151 Dictionary created for label 'slot' with 5 values: datetime (seen 174 times), weather/attribute (seen 77 times), weather/noun (seen 66 times), location (seen 44 times) Dictionary with 5 tags: <unk>, datetime, weather/attribute, weather/noun, location
Nasz model będzie wykorzystywał wektorowe reprezentacje słów (zob. Word Embeddings).
embedding_types = [
WordEmbeddings('en'),
FlairEmbeddings('en-forward'),
FlairEmbeddings('en-backward'),
CharacterEmbeddings(),
]
embeddings = StackedEmbeddings(embeddings=embedding_types)
tagger = SequenceTagger(hidden_size=256, embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type='slot', use_crf=True)
2023-04-16 23:32:52,457 https://flair.informatik.hu-berlin.de/resources/embeddings/token/en-fasttext-news-300d-1M.vectors.npy not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpxx1l309v
100%|██████████| 1.12G/1.12G [01:03<00:00, 18.8MB/s]
2023-04-16 23:33:56,277 copying C:\Users\macty\AppData\Local\Temp\tmpxx1l309v to cache at C:\Users\macty\.flair\embeddings\en-fasttext-news-300d-1M.vectors.npy
2023-04-16 23:33:57,359 removing temp file C:\Users\macty\AppData\Local\Temp\tmpxx1l309v 2023-04-16 23:33:58,068 https://flair.informatik.hu-berlin.de/resources/embeddings/token/en-fasttext-news-300d-1M not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9
100%|██████████| 52.1M/52.1M [00:02<00:00, 18.9MB/s]
2023-04-16 23:34:01,079 copying C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9 to cache at C:\Users\macty\.flair\embeddings\en-fasttext-news-300d-1M 2023-04-16 23:34:01,116 removing temp file C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9
2023-04-16 23:34:09,948 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-forward-0.4.1.pt not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn
100%|██████████| 69.7M/69.7M [00:03<00:00, 21.2MB/s]
2023-04-16 23:34:13,532 copying C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn to cache at C:\Users\macty\.flair\embeddings\news-forward-0.4.1.pt 2023-04-16 23:34:13,579 removing temp file C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn
2023-04-16 23:34:14,221 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-backward-0.4.1.pt not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr
100%|██████████| 69.7M/69.7M [00:03<00:00, 22.5MB/s]
2023-04-16 23:34:17,584 copying C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr to cache at C:\Users\macty\.flair\embeddings\news-backward-0.4.1.pt 2023-04-16 23:34:17,631 removing temp file C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr
2023-04-16 23:34:18,016 https://flair.informatik.hu-berlin.de/resources/characters/common_characters not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpqyh0003q
100%|██████████| 2.82k/2.82k [00:00<00:00, 1.44MB/s]
2023-04-16 23:34:18,139 copying C:\Users\macty\AppData\Local\Temp\tmpqyh0003q to cache at C:\Users\macty\.flair\datasets\common_characters 2023-04-16 23:34:18,145 removing temp file C:\Users\macty\AppData\Local\Temp\tmpqyh0003q 2023-04-16 23:34:18,147 SequenceTagger predicts: Dictionary with 17 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location
Zobaczmy jak wygląda architektura sieci neuronowej, która będzie odpowiedzialna za przewidywanie slotów w wypowiedziach.
print(tagger)
SequenceTagger( (embeddings): StackedEmbeddings( (list_embedding_0): WordEmbeddings( 'en' (embedding): Embedding(1000001, 300) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.05, inplace=False) (encoder): Embedding(300, 100) (rnn): LSTM(100, 2048) ) ) (list_embedding_2): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.05, inplace=False) (encoder): Embedding(300, 100) (rnn): LSTM(100, 2048) ) ) (list_embedding_3): CharacterEmbeddings( (char_embedding): Embedding(275, 25) (char_rnn): LSTM(25, 25, bidirectional=True) ) ) (word_dropout): WordDropout(p=0.05) (locked_dropout): LockedDropout(p=0.5) (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True) (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=19, bias=True) (loss_function): ViterbiLoss() (crf): CRF() )
Wykonamy dziesięć iteracji (epok) uczenia a wynikowy model zapiszemy w katalogu slot-model
.
trainer = ModelTrainer(tagger, corpus)
trainer.train('slot-model',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=10,
train_with_dev=False)
2023-04-16 23:34:18,357 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,358 Model: "SequenceTagger( (embeddings): StackedEmbeddings( (list_embedding_0): WordEmbeddings( 'en' (embedding): Embedding(1000001, 300) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.05, inplace=False) (encoder): Embedding(300, 100) (rnn): LSTM(100, 2048) ) ) (list_embedding_2): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.05, inplace=False) (encoder): Embedding(300, 100) (rnn): LSTM(100, 2048) ) ) (list_embedding_3): CharacterEmbeddings( (char_embedding): Embedding(275, 25) (char_rnn): LSTM(25, 25, bidirectional=True) ) ) (word_dropout): WordDropout(p=0.05) (locked_dropout): LockedDropout(p=0.5) (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True) (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True) (linear): Linear(in_features=512, out_features=19, bias=True) (loss_function): ViterbiLoss() (crf): CRF() )" 2023-04-16 23:34:18,359 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,360 Corpus: "Corpus: 270 train + 30 dev + 300 test sentences" 2023-04-16 23:34:18,360 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,361 Parameters: 2023-04-16 23:34:18,361 - learning_rate: "0.100000" 2023-04-16 23:34:18,361 - mini_batch_size: "32" 2023-04-16 23:34:18,362 - patience: "3" 2023-04-16 23:34:18,363 - anneal_factor: "0.5" 2023-04-16 23:34:18,363 - max_epochs: "10" 2023-04-16 23:34:18,363 - shuffle: "True" 2023-04-16 23:34:18,364 - train_with_dev: "False" 2023-04-16 23:34:18,364 - batch_growth_annealing: "False" 2023-04-16 23:34:18,365 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,365 Model training base path: "slot-model" 2023-04-16 23:34:18,366 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,367 Device: cpu 2023-04-16 23:34:18,367 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:18,368 Embeddings storage mode: cpu 2023-04-16 23:34:18,368 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:20,279 epoch 1 - iter 1/9 - loss 3.46586463 - time (sec): 1.91 - samples/sec: 152.88 - lr: 0.100000 2023-04-16 23:34:22,063 epoch 1 - iter 2/9 - loss 3.09300251 - time (sec): 3.69 - samples/sec: 153.49 - lr: 0.100000 2023-04-16 23:34:23,066 epoch 1 - iter 3/9 - loss 3.09930113 - time (sec): 4.70 - samples/sec: 154.78 - lr: 0.100000 2023-04-16 23:34:24,090 epoch 1 - iter 4/9 - loss 2.95447677 - time (sec): 5.72 - samples/sec: 155.39 - lr: 0.100000 2023-04-16 23:34:24,981 epoch 1 - iter 5/9 - loss 2.81596711 - time (sec): 6.61 - samples/sec: 157.92 - lr: 0.100000 2023-04-16 23:34:25,968 epoch 1 - iter 6/9 - loss 2.61288632 - time (sec): 7.60 - samples/sec: 156.47 - lr: 0.100000 2023-04-16 23:34:26,968 epoch 1 - iter 7/9 - loss 2.37758500 - time (sec): 8.60 - samples/sec: 153.85 - lr: 0.100000 2023-04-16 23:34:27,959 epoch 1 - iter 8/9 - loss 2.29585029 - time (sec): 9.59 - samples/sec: 153.81 - lr: 0.100000 2023-04-16 23:34:28,676 epoch 1 - iter 9/9 - loss 2.27926901 - time (sec): 10.31 - samples/sec: 150.87 - lr: 0.100000 2023-04-16 23:34:28,678 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:28,678 EPOCH 1 done: loss 2.2793 - lr 0.100000
100%|██████████| 1/1 [00:01<00:00, 1.12s/it]
2023-04-16 23:34:29,797 Evaluating as a multi-label problem: False 2023-04-16 23:34:29,806 DEV : loss 1.2717913389205933 - f1-score (micro avg) 0.0357 2023-04-16 23:34:29,808 BAD EPOCHS (no improvement): 0 2023-04-16 23:34:29,809 saving best model
2023-04-16 23:34:41,288 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:42,050 epoch 2 - iter 1/9 - loss 1.43326204 - time (sec): 0.76 - samples/sec: 241.47 - lr: 0.100000 2023-04-16 23:34:42,846 epoch 2 - iter 2/9 - loss 1.40312603 - time (sec): 1.56 - samples/sec: 234.92 - lr: 0.100000 2023-04-16 23:34:43,571 epoch 2 - iter 3/9 - loss 1.39219711 - time (sec): 2.28 - samples/sec: 235.65 - lr: 0.100000 2023-04-16 23:34:44,338 epoch 2 - iter 4/9 - loss 1.38006975 - time (sec): 3.05 - samples/sec: 232.13 - lr: 0.100000 2023-04-16 23:34:45,066 epoch 2 - iter 5/9 - loss 1.34341906 - time (sec): 3.78 - samples/sec: 232.93 - lr: 0.100000 2023-04-16 23:34:45,850 epoch 2 - iter 6/9 - loss 1.31603716 - time (sec): 4.56 - samples/sec: 234.55 - lr: 0.100000 2023-04-16 23:34:46,612 epoch 2 - iter 7/9 - loss 1.28879818 - time (sec): 5.32 - samples/sec: 237.23 - lr: 0.100000 2023-04-16 23:34:47,400 epoch 2 - iter 8/9 - loss 1.27353642 - time (sec): 6.11 - samples/sec: 238.71 - lr: 0.100000 2023-04-16 23:34:47,821 epoch 2 - iter 9/9 - loss 1.27439781 - time (sec): 6.53 - samples/sec: 238.02 - lr: 0.100000 2023-04-16 23:34:47,823 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:47,823 EPOCH 2 done: loss 1.2744 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 4.76it/s]
2023-04-16 23:34:48,036 Evaluating as a multi-label problem: False 2023-04-16 23:34:48,044 DEV : loss 0.7916695475578308 - f1-score (micro avg) 0.0 2023-04-16 23:34:48,046 BAD EPOCHS (no improvement): 1 2023-04-16 23:34:48,047 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:48,750 epoch 3 - iter 1/9 - loss 0.95985486 - time (sec): 0.70 - samples/sec: 266.38 - lr: 0.100000 2023-04-16 23:34:49,421 epoch 3 - iter 2/9 - loss 0.93650848 - time (sec): 1.37 - samples/sec: 258.56 - lr: 0.100000 2023-04-16 23:34:50,208 epoch 3 - iter 3/9 - loss 0.91011594 - time (sec): 2.16 - samples/sec: 249.54 - lr: 0.100000 2023-04-16 23:34:50,909 epoch 3 - iter 4/9 - loss 0.89886124 - time (sec): 2.86 - samples/sec: 249.21 - lr: 0.100000 2023-04-16 23:34:51,580 epoch 3 - iter 5/9 - loss 0.87307849 - time (sec): 3.53 - samples/sec: 251.48 - lr: 0.100000 2023-04-16 23:34:52,375 epoch 3 - iter 6/9 - loss 0.85130603 - time (sec): 4.33 - samples/sec: 251.67 - lr: 0.100000 2023-04-16 23:34:53,161 epoch 3 - iter 7/9 - loss 0.83100431 - time (sec): 5.11 - samples/sec: 249.75 - lr: 0.100000 2023-04-16 23:34:53,966 epoch 3 - iter 8/9 - loss 0.79737653 - time (sec): 5.92 - samples/sec: 248.39 - lr: 0.100000 2023-04-16 23:34:54,372 epoch 3 - iter 9/9 - loss 0.78005277 - time (sec): 6.32 - samples/sec: 245.89 - lr: 0.100000 2023-04-16 23:34:54,373 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:34:54,374 EPOCH 3 done: loss 0.7801 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 4.50it/s]
2023-04-16 23:34:54,600 Evaluating as a multi-label problem: False 2023-04-16 23:34:54,608 DEV : loss 0.33898717164993286 - f1-score (micro avg) 0.7761 2023-04-16 23:34:54,610 BAD EPOCHS (no improvement): 0 2023-04-16 23:34:54,611 saving best model
2023-04-16 23:35:08,824 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:09,623 epoch 4 - iter 1/9 - loss 0.41515362 - time (sec): 0.80 - samples/sec: 244.67 - lr: 0.100000 2023-04-16 23:35:10,428 epoch 4 - iter 2/9 - loss 0.51239108 - time (sec): 1.60 - samples/sec: 243.44 - lr: 0.100000 2023-04-16 23:35:11,205 epoch 4 - iter 3/9 - loss 0.45658054 - time (sec): 2.38 - samples/sec: 232.97 - lr: 0.100000 2023-04-16 23:35:11,983 epoch 4 - iter 4/9 - loss 0.45578642 - time (sec): 3.16 - samples/sec: 231.55 - lr: 0.100000 2023-04-16 23:35:12,730 epoch 4 - iter 5/9 - loss 0.44140354 - time (sec): 3.90 - samples/sec: 236.42 - lr: 0.100000 2023-04-16 23:35:13,509 epoch 4 - iter 6/9 - loss 0.42847639 - time (sec): 4.68 - samples/sec: 235.10 - lr: 0.100000 2023-04-16 23:35:14,254 epoch 4 - iter 7/9 - loss 0.42680964 - time (sec): 5.43 - samples/sec: 238.02 - lr: 0.100000 2023-04-16 23:35:15,046 epoch 4 - iter 8/9 - loss 0.41317872 - time (sec): 6.22 - samples/sec: 238.91 - lr: 0.100000 2023-04-16 23:35:15,362 epoch 4 - iter 9/9 - loss 0.40072542 - time (sec): 6.54 - samples/sec: 237.91 - lr: 0.100000 2023-04-16 23:35:15,364 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:15,364 EPOCH 4 done: loss 0.4007 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 4.63it/s]
2023-04-16 23:35:15,583 Evaluating as a multi-label problem: False 2023-04-16 23:35:15,591 DEV : loss 0.16208426654338837 - f1-score (micro avg) 0.9231 2023-04-16 23:35:15,593 BAD EPOCHS (no improvement): 0 2023-04-16 23:35:15,595 saving best model
2023-04-16 23:35:26,934 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:27,683 epoch 5 - iter 1/9 - loss 0.23677549 - time (sec): 0.75 - samples/sec: 243.64 - lr: 0.100000 2023-04-16 23:35:28,430 epoch 5 - iter 2/9 - loss 0.21526806 - time (sec): 1.49 - samples/sec: 236.28 - lr: 0.100000 2023-04-16 23:35:29,218 epoch 5 - iter 3/9 - loss 0.18914680 - time (sec): 2.28 - samples/sec: 232.25 - lr: 0.100000 2023-04-16 23:35:30,001 epoch 5 - iter 4/9 - loss 0.21946464 - time (sec): 3.06 - samples/sec: 235.24 - lr: 0.100000 2023-04-16 23:35:30,749 epoch 5 - iter 5/9 - loss 0.20726706 - time (sec): 3.81 - samples/sec: 238.39 - lr: 0.100000 2023-04-16 23:35:31,440 epoch 5 - iter 6/9 - loss 0.20088127 - time (sec): 4.50 - samples/sec: 242.67 - lr: 0.100000 2023-04-16 23:35:32,199 epoch 5 - iter 7/9 - loss 0.21393993 - time (sec): 5.26 - samples/sec: 242.64 - lr: 0.100000 2023-04-16 23:35:32,995 epoch 5 - iter 8/9 - loss 0.20896170 - time (sec): 6.06 - samples/sec: 241.62 - lr: 0.100000 2023-04-16 23:35:33,438 epoch 5 - iter 9/9 - loss 0.22090499 - time (sec): 6.50 - samples/sec: 239.16 - lr: 0.100000 2023-04-16 23:35:33,440 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:33,440 EPOCH 5 done: loss 0.2209 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 4.67it/s]
2023-04-16 23:35:33,657 Evaluating as a multi-label problem: False 2023-04-16 23:35:33,664 DEV : loss 0.07899065315723419 - f1-score (micro avg) 0.9706 2023-04-16 23:35:33,667 BAD EPOCHS (no improvement): 0 2023-04-16 23:35:33,668 saving best model
2023-04-16 23:35:47,995 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:48,716 epoch 6 - iter 1/9 - loss 0.08489894 - time (sec): 0.72 - samples/sec: 265.65 - lr: 0.100000 2023-04-16 23:35:49,474 epoch 6 - iter 2/9 - loss 0.16483877 - time (sec): 1.48 - samples/sec: 262.02 - lr: 0.100000 2023-04-16 23:35:50,229 epoch 6 - iter 3/9 - loss 0.17961087 - time (sec): 2.23 - samples/sec: 261.20 - lr: 0.100000 2023-04-16 23:35:50,883 epoch 6 - iter 4/9 - loss 0.17008273 - time (sec): 2.89 - samples/sec: 266.11 - lr: 0.100000 2023-04-16 23:35:51,580 epoch 6 - iter 5/9 - loss 0.16165644 - time (sec): 3.58 - samples/sec: 259.35 - lr: 0.100000 2023-04-16 23:35:52,303 epoch 6 - iter 6/9 - loss 0.16796368 - time (sec): 4.31 - samples/sec: 257.37 - lr: 0.100000 2023-04-16 23:35:52,971 epoch 6 - iter 7/9 - loss 0.15208281 - time (sec): 4.97 - samples/sec: 254.12 - lr: 0.100000 2023-04-16 23:35:53,698 epoch 6 - iter 8/9 - loss 0.14079077 - time (sec): 5.70 - samples/sec: 255.74 - lr: 0.100000 2023-04-16 23:35:54,094 epoch 6 - iter 9/9 - loss 0.14176936 - time (sec): 6.10 - samples/sec: 255.04 - lr: 0.100000 2023-04-16 23:35:54,095 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:35:54,096 EPOCH 6 done: loss 0.1418 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 5.41it/s]
2023-04-16 23:35:54,284 Evaluating as a multi-label problem: False 2023-04-16 23:35:54,292 DEV : loss 0.05525948479771614 - f1-score (micro avg) 0.9706 2023-04-16 23:35:54,294 BAD EPOCHS (no improvement): 0 2023-04-16 23:35:54,295 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:55,007 epoch 7 - iter 1/9 - loss 0.06206851 - time (sec): 0.71 - samples/sec: 262.64 - lr: 0.100000 2023-04-16 23:35:55,659 epoch 7 - iter 2/9 - loss 0.07954220 - time (sec): 1.36 - samples/sec: 257.33 - lr: 0.100000 2023-04-16 23:35:56,426 epoch 7 - iter 3/9 - loss 0.07803471 - time (sec): 2.13 - samples/sec: 248.71 - lr: 0.100000 2023-04-16 23:35:57,138 epoch 7 - iter 4/9 - loss 0.09479619 - time (sec): 2.84 - samples/sec: 247.27 - lr: 0.100000 2023-04-16 23:35:57,921 epoch 7 - iter 5/9 - loss 0.12208708 - time (sec): 3.63 - samples/sec: 251.03 - lr: 0.100000 2023-04-16 23:35:58,664 epoch 7 - iter 6/9 - loss 0.11140702 - time (sec): 4.37 - samples/sec: 251.09 - lr: 0.100000 2023-04-16 23:35:59,452 epoch 7 - iter 7/9 - loss 0.10265536 - time (sec): 5.16 - samples/sec: 250.34 - lr: 0.100000 2023-04-16 23:36:00,146 epoch 7 - iter 8/9 - loss 0.09764915 - time (sec): 5.85 - samples/sec: 252.61 - lr: 0.100000 2023-04-16 23:36:00,536 epoch 7 - iter 9/9 - loss 0.09412069 - time (sec): 6.24 - samples/sec: 249.16 - lr: 0.100000 2023-04-16 23:36:00,537 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:36:00,538 EPOCH 7 done: loss 0.0941 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 4.76it/s]
2023-04-16 23:36:00,751 Evaluating as a multi-label problem: False 2023-04-16 23:36:00,759 DEV : loss 0.04532366245985031 - f1-score (micro avg) 0.9706 2023-04-16 23:36:00,761 BAD EPOCHS (no improvement): 0 2023-04-16 23:36:00,762 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:01,551 epoch 8 - iter 1/9 - loss 0.03479994 - time (sec): 0.79 - samples/sec: 234.47 - lr: 0.100000 2023-04-16 23:36:02,299 epoch 8 - iter 2/9 - loss 0.03457490 - time (sec): 1.54 - samples/sec: 235.52 - lr: 0.100000 2023-04-16 23:36:03,135 epoch 8 - iter 3/9 - loss 0.02756396 - time (sec): 2.37 - samples/sec: 232.19 - lr: 0.100000 2023-04-16 23:36:03,831 epoch 8 - iter 4/9 - loss 0.03832822 - time (sec): 3.07 - samples/sec: 235.58 - lr: 0.100000 2023-04-16 23:36:04,675 epoch 8 - iter 5/9 - loss 0.05709582 - time (sec): 3.91 - samples/sec: 236.64 - lr: 0.100000 2023-04-16 23:36:05,470 epoch 8 - iter 6/9 - loss 0.07284620 - time (sec): 4.71 - samples/sec: 237.68 - lr: 0.100000 2023-04-16 23:36:06,242 epoch 8 - iter 7/9 - loss 0.09056851 - time (sec): 5.48 - samples/sec: 238.50 - lr: 0.100000 2023-04-16 23:36:06,934 epoch 8 - iter 8/9 - loss 0.08531148 - time (sec): 6.17 - samples/sec: 238.50 - lr: 0.100000 2023-04-16 23:36:07,344 epoch 8 - iter 9/9 - loss 0.08690437 - time (sec): 6.58 - samples/sec: 236.25 - lr: 0.100000 2023-04-16 23:36:07,345 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:36:07,346 EPOCH 8 done: loss 0.0869 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 5.03it/s]
2023-04-16 23:36:07,548 Evaluating as a multi-label problem: False 2023-04-16 23:36:07,556 DEV : loss 0.024222934618592262 - f1-score (micro avg) 0.9706 2023-04-16 23:36:07,558 BAD EPOCHS (no improvement): 0 2023-04-16 23:36:07,559 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:08,281 epoch 9 - iter 1/9 - loss 0.04802091 - time (sec): 0.72 - samples/sec: 239.94 - lr: 0.100000 2023-04-16 23:36:09,019 epoch 9 - iter 2/9 - loss 0.04962141 - time (sec): 1.46 - samples/sec: 241.95 - lr: 0.100000 2023-04-16 23:36:09,715 epoch 9 - iter 3/9 - loss 0.04733611 - time (sec): 2.16 - samples/sec: 258.47 - lr: 0.100000 2023-04-16 23:36:10,377 epoch 9 - iter 4/9 - loss 0.05639010 - time (sec): 2.82 - samples/sec: 258.08 - lr: 0.100000 2023-04-16 23:36:11,111 epoch 9 - iter 5/9 - loss 0.11318641 - time (sec): 3.55 - samples/sec: 263.87 - lr: 0.100000 2023-04-16 23:36:11,812 epoch 9 - iter 6/9 - loss 0.10583430 - time (sec): 4.25 - samples/sec: 260.58 - lr: 0.100000 2023-04-16 23:36:12,486 epoch 9 - iter 7/9 - loss 0.09734085 - time (sec): 4.93 - samples/sec: 262.89 - lr: 0.100000 2023-04-16 23:36:13,213 epoch 9 - iter 8/9 - loss 0.09001355 - time (sec): 5.65 - samples/sec: 259.68 - lr: 0.100000 2023-04-16 23:36:13,582 epoch 9 - iter 9/9 - loss 0.08649074 - time (sec): 6.02 - samples/sec: 258.22 - lr: 0.100000 2023-04-16 23:36:13,583 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:36:13,584 EPOCH 9 done: loss 0.0865 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 5.41it/s]
2023-04-16 23:36:13,772 Evaluating as a multi-label problem: False 2023-04-16 23:36:13,779 DEV : loss 0.0387137271463871 - f1-score (micro avg) 0.9706 2023-04-16 23:36:13,781 BAD EPOCHS (no improvement): 1 2023-04-16 23:36:13,782 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:14,486 epoch 10 - iter 1/9 - loss 0.01475051 - time (sec): 0.70 - samples/sec: 251.42 - lr: 0.100000 2023-04-16 23:36:15,220 epoch 10 - iter 2/9 - loss 0.02797203 - time (sec): 1.44 - samples/sec: 246.17 - lr: 0.100000 2023-04-16 23:36:15,881 epoch 10 - iter 3/9 - loss 0.03366118 - time (sec): 2.10 - samples/sec: 250.12 - lr: 0.100000 2023-04-16 23:36:16,583 epoch 10 - iter 4/9 - loss 0.02913842 - time (sec): 2.80 - samples/sec: 256.33 - lr: 0.100000 2023-04-16 23:36:17,329 epoch 10 - iter 5/9 - loss 0.02953250 - time (sec): 3.55 - samples/sec: 258.53 - lr: 0.100000 2023-04-16 23:36:18,035 epoch 10 - iter 6/9 - loss 0.04606061 - time (sec): 4.25 - samples/sec: 259.82 - lr: 0.100000 2023-04-16 23:36:18,729 epoch 10 - iter 7/9 - loss 0.04505843 - time (sec): 4.95 - samples/sec: 260.56 - lr: 0.100000 2023-04-16 23:36:19,462 epoch 10 - iter 8/9 - loss 0.05644751 - time (sec): 5.68 - samples/sec: 258.63 - lr: 0.100000 2023-04-16 23:36:19,893 epoch 10 - iter 9/9 - loss 0.06249184 - time (sec): 6.11 - samples/sec: 254.46 - lr: 0.100000 2023-04-16 23:36:19,895 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:36:19,895 EPOCH 10 done: loss 0.0625 - lr 0.100000
100%|██████████| 1/1 [00:00<00:00, 5.43it/s]
2023-04-16 23:36:20,082 Evaluating as a multi-label problem: False 2023-04-16 23:36:20,089 DEV : loss 0.03223632648587227 - f1-score (micro avg) 0.9275 2023-04-16 23:36:20,091 BAD EPOCHS (no improvement): 2
2023-04-16 23:36:31,813 ---------------------------------------------------------------------------------------------------- 2023-04-16 23:36:35,403 SequenceTagger predicts: Dictionary with 19 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location, <START>, <STOP>
100%|██████████| 10/10 [00:13<00:00, 1.40s/it]
2023-04-16 23:36:50,601 Evaluating as a multi-label problem: False 2023-04-16 23:36:50,612 0.2111 0.1989 0.2048 0.1176 2023-04-16 23:36:50,613 Results: - F-score (micro) 0.2048 - F-score (macro) 0.1106 - Accuracy 0.1176 By class: precision recall f1-score support datetime 0.2136 0.2327 0.2227 202 weather/noun 0.2184 0.5588 0.3140 34 reminder/todo 0.0000 0.0000 0.0000 46 reminder/noun 0.0000 0.0000 0.0000 42 weather/attribute 0.1765 0.1667 0.1714 18 location 0.1765 0.1765 0.1765 17 reminder/recurring_period 0.0000 0.0000 0.0000 2 negation 0.0000 0.0000 0.0000 1 micro avg 0.2111 0.1989 0.2048 362 macro avg 0.0981 0.1418 0.1106 362 weighted avg 0.1568 0.1989 0.1706 362 2023-04-16 23:36:50,613 ----------------------------------------------------------------------------------------------------
{'test_score': 0.20483641536273117, 'dev_score_history': [0.03571428571428571, 0.0, 0.7761194029850745, 0.923076923076923, 0.9705882352941176, 0.9705882352941176, 0.9705882352941176, 0.9705882352941176, 0.9705882352941176, 0.9275362318840579], 'train_loss_history': [2.279269006857918, 1.2743978126256028, 0.7800527689157958, 0.40072541558857516, 0.22090499240102493, 0.14176936011605706, 0.09412068554059486, 0.08690436752662781, 0.08649074149668408, 0.06249183581189711], 'dev_loss_history': [1.2717913389205933, 0.7916695475578308, 0.33898717164993286, 0.16208426654338837, 0.07899065315723419, 0.05525948479771614, 0.04532366245985031, 0.024222934618592262, 0.0387137271463871, 0.03223632648587227]}
Jakość wyuczonego modelu możemy ocenić, korzystając z zaraportowanych powyżej metryk, tj.:
_tp (true positives)
liczba słów oznaczonych w zbiorze testowym etykietą $e$, które model oznaczył tą etykietą
_fp (false positives)
liczba słów nieoznaczonych w zbiorze testowym etykietą $e$, które model oznaczył tą etykietą
_fn (false negatives)
liczba słów oznaczonych w zbiorze testowym etykietą $e$, którym model nie nadał etykiety $e$
_precision
$$\frac{tp}{tp + fp}$$
_recall
$$\frac{tp}{tp + fn}$$
$F_1$
$$\frac{2 \cdot precision \cdot recall}{precision + recall}$$
_micro $F_1$
$F_1$ w którym $tp$, $fp$ i $fn$ są liczone łącznie dla wszystkich etykiet, tj. $tp = \sum_{e}{{tp}_e}$, $fn = \sum{e}{{fn}e}$, $fp = \sum{e}{{fp}_e}$
_macro $F_1$
średnia arytmetyczna z $F_1$ obliczonych dla poszczególnych etykiet z osobna.
Wyuczony model możemy wczytać z pliku korzystając z metody load
.
model = SequenceTagger.load('slot-model/final-model.pt')
2023-04-16 23:37:00,272 SequenceTagger predicts: Dictionary with 19 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location, <START>, <STOP>
Wczytany model możemy wykorzystać do przewidywania slotów w wypowiedziach użytkownika, korzystając
z przedstawionej poniżej funkcji predict
.
def predict(model, sentence):
csentence = [{'form': word, 'slot': 'O'} for word in sentence]
fsentence = conllu2flair([csentence])[0]
model.predict(fsentence)
for span in fsentence.get_spans('slot'):
tag = span.get_label('slot').value
csentence[span.tokens[0].idx - 1]['slot'] = f'B-{tag}'
for token in span.tokens[1:]:
csentence[token.idx - 1]['slot'] = f'I-{tag}'
return csentence
tabulate(predict(model, 'set alarm for 20 minutes'.split()), tablefmt='html')
set | O |
alarm | O |
for | B-datetime |
20 | B-datetime |
minutes | I-datetime |
tabulate(predict(model, 'change my 3 pm alarm to the next day'.split()), tablefmt='html')
change | O |
my | O |
3 | O |
pm | O |
alarm | O |
to | O |
the | O |
next | O |
day | B-weather/noun |
Literatura
- Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis, Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog. NAACL-HLT (1) 2019, pp. 3795-3805
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289, https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 15, 1997), 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Attention is All you Need, NIPS 2017, pp. 5998-6008, https://arxiv.org/abs/1706.03762
- Alan Akbik, Duncan Blythe, Roland Vollgraf, Contextual String Embeddings for Sequence Labeling, Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649, https://www.aclweb.org/anthology/C18-1139.pdf