Maciej Tyczynski d9c9c06603 created model, processed user data

2023-04-20 16:28:18 +02:00

61 KiB

Raw Blame History

Parsing semantyczny z wykorzystaniem technik uczenia maszynowego

Wprowadzenie

Problem wykrywania slotów i ich wartości w wypowiedziach użytkownika można sformułować jako zadanie polegające na przewidywaniu dla poszczególnych słów etykiet wskazujących na to czy i do jakiego slotu dane słowo należy.

chciałbym zarezerwować stolik na jutro/day na godzinę dwunastą/hour czterdzieści/hour pięć/hour na pięć/size osób

Granice slotów oznacza się korzystając z wybranego schematu etykietowania.

Schemat IOB

Prefix	Znaczenie
I	wnętrze slotu (inside)
O	poza slotem (outside)
B	początek slotu (beginning)

chciałbym zarezerwować stolik na jutro/B-day na godzinę dwunastą/B-hour czterdzieści/I-hour pięć/I-hour na pięć/B-size osób

Schemat IOBES

Prefix	Znaczenie
I	wnętrze slotu (inside)
O	poza slotem (outside)
B	początek slotu (beginning)
E	koniec slotu (ending)
S	pojedyncze słowo (single)

chciałbym zarezerwować stolik na jutro/S-day na godzinę dwunastą/B-hour czterdzieści/I-hour pięć/E-hour na pięć/S-size osób

Jeżeli dla tak sformułowanego zadania przygotujemy zbiór danych złożony z wypowiedzi użytkownika z oznaczonymi slotami (tzw. _zbiór uczący), to możemy zastosować techniki (nadzorowanego) uczenia maszynowego w celu zbudowania modelu annotującego wypowiedzi użytkownika etykietami slotów.

Do zbudowania takiego modelu można wykorzystać między innymi:

warunkowe pola losowe (Lafferty i in.; 2001),
rekurencyjne sieci neuronowe, np. sieci LSTM (Hochreiter i Schmidhuber; 1997),
transformery (Vaswani i in., 2017).

Przykład

Skorzystamy ze zbioru danych przygotowanego przez Schustera (2019).

!mkdir -p l07
%cd l07
!curl -L -C -  https://fb.me/multilingual_task_oriented_data  -o data.zip
!unzip data.zip
%cd ..

g:\studia\studia uam\systemy dialogowe\notebooks\l07

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 14 8714k   14 1307k    0     0   860k      0  0:00:10  0:00:01  0:00:09 1354k
100 8714k  100 8714k    0     0  4650k      0  0:00:01  0:00:01 --:--:-- 6606k

g:\studia\studia uam\systemy dialogowe\notebooks

'unzip' is not recognized as an internal or external command,
operable program or batch file.

'Expand-Archive' is not recognized as an internal or external command,
operable program or batch file.

Zbiór ten gromadzi wypowiedzi w trzech językach opisane slotami dla dwunastu ram należących do trzech dziedzin Alarm, Reminder oraz Weather. Dane wczytamy korzystając z biblioteki conllu.

from conllu import parse_incr
fields = ['id', 'form', 'frame', 'slot']

def nolabel2o(line, i):
    return 'O' if line[i] == 'NoLabel' else line[i]

with open('l07/en/train-en.conllu') as trainfile:
    trainset = list(parse_incr(trainfile, fields=fields, field_parsers={'slot': nolabel2o}))
with open('l07/en/test-en.conllu') as testfile:
    testset = list(parse_incr(testfile, fields=fields, field_parsers={'slot': nolabel2o}))

Zobaczmy kilka przykładowych wypowiedzi z tego zbioru.

from tabulate import tabulate
tabulate(trainset[0], tablefmt='html')

1	tell	weather/find	O
2	me	weather/find	O
3	the	weather/find	O
4	weather	weather/find	B-weather/noun
5	report	weather/find	I-weather/noun
6	for	weather/find	O
7	half	weather/find	B-location
8	moon	weather/find	I-location
9	bay	weather/find	I-location

tabulate(trainset[1000], tablefmt='html')

1	remind	reminder/set_reminder	O
2	me	reminder/set_reminder	O
3	about	reminder/set_reminder	O
4	game	reminder/set_reminder	B-reminder/todo
5	night	reminder/set_reminder	I-reminder/todo
6	on	reminder/set_reminder	B-datetime
7	friday	reminder/set_reminder	I-datetime
8	at	reminder/set_reminder	I-datetime
9	4pm	reminder/set_reminder	I-datetime

tabulate(trainset[2000], tablefmt='html')

1	set	alarm/set_alarm	O
2	alarm	alarm/set_alarm	O
3	for	alarm/set_alarm	B-datetime
4	20	alarm/set_alarm	I-datetime
5	minutes	alarm/set_alarm	I-datetime

Na potrzeby prezentacji procesu uczenia w jupyterowym notatniku zawęzimy zbiór danych do początkowych przykładów.

trainset = trainset[:300]
testset = testset[:300]

Budując model skorzystamy z architektury opartej o rekurencyjne sieci neuronowe zaimplementowanej w bibliotece flair (Akbik i in. 2018).

from flair.data import Corpus, Sentence, Token
from flair.datasets import FlairDatapointDataset
from flair.embeddings import StackedEmbeddings
from flair.embeddings import WordEmbeddings
from flair.embeddings import CharacterEmbeddings
from flair.embeddings import FlairEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer

# determinizacja obliczeń
import random
import torch
random.seed(42)
torch.manual_seed(42)

if torch.cuda.is_available():
    torch.cuda.manual_seed(0)
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.enabled = False
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

c:\Users\macty\AppData\Local\Programs\Python\Python311\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Dane skonwertujemy do formatu wykorzystywanego przez flair, korzystając z następującej funkcji.

def conllu2flair(sentences, label=None):
    fsentences = []

    for sentence in sentences:
        fsentence = Sentence(' '.join(token['form'] for token in sentence), use_tokenizer=False)
        start_idx = None
        end_idx = None
        tag = None

        if label:
            for idx, (token, ftoken) in enumerate(zip(sentence, fsentence)):
                if token[label].startswith('B-'):
                    start_idx = idx
                    end_idx = idx
                    tag = token[label][2:]
                elif token[label].startswith('I-'):
                    end_idx = idx
                elif token[label] == 'O':
                    if start_idx is not None:
                        fsentence[start_idx:end_idx+1].add_label(label, tag)
                        start_idx = None
                        end_idx = None
                        tag = None

            if start_idx is not None:
                fsentence[start_idx:end_idx+1].add_label(label, tag)

        fsentences.append(fsentence)

    return FlairDatapointDataset(fsentences)

corpus = Corpus(train=conllu2flair(trainset, 'slot'), test=conllu2flair(testset, 'slot'))
print(corpus)
tag_dictionary = corpus.make_label_dictionary(label_type='slot')
print(tag_dictionary)

Corpus: 270 train + 30 dev + 300 test sentences
2023-04-16 23:32:52,141 Computing label dictionary. Progress:

270it [00:00, 53998.76it/s]

2023-04-16 23:32:52,151 Dictionary created for label 'slot' with 5 values: datetime (seen 174 times), weather/attribute (seen 77 times), weather/noun (seen 66 times), location (seen 44 times)
Dictionary with 5 tags: <unk>, datetime, weather/attribute, weather/noun, location

Nasz model będzie wykorzystywał wektorowe reprezentacje słów (zob. Word Embeddings).

embedding_types = [
    WordEmbeddings('en'),
    FlairEmbeddings('en-forward'),
    FlairEmbeddings('en-backward'),
    CharacterEmbeddings(),
]

embeddings = StackedEmbeddings(embeddings=embedding_types)
tagger = SequenceTagger(hidden_size=256, embeddings=embeddings,
                        tag_dictionary=tag_dictionary,
                        tag_type='slot', use_crf=True)

2023-04-16 23:32:52,457 https://flair.informatik.hu-berlin.de/resources/embeddings/token/en-fasttext-news-300d-1M.vectors.npy not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpxx1l309v

100%|██████████| 1.12G/1.12G [01:03<00:00, 18.8MB/s]

2023-04-16 23:33:56,277 copying C:\Users\macty\AppData\Local\Temp\tmpxx1l309v to cache at C:\Users\macty\.flair\embeddings\en-fasttext-news-300d-1M.vectors.npy

2023-04-16 23:33:57,359 removing temp file C:\Users\macty\AppData\Local\Temp\tmpxx1l309v
2023-04-16 23:33:58,068 https://flair.informatik.hu-berlin.de/resources/embeddings/token/en-fasttext-news-300d-1M not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9

100%|██████████| 52.1M/52.1M [00:02<00:00, 18.9MB/s]

2023-04-16 23:34:01,079 copying C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9 to cache at C:\Users\macty\.flair\embeddings\en-fasttext-news-300d-1M
2023-04-16 23:34:01,116 removing temp file C:\Users\macty\AppData\Local\Temp\tmpnqnow7h9

2023-04-16 23:34:09,948 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-forward-0.4.1.pt not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn

100%|██████████| 69.7M/69.7M [00:03<00:00, 21.2MB/s]

2023-04-16 23:34:13,532 copying C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn to cache at C:\Users\macty\.flair\embeddings\news-forward-0.4.1.pt
2023-04-16 23:34:13,579 removing temp file C:\Users\macty\AppData\Local\Temp\tmpf2iq2swn

2023-04-16 23:34:14,221 https://flair.informatik.hu-berlin.de/resources/embeddings/flair/news-backward-0.4.1.pt not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr

100%|██████████| 69.7M/69.7M [00:03<00:00, 22.5MB/s]

2023-04-16 23:34:17,584 copying C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr to cache at C:\Users\macty\.flair\embeddings\news-backward-0.4.1.pt
2023-04-16 23:34:17,631 removing temp file C:\Users\macty\AppData\Local\Temp\tmp1i_pq3dr

2023-04-16 23:34:18,016 https://flair.informatik.hu-berlin.de/resources/characters/common_characters not found in cache, downloading to C:\Users\macty\AppData\Local\Temp\tmpqyh0003q

100%|██████████| 2.82k/2.82k [00:00<00:00, 1.44MB/s]

2023-04-16 23:34:18,139 copying C:\Users\macty\AppData\Local\Temp\tmpqyh0003q to cache at C:\Users\macty\.flair\datasets\common_characters
2023-04-16 23:34:18,145 removing temp file C:\Users\macty\AppData\Local\Temp\tmpqyh0003q
2023-04-16 23:34:18,147 SequenceTagger predicts: Dictionary with 17 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location

Zobaczmy jak wygląda architektura sieci neuronowej, która będzie odpowiedzialna za przewidywanie slotów w wypowiedziach.

print(tagger)

SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'en'
      (embedding): Embedding(1000001, 300)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_3): CharacterEmbeddings(
      (char_embedding): Embedding(275, 25)
      (char_rnn): LSTM(25, 25, bidirectional=True)
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True)
  (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)

Wykonamy dziesięć iteracji (epok) uczenia a wynikowy model zapiszemy w katalogu slot-model.

trainer = ModelTrainer(tagger, corpus)
trainer.train('slot-model',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=10,
              train_with_dev=False)

2023-04-16 23:34:18,357 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,358 Model: "SequenceTagger(
  (embeddings): StackedEmbeddings(
    (list_embedding_0): WordEmbeddings(
      'en'
      (embedding): Embedding(1000001, 300)
    )
    (list_embedding_1): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_2): FlairEmbeddings(
      (lm): LanguageModel(
        (drop): Dropout(p=0.05, inplace=False)
        (encoder): Embedding(300, 100)
        (rnn): LSTM(100, 2048)
      )
    )
    (list_embedding_3): CharacterEmbeddings(
      (char_embedding): Embedding(275, 25)
      (char_rnn): LSTM(25, 25, bidirectional=True)
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=4446, out_features=4446, bias=True)
  (rnn): LSTM(4446, 256, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=512, out_features=19, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2023-04-16 23:34:18,359 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,360 Corpus: "Corpus: 270 train + 30 dev + 300 test sentences"
2023-04-16 23:34:18,360 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,361 Parameters:
2023-04-16 23:34:18,361  - learning_rate: "0.100000"
2023-04-16 23:34:18,361  - mini_batch_size: "32"
2023-04-16 23:34:18,362  - patience: "3"
2023-04-16 23:34:18,363  - anneal_factor: "0.5"
2023-04-16 23:34:18,363  - max_epochs: "10"
2023-04-16 23:34:18,363  - shuffle: "True"
2023-04-16 23:34:18,364  - train_with_dev: "False"
2023-04-16 23:34:18,364  - batch_growth_annealing: "False"
2023-04-16 23:34:18,365 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,365 Model training base path: "slot-model"
2023-04-16 23:34:18,366 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,367 Device: cpu
2023-04-16 23:34:18,367 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:18,368 Embeddings storage mode: cpu
2023-04-16 23:34:18,368 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:20,279 epoch 1 - iter 1/9 - loss 3.46586463 - time (sec): 1.91 - samples/sec: 152.88 - lr: 0.100000
2023-04-16 23:34:22,063 epoch 1 - iter 2/9 - loss 3.09300251 - time (sec): 3.69 - samples/sec: 153.49 - lr: 0.100000
2023-04-16 23:34:23,066 epoch 1 - iter 3/9 - loss 3.09930113 - time (sec): 4.70 - samples/sec: 154.78 - lr: 0.100000
2023-04-16 23:34:24,090 epoch 1 - iter 4/9 - loss 2.95447677 - time (sec): 5.72 - samples/sec: 155.39 - lr: 0.100000
2023-04-16 23:34:24,981 epoch 1 - iter 5/9 - loss 2.81596711 - time (sec): 6.61 - samples/sec: 157.92 - lr: 0.100000
2023-04-16 23:34:25,968 epoch 1 - iter 6/9 - loss 2.61288632 - time (sec): 7.60 - samples/sec: 156.47 - lr: 0.100000
2023-04-16 23:34:26,968 epoch 1 - iter 7/9 - loss 2.37758500 - time (sec): 8.60 - samples/sec: 153.85 - lr: 0.100000
2023-04-16 23:34:27,959 epoch 1 - iter 8/9 - loss 2.29585029 - time (sec): 9.59 - samples/sec: 153.81 - lr: 0.100000
2023-04-16 23:34:28,676 epoch 1 - iter 9/9 - loss 2.27926901 - time (sec): 10.31 - samples/sec: 150.87 - lr: 0.100000
2023-04-16 23:34:28,678 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:28,678 EPOCH 1 done: loss 2.2793 - lr 0.100000

100%|██████████| 1/1 [00:01<00:00,  1.12s/it]

2023-04-16 23:34:29,797 Evaluating as a multi-label problem: False
2023-04-16 23:34:29,806 DEV : loss 1.2717913389205933 - f1-score (micro avg)  0.0357
2023-04-16 23:34:29,808 BAD EPOCHS (no improvement): 0
2023-04-16 23:34:29,809 saving best model

2023-04-16 23:34:41,288 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:42,050 epoch 2 - iter 1/9 - loss 1.43326204 - time (sec): 0.76 - samples/sec: 241.47 - lr: 0.100000
2023-04-16 23:34:42,846 epoch 2 - iter 2/9 - loss 1.40312603 - time (sec): 1.56 - samples/sec: 234.92 - lr: 0.100000
2023-04-16 23:34:43,571 epoch 2 - iter 3/9 - loss 1.39219711 - time (sec): 2.28 - samples/sec: 235.65 - lr: 0.100000
2023-04-16 23:34:44,338 epoch 2 - iter 4/9 - loss 1.38006975 - time (sec): 3.05 - samples/sec: 232.13 - lr: 0.100000
2023-04-16 23:34:45,066 epoch 2 - iter 5/9 - loss 1.34341906 - time (sec): 3.78 - samples/sec: 232.93 - lr: 0.100000
2023-04-16 23:34:45,850 epoch 2 - iter 6/9 - loss 1.31603716 - time (sec): 4.56 - samples/sec: 234.55 - lr: 0.100000
2023-04-16 23:34:46,612 epoch 2 - iter 7/9 - loss 1.28879818 - time (sec): 5.32 - samples/sec: 237.23 - lr: 0.100000
2023-04-16 23:34:47,400 epoch 2 - iter 8/9 - loss 1.27353642 - time (sec): 6.11 - samples/sec: 238.71 - lr: 0.100000
2023-04-16 23:34:47,821 epoch 2 - iter 9/9 - loss 1.27439781 - time (sec): 6.53 - samples/sec: 238.02 - lr: 0.100000
2023-04-16 23:34:47,823 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:47,823 EPOCH 2 done: loss 1.2744 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  4.76it/s]

2023-04-16 23:34:48,036 Evaluating as a multi-label problem: False
2023-04-16 23:34:48,044 DEV : loss 0.7916695475578308 - f1-score (micro avg)  0.0
2023-04-16 23:34:48,046 BAD EPOCHS (no improvement): 1
2023-04-16 23:34:48,047 ----------------------------------------------------------------------------------------------------

2023-04-16 23:34:48,750 epoch 3 - iter 1/9 - loss 0.95985486 - time (sec): 0.70 - samples/sec: 266.38 - lr: 0.100000
2023-04-16 23:34:49,421 epoch 3 - iter 2/9 - loss 0.93650848 - time (sec): 1.37 - samples/sec: 258.56 - lr: 0.100000
2023-04-16 23:34:50,208 epoch 3 - iter 3/9 - loss 0.91011594 - time (sec): 2.16 - samples/sec: 249.54 - lr: 0.100000
2023-04-16 23:34:50,909 epoch 3 - iter 4/9 - loss 0.89886124 - time (sec): 2.86 - samples/sec: 249.21 - lr: 0.100000
2023-04-16 23:34:51,580 epoch 3 - iter 5/9 - loss 0.87307849 - time (sec): 3.53 - samples/sec: 251.48 - lr: 0.100000
2023-04-16 23:34:52,375 epoch 3 - iter 6/9 - loss 0.85130603 - time (sec): 4.33 - samples/sec: 251.67 - lr: 0.100000
2023-04-16 23:34:53,161 epoch 3 - iter 7/9 - loss 0.83100431 - time (sec): 5.11 - samples/sec: 249.75 - lr: 0.100000
2023-04-16 23:34:53,966 epoch 3 - iter 8/9 - loss 0.79737653 - time (sec): 5.92 - samples/sec: 248.39 - lr: 0.100000
2023-04-16 23:34:54,372 epoch 3 - iter 9/9 - loss 0.78005277 - time (sec): 6.32 - samples/sec: 245.89 - lr: 0.100000
2023-04-16 23:34:54,373 ----------------------------------------------------------------------------------------------------
2023-04-16 23:34:54,374 EPOCH 3 done: loss 0.7801 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  4.50it/s]

2023-04-16 23:34:54,600 Evaluating as a multi-label problem: False
2023-04-16 23:34:54,608 DEV : loss 0.33898717164993286 - f1-score (micro avg)  0.7761
2023-04-16 23:34:54,610 BAD EPOCHS (no improvement): 0
2023-04-16 23:34:54,611 saving best model

2023-04-16 23:35:08,824 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:09,623 epoch 4 - iter 1/9 - loss 0.41515362 - time (sec): 0.80 - samples/sec: 244.67 - lr: 0.100000
2023-04-16 23:35:10,428 epoch 4 - iter 2/9 - loss 0.51239108 - time (sec): 1.60 - samples/sec: 243.44 - lr: 0.100000
2023-04-16 23:35:11,205 epoch 4 - iter 3/9 - loss 0.45658054 - time (sec): 2.38 - samples/sec: 232.97 - lr: 0.100000
2023-04-16 23:35:11,983 epoch 4 - iter 4/9 - loss 0.45578642 - time (sec): 3.16 - samples/sec: 231.55 - lr: 0.100000
2023-04-16 23:35:12,730 epoch 4 - iter 5/9 - loss 0.44140354 - time (sec): 3.90 - samples/sec: 236.42 - lr: 0.100000
2023-04-16 23:35:13,509 epoch 4 - iter 6/9 - loss 0.42847639 - time (sec): 4.68 - samples/sec: 235.10 - lr: 0.100000
2023-04-16 23:35:14,254 epoch 4 - iter 7/9 - loss 0.42680964 - time (sec): 5.43 - samples/sec: 238.02 - lr: 0.100000
2023-04-16 23:35:15,046 epoch 4 - iter 8/9 - loss 0.41317872 - time (sec): 6.22 - samples/sec: 238.91 - lr: 0.100000
2023-04-16 23:35:15,362 epoch 4 - iter 9/9 - loss 0.40072542 - time (sec): 6.54 - samples/sec: 237.91 - lr: 0.100000
2023-04-16 23:35:15,364 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:15,364 EPOCH 4 done: loss 0.4007 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  4.63it/s]

2023-04-16 23:35:15,583 Evaluating as a multi-label problem: False
2023-04-16 23:35:15,591 DEV : loss 0.16208426654338837 - f1-score (micro avg)  0.9231
2023-04-16 23:35:15,593 BAD EPOCHS (no improvement): 0
2023-04-16 23:35:15,595 saving best model

2023-04-16 23:35:26,934 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:27,683 epoch 5 - iter 1/9 - loss 0.23677549 - time (sec): 0.75 - samples/sec: 243.64 - lr: 0.100000
2023-04-16 23:35:28,430 epoch 5 - iter 2/9 - loss 0.21526806 - time (sec): 1.49 - samples/sec: 236.28 - lr: 0.100000
2023-04-16 23:35:29,218 epoch 5 - iter 3/9 - loss 0.18914680 - time (sec): 2.28 - samples/sec: 232.25 - lr: 0.100000
2023-04-16 23:35:30,001 epoch 5 - iter 4/9 - loss 0.21946464 - time (sec): 3.06 - samples/sec: 235.24 - lr: 0.100000
2023-04-16 23:35:30,749 epoch 5 - iter 5/9 - loss 0.20726706 - time (sec): 3.81 - samples/sec: 238.39 - lr: 0.100000
2023-04-16 23:35:31,440 epoch 5 - iter 6/9 - loss 0.20088127 - time (sec): 4.50 - samples/sec: 242.67 - lr: 0.100000
2023-04-16 23:35:32,199 epoch 5 - iter 7/9 - loss 0.21393993 - time (sec): 5.26 - samples/sec: 242.64 - lr: 0.100000
2023-04-16 23:35:32,995 epoch 5 - iter 8/9 - loss 0.20896170 - time (sec): 6.06 - samples/sec: 241.62 - lr: 0.100000
2023-04-16 23:35:33,438 epoch 5 - iter 9/9 - loss 0.22090499 - time (sec): 6.50 - samples/sec: 239.16 - lr: 0.100000
2023-04-16 23:35:33,440 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:33,440 EPOCH 5 done: loss 0.2209 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  4.67it/s]

2023-04-16 23:35:33,657 Evaluating as a multi-label problem: False
2023-04-16 23:35:33,664 DEV : loss 0.07899065315723419 - f1-score (micro avg)  0.9706
2023-04-16 23:35:33,667 BAD EPOCHS (no improvement): 0
2023-04-16 23:35:33,668 saving best model

2023-04-16 23:35:47,995 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:48,716 epoch 6 - iter 1/9 - loss 0.08489894 - time (sec): 0.72 - samples/sec: 265.65 - lr: 0.100000
2023-04-16 23:35:49,474 epoch 6 - iter 2/9 - loss 0.16483877 - time (sec): 1.48 - samples/sec: 262.02 - lr: 0.100000
2023-04-16 23:35:50,229 epoch 6 - iter 3/9 - loss 0.17961087 - time (sec): 2.23 - samples/sec: 261.20 - lr: 0.100000
2023-04-16 23:35:50,883 epoch 6 - iter 4/9 - loss 0.17008273 - time (sec): 2.89 - samples/sec: 266.11 - lr: 0.100000
2023-04-16 23:35:51,580 epoch 6 - iter 5/9 - loss 0.16165644 - time (sec): 3.58 - samples/sec: 259.35 - lr: 0.100000
2023-04-16 23:35:52,303 epoch 6 - iter 6/9 - loss 0.16796368 - time (sec): 4.31 - samples/sec: 257.37 - lr: 0.100000
2023-04-16 23:35:52,971 epoch 6 - iter 7/9 - loss 0.15208281 - time (sec): 4.97 - samples/sec: 254.12 - lr: 0.100000
2023-04-16 23:35:53,698 epoch 6 - iter 8/9 - loss 0.14079077 - time (sec): 5.70 - samples/sec: 255.74 - lr: 0.100000
2023-04-16 23:35:54,094 epoch 6 - iter 9/9 - loss 0.14176936 - time (sec): 6.10 - samples/sec: 255.04 - lr: 0.100000
2023-04-16 23:35:54,095 ----------------------------------------------------------------------------------------------------
2023-04-16 23:35:54,096 EPOCH 6 done: loss 0.1418 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  5.41it/s]

2023-04-16 23:35:54,284 Evaluating as a multi-label problem: False
2023-04-16 23:35:54,292 DEV : loss 0.05525948479771614 - f1-score (micro avg)  0.9706
2023-04-16 23:35:54,294 BAD EPOCHS (no improvement): 0
2023-04-16 23:35:54,295 ----------------------------------------------------------------------------------------------------

2023-04-16 23:35:55,007 epoch 7 - iter 1/9 - loss 0.06206851 - time (sec): 0.71 - samples/sec: 262.64 - lr: 0.100000
2023-04-16 23:35:55,659 epoch 7 - iter 2/9 - loss 0.07954220 - time (sec): 1.36 - samples/sec: 257.33 - lr: 0.100000
2023-04-16 23:35:56,426 epoch 7 - iter 3/9 - loss 0.07803471 - time (sec): 2.13 - samples/sec: 248.71 - lr: 0.100000
2023-04-16 23:35:57,138 epoch 7 - iter 4/9 - loss 0.09479619 - time (sec): 2.84 - samples/sec: 247.27 - lr: 0.100000
2023-04-16 23:35:57,921 epoch 7 - iter 5/9 - loss 0.12208708 - time (sec): 3.63 - samples/sec: 251.03 - lr: 0.100000
2023-04-16 23:35:58,664 epoch 7 - iter 6/9 - loss 0.11140702 - time (sec): 4.37 - samples/sec: 251.09 - lr: 0.100000
2023-04-16 23:35:59,452 epoch 7 - iter 7/9 - loss 0.10265536 - time (sec): 5.16 - samples/sec: 250.34 - lr: 0.100000
2023-04-16 23:36:00,146 epoch 7 - iter 8/9 - loss 0.09764915 - time (sec): 5.85 - samples/sec: 252.61 - lr: 0.100000
2023-04-16 23:36:00,536 epoch 7 - iter 9/9 - loss 0.09412069 - time (sec): 6.24 - samples/sec: 249.16 - lr: 0.100000
2023-04-16 23:36:00,537 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:00,538 EPOCH 7 done: loss 0.0941 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  4.76it/s]

2023-04-16 23:36:00,751 Evaluating as a multi-label problem: False
2023-04-16 23:36:00,759 DEV : loss 0.04532366245985031 - f1-score (micro avg)  0.9706
2023-04-16 23:36:00,761 BAD EPOCHS (no improvement): 0
2023-04-16 23:36:00,762 ----------------------------------------------------------------------------------------------------

2023-04-16 23:36:01,551 epoch 8 - iter 1/9 - loss 0.03479994 - time (sec): 0.79 - samples/sec: 234.47 - lr: 0.100000
2023-04-16 23:36:02,299 epoch 8 - iter 2/9 - loss 0.03457490 - time (sec): 1.54 - samples/sec: 235.52 - lr: 0.100000
2023-04-16 23:36:03,135 epoch 8 - iter 3/9 - loss 0.02756396 - time (sec): 2.37 - samples/sec: 232.19 - lr: 0.100000
2023-04-16 23:36:03,831 epoch 8 - iter 4/9 - loss 0.03832822 - time (sec): 3.07 - samples/sec: 235.58 - lr: 0.100000
2023-04-16 23:36:04,675 epoch 8 - iter 5/9 - loss 0.05709582 - time (sec): 3.91 - samples/sec: 236.64 - lr: 0.100000
2023-04-16 23:36:05,470 epoch 8 - iter 6/9 - loss 0.07284620 - time (sec): 4.71 - samples/sec: 237.68 - lr: 0.100000
2023-04-16 23:36:06,242 epoch 8 - iter 7/9 - loss 0.09056851 - time (sec): 5.48 - samples/sec: 238.50 - lr: 0.100000
2023-04-16 23:36:06,934 epoch 8 - iter 8/9 - loss 0.08531148 - time (sec): 6.17 - samples/sec: 238.50 - lr: 0.100000
2023-04-16 23:36:07,344 epoch 8 - iter 9/9 - loss 0.08690437 - time (sec): 6.58 - samples/sec: 236.25 - lr: 0.100000
2023-04-16 23:36:07,345 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:07,346 EPOCH 8 done: loss 0.0869 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  5.03it/s]

2023-04-16 23:36:07,548 Evaluating as a multi-label problem: False
2023-04-16 23:36:07,556 DEV : loss 0.024222934618592262 - f1-score (micro avg)  0.9706
2023-04-16 23:36:07,558 BAD EPOCHS (no improvement): 0
2023-04-16 23:36:07,559 ----------------------------------------------------------------------------------------------------

2023-04-16 23:36:08,281 epoch 9 - iter 1/9 - loss 0.04802091 - time (sec): 0.72 - samples/sec: 239.94 - lr: 0.100000
2023-04-16 23:36:09,019 epoch 9 - iter 2/9 - loss 0.04962141 - time (sec): 1.46 - samples/sec: 241.95 - lr: 0.100000
2023-04-16 23:36:09,715 epoch 9 - iter 3/9 - loss 0.04733611 - time (sec): 2.16 - samples/sec: 258.47 - lr: 0.100000
2023-04-16 23:36:10,377 epoch 9 - iter 4/9 - loss 0.05639010 - time (sec): 2.82 - samples/sec: 258.08 - lr: 0.100000
2023-04-16 23:36:11,111 epoch 9 - iter 5/9 - loss 0.11318641 - time (sec): 3.55 - samples/sec: 263.87 - lr: 0.100000
2023-04-16 23:36:11,812 epoch 9 - iter 6/9 - loss 0.10583430 - time (sec): 4.25 - samples/sec: 260.58 - lr: 0.100000
2023-04-16 23:36:12,486 epoch 9 - iter 7/9 - loss 0.09734085 - time (sec): 4.93 - samples/sec: 262.89 - lr: 0.100000
2023-04-16 23:36:13,213 epoch 9 - iter 8/9 - loss 0.09001355 - time (sec): 5.65 - samples/sec: 259.68 - lr: 0.100000
2023-04-16 23:36:13,582 epoch 9 - iter 9/9 - loss 0.08649074 - time (sec): 6.02 - samples/sec: 258.22 - lr: 0.100000
2023-04-16 23:36:13,583 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:13,584 EPOCH 9 done: loss 0.0865 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  5.41it/s]

2023-04-16 23:36:13,772 Evaluating as a multi-label problem: False
2023-04-16 23:36:13,779 DEV : loss 0.0387137271463871 - f1-score (micro avg)  0.9706
2023-04-16 23:36:13,781 BAD EPOCHS (no improvement): 1
2023-04-16 23:36:13,782 ----------------------------------------------------------------------------------------------------

2023-04-16 23:36:14,486 epoch 10 - iter 1/9 - loss 0.01475051 - time (sec): 0.70 - samples/sec: 251.42 - lr: 0.100000
2023-04-16 23:36:15,220 epoch 10 - iter 2/9 - loss 0.02797203 - time (sec): 1.44 - samples/sec: 246.17 - lr: 0.100000
2023-04-16 23:36:15,881 epoch 10 - iter 3/9 - loss 0.03366118 - time (sec): 2.10 - samples/sec: 250.12 - lr: 0.100000
2023-04-16 23:36:16,583 epoch 10 - iter 4/9 - loss 0.02913842 - time (sec): 2.80 - samples/sec: 256.33 - lr: 0.100000
2023-04-16 23:36:17,329 epoch 10 - iter 5/9 - loss 0.02953250 - time (sec): 3.55 - samples/sec: 258.53 - lr: 0.100000
2023-04-16 23:36:18,035 epoch 10 - iter 6/9 - loss 0.04606061 - time (sec): 4.25 - samples/sec: 259.82 - lr: 0.100000
2023-04-16 23:36:18,729 epoch 10 - iter 7/9 - loss 0.04505843 - time (sec): 4.95 - samples/sec: 260.56 - lr: 0.100000
2023-04-16 23:36:19,462 epoch 10 - iter 8/9 - loss 0.05644751 - time (sec): 5.68 - samples/sec: 258.63 - lr: 0.100000
2023-04-16 23:36:19,893 epoch 10 - iter 9/9 - loss 0.06249184 - time (sec): 6.11 - samples/sec: 254.46 - lr: 0.100000
2023-04-16 23:36:19,895 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:19,895 EPOCH 10 done: loss 0.0625 - lr 0.100000

100%|██████████| 1/1 [00:00<00:00,  5.43it/s]

2023-04-16 23:36:20,082 Evaluating as a multi-label problem: False
2023-04-16 23:36:20,089 DEV : loss 0.03223632648587227 - f1-score (micro avg)  0.9275
2023-04-16 23:36:20,091 BAD EPOCHS (no improvement): 2

2023-04-16 23:36:31,813 ----------------------------------------------------------------------------------------------------
2023-04-16 23:36:35,403 SequenceTagger predicts: Dictionary with 19 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location, <START>, <STOP>

100%|██████████| 10/10 [00:13<00:00,  1.40s/it]

2023-04-16 23:36:50,601 Evaluating as a multi-label problem: False
2023-04-16 23:36:50,612 0.2111	0.1989	0.2048	0.1176
2023-04-16 23:36:50,613 
Results:
- F-score (micro) 0.2048
- F-score (macro) 0.1106
- Accuracy 0.1176

By class:
                           precision    recall  f1-score   support

                 datetime     0.2136    0.2327    0.2227       202
             weather/noun     0.2184    0.5588    0.3140        34
            reminder/todo     0.0000    0.0000    0.0000        46
            reminder/noun     0.0000    0.0000    0.0000        42
        weather/attribute     0.1765    0.1667    0.1714        18
                 location     0.1765    0.1765    0.1765        17
reminder/recurring_period     0.0000    0.0000    0.0000         2
                 negation     0.0000    0.0000    0.0000         1

                micro avg     0.2111    0.1989    0.2048       362
                macro avg     0.0981    0.1418    0.1106       362
             weighted avg     0.1568    0.1989    0.1706       362

2023-04-16 23:36:50,613 ----------------------------------------------------------------------------------------------------

{'test_score': 0.20483641536273117,
 'dev_score_history': [0.03571428571428571,
  0.0,
  0.7761194029850745,
  0.923076923076923,
  0.9705882352941176,
  0.9705882352941176,
  0.9705882352941176,
  0.9705882352941176,
  0.9705882352941176,
  0.9275362318840579],
 'train_loss_history': [2.279269006857918,
  1.2743978126256028,
  0.7800527689157958,
  0.40072541558857516,
  0.22090499240102493,
  0.14176936011605706,
  0.09412068554059486,
  0.08690436752662781,
  0.08649074149668408,
  0.06249183581189711],
 'dev_loss_history': [1.2717913389205933,
  0.7916695475578308,
  0.33898717164993286,
  0.16208426654338837,
  0.07899065315723419,
  0.05525948479771614,
  0.04532366245985031,
  0.024222934618592262,
  0.0387137271463871,
  0.03223632648587227]}

Jakość wyuczonego modelu możemy ocenić, korzystając z zaraportowanych powyżej metryk, tj.:

_tp (true positives)

liczba słów oznaczonych w zbiorze testowym etykietą $e$, które model oznaczył tą etykietą
_fp (false positives)

liczba słów nieoznaczonych w zbiorze testowym etykietą $e$, które model oznaczył tą etykietą
_fn (false negatives)

liczba słów oznaczonych w zbiorze testowym etykietą $e$, którym model nie nadał etykiety $e$
_precision

$$\frac{tp}{tp + fp}$$
_recall

$$\frac{tp}{tp + fn}$$
$F_1$

$$\frac{2 \cdot precision \cdot recall}{precision + recall}$$
_micro $F_1$

$F_1$ w którym $tp$, $fp$ i $fn$ są liczone łącznie dla wszystkich etykiet, tj. $tp = \sum_{e}{{tp}_e}$, $fn = \sum{e}{{fn}e}$, $fp = \sum{e}{{fp}_e}$
_macro $F_1$

średnia arytmetyczna z $F_1$ obliczonych dla poszczególnych etykiet z osobna.

Wyuczony model możemy wczytać z pliku korzystając z metody load.

model = SequenceTagger.load('slot-model/final-model.pt')

2023-04-16 23:37:00,272 SequenceTagger predicts: Dictionary with 19 tags: O, S-datetime, B-datetime, E-datetime, I-datetime, S-weather/attribute, B-weather/attribute, E-weather/attribute, I-weather/attribute, S-weather/noun, B-weather/noun, E-weather/noun, I-weather/noun, S-location, B-location, E-location, I-location, <START>, <STOP>

Wczytany model możemy wykorzystać do przewidywania slotów w wypowiedziach użytkownika, korzystając z przedstawionej poniżej funkcji predict.

def predict(model, sentence):
    csentence = [{'form': word, 'slot': 'O'} for word in sentence]
    fsentence = conllu2flair([csentence])[0]
    model.predict(fsentence)

    for span in fsentence.get_spans('slot'):
        tag =  span.get_label('slot').value
        csentence[span.tokens[0].idx - 1]['slot'] = f'B-{tag}'

        for token in span.tokens[1:]:
            csentence[token.idx - 1]['slot'] = f'I-{tag}'

    return csentence

tabulate(predict(model, 'set alarm for 20 minutes'.split()), tablefmt='html')

set	O
alarm	O
for	B-datetime
20	B-datetime
minutes	I-datetime

tabulate(predict(model, 'change my 3 pm alarm to the next day'.split()), tablefmt='html')

change	O
my	O
3	O
pm	O
alarm	O
to	O
the	O
next	O
day	B-weather/noun

Literatura

Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis, Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog. NAACL-HLT (1) 2019, pp. 3795-3805
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML '01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 282–289, https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (November 15, 1997), 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, Attention is All you Need, NIPS 2017, pp. 5998-6008, https://arxiv.org/abs/1706.03762
Alan Akbik, Duncan Blythe, Roland Vollgraf, Contextual String Embeddings for Sequence Labeling, Proceedings of the 27th International Conference on Computational Linguistics, pp. 1638–1649, https://www.aclweb.org/anthology/C18-1139.pdf

61 KiB Raw Blame History Unescape Escape