ugb/3_RNN.ipynb
Paweł Skórzewski ccd30390b3 3_RNN.ipynb
2024-05-10 14:56:49 +02:00

929 lines
24 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uczenie głębokie przetwarzanie tekstu laboratoria\n",
"# 3. RNN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Podejście softmax z embeddingami na przykładzie NER"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Defaulting to user installation because normal site-packages is not writeable\n",
"Requirement already satisfied: torch in /home/pawel/.local/lib/python3.10/site-packages (2.3.0)\n",
"Collecting torchtext\n",
" Downloading torchtext-0.18.0-cp310-cp310-manylinux1_x86_64.whl (2.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m9.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /home/pawel/.local/lib/python3.10/site-packages (from torch) (3.13.1)\n",
"Requirement already satisfied: fsspec in /home/pawel/.local/lib/python3.10/site-packages (from torch) (2024.2.0)\n",
"Requirement already satisfied: sympy in /home/pawel/.local/lib/python3.10/site-packages (from torch) (1.12)\n",
"Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.105)\n",
"Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.105)\n",
"Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.105)\n",
"Requirement already satisfied: typing-extensions>=4.8.0 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (4.10.0)\n",
"Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (8.9.2.26)\n",
"Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (10.3.2.106)\n",
"Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.0.106)\n",
"Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.105)\n",
"Requirement already satisfied: triton==2.3.0 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (2.3.0)\n",
"Requirement already satisfied: jinja2 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (3.1.3)\n",
"Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (2.20.5)\n",
"Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (12.1.3.1)\n",
"Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (11.0.2.54)\n",
"Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /home/pawel/.local/lib/python3.10/site-packages (from torch) (11.4.5.107)\n",
"Requirement already satisfied: networkx in /home/pawel/.local/lib/python3.10/site-packages (from torch) (3.3)\n",
"Requirement already satisfied: nvidia-nvjitlink-cu12 in /home/pawel/.local/lib/python3.10/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch) (12.4.127)\n",
"Requirement already satisfied: requests in /home/pawel/.local/lib/python3.10/site-packages (from torchtext) (2.31.0)\n",
"Requirement already satisfied: numpy in /home/pawel/.local/lib/python3.10/site-packages (from torchtext) (1.26.4)\n",
"Requirement already satisfied: tqdm in /home/pawel/.local/lib/python3.10/site-packages (from torchtext) (4.66.2)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /home/pawel/.local/lib/python3.10/site-packages (from jinja2->torch) (2.1.5)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /home/pawel/.local/lib/python3.10/site-packages (from requests->torchtext) (2024.2.2)\n",
"Requirement already satisfied: idna<4,>=2.5 in /home/pawel/.local/lib/python3.10/site-packages (from requests->torchtext) (3.6)\n",
"Requirement already satisfied: charset-normalizer<4,>=2 in /home/pawel/.local/lib/python3.10/site-packages (from requests->torchtext) (3.3.2)\n",
"Requirement already satisfied: urllib3<3,>=1.21.1 in /home/pawel/.local/lib/python3.10/site-packages (from requests->torchtext) (2.2.1)\n",
"Requirement already satisfied: mpmath>=0.19 in /home/pawel/.local/lib/python3.10/site-packages (from sympy->torch) (1.3.0)\n",
"Installing collected packages: torchtext\n",
"Successfully installed torchtext-0.18.0\n"
]
}
],
"source": [
"!pip install torch torchtext"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter\n",
"\n",
"import torch\n",
"from datasets import load_dataset\n",
"from torchtext.vocab import vocab\n",
"from tqdm.notebook import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wczytujemy zbiór danych `conll2003` (https://huggingface.co/datasets/conll2003), który zawiera teksty oznaczone znacznikami części mowy (*POS tags*): "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"dataset = load_dataset(\"conll2003\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Poiżej funkcja, która tworzy słownik (https://pytorch.org/text/stable/vocab.html).\n",
"\n",
"Parametr `special` określa symbole specjalne:\n",
"* `<unk>` nieznany token\n",
"* `<pad>` wypełnienie\n",
"* `<bos>` początek zdania\n",
"* `<eos>` koniec zdania"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"def build_vocab(dataset):\n",
" counter = Counter()\n",
" for document in dataset:\n",
" counter.update(document)\n",
" return vocab(counter, specials=['<unk>', '<pad>', '<bos>', '<eos>'])"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"v = build_vocab(dataset['train']['tokens'])"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"itos = v.get_itos() # mapowanie indeksów na tokeny"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"23627"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(itos) # liczba różnych tokenów w słowniku"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"21"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v['on'] # indeks tokenu `on`"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"v[\"<unk>\"] # indeks nieznanego tokenu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"W przypadku, gdy w analizowanym tekście znajdzie się token, którego nie ma w słowniku, będzie reprezentowany przez indeks domyślny (*default index*). Ustawiamy, żeby był taki sam, jak indeks „nieznanego tokenu”:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"v.set_default_index(v[\"<unk>\"])"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"def data_process(dt):\n",
" # Wektoryzacja dokumentów tekstowych.\n",
" return [ torch.tensor([v['<bos>']] +[v[token] for token in document ] + [v['<eos>']], dtype = torch.long) for document in dt]"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"def labels_process(dt):\n",
" # Wektoryzacja etykiet (POS)\n",
" return [ torch.tensor([0] + document + [0], dtype = torch.long) for document in dt]\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Teraz wektoryzujemy wszystkie dane:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"train_tokens_ids = data_process(dataset['train']['tokens'])"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"test_tokens_ids = data_process(dataset['test']['tokens'])"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"validation_tokens_ids = data_process(dataset['validation']['tokens'])"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"train_labels = labels_process(dataset['train']['ner_tags'])"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"validation_labels = labels_process(dataset['validation']['ner_tags'])"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"test_labels = labels_process(dataset['test']['ner_tags'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Przykład, jak wyglądają dane po zwektoryzowaniu:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tensor([ 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 3])"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_tokens_ids[0]"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': '0',\n",
" 'tokens': ['EU',\n",
" 'rejects',\n",
" 'German',\n",
" 'call',\n",
" 'to',\n",
" 'boycott',\n",
" 'British',\n",
" 'lamb',\n",
" '.'],\n",
" 'pos_tags': [22, 42, 16, 21, 35, 37, 16, 21, 7],\n",
" 'chunk_tags': [11, 21, 11, 12, 21, 22, 11, 12, 0],\n",
" 'ner_tags': [3, 0, 7, 0, 0, 0, 7, 0, 0]}"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dataset['train'][0]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"tensor([0, 3, 0, 7, 0, 0, 0, 7, 0, 0, 0])"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"train_labels[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Funkcja, której użyjemy do ewaluacji:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"def get_scores(y_true, y_pred):\n",
" # Funkcja zwraca precyzję, pokrycie i F1\n",
" acc_score = 0\n",
" tp = 0\n",
" fp = 0\n",
" selected_items = 0\n",
" relevant_items = 0 \n",
"\n",
" for p,t in zip(y_pred, y_true):\n",
" if p == t:\n",
" acc_score +=1\n",
"\n",
" if p > 0 and p == t:\n",
" tp +=1\n",
"\n",
" if p > 0:\n",
" selected_items += 1\n",
"\n",
" if t > 0 :\n",
" relevant_items +=1\n",
"\n",
" \n",
" \n",
" if selected_items == 0:\n",
" precision = 1.0\n",
" else:\n",
" precision = tp / selected_items\n",
" \n",
" \n",
" if relevant_items == 0:\n",
" recall = 1.0\n",
" else:\n",
" recall = tp / relevant_items\n",
" \n",
" \n",
" if precision + recall == 0.0 :\n",
" f1 = 0.0\n",
" else:\n",
" f1 = 2* precision * recall / (precision + recall)\n",
"\n",
" return precision, recall, f1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ile mamy różnych POS tagów?"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9\n"
]
}
],
"source": [
"num_tags = max([max(x) for x in dataset['train']['ner_tags'] ]) + 1 \n",
"print(num_tags)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Implementacja rekurencyjnej sieci neuronowej LSTM:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"class LSTM(torch.nn.Module):\n",
"\n",
" def __init__(self):\n",
" super(LSTM, self).__init__()\n",
" self.emb = torch.nn.Embedding(len(v.get_itos()),100)\n",
" self.rec = torch.nn.LSTM(100, 256, 1, batch_first = True)\n",
" self.fc1 = torch.nn.Linear( 256 , 9)\n",
"\n",
" def forward(self, x):\n",
" emb = torch.relu(self.emb(x))\n",
" \n",
" lstm_output, (h_n, c_n) = self.rec(emb)\n",
" \n",
" out_weights = self.fc1(lstm_output)\n",
"\n",
" return out_weights"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Stworzenie modelu:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"lstm = LSTM()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Definicja funkcji kosztu:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"criterion = torch.nn.CrossEntropyLoss()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Definicja optymalizatora:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"optimizer = torch.optim.Adam(lstm.parameters())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Funkcja do ewaluacji modelu:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"def eval_model(dataset_tokens, dataset_labels, model):\n",
" Y_true = []\n",
" Y_pred = []\n",
" for i in tqdm(range(len(dataset_labels))):\n",
" batch_tokens = dataset_tokens[i].unsqueeze(0)\n",
" tags = list(dataset_labels[i].numpy())\n",
" Y_true += tags\n",
" \n",
" Y_batch_pred_weights = model(batch_tokens).squeeze(0)\n",
" Y_batch_pred = torch.argmax(Y_batch_pred_weights,1)\n",
" Y_pred += list(Y_batch_pred.numpy())\n",
" \n",
"\n",
" return get_scores(Y_true, Y_pred)\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Uczenie modelu:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"NUM_EPOCHS = 5"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7b88376fa1a6481b92da7e8308b581cb",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/14041 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9ec3025ecf6a4ed69a5f1df4d2e8099d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0.49656056896350703, 0.4950598628385447, 0.49580908032596044)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "6cb176b6c465408bad2c2e7fed25dcd0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/14041 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "935c5d560d364b6b9930216f993caf2a",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0.6289105835367207, 0.6589561780774148, 0.643582902877902)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1827fd779e5c478ebc6b512788898c8e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/14041 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "2a347fc594654853a6e6a2425a7a5c98",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0.7031268719300348, 0.6822038823666163, 0.6925073746312684)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "788f646a32824270b7a9a5ef2b87662b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/14041 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8b44d34b37d844ac9d3075e9a3fd25d4",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0.7354687113529558, 0.6912704870394049, 0.7126850020971898)\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8cab6377c9f54f3e8acf262273a4dbf0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/14041 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1ae82f803d424b83b7abdc3319282131",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0.7134837896666285, 0.7239335115657329, 0.7186706669743826)\n"
]
}
],
"source": [
"for i in range(NUM_EPOCHS):\n",
" lstm.train()\n",
" #for i in tqdm(range(500)):\n",
" for i in tqdm(range(len(train_labels))):\n",
" batch_tokens = train_tokens_ids[i].unsqueeze(0)\n",
" tags = train_labels[i].unsqueeze(1)\n",
" \n",
" \n",
" predicted_tags = lstm(batch_tokens)\n",
"\n",
" \n",
" optimizer.zero_grad()\n",
" loss = criterion(predicted_tags.squeeze(0),tags.squeeze(1))\n",
" \n",
" loss.backward()\n",
" optimizer.step()\n",
" \n",
" lstm.eval()\n",
" print(eval_model(validation_tokens_ids, validation_labels, lstm))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ewaluacja:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d494e8de77cc4597b07eb4bbaff1d241",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3250 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(0.7134837896666285, 0.7239335115657329, 0.7186706669743826)"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"eval_model(validation_tokens_ids, validation_labels, lstm)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a221459483784b94bc2251b8dae6bbba",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3453 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"(0.6529463280370325, 0.6433678500986193, 0.6481217013349891)"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"eval_model(test_tokens_ids, test_labels, lstm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Zadanie 3\n",
"\n",
"Sklonuj repozytorium https://git.wmi.amu.edu.pl/kubapok/en-ner-conll-2003\n",
"\n",
"Stwórz model *sequence labelling* oparty o dowolną rekurencyjną sieć neuronową (możesz wzorować się na przykładzie z zajęć).\n",
"\n",
"W plikach dev-0/out.tsv oraz test-A/out.tsv umieść wyniki predykcji dla dev-0/in.tsv i test-A/in.tsv odpowiednio.\n",
"Do ewaluacji wykorzystaj narzędzie GEval (https://gitlab.com/filipg/geval):\n",
"\n",
" wget https://gonito.net/get/bin/geval\n",
" chmod u+x geval\n",
" ./geval --help\n",
"\n",
"Liczba punktów uzyskanych za zadanie zależy od uzyskanej wartości accuracy na zbiorze `test-A` (wynik zaokrąglony w górę):\n",
"\n",
" points = math.ceil(accuracy * 7.0)\n",
"\n",
"⚠️ W systemie Moodle proszę załączyć plik `test-A/out.tsv` oraz link do repozytorium z rozwiązaniem zadania.\n",
" "
]
}
],
"metadata": {
"author": "Jakub Pokrywka",
"email": "kubapok@wmi.amu.edu.pl",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"lang": "pl",
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"subtitle": "11.NER RNN[ćwiczenia]",
"title": "Ekstrakcja informacji",
"year": "2021"
},
"nbformat": 4,
"nbformat_minor": 4
}