307 KiB
307 KiB
Instalacja pakietów
!pip install transformers datasets torch sentencepiece
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting transformers Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m47.6 MB/s[0m eta [36m0:00:00[0m [?25hCollecting datasets Downloading datasets-2.9.0-py3-none-any.whl (462 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 KB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1+cu116) Collecting sentencepiece Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (2022.6.2) Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (6.0) Collecting huggingface-hub<1.0,>=0.11.0 Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers) (4.64.1) Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers) (2.25.1) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (23.0) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers) (3.9.0) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (1.21.6) Collecting xxhash Downloading xxhash-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 KB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (9.0.0) Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets) (3.8.3) Collecting responses<0.19 Downloading responses-0.18.0-py3-none-any.whl (38 kB) Collecting multiprocess Downloading multiprocess-0.70.14-py38-none-any.whl (132 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 KB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from datasets) (1.3.5) Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.8/dist-packages (from datasets) (2023.1.0) Requirement already satisfied: dill<0.3.7 in /usr/local/lib/python3.8/dist-packages (from datasets) (0.3.6) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.4.0) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (6.0.4) Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (2.1.1) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.8.2) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (22.2.0) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (4.0.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (1.24.3) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2022.12.7) Collecting urllib3<1.27,>=1.21.1 Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2022.7.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0) Installing collected packages: tokenizers, sentencepiece, xxhash, urllib3, multiprocess, responses, huggingface-hub, transformers, datasets Attempting uninstall: urllib3 Found existing installation: urllib3 1.24.3 Uninstalling urllib3-1.24.3: Successfully uninstalled urllib3-1.24.3 Successfully installed datasets-2.9.0 huggingface-hub-0.12.0 multiprocess-0.70.14 responses-0.18.0 sentencepiece-0.1.97 tokenizers-0.13.2 transformers-4.26.1 urllib3-1.26.14 xxhash-3.2.0
Załadowanie datasetu
from datasets import load_dataset
dataset = load_dataset("sms_spam")
Downloading builder script: 0%| | 0.00/3.21k [00:00<?, ?B/s]
Downloading metadata: 0%| | 0.00/1.69k [00:00<?, ?B/s]
Downloading readme: 0%| | 0.00/4.87k [00:00<?, ?B/s]
Downloading and preparing dataset sms_spam/plain_text to /root/.cache/huggingface/datasets/sms_spam/plain_text/1.0.0/53f051d3b5f62d99d61792c91acefe4f1577ad3e4c216fb0ad39e30b9f20019c...
Downloading data: 0%| | 0.00/203k [00:00<?, ?B/s]
Generating train split: 0%| | 0/5574 [00:00<?, ? examples/s]
Dataset sms_spam downloaded and prepared to /root/.cache/huggingface/datasets/sms_spam/plain_text/1.0.0/53f051d3b5f62d99d61792c91acefe4f1577ad3e4c216fb0ad39e30b9f20019c. Subsequent calls will reuse this data.
0%| | 0/1 [00:00<?, ?it/s]
dataset['train'][0]
{'sms': 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...\n', 'label': 0}
Modyfikacja datasetu - klasyfikacja
parsed_dataset = []
for row in dataset['train']:
text = "binary classification: " + row['sms'].replace("\n", "")
new_row = {}
new_row['sms'] = text
if row['label'] == 0:
new_row['label'] = "0"
else:
new_row['label'] = "1"
parsed_dataset.append(new_row)
parsed_dataset[0]
{'sms': 'binary classification: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...', 'label': '0'}
Tokenizer T5
from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-base')
Downloading (…)ve/main/spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s]
Downloading (…)lve/main/config.json: 0%| | 0.00/1.21k [00:00<?, ?B/s]
/usr/local/lib/python3.8/dist-packages/transformers/models/t5/tokenization_t5.py:163: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5. For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`. - Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding. - If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding. - To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value. warnings.warn(
sms = parsed_dataset[0]['sms']
print('Original: ', sms)
print('Tokenized: ', tokenizer.tokenize(sms))
print('Token IDs: ', tokenizer.convert_tokens_to_ids(tokenizer.tokenize(sms)))
Original: binary classification: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... Tokenized: ['▁binary', '▁classification', ':', '▁Go', '▁until', '▁jur', 'ong', '▁point', ',', '▁crazy', '.', '.', '▁Available', '▁only', '▁in', '▁bug', 'is', '▁', 'n', '▁great', '▁world', '▁la', '▁', 'e', '▁buffet', '...', '▁Cine', '▁there', '▁got', '▁', 'a', 'more', '▁wa', 't', '...'] Token IDs: [14865, 13774, 10, 1263, 552, 10081, 2444, 500, 6, 6139, 5, 5, 8144, 163, 16, 8143, 159, 3, 29, 248, 296, 50, 3, 15, 15385, 233, 17270, 132, 530, 3, 9, 3706, 8036, 17, 233]
Check maximum lenght of a sentence
max_len = 0
for sentence in parsed_dataset:
input_ids = tokenizer.encode(sentence['sms'], add_special_tokens=True)
max_len = max(max_len, len(input_ids))
print('Max sentence length: ', max_len)
Max sentence length: 341
max_label_len = 0
for sentence in parsed_dataset:
input_ids = tokenizer.encode(sentence['label'], add_special_tokens=True)
max_label_len = max(max_label_len, len(input_ids))
print('Max sentence length: ', max_label_len)
Max sentence length: 3
Pre train tokenization
import torch
input_ids = []
target_ids = []
attention_masks = []
for sentence in parsed_dataset:
encoded_dict = tokenizer.encode_plus(
sentence['sms'],
add_special_tokens = True,
max_length = 341,
padding = 'max_length',
truncation=True,
return_attention_mask = True,
return_tensors = 'pt',
)
encoded_target_dict = tokenizer.encode_plus(
sentence['label'],
add_special_tokens = True,
max_length = 3,
padding = 'max_length',
truncation=True,
return_attention_mask = True,
return_tensors = 'pt',
)
input_ids.append(encoded_dict['input_ids'])
target_ids.append(encoded_target_dict['input_ids'])
attention_masks.append(encoded_dict['attention_mask'])
input_ids = torch.cat(input_ids, dim=0)
target_ids = torch.cat(target_ids, dim=0)
attention_masks = torch.cat(attention_masks, dim=0)
print('Original: ', parsed_dataset[0])
print('Token IDs:', input_ids[0])
print('Label token IDs:', target_ids[0])
Original: {'sms': 'binary classification: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...', 'label': '0'} Token IDs: tensor([14865, 13774, 10, 1263, 552, 10081, 2444, 500, 6, 6139, 5, 5, 8144, 163, 16, 8143, 159, 3, 29, 248, 296, 50, 3, 15, 15385, 233, 17270, 132, 530, 3, 9, 3706, 8036, 17, 233, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) Label token IDs: tensor([ 3, 632, 1])
Split dataset
from torch.utils.data import TensorDataset, random_split
dataset = TensorDataset(input_ids, attention_masks, target_ids)
test_size = 1000
dataset_len = len(dataset)
train_size = int(0.9 * (dataset_len-test_size))
val_size = (dataset_len-test_size) - train_size
test_dataset, train_dataset, val_dataset = random_split(dataset, [test_size, train_size, val_size])
print('{:>5,} test samples'.format(test_size))
print('{:>5,} training samples'.format(train_size))
print('{:>5,} validation samples'.format(val_size))
1,000 test samples 4,116 training samples 458 validation samples
Create train and validation loaders
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
batch_size = 16
train_dataloader = DataLoader(
train_dataset,
sampler = RandomSampler(train_dataset),
batch_size = batch_size
)
validation_dataloader = DataLoader(
val_dataset,
sampler = SequentialSampler(val_dataset),
batch_size = batch_size
)
Device check
if torch.cuda.is_available():
device = torch.device("cuda")
print('There are %d GPU(s) available.' % torch.cuda.device_count())
print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
print('No GPU available, using the CPU instead.')
device = torch.device("cpu")
There are 1 GPU(s) available. We will use the GPU: Tesla T4
Load T5 model
from transformers import T5ForConditionalGeneration
model = T5ForConditionalGeneration.from_pretrained('t5-base')
model.cuda()
Downloading (…)"pytorch_model.bin";: 0%| | 0.00/892M [00:00<?, ?B/s]
Downloading (…)neration_config.json: 0%| | 0.00/147 [00:00<?, ?B/s]
T5ForConditionalGeneration( (shared): Embedding(32128, 768) (encoder): T5Stack( (embed_tokens): Embedding(32128, 768) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) (relative_attention_bias): Embedding(32, 12) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (2): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (3): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (4): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (5): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (6): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (7): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (8): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (9): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (10): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (11): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (decoder): T5Stack( (embed_tokens): Embedding(32128, 768) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) (relative_attention_bias): Embedding(32, 12) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (2): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (3): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (4): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (5): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (6): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (7): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (8): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (9): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (10): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (11): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseActDense( (wi): Linear(in_features=768, out_features=3072, bias=False) (wo): Linear(in_features=3072, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): ReLU() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (lm_head): Linear(in_features=768, out_features=32128, bias=False) )
Helper functions
import datetime
import numpy as np
def calculate_accuracy(preds, target):
results_ok = 0.0
results_false = 0.0
for idx, pred in enumerate(preds):
if pred == target[idx]:
results_ok += 1.0
else:
results_false += 1.0
return results_ok / (results_ok + results_false)
def format_time(elapsed):
'''
Takes a time in seconds and returns a string hh:mm:ss
'''
elapsed_rounded = int(round((elapsed)))
return str(datetime.timedelta(seconds=elapsed_rounded))
Init training
optimizer = torch.optim.AdamW(model.parameters(),
lr = 3e-4,
eps = 1e-8
)
epochs = 4
total_steps = len(train_dataloader) * epochs
Training
import random
import time
# This training code is based on the `run_glue.py` script here:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L128
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
training_stats = []
total_t0 = time.time()
for epoch_i in range(0, epochs):
# ========================================
# Training
# ========================================
print("")
print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
print('Training...')
t0 = time.time()
total_train_loss = 0
total_train_acc = 0
model.train()
for step, batch in enumerate(train_dataloader):
if step % 40 == 0 and not step == 0:
elapsed = format_time(time.time() - t0)
print(' Batch {:>5,} of {:>5,}. Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
y = batch[2].to(device)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
outputs = model(
input_ids=b_input_ids,
attention_mask=b_input_mask,
decoder_input_ids=y_ids,
labels=lm_labels
)
generated_ids = model.generate(
input_ids = b_input_ids,
attention_mask = b_input_mask,
max_length=3,
num_beams=2,
repetition_penalty=2.5,
length_penalty=1.0,
early_stopping=True
)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in y]
total_train_acc += calculate_accuracy(preds, target)
loss = outputs[0]
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_train_loss += loss.item()
avg_train_loss = total_train_loss / len(train_dataloader)
avg_train_acc = total_train_acc / len(train_dataloader)
training_time = format_time(time.time() - t0)
print("")
print(" Average training loss: {0:.2f}".format(avg_train_loss))
print(" Average training acc: {0:.2f}".format(avg_train_acc))
print(" Training epcoh took: {:}".format(training_time))
# ========================================
# Validation
# ========================================
print("")
print("Running Validation...")
t0 = time.time()
model.eval()
total_eval_loss = 0
total_eval_accuracy = 0
for batch in validation_dataloader:
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
y = batch[2].to(device)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
with torch.no_grad():
outputs = model(
input_ids=b_input_ids,
attention_mask=b_input_mask,
decoder_input_ids=y_ids,
labels=lm_labels
)
loss = outputs[0]
total_eval_loss += loss.item()
generated_ids = model.generate(
input_ids = b_input_ids,
attention_mask = b_input_mask,
max_length=3,
num_beams=2,
repetition_penalty=2.5,
length_penalty=1.0,
early_stopping=True
)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in y]
total_eval_accuracy += calculate_accuracy(preds, target)
avg_val_loss = total_eval_loss / len(validation_dataloader)
avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
print(" Accuracy: {0:.2f}".format(avg_val_accuracy))
validation_time = format_time(time.time() - t0)
print(" Validation took: {:}".format(validation_time))
print(" Validation Loss: {0:.2f}".format(avg_val_loss))
training_stats.append(
{
'epoch': epoch_i + 1,
'Training Loss': avg_train_loss,
'Training Accur.': avg_train_acc,
'Valid. Loss': avg_val_loss,
'Valid. Accur.': avg_val_accuracy,
'Training Time': training_time,
'Validation Time': validation_time
}
)
print("")
print("Training complete!")
print("Total training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))
======== Epoch 1 / 4 ======== Training... Batch 40 of 258. Elapsed: 0:01:12. Batch 80 of 258. Elapsed: 0:02:20. Batch 120 of 258. Elapsed: 0:03:28. Batch 160 of 258. Elapsed: 0:04:36. Batch 200 of 258. Elapsed: 0:05:45. Batch 240 of 258. Elapsed: 0:06:53. Average training loss: 0.09 Average training acc: 0.81 Training epcoh took: 0:07:23 Running Validation... Accuracy: 0.83 Validation took: 0:00:27 Validation Loss: 0.00 ======== Epoch 2 / 4 ======== Training... Batch 40 of 258. Elapsed: 0:01:09. Batch 80 of 258. Elapsed: 0:02:17. Batch 120 of 258. Elapsed: 0:03:25. Batch 160 of 258. Elapsed: 0:04:33. Batch 200 of 258. Elapsed: 0:05:42. Batch 240 of 258. Elapsed: 0:06:50. Average training loss: 0.00 Average training acc: 0.86 Training epcoh took: 0:07:19 Running Validation... Accuracy: 0.83 Validation took: 0:00:26 Validation Loss: 0.00 ======== Epoch 3 / 4 ======== Training... Batch 40 of 258. Elapsed: 0:01:08. Batch 80 of 258. Elapsed: 0:02:16. Batch 120 of 258. Elapsed: 0:03:24. Batch 160 of 258. Elapsed: 0:04:32. Batch 200 of 258. Elapsed: 0:05:41. Batch 240 of 258. Elapsed: 0:06:49. Average training loss: 0.00 Average training acc: 0.85 Training epcoh took: 0:07:18 Running Validation... Accuracy: 0.83 Validation took: 0:00:26 Validation Loss: 0.00 ======== Epoch 4 / 4 ======== Training... Batch 40 of 258. Elapsed: 0:01:08. Batch 80 of 258. Elapsed: 0:02:16. Batch 120 of 258. Elapsed: 0:03:24. Batch 160 of 258. Elapsed: 0:04:32. Batch 200 of 258. Elapsed: 0:05:41. Batch 240 of 258. Elapsed: 0:06:49. Average training loss: 0.00 Average training acc: 0.86 Training epcoh took: 0:07:18 Running Validation... Accuracy: 0.83 Validation took: 0:00:26 Validation Loss: 0.00 Training complete! Total training took 0:31:02 (h:mm:ss)
Train summary
import pandas as pd
pd.set_option('precision', 2)
df_stats = pd.DataFrame(data=training_stats)
df_stats = df_stats.set_index('epoch')
df_stats
Training Loss | Training Accur. | Valid. Loss | Valid. Accur. | Training Time | Validation Time | |
---|---|---|---|---|---|---|
epoch | ||||||
1 | 9.03e-02 | 0.81 | 9.89e-07 | 0.83 | 0:07:23 | 0:00:27 |
2 | 1.30e-05 | 0.86 | 2.26e-08 | 0.83 | 0:07:19 | 0:00:26 |
3 | 3.05e-06 | 0.85 | 0.00e+00 | 0.83 | 0:07:18 | 0:00:26 |
4 | 5.13e-06 | 0.86 | 0.00e+00 | 0.83 | 0:07:18 | 0:00:26 |
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style='darkgrid')
sns.set(font_scale=1.5)
plt.rcParams["figure.figsize"] = (12,6)
plt.plot(df_stats['Training Loss'], 'b-o', label="Training")
plt.plot(df_stats['Valid. Loss'], 'g-o', label="Validation")
plt.title("Training & Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.xticks([1, 2, 3, 4])
plt.show()
Create test loader
prediction_dataloader = DataLoader(
test_dataset,
sampler = SequentialSampler(test_dataset),
batch_size = batch_size
)
Evaluate on test dataset
print('Predicting labels for {:,} test sentences...'.format(len(test_dataset)))
model.eval()
predictions , true_labels = [], []
total_test_acc = 0
for batch in prediction_dataloader:
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
y = batch[2].to(device)
with torch.no_grad():
generated_ids = model.generate(
input_ids = b_input_ids,
attention_mask = b_input_mask,
max_length=3,
num_beams=2,
repetition_penalty=2.5,
length_penalty=1.0,
early_stopping=True
)
preds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]
target = [tokenizer.decode(t, skip_special_tokens=True, clean_up_tokenization_spaces=True) for t in y]
total_test_acc += calculate_accuracy(preds, target)
predictions.append(preds)
true_labels.append(target)
print(' DONE.')
Predicting labels for 1,000 test sentences... DONE.
avg_test_accuracy = total_test_acc / len(prediction_dataloader)
avg_test_accuracy
0.873015873015873
print("Sample prediction: {}, expected: {}".format(predictions[2][10], true_labels[2][10]))
Sample prediction: 0, expected: 0
MCC Score
from sklearn.metrics import matthews_corrcoef
matthews_set = []
print('Calculating Matthews Corr. Coef. for each batch...')
for i in range(len(true_labels)):
matthews = matthews_corrcoef(true_labels[i], predictions[i])
matthews_set.append(matthews)
Calculating Matthews Corr. Coef. for each batch...
ax = sns.barplot(x=list(range(len(matthews_set))), y=matthews_set, ci=None)
plt.title('MCC Score per Batch')
plt.ylabel('MCC Score (-1 to +1)')
plt.xlabel('Batch #')
plt.show()
flat_predictions = np.concatenate(predictions, axis=0)
flat_true_labels = np.concatenate(true_labels, axis=0)
mcc = matthews_corrcoef(flat_true_labels, flat_predictions)
print('Total MCC: %.3f' % mcc)
Total MCC: 0.042
Save model
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)
output_dir = '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model'
print("Saving model to %s" % output_dir)
model_to_save = model.module if hasattr(model, 'module') else model
model_to_save.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
Mounted at /content/gdrive/ Saving model to /content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model
('/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model/tokenizer_config.json', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model/special_tokens_map.json', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model/spiece.model', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/T5_Model/added_tokens.json')