162 KiB
162 KiB
Instalacja pakietów
!pip install transformers datasets torch sentencepiece
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: transformers in /usr/local/lib/python3.8/dist-packages (4.26.1) Requirement already satisfied: datasets in /usr/local/lib/python3.8/dist-packages (2.9.0) Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1+cu116) Requirement already satisfied: sentencepiece in /usr/local/lib/python3.8/dist-packages (0.1.97) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (1.21.6) Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.12.1) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers) (4.64.1) Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers) (2.25.1) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (6.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (2022.6.2) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.13.2) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers) (3.9.0) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (23.0) Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.8/dist-packages (from datasets) (2023.1.0) Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (9.0.0) Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from datasets) (1.3.5) Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.8/dist-packages (from datasets) (0.18.0) Requirement already satisfied: xxhash in /usr/local/lib/python3.8/dist-packages (from datasets) (3.2.0) Requirement already satisfied: dill<0.3.7 in /usr/local/lib/python3.8/dist-packages (from datasets) (0.3.6) Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from datasets) (0.70.14) Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets) (3.8.3) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.4.0) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.8.2) Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (2.1.1) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (22.2.0) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (6.0.4) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (1.26.14) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (4.0.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2022.12.7) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2022.7.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0)
Załadowanie datasetu
from datasets import load_dataset
dataset = load_dataset("sms_spam")
Downloading builder script: 0%| | 0.00/3.21k [00:00<?, ?B/s]
Downloading metadata: 0%| | 0.00/1.69k [00:00<?, ?B/s]
Downloading readme: 0%| | 0.00/4.87k [00:00<?, ?B/s]
Downloading and preparing dataset sms_spam/plain_text to /root/.cache/huggingface/datasets/sms_spam/plain_text/1.0.0/53f051d3b5f62d99d61792c91acefe4f1577ad3e4c216fb0ad39e30b9f20019c...
Downloading data: 0%| | 0.00/203k [00:00<?, ?B/s]
Generating train split: 0%| | 0/5574 [00:00<?, ? examples/s]
Dataset sms_spam downloaded and prepared to /root/.cache/huggingface/datasets/sms_spam/plain_text/1.0.0/53f051d3b5f62d99d61792c91acefe4f1577ad3e4c216fb0ad39e30b9f20019c. Subsequent calls will reuse this data.
0%| | 0/1 [00:00<?, ?it/s]
dataset['train'][0]
{'sms': 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...\n', 'label': 0}
Przygotowanie datasetu
parsed_dataset = []
for row in dataset['train']:
text = row['sms'].replace("\n", "")
new_row = {}
new_row['sms'] = text
if row['label'] == 0:
new_row['label'] = "False"
else:
new_row['label'] = "True"
parsed_dataset.append(new_row)
parsed_dataset[0]
{'sms': 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...', 'label': 'False'}
Tokenizer FLAN-T5
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-base')
Downloading (…)okenizer_config.json: 0%| | 0.00/2.54k [00:00<?, ?B/s]
Downloading (…)"spiece.model";: 0%| | 0.00/792k [00:00<?, ?B/s]
Downloading (…)/main/tokenizer.json: 0%| | 0.00/2.42M [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.20k [00:00<?, ?B/s]
sms = parsed_dataset[0]['sms']
print('Original: ', sms)
print('Tokenized: ', tokenizer.tokenize(sms))
print('Token IDs: ', tokenizer.convert_tokens_to_ids(tokenizer.tokenize(sms)))
Original: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... Tokenized: ['▁Go', '▁until', '▁jur', 'ong', '▁point', ',', '▁crazy', '.', '.', '▁Available', '▁only', '▁in', '▁bug', 'is', '▁', 'n', '▁great', '▁world', '▁la', '▁', 'e', '▁buffet', '...', '▁Cine', '▁there', '▁got', '▁', 'a', 'more', '▁wa', 't', '...'] Token IDs: [1263, 552, 10081, 2444, 500, 6, 6139, 5, 5, 8144, 163, 16, 8143, 159, 3, 29, 248, 296, 50, 3, 15, 15385, 233, 17270, 132, 530, 3, 9, 3706, 8036, 17, 233]
Few shot learning
print(parsed_dataset[0]) #0
print(parsed_dataset[123]) #1
print(parsed_dataset[2000]) #0
print(parsed_dataset[3002]) #1
{'sms': 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...', 'label': 'False'} {'sms': 'Todays Voda numbers ending 7548 are selected to receive a $350 award. If you have a match please call 08712300220 quoting claim code 4041 standard rates app', 'label': 'True'} {'sms': "LMAO where's your fish memory when I need it?", 'label': 'False'} {'sms': 'This message is free. Welcome to the new & improved Sex & Dogging club! To unsubscribe from this service reply STOP. msgs@150p 18+only', 'label': 'True'}
non_spam_1 = "Content of the text message: " + parsed_dataset[0]['sms'] + "\nAnswer if this text message is spam or not: False\n\n"
spam_1 = "Content of the text message: " + parsed_dataset[123]['sms'] + "\nAnswer if this text message is spam or not: True\n\n"
non_spam_2 = "Content of the text message: " + parsed_dataset[2000]['sms'] + "\nAnswer if this text message is spam or not: False\n\n"
spam_2 = "Content of the text message: " + parsed_dataset[3002]['sms'] + "\nAnswer if this text message is spam or not: True\n\n"
few_shot_prefix = non_spam_1 + spam_1 + non_spam_2 + spam_2 + "Content of the text message: "
print(few_shot_prefix)
Content of the text message: Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... Answer if this text message is spam or not: False Content of the text message: Todays Voda numbers ending 7548 are selected to receive a $350 award. If you have a match please call 08712300220 quoting claim code 4041 standard rates app Answer if this text message is spam or not: True Content of the text message: LMAO where's your fish memory when I need it? Answer if this text message is spam or not: False Content of the text message: This message is free. Welcome to the new & improved Sex & Dogging club! To unsubscribe from this service reply STOP. msgs@150p 18+only Answer if this text message is spam or not: True Content of the text message:
Load FLAN-T5 model
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')
model.cuda()
Downloading (…)lve/main/config.json: 0%| | 0.00/1.40k [00:00<?, ?B/s]
Downloading (…)"pytorch_model.bin";: 0%| | 0.00/990M [00:00<?, ?B/s]
Downloading (…)neration_config.json: 0%| | 0.00/147 [00:00<?, ?B/s]
T5ForConditionalGeneration( (shared): Embedding(32128, 768) (encoder): T5Stack( (embed_tokens): Embedding(32128, 768) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) (relative_attention_bias): Embedding(32, 12) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (2): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (3): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (4): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (5): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (6): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (7): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (8): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (9): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (10): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (11): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (decoder): T5Stack( (embed_tokens): Embedding(32128, 768) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) (relative_attention_bias): Embedding(32, 12) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (2): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (3): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (4): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (5): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (6): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (7): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (8): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (9): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (10): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (11): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerCrossAttention( (EncDecAttention): T5Attention( (q): Linear(in_features=768, out_features=768, bias=False) (k): Linear(in_features=768, out_features=768, bias=False) (v): Linear(in_features=768, out_features=768, bias=False) (o): Linear(in_features=768, out_features=768, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (2): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=768, out_features=2048, bias=False) (wi_1): Linear(in_features=768, out_features=2048, bias=False) (wo): Linear(in_features=2048, out_features=768, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (lm_head): Linear(in_features=768, out_features=32128, bias=False) )
Helper functions
import torch
def calculate_accuracy(preds, target):
results_ok = 0.0
results_false = 0.0
for idx, pred in enumerate(preds):
if pred == target[idx]:
results_ok += 1.0
else:
results_false += 1.0
return results_ok / (results_ok + results_false)
if torch.cuda.is_available():
device = torch.device("cuda")
print('There are %d GPU(s) available.' % torch.cuda.device_count())
print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
print('No GPU available, using the CPU instead.')
device = torch.device("cpu")
There are 1 GPU(s) available. We will use the GPU: Tesla T4
Predykcja
parsed_dataset = parsed_dataset[1:123] + parsed_dataset[124:2000] + parsed_dataset[2001:3002] + parsed_dataset[3003:]
predictions = []
expected = []
for row in parsed_dataset:
input_text = few_shot_prefix + row['sms'] + "\nAnswer if this text message is spam or not: "
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
generated_ids = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=400)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
predictions.append(generated_text)
expected.append(row['label'])
acc = calculate_accuracy(predictions, expected)
print(acc)
0.503593244699964
print("Sample prediction: {}, expected: {}".format(predictions[101], expected[101]))
Sample prediction: False, expected: False
MCC Score
from sklearn.metrics import matthews_corrcoef
print('Calculating Matthews Corr. Coef. for each batch...')
matthews = matthews_corrcoef(expected, predictions)
print('Total MCC: %.3f' % matthews)
Calculating Matthews Corr. Coef. for each batch... Total MCC: 0.010
Save model
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)
output_dir = '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model'
print("Saving model to %s" % output_dir)
model_to_save = model.module if hasattr(model, 'module') else model
model_to_save.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
Mounted at /content/gdrive/ Saving model to /content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model
('/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model/tokenizer_config.json', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model/special_tokens_map.json', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model/spiece.model', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model/added_tokens.json', '/content/gdrive/My Drive/UAM/Przetwarzanie-tekstu/Ver_1_FLAN-T5_Model/tokenizer.json')