602 KiB
602 KiB
data_amount = 5000
!pip3 install transformers
Collecting transformers Downloading transformers-4.16.2-py3-none-any.whl (3.5 MB) [K |████████████████████████████████| 3.5 MB 4.3 MB/s [?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.62.3) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.21.5) Collecting tokenizers!=0.11.3,>=0.10.1 Downloading tokenizers-0.11.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.8 MB) [K |████████████████████████████████| 6.8 MB 64.5 MB/s [?25hCollecting sacremoses Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB) [K |████████████████████████████████| 895 kB 58.2 MB/s [?25hCollecting huggingface-hub<1.0,>=0.1.0 Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB) [K |████████████████████████████████| 67 kB 7.8 MB/s [?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.11.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.3) Collecting pyyaml>=5.1 Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB) [K |████████████████████████████████| 596 kB 60.4 MB/s [?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.4.2) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (3.10.0.2) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (3.0.7) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.7.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.1.0) Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2) Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers Attempting uninstall: pyyaml Found existing installation: PyYAML 3.13 Uninstalling PyYAML-3.13: Successfully uninstalled PyYAML-3.13 Successfully installed huggingface-hub-0.4.0 pyyaml-6.0 sacremoses-0.0.47 tokenizers-0.11.5 transformers-4.16.2
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
import torch
from transformers import TrainingArguments, Trainer
from transformers import BertTokenizer, BertForSequenceClassification
from transformers import EarlyStoppingCallback
import matplotlib.pyplot as plt
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
data_path = 'drive/MyDrive/blogtext.csv'
data = pd.read_csv(data_path, error_bad_lines=False, engine='python')
data = data[:data_amount]
data.head()
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2882: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. exec(code_obj, self.user_global_ns, self.user_ns) Skipping line 16844: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 19370: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 31753: field larger than field limit (131072) Skipping line 33676: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 65976: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 116130: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 127080: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 154052: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 174200: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 189740: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 274245: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 275624: field larger than field limit (131072) Skipping line 302668: field larger than field limit (131072) Skipping line 307322: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 317541: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 333957: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 345859: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359845: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359846: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359847: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359849: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 371329: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 384761: field larger than field limit (131072) Skipping line 389712: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 391820: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 398927: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 401260: field larger than field limit (131072) Skipping line 403079: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 454667: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 465419: field larger than field limit (131072) Skipping line 466152: field larger than field limit (131072) Skipping line 485309: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 529874: field larger than field limit (131072) Skipping line 552169: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 554628: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 560429: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 589855: field larger than field limit (131072) Skipping line 601507: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 614020: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 630106: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 632882: field larger than field limit (131072) Skipping line 637573: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 658667: field larger than field limit (131072)
id | gender | age | topic | sign | date | text | |
---|---|---|---|---|---|---|---|
0 | 2059027 | male | 15 | Student | Leo | 14,May,2004 | Info has been found (+/- 100 pages,... |
1 | 2059027 | male | 15 | Student | Leo | 13,May,2004 | These are the team members: Drewe... |
2 | 2059027 | male | 15 | Student | Leo | 12,May,2004 | In het kader van kernfusie op aarde... |
3 | 2059027 | male | 15 | Student | Leo | 12,May,2004 | testing!!! testing!!! |
4 | 3581210 | male | 33 | InvestmentBanking | Aquarius | 11,June,2004 | Thanks to Yahoo!'s Toolbar I can ... |
if (torch.cuda.is_available()):
device = "cuda:0"
torch.cuda.empty_cache()
else:
device = "cpu"
Model typu encoder (BertForSequenceClassification)
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, problem_type="multi_label_classification", num_labels=4).to(device)
loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/45c3f7a79a80e1cf0a489e5c62b43f173c15db47864303a55d623bb3c96f72a5.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 loading file https://huggingface.co/bert-base-uncased/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/c1d7f0a763fb63861cc08553866f1fc3e5a6f4f07621be277452d26d71303b7e.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 loading file https://huggingface.co/bert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/534479488c54aeaf9c3406f647aa2ec13648c06771ffe269edabebd4c412da1d.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e Model config BertConfig { "_name_or_path": "bert-base-uncased", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.16.2", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e Model config BertConfig { "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "problem_type": "multi_label_classification", "transformers_version": "4.16.2", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 } loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight'] - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
n, bins, patches = plt.hist(data['age'], 4, density=True, facecolor='b', alpha=0.75)
plt.title('Histogram of Age')
plt.grid(True)
plt.figure(figsize=(100,100), dpi=100)
plt.show()
<Figure size 10000x10000 with 0 Axes>
"""
1 - 22 -> 1 klasa
23 - 31 -> 2 klasa
32 - 39 -> 3 klasa
40 - 48 -> 4 klasa
"""
def mapAgeToClass2(value: pd.DataFrame) -> int:
if(value['age'] <=22):
return [1.0,0.0,0.0,0.0]
elif(value['age'] > 22 and value['age'] <= 31):
return [0.0,1.0,0.0,0.0]
elif(value['age'] > 31 and value['age'] <= 39):
return [0.0,0.0,1.0,0.0]
else:
return [0.0,0.0,0.0,1.0]
data['label'] = data.apply(lambda row: mapAgeToClass2(row), axis=1)
data.head()
id | gender | age | topic | sign | date | text | label | |
---|---|---|---|---|---|---|---|---|
0 | 2059027 | male | 15 | Student | Leo | 14,May,2004 | Info has been found (+/- 100 pages,... | [1.0, 0.0, 0.0, 0.0] |
1 | 2059027 | male | 15 | Student | Leo | 13,May,2004 | These are the team members: Drewe... | [1.0, 0.0, 0.0, 0.0] |
2 | 2059027 | male | 15 | Student | Leo | 12,May,2004 | In het kader van kernfusie op aarde... | [1.0, 0.0, 0.0, 0.0] |
3 | 2059027 | male | 15 | Student | Leo | 12,May,2004 | testing!!! testing!!! | [1.0, 0.0, 0.0, 0.0] |
4 | 3581210 | male | 33 | InvestmentBanking | Aquarius | 11,June,2004 | Thanks to Yahoo!'s Toolbar I can ... | [0.0, 0.0, 1.0, 0.0] |
X = list(data['text'])
Y = list(data['label'])
X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.2)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=512)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=512)
class Dataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels=None):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
if self.labels:
item["labels"] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.encodings["input_ids"])
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)
def compute_metrics(p):
pred, labels = p
pred = np.argmax(pred, axis=1)
labels = np.argmax(labels, axis=1)
accuracy = accuracy_score(y_true=labels, y_pred=pred)
recall = recall_score(y_true=labels, y_pred=pred, average='micro')
precision = precision_score(y_true=labels, y_pred=pred, average='micro')
f1 = f1_score(y_true=labels, y_pred=pred, average='micro')
return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}
args = TrainingArguments(
output_dir="output",
evaluation_strategy="steps",
eval_steps=100,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
seed=0,
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
PyTorch: setting up devices The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
trainer.train()
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 4000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 1500
[1100/1500 12:04 < 04:23, 1.52 it/s, Epoch 2/3]
Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
100 | No log | 0.308495 | 0.721000 | 0.721000 | 0.721000 | 0.721000 |
200 | No log | 0.267907 | 0.793000 | 0.793000 | 0.793000 | 0.793000 |
300 | No log | 0.246032 | 0.786000 | 0.786000 | 0.786000 | 0.786000 |
400 | No log | 0.235976 | 0.796000 | 0.796000 | 0.796000 | 0.796000 |
500 | 0.297000 | 0.217070 | 0.830000 | 0.830000 | 0.830000 | 0.830000 |
600 | 0.297000 | 0.232244 | 0.828000 | 0.828000 | 0.828000 | 0.828000 |
700 | 0.297000 | 0.198891 | 0.853000 | 0.853000 | 0.853000 | 0.853000 |
800 | 0.297000 | 0.202887 | 0.851000 | 0.851000 | 0.851000 | 0.851000 |
900 | 0.297000 | 0.228751 | 0.847000 | 0.847000 | 0.847000 | 0.847000 |
1000 | 0.153700 | 0.221675 | 0.850000 | 0.850000 | 0.850000 | 0.850000 |
1100 | 0.153700 | 0.218299 | 0.866000 | 0.866000 | 0.866000 | 0.866000 |
***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Saving model checkpoint to output/checkpoint-500 Configuration saved in output/checkpoint-500/config.json Model weights saved in output/checkpoint-500/pytorch_model.bin ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Saving model checkpoint to output/checkpoint-1000 Configuration saved in output/checkpoint-1000/config.json Model weights saved in output/checkpoint-1000/pytorch_model.bin ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Training completed. Do not forget to share your model on huggingface.co/models =) Loading best model from output/checkpoint-500 (score: 0.21706973016262054).
TrainOutput(global_step=1100, training_loss=0.212534950429743, metrics={'train_runtime': 724.5874, 'train_samples_per_second': 16.561, 'train_steps_per_second': 2.07, 'total_flos': 2315418864844800.0, 'train_loss': 0.212534950429743, 'epoch': 2.2})
result = trainer.predict(val_dataset)
***** Running Prediction ***** Num examples = 1000 Batch size = 8
[125/125 00:19]
print(result.metrics)
{'test_loss': 0.21706973016262054, 'test_accuracy': 0.83, 'test_precision': 0.83, 'test_recall': 0.83, 'test_f1': 0.83, 'test_runtime': 19.3166, 'test_samples_per_second': 51.769, 'test_steps_per_second': 6.471}
filename = 'model_encoder.pkl'
trainer.save_model(filename)
Saving model checkpoint to model_encoder.pkl Configuration saved in model_encoder.pkl/config.json Model weights saved in model_encoder.pkl/pytorch_model.bin
Model typu decoder
!pip install transformers
Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (4.16.2) Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers) (0.0.47) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.11.0) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.3) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.21.5) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.4.0) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.62.3) Requirement already satisfied: tokenizers!=0.11.3,>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.11.5) Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.4.2) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (6.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (3.10.0.2) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (3.0.7) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.7.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4) Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.1.0)
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
import torch
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback
import matplotlib.pyplot as plt
from transformers import LongformerTokenizer, LongformerForSequenceClassification, LongformerConfig
model_name = "allenai/longformer-scico"
config = LongformerConfig(attention_window=32)
config.attention_window=32
tokenizer = LongformerTokenizer.from_pretrained(model_name)
model = LongformerForSequenceClassification(config).from_pretrained(model_name, problem_type="multi_label_classification")
"""
1 - 22 -> 1 klasa
23 - 31 -> 2 klasa
32 - 39 -> 3 klasa
40 - 48 -> 4 klasa
"""
def mapAgeToClass2(value: pd.DataFrame) -> int:
if(value['age'] <=22):
return [1.0,0.0,0.0,0.0]
elif(value['age'] > 22 and value['age'] <= 31):
return [0.0,1.0,0.0,0.0]
elif(value['age'] > 31 and value['age'] <= 39):
return [0.0,0.0,1.0,0.0]
else:
return [0.0,0.0,0.0,1.0]
data_path = 'drive/MyDrive/blogtext.csv'
data = pd.read_csv(data_path, error_bad_lines=False, engine='python')
data = data[:data_amount]
data['label'] = data.apply(lambda row: mapAgeToClass2(row), axis=1)
X = list(data['text'])
Y = list(data['label'])
if (torch.cuda.is_available()):
device = "cuda:0"
torch.cuda.empty_cache()
X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.2)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=128)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=128)
class Dataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels=None):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
if self.labels:
item["labels"] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.encodings["input_ids"])
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)
def compute_metrics(p):
pred, labels = p
pred = np.argmax(pred, axis=1)
labels = np.argmax(labels, axis=1)
accuracy = accuracy_score(y_true=labels, y_pred=pred)
recall = recall_score(y_true=labels, y_pred=pred, average='micro')
precision = precision_score(y_true=labels, y_pred=pred, average='micro')
f1 = f1_score(y_true=labels, y_pred=pred, average='micro')
return {"accuracy": accuracy, "precision": precision, "recall": recall, "f1": f1}
loading file https://huggingface.co/allenai/longformer-scico/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/1ee5ff92bf5d5e992fcf9006e19b6a4ad35d7c8564ef75f4d79a1ed2153273ff.bfdcc444ff249bca1a95ca170ec350b442f81804d7df3a95a2252217574121d7 loading file https://huggingface.co/allenai/longformer-scico/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/0bc7fa46278c9aeb0db119eeac69668e732999ecf7e70938f7fabc0c50da0ed6.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435 loading file https://huggingface.co/allenai/longformer-scico/resolve/main/added_tokens.json from cache at /root/.cache/huggingface/transformers/3f461190d4c3e4866b53ee0eb0cc229b7868d365099be2f8e40def2f56f64bd1.b2dabb9d6f1c7ea55d3c9c1c2037f316794ad095778dd06ae6a225cc74100b76 loading file https://huggingface.co/allenai/longformer-scico/resolve/main/special_tokens_map.json from cache at /root/.cache/huggingface/transformers/d599b9f7e2161f0d2e3b9c8fd9cebef8b07c938f69b08a0a42e78c584f1b4b1e.a11ebb04664c067c8fe5ef8f8068b0f721263414a26058692f7b2e4ba2a1b342 loading file https://huggingface.co/allenai/longformer-scico/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/da65f1f02a542899b2b4e34dbc660a4afcad000d51ea419fc5fd6a227a122f5e.3f75ee48edc5dac7e53863302122c4a3cee3a14a708eca842a8f62714c185ca5 loading file https://huggingface.co/allenai/longformer-scico/resolve/main/tokenizer.json from cache at None loading configuration file https://huggingface.co/allenai/longformer-scico/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/7e93cc5a6217edd672fdad60f054eec31e6a815697de5adae6b921e1f75836a3.6890559d1ffa3ad48d00eee0ae70669ec630881b293df48322fe4c28c7088c35 Model config LongformerConfig { "_name_or_path": "allenai/longformer-scico", "architectures": [ "LongformerForSequenceClassification" ], "attention_mode": "longformer", "attention_probs_dropout_prob": 0.1, "attention_window": [ 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512 ], "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "not related", "1": "coref", "2": "parent", "3": "child" }, "ignore_attention_mask": false, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "child": "3", "coref": "1", "not related": "0", "parent": "2" }, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "model_type": "longformer", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "sep_token_id": 2, "transformers_version": "4.16.2", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50267 } Adding <m> to the vocabulary Adding </m> to the vocabulary loading configuration file https://huggingface.co/allenai/longformer-scico/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/7e93cc5a6217edd672fdad60f054eec31e6a815697de5adae6b921e1f75836a3.6890559d1ffa3ad48d00eee0ae70669ec630881b293df48322fe4c28c7088c35 Model config LongformerConfig { "_name_or_path": "allenai/longformer-base-4096", "architectures": [ "LongformerForSequenceClassification" ], "attention_mode": "longformer", "attention_probs_dropout_prob": 0.1, "attention_window": [ 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512 ], "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "not related", "1": "coref", "2": "parent", "3": "child" }, "ignore_attention_mask": false, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "child": "3", "coref": "1", "not related": "0", "parent": "2" }, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "model_type": "longformer", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "problem_type": "multi_label_classification", "sep_token_id": 2, "transformers_version": "4.16.2", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50267 } loading weights file https://huggingface.co/allenai/longformer-scico/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/33709a0b0a44102dd29583428fe5253bf07cbd1ed163757382d471017620ad4d.6fd6d3de002d054747c1a5eb1e2b33e56924ad6db478547c9cf616d11dd48609 All model checkpoint weights were used when initializing LongformerForSequenceClassification. All the weights of LongformerForSequenceClassification were initialized from the model checkpoint at allenai/longformer-scico. If your task is similar to the task the model of the checkpoint was trained on, you can already use LongformerForSequenceClassification for predictions without further training. /usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2882: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. exec(code_obj, self.user_global_ns, self.user_ns) Skipping line 16844: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 19370: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 31753: field larger than field limit (131072) Skipping line 33676: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 65976: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 116130: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 127080: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 154052: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 174200: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 189740: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 274245: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 275624: field larger than field limit (131072) Skipping line 302668: field larger than field limit (131072) Skipping line 307322: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 317541: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 333957: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 345859: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359845: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359846: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359847: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359849: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 371329: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 384761: field larger than field limit (131072) Skipping line 389712: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 391820: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 398927: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 401260: field larger than field limit (131072) Skipping line 403079: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 454667: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 465419: field larger than field limit (131072) Skipping line 466152: field larger than field limit (131072) Skipping line 485309: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 529874: field larger than field limit (131072) Skipping line 552169: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 554628: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 560429: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 589855: field larger than field limit (131072) Skipping line 601507: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 614020: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 630106: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 632882: field larger than field limit (131072) Skipping line 637573: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 658667: field larger than field limit (131072)
args = TrainingArguments(
output_dir="output",
evaluation_strategy="steps",
eval_steps=100,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
seed=0,
load_best_model_at_end=True
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics,
callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
PyTorch: setting up devices The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
trainer.train()
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 4000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 1500 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512
[1500/1500 35:53, Epoch 3/3]
Step | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
100 | No log | 0.407021 | 0.625000 | 0.625000 | 0.625000 | 0.625000 |
200 | No log | 0.333797 | 0.690000 | 0.690000 | 0.690000 | 0.690000 |
300 | No log | 0.403388 | 0.644000 | 0.644000 | 0.644000 | 0.644000 |
400 | No log | 0.296055 | 0.747000 | 0.747000 | 0.747000 | 0.747000 |
500 | 0.370100 | 0.318152 | 0.713000 | 0.713000 | 0.713000 | 0.713000 |
600 | 0.370100 | 0.301799 | 0.740000 | 0.740000 | 0.740000 | 0.740000 |
700 | 0.370100 | 0.295635 | 0.715000 | 0.715000 | 0.715000 | 0.715000 |
800 | 0.370100 | 0.268345 | 0.765000 | 0.765000 | 0.765000 | 0.765000 |
900 | 0.370100 | 0.282199 | 0.753000 | 0.753000 | 0.753000 | 0.753000 |
1000 | 0.294600 | 0.265310 | 0.788000 | 0.788000 | 0.788000 | 0.788000 |
1100 | 0.294600 | 0.268466 | 0.789000 | 0.789000 | 0.789000 | 0.789000 |
1200 | 0.294600 | 0.245028 | 0.804000 | 0.804000 | 0.804000 | 0.804000 |
1300 | 0.294600 | 0.260589 | 0.808000 | 0.808000 | 0.808000 | 0.808000 |
1400 | 0.294600 | 0.247587 | 0.807000 | 0.807000 | 0.807000 | 0.807000 |
1500 | 0.213700 | 0.242638 | 0.824000 | 0.824000 | 0.824000 | 0.824000 |
[1;30;43mStrumieniowane dane wyjściowe obcięte do 5000 ostatnich wierszy.[0m Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Saving model checkpoint to output/checkpoint-500 Configuration saved in output/checkpoint-500/config.json Model weights saved in output/checkpoint-500/pytorch_model.bin Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Saving model checkpoint to output/checkpoint-1000 Configuration saved in output/checkpoint-1000/config.json Model weights saved in output/checkpoint-1000/pytorch_model.bin Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Saving model checkpoint to output/checkpoint-1500 Configuration saved in output/checkpoint-1500/config.json Model weights saved in output/checkpoint-1500/pytorch_model.bin Training completed. Do not forget to share your model on huggingface.co/models =) Loading best model from output/checkpoint-1500 (score: 0.24263811111450195).
TrainOutput(global_step=1500, training_loss=0.29278671264648437, metrics={'train_runtime': 2154.3722, 'train_samples_per_second': 5.57, 'train_steps_per_second': 0.696, 'total_flos': 985291591680000.0, 'train_loss': 0.29278671264648437, 'epoch': 3.0})
result = trainer.predict(val_dataset)
***** Running Prediction ***** Num examples = 1000 Batch size = 8 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512
[125/125 00:37]
Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512 Initializing global attention on CLS token... Input ids are automatically padded from 128 to 512 to be a multiple of `config.attention_window`: 512
print(result.metrics)
{'test_loss': 0.24263811111450195, 'test_accuracy': 0.824, 'test_precision': 0.824, 'test_recall': 0.824, 'test_f1': 0.824, 'test_runtime': 38.0024, 'test_samples_per_second': 26.314, 'test_steps_per_second': 3.289}
filename='model_decoder'
trainer.save_model(filename)
Saving model checkpoint to model_decoder Configuration saved in model_decoder/config.json Model weights saved in model_decoder/pytorch_model.bin
Model typu encoder-decoder
!pip install sentencepiece==0.1.91
!pip install transformers
Requirement already satisfied: sentencepiece==0.1.91 in /usr/local/lib/python3.7/dist-packages (0.1.91) Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (4.16.2) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (6.0) Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.4.2) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.62.3) Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.11.0) Requirement already satisfied: tokenizers!=0.11.3,>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.11.5) Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.4.0) Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers) (0.0.47) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.3) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.21.5) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (3.10.0.2) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (3.0.7) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.7.0) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2021.10.8) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.1.0) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0) Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score
import torch
from transformers import TrainingArguments, Trainer
from transformers import EarlyStoppingCallback
import matplotlib.pyplot as plt
from transformers import T5Tokenizer, T5ForConditionalGeneration
from transformers import EvalPrediction
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)
"""
1 - 22 -> 1 klasa
23 - 31 -> 2 klasa
32 - 39 -> 3 klasa
40 - 48 -> 4 klasa
"""
def mapAgeToClass2(value: pd.DataFrame):
if(value['age'] <=22):
return 'class1'
elif(value['age'] > 22 and value['age'] <= 31):
return 'class2'
elif(value['age'] > 31 and value['age'] <= 39):
return 'class3'
else:
return 'class4'
data_path = 'drive/MyDrive/blogtext.csv'
data = pd.read_csv(data_path, error_bad_lines=False, engine='python')
data = data[:data_amount]
data['label'] = data.apply(lambda row: mapAgeToClass2(row), axis=1)
X = list(data['text'])
Y = list(data['label'])
if (torch.cuda.is_available()):
device = "cuda:0"
torch.cuda.empty_cache()
X_train, X_val, y_train, y_val = train_test_split(X, Y, test_size=0.2)
X_train_tokenized = tokenizer(X_train, padding=True, truncation=True, max_length=1024)
X_val_tokenized = tokenizer(X_val, padding=True, truncation=True, max_length=1024)
class Dataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels=None):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
if self.labels:
item["labels"] = torch.tensor(tokenizer(self.labels[idx])['input_ids'])
return item
def __len__(self):
return len(self.encodings["input_ids"])
train_dataset = Dataset(X_train_tokenized, y_train)
val_dataset = Dataset(X_val_tokenized, y_val)
def compute_metrics(pred):
labels_ids = pred.label_ids
pred_ids = pred.predictions
pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)
accuracy = sum([int(pred_str[i] == label_str[i]) for i in range(len(pred_str))]) / len(pred_str)
return {"accuracy": accuracy}
loading file https://huggingface.co/t5-small/resolve/main/spiece.model from cache at /root/.cache/huggingface/transformers/65fc04e21f45f61430aea0c4fedffac16a4d20d78b8e6601d8d996ebefefecd2.3b69006860e7b5d0a63ffdddc01ddcd6b7c318a6f4fd793596552c741734c62d loading file https://huggingface.co/t5-small/resolve/main/added_tokens.json from cache at None loading file https://huggingface.co/t5-small/resolve/main/special_tokens_map.json from cache at None loading file https://huggingface.co/t5-small/resolve/main/tokenizer_config.json from cache at None loading file https://huggingface.co/t5-small/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/06779097c78e12f47ef67ecb728810c2ae757ee0a9efe9390c6419783d99382d.8627f1bd5d270a9fd2e5a51c8bec3223896587cc3cfe13edeabb0992ab43c529 loading configuration file https://huggingface.co/t5-small/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fe501e8fd6425b8ec93df37767fcce78ce626e34cc5edc859c662350cf712e41.406701565c0afd9899544c1cb8b93185a76f00b31e5ce7f6e18bbaef02241985 Model config T5Config { "_name_or_path": "t5-small", "architectures": [ "T5WithLMHeadModel" ], "d_ff": 2048, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "relu", "initializer_factor": 1.0, "is_encoder_decoder": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "n_positions": 512, "num_decoder_layers": 6, "num_heads": 8, "num_layers": 6, "output_past": true, "pad_token_id": 0, "relative_attention_num_buckets": 32, "task_specific_params": { "summarization": { "early_stopping": true, "length_penalty": 2.0, "max_length": 200, "min_length": 30, "no_repeat_ngram_size": 3, "num_beams": 4, "prefix": "summarize: " }, "translation_en_to_de": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to German: " }, "translation_en_to_fr": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to French: " }, "translation_en_to_ro": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to Romanian: " } }, "transformers_version": "4.16.2", "use_cache": true, "vocab_size": 32128 } loading configuration file https://huggingface.co/t5-small/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fe501e8fd6425b8ec93df37767fcce78ce626e34cc5edc859c662350cf712e41.406701565c0afd9899544c1cb8b93185a76f00b31e5ce7f6e18bbaef02241985 Model config T5Config { "architectures": [ "T5WithLMHeadModel" ], "d_ff": 2048, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "relu", "initializer_factor": 1.0, "is_encoder_decoder": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "n_positions": 512, "num_decoder_layers": 6, "num_heads": 8, "num_layers": 6, "output_past": true, "pad_token_id": 0, "relative_attention_num_buckets": 32, "task_specific_params": { "summarization": { "early_stopping": true, "length_penalty": 2.0, "max_length": 200, "min_length": 30, "no_repeat_ngram_size": 3, "num_beams": 4, "prefix": "summarize: " }, "translation_en_to_de": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to German: " }, "translation_en_to_fr": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to French: " }, "translation_en_to_ro": { "early_stopping": true, "max_length": 300, "num_beams": 4, "prefix": "translate English to Romanian: " } }, "transformers_version": "4.16.2", "use_cache": true, "vocab_size": 32128 } loading weights file https://huggingface.co/t5-small/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/fee5a3a0ae379232608b6eed45d2d7a0d2966b9683728838412caccc41b4b0ed.ddacdc89ec88482db20c676f0861a336f3d0409f94748c209847b49529d73885 All model checkpoint weights were used when initializing T5ForConditionalGeneration. All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at t5-small. If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training. /usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2882: FutureWarning: The error_bad_lines argument has been deprecated and will be removed in a future version. exec(code_obj, self.user_global_ns, self.user_ns) Skipping line 16844: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 19370: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 31753: field larger than field limit (131072) Skipping line 33676: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 65976: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 116130: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 127080: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 154052: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 174200: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 189740: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 274245: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 275624: field larger than field limit (131072) Skipping line 302668: field larger than field limit (131072) Skipping line 307322: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 317541: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 333957: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 345859: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359845: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359846: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359847: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 359849: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 371329: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 384761: field larger than field limit (131072) Skipping line 389712: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 391820: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 398927: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 401260: field larger than field limit (131072) Skipping line 403079: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 454667: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 465419: field larger than field limit (131072) Skipping line 466152: field larger than field limit (131072) Skipping line 485309: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 529874: field larger than field limit (131072) Skipping line 552169: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 554628: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 560429: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 589855: field larger than field limit (131072) Skipping line 601507: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 614020: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 630106: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 632882: field larger than field limit (131072) Skipping line 637573: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instead Skipping line 658667: field larger than field limit (131072)
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
args = Seq2SeqTrainingArguments(
output_dir="output",
evaluation_strategy="steps",
eval_steps=50,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
seed=0,
load_best_model_at_end=True,
predict_with_generate=True
)
trainer = Seq2SeqTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
compute_metrics=compute_metrics
)
PyTorch: setting up devices The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
trainer.train()
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning FutureWarning, ***** Running training ***** Num examples = 4000 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 1500
[1500/1500 32:13, Epoch 3/3]
Step | Training Loss | Validation Loss | Accuracy |
---|---|---|---|
50 | No log | 2.898511 | 0.000000 |
100 | No log | 0.437433 | 0.601000 |
150 | No log | 0.301548 | 0.645000 |
200 | No log | 0.278892 | 0.668000 |
250 | No log | 0.270245 | 0.686000 |
300 | No log | 0.286085 | 0.663000 |
350 | No log | 0.262066 | 0.708000 |
400 | No log | 0.257251 | 0.697000 |
450 | No log | 0.252848 | 0.710000 |
500 | 1.057600 | 0.248504 | 0.701000 |
550 | 1.057600 | 0.251563 | 0.721000 |
600 | 1.057600 | 0.239508 | 0.731000 |
650 | 1.057600 | 0.235462 | 0.738000 |
700 | 1.057600 | 0.246152 | 0.734000 |
750 | 1.057600 | 0.237433 | 0.733000 |
800 | 1.057600 | 0.234127 | 0.752000 |
850 | 1.057600 | 0.224785 | 0.760000 |
900 | 1.057600 | 0.222618 | 0.747000 |
950 | 1.057600 | 0.217110 | 0.770000 |
1000 | 0.266600 | 0.214305 | 0.765000 |
1050 | 0.266600 | 0.213813 | 0.771000 |
1100 | 0.266600 | 0.212208 | 0.774000 |
1150 | 0.266600 | 0.211007 | 0.772000 |
1200 | 0.266600 | 0.210451 | 0.768000 |
1250 | 0.266600 | 0.210460 | 0.768000 |
1300 | 0.266600 | 0.214561 | 0.769000 |
1350 | 0.266600 | 0.210450 | 0.767000 |
1400 | 0.266600 | 0.209276 | 0.767000 |
1450 | 0.266600 | 0.210069 | 0.769000 |
1500 | 0.244700 | 0.210056 | 0.766000 |
***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Saving model checkpoint to output/checkpoint-500 Configuration saved in output/checkpoint-500/config.json Model weights saved in output/checkpoint-500/pytorch_model.bin ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Saving model checkpoint to output/checkpoint-1000 Configuration saved in output/checkpoint-1000/config.json Model weights saved in output/checkpoint-1000/pytorch_model.bin ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 ***** Running Evaluation ***** Num examples = 1000 Batch size = 8 Saving model checkpoint to output/checkpoint-1500 Configuration saved in output/checkpoint-1500/config.json Model weights saved in output/checkpoint-1500/pytorch_model.bin Training completed. Do not forget to share your model on huggingface.co/models =) Loading best model from output/checkpoint-1500 (score: 0.2100560963153839).
TrainOutput(global_step=1500, training_loss=0.5229549509684245, metrics={'train_runtime': 1934.1295, 'train_samples_per_second': 6.204, 'train_steps_per_second': 0.776, 'total_flos': 3248203235328000.0, 'train_loss': 0.5229549509684245, 'epoch': 3.0})
result = trainer.predict(val_dataset)
***** Running Prediction ***** Num examples = 1000 Batch size = 8
[125/125 00:33]
print(result.metrics)
{'test_loss': 0.2100560963153839, 'test_accuracy': 0.766, 'test_runtime': 45.1374, 'test_samples_per_second': 22.155, 'test_steps_per_second': 2.769}
filename='model_encoder_decoder'
trainer.save_model(filename)
Saving model checkpoint to model_encoder_decoder Configuration saved in model_encoder_decoder/config.json Model weights saved in model_encoder_decoder/pytorch_model.bin