217 KiB
217 KiB
! pip install datasets transformers torch scikit-learn evaluate
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting datasets Downloading datasets-2.9.0-py3-none-any.whl (462 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.8/462.8 KB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m [?25hCollecting transformers Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m62.5 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1+cu116) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (1.0.2) Collecting evaluate Downloading evaluate-0.4.0-py3-none-any.whl (81 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.4/81.4 KB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from datasets) (1.3.5) Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.8/dist-packages (from datasets) (4.64.1) Collecting xxhash Downloading xxhash-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (213 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m213.0/213.0 KB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: dill<0.3.7 in /usr/local/lib/python3.8/dist-packages (from datasets) (0.3.6) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from datasets) (6.0) Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets) (3.8.3) Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from datasets) (23.0) Collecting huggingface-hub<1.0.0,>=0.2.0 Downloading huggingface_hub-0.12.0-py3-none-any.whl (190 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.8/dist-packages (from datasets) (2023.1.0) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from datasets) (1.21.6) Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (9.0.0) Collecting multiprocess Downloading multiprocess-0.70.14-py38-none-any.whl (132 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.0/132.0 KB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m [?25hCollecting responses<0.19 Downloading responses-0.18.0-py3-none-any.whl (38 kB) Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (2.25.1) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers) (3.9.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (2022.6.2) Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m100.8 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.4.0) Requirement already satisfied: scipy>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.7.3) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.2.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (3.1.0) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.8.2) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (22.2.0) Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (2.1.1) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (6.0.4) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->datasets) (4.0.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->datasets) (1.24.3) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->datasets) (2.10) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests>=2.19.0->datasets) (2022.12.7) Collecting urllib3<1.27,>=1.21.1 Downloading urllib3-1.26.14-py2.py3-none-any.whl (140 kB) [2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m [?25hRequirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->datasets) (2022.7.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0) Installing collected packages: tokenizers, xxhash, urllib3, multiprocess, responses, huggingface-hub, transformers, datasets, evaluate Attempting uninstall: urllib3 Found existing installation: urllib3 1.24.3 Uninstalling urllib3-1.24.3: Successfully uninstalled urllib3-1.24.3 Successfully installed datasets-2.9.0 evaluate-0.4.0 huggingface-hub-0.12.0 multiprocess-0.70.14 responses-0.18.0 tokenizers-0.13.2 transformers-4.26.1 urllib3-1.26.14 xxhash-3.2.0
!wget 'https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/run_glue.py' -O 'run_glue.py'
!wget 'https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/roberta.py' -O 'roberta.py'
!wget 'https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/gpt2.py' -O 'gpt2.py'
--2023-02-12 20:36:08-- https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/run_glue.py Resolving git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)... 150.254.78.40 Connecting to git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)|150.254.78.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 30601 (30K) [text/plain] Saving to: ‘run_glue.py’ run_glue.py 100%[===================>] 29.88K 140KB/s in 0.2s 2023-02-12 20:36:11 (140 KB/s) - ‘run_glue.py’ saved [30601/30601] --2023-02-12 20:36:11-- https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/roberta.py Resolving git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)... 150.254.78.40 Connecting to git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)|150.254.78.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 12783 (12K) [text/plain] Saving to: ‘roberta.py’ roberta.py 100%[===================>] 12.48K 29.5KB/s in 0.4s 2023-02-12 20:36:13 (29.5 KB/s) - ‘roberta.py’ saved [12783/12783] --2023-02-12 20:36:13-- https://git.wmi.amu.edu.pl/s444465/projekt-glebokie/raw/branch/master/gpt2.py Resolving git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)... 150.254.78.40 Connecting to git.wmi.amu.edu.pl (git.wmi.amu.edu.pl)|150.254.78.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 7976 (7.8K) [text/plain] Saving to: ‘gpt2.py’ gpt2.py 100%[===================>] 7.79K --.-KB/s in 0s 2023-02-12 20:36:14 (1.23 GB/s) - ‘gpt2.py’ saved [7976/7976]
import json
from pathlib import Path
from typing import Dict, List
from datasets import load_dataset
loaded_data = load_dataset('emotion')
!mkdir -v -p data
train_path = Path('data/train.json')
valid_path = Path('data/valid.json')
test_path = Path('data/test.json')
data_train, data_valid, data_test = [], [], []
for source_data, dataset, max_size in [
(loaded_data['train'], data_train, None),
(loaded_data['test'], data_valid, None),
]:
for i, data in enumerate(source_data):
if max_size is not None and i >= max_size:
break
data_line = {
'label': int(data['label']),
'text': data['text'],
}
dataset.append(data_line)
print(f'Train: {len(data_train):6d}')
print(f'Valid: {len(data_valid):6d}')
data_class_1, data_class_2 = [], []
for data in data_valid:
label = data['label']
if label == 0:
data_class_1.append(data)
elif label == 1:
data_class_2.append(data)
print(f'Label 1: {len(data_class_1):6d}')
print(f'Label 2: {len(data_class_2):6d}')
size_half_class_1 = int(len(data_class_1) / 2)
size_half_class_2 = int(len(data_class_2) / 2)
data_valid = data_class_1[:size_half_class_1] + data_class_2[:size_half_class_2]
data_test = data_class_1[size_half_class_1:] + data_class_2[size_half_class_2:]
print(f'Valid: {len(data_valid):6d}')
print(f'Test : {len(data_test):6d}')
MAP_LABEL_TRANSLATION = {
0: 'sadness',
1: 'joy',
2: 'love',
3: 'anger',
4: 'fear',
5: 'surprise',
}
def save_as_translations(original_save_path: Path, data_to_save: List[Dict]) -> None:
file_name = 's2s-' + original_save_path.name
file_path = original_save_path.parent / file_name
print(f'Saving into: {file_path}')
with open(file_path, 'wt') as f_write:
for data_line in data_to_save:
label = data_line['label']
new_label = MAP_LABEL_TRANSLATION[label]
data_line['label'] = new_label
data_line_str = json.dumps(data_line)
f_write.write(f'{data_line_str}\n')
for file_path, data_to_save in [(train_path, data_train), (valid_path, data_valid), (test_path, data_test)]:
print(f'Saving into: {file_path}')
with open(file_path, 'wt') as f_write:
for data_line in data_to_save:
data_line_str = json.dumps(data_line)
f_write.write(f'{data_line_str}\n')
save_as_translations(file_path, data_to_save)
Downloading builder script: 0%| | 0.00/3.97k [00:00<?, ?B/s]
Downloading metadata: 0%| | 0.00/3.28k [00:00<?, ?B/s]
Downloading readme: 0%| | 0.00/8.78k [00:00<?, ?B/s]
WARNING:datasets.builder:No config specified, defaulting to: emotion/split
Downloading and preparing dataset emotion/split to /root/.cache/huggingface/datasets/emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd...
Downloading data files: 0%| | 0/3 [00:00<?, ?it/s]
Downloading data: 0%| | 0.00/592k [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/74.0k [00:00<?, ?B/s]
Downloading data: 0%| | 0.00/74.9k [00:00<?, ?B/s]
Extracting data files: 0%| | 0/3 [00:00<?, ?it/s]
Generating train split: 0%| | 0/16000 [00:00<?, ? examples/s]
Generating validation split: 0%| | 0/2000 [00:00<?, ? examples/s]
Generating test split: 0%| | 0/2000 [00:00<?, ? examples/s]
Dataset emotion downloaded and prepared to /root/.cache/huggingface/datasets/emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd. Subsequent calls will reuse this data.
0%| | 0/3 [00:00<?, ?it/s]
mkdir: created directory 'data' Train: 16000 Valid: 2000 Label 1: 581 Label 2: 695 Valid: 637 Test : 639 Saving into: data/train.json Saving into: data/s2s-train.json Saving into: data/valid.json Saving into: data/s2s-valid.json Saving into: data/test.json Saving into: data/s2s-test.json
!head -n 4500 data/train.json > data/train-5k.json
!tail -n 2500 data/train.json >> data/train-5k.json
!wc -l data/train-5k.json
7000 data/train-5k.json
from pathlib import Path
for file_name in ["train", "valid", "test", "s2s-train", "s2s-valid", "s2s-test"]:
print(f"=== {file_name} ===")
all_text = Path(f"data/{file_name}.json").read_text().split('\n')
text = all_text[:2500] + all_text[-2500:]
Path(f"data/{file_name}-5k.json").write_text("\n".join(text))
=== train === === valid === === test === === s2s-train === === s2s-valid === === s2s-test ===
import os
os.environ['TOKENIZERS_PARALLELISM'] = 'true'
!python run_glue.py \
--cache_dir .cache_training \
--model_name_or_path gpt2 \
--custom_model gpt2_hidden \
--freeze_weights \
--train_file data/s2s-train.json \
--validation_file data/s2s-valid.json \
--test_file data/s2s-test.json \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 24 \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 5 \
--output_dir out/imdb-5k/gpt2
2023-02-12 20:36:36.029528: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-12 20:36:36.925834: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-12 20:36:36.925931: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-12 20:36:36.925949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/imdb-5k/gpt2/runs/Feb12_20-36-39_64266b139b25, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=5.0, optim=adamw_hf, optim_args=None, output_dir=out/imdb-5k/gpt2, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=24, per_device_train_batch_size=24, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/imdb-5k/gpt2, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) INFO:__main__:load a local file for train: data/s2s-train.json INFO:__main__:load a local file for validation: data/s2s-valid.json INFO:__main__:load a local file for test: data/s2s-test.json WARNING:datasets.builder:Using custom data configuration default-2f4908162fef247b INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 12075.73it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 2299.51it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 1048.23it/s] Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 121kB/s] [INFO|configuration_utils.py:660] 2023-02-12 20:36:43,514 >> loading configuration file config.json from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:712] 2023-02-12 20:36:43,515 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3", "4": "LABEL_4", "5": "LABEL_5" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3, "LABEL_4": 4, "LABEL_5": 5 }, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.26.1", "use_cache": true, "vocab_size": 50257 } [INFO|tokenization_auto.py:458] 2023-02-12 20:36:44,424 >> Could not locate the tokenizer configuration file, will try to use the model config instead. [INFO|configuration_utils.py:660] 2023-02-12 20:36:45,322 >> loading configuration file config.json from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:712] 2023-02-12 20:36:45,323 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.26.1", "use_cache": true, "vocab_size": 50257 } Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:01<00:00, 940kB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:01<00:00, 413kB/s] Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:01<00:00, 1.01MB/s] [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,333 >> loading file vocab.json from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,333 >> loading file merges.txt from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,334 >> loading file tokenizer.json from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,334 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,334 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1802] 2023-02-12 20:36:57,334 >> loading file tokenizer_config.json from cache at None [INFO|configuration_utils.py:660] 2023-02-12 20:36:57,334 >> loading configuration file config.json from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:712] 2023-02-12 20:36:57,335 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.26.1", "use_cache": true, "vocab_size": 50257 } INFO:__main__:Using hidden states in model: True INFO:__main__:Using implementation from class: GPT2ForSequenceClassificationCustom Downloading (…)"pytorch_model.bin";: 100% 548M/548M [00:02<00:00, 261MB/s] [INFO|modeling_utils.py:2275] 2023-02-12 20:37:00,438 >> loading weights file pytorch_model.bin from cache at .cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin [INFO|modeling_utils.py:2857] 2023-02-12 20:37:03,142 >> All model checkpoint weights were used when initializing GPT2ForSequenceClassificationCustom. [WARNING|modeling_utils.py:2859] 2023-02-12 20:37:03,142 >> Some weights of GPT2ForSequenceClassificationCustom were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.dense_2.bias', 'score.out_proj.weight', 'score.dense_1_hidden.bias', 'score.dense_1_input.weight', 'score.dense_1_hidden.weight', 'score.dense_1_input.bias', 'score.dense_2.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. INFO:__main__:Freezing encoder weights INFO:__main__:Freezing layer 1 INFO:__main__:Freezing layer 2 INFO:__main__:Freezing layer 3 INFO:__main__:Freezing layer 4 INFO:__main__:Freezing layer 5 INFO:__main__:Freezing layer 6 INFO:__main__:Freezing layer 7 INFO:__main__:Freezing layer 8 INFO:__main__:Freezing layer 9 INFO:__main__:Freezing layer 10 INFO:__main__:Freezing layer 11 INFO:__main__:Freezing layer 12 INFO:__main__:Freezing layer 13 INFO:__main__:Freezing layer 14 INFO:__main__:Freezing layer 15 INFO:__main__:Freezing layer 16 INFO:__main__:Freezing layer 17 INFO:__main__:Freezing layer 18 INFO:__main__:Freezing layer 19 INFO:__main__:Freezing layer 20 INFO:__main__:Freezing layer 21 INFO:__main__:Freezing layer 22 INFO:__main__:Freezing layer 23 INFO:__main__:Freezing layer 24 INFO:__main__:Freezing layer 25 INFO:__main__:Freezing layer 26 INFO:__main__:Freezing layer 27 INFO:__main__:Freezing layer 28 INFO:__main__:Freezing layer 29 INFO:__main__:Freezing layer 30 INFO:__main__:Freezing layer 31 INFO:__main__:Freezing layer 32 INFO:__main__:Freezing layer 33 INFO:__main__:Freezing layer 34 INFO:__main__:Freezing layer 35 INFO:__main__:Freezing layer 36 INFO:__main__:Freezing layer 37 INFO:__main__:Freezing layer 38 INFO:__main__:Freezing layer 39 INFO:__main__:Freezing layer 40 INFO:__main__:Ignoring layer 41 INFO:__main__:Ignoring layer 42 INFO:__main__:Ignoring layer 43 INFO:__main__:Ignoring layer 44 INFO:__main__:Ignoring layer 45 INFO:__main__:Ignoring layer 46 INFO:__main__:Ignoring layer 47 INFO:__main__:Ignoring layer 48 INFO:__main__:Ignoring layer 49 INFO:__main__:Ignoring layer 50 INFO:__main__:Ignoring layer 51 INFO:__main__:Ignoring layer 52 INFO:__main__:Ignoring layer 53 INFO:__main__:Ignoring layer 54 INFO:__main__:Ignoring layer 55 INFO:__main__:Ignoring layer 56 INFO:__main__:Ignoring layer 57 INFO:__main__:Ignoring layer 58 INFO:__main__:Ignoring layer 59 INFO:__main__:Ignoring layer 60 INFO:__main__:Ignoring layer 61 INFO:__main__:Ignoring layer 62 INFO:__main__:Ignoring layer 63 INFO:__main__:Ignoring layer 64 INFO:__main__:Ignoring layer 65 INFO:__main__:Ignoring layer 66 INFO:__main__:Ignoring layer 67 INFO:__main__:Ignoring layer 68 INFO:__main__:Ignoring layer 69 INFO:__main__:Ignoring layer 70 INFO:__main__:Ignoring layer 71 INFO:__main__:Ignoring layer 72 INFO:__main__:Ignoring layer 73 INFO:__main__:Ignoring layer 74 INFO:__main__:Ignoring layer 75 INFO:__main__:Ignoring layer 76 INFO:__main__:Ignoring layer 77 INFO:__main__:Ignoring layer 78 INFO:__main__:Ignoring layer 79 INFO:__main__:Ignoring layer 80 INFO:__main__:Ignoring layer 81 INFO:__main__:Ignoring layer 82 INFO:__main__:Ignoring layer 83 INFO:__main__:Ignoring layer 84 INFO:__main__:Ignoring layer 85 INFO:__main__:Ignoring layer 86 INFO:__main__:Ignoring layer 87 INFO:__main__:Ignoring layer 88 INFO:__main__:Ignoring layer 89 INFO:__main__:Ignoring layer 90 INFO:__main__:Ignoring layer 91 INFO:__main__:Ignoring layer 92 INFO:__main__:Ignoring layer 93 INFO:__main__:Ignoring layer 94 INFO:__main__:Ignoring layer 95 INFO:__main__:Ignoring layer 96 INFO:__main__:Ignoring layer 97 INFO:__main__:Ignoring layer 98 INFO:__main__:Ignoring layer 99 INFO:__main__:Ignoring layer 100 INFO:__main__:Ignoring layer 101 INFO:__main__:Ignoring layer 102 INFO:__main__:Ignoring layer 103 INFO:__main__:Ignoring layer 104 INFO:__main__:Ignoring layer 105 INFO:__main__:Ignoring layer 106 INFO:__main__:Ignoring layer 107 INFO:__main__:Ignoring layer 108 INFO:__main__:Ignoring layer 109 INFO:__main__:Ignoring layer 110 INFO:__main__:Ignoring layer 111 INFO:__main__:Ignoring layer 112 INFO:__main__:Ignoring layer 113 INFO:__main__:Ignoring layer 114 INFO:__main__:Ignoring layer 115 INFO:__main__:Ignoring layer 116 INFO:__main__:Ignoring layer 117 INFO:__main__:Ignoring layer 118 INFO:__main__:Ignoring layer 119 INFO:__main__:Ignoring layer 120 INFO:__main__:Ignoring layer 121 INFO:__main__:Ignoring layer 122 INFO:__main__:Ignoring layer 123 INFO:__main__:Ignoring layer 124 INFO:__main__:Ignoring layer 125 INFO:__main__:Ignoring layer 126 INFO:__main__:Ignoring layer 127 INFO:__main__:Ignoring layer 128 INFO:__main__:Ignoring layer 129 INFO:__main__:Ignoring layer 130 INFO:__main__:Ignoring layer 131 INFO:__main__:Ignoring layer 132 INFO:__main__:Ignoring layer 133 INFO:__main__:Ignoring layer 134 INFO:__main__:Ignoring layer 135 INFO:__main__:Ignoring layer 136 INFO:__main__:Ignoring layer 137 INFO:__main__:Ignoring layer 138 INFO:__main__:Ignoring layer 139 INFO:__main__:Ignoring layer 140 INFO:__main__:Ignoring layer 141 INFO:__main__:Ignoring layer 142 INFO:__main__:Ignoring layer 143 INFO:__main__:Ignoring layer 144 INFO:__main__:Ignoring layer 145 INFO:__main__:Ignoring layer 146 INFO:__main__:Ignoring layer 147 INFO:__main__:Ignoring layer 148 INFO:__main__:Ignoring layer 149 INFO:__main__:Ignoring layer 150 INFO:__main__:Ignoring layer 151 INFO:__main__:Ignoring layer 152 INFO:__main__:Ignoring layer 153 INFO:__main__:Ignoring layer 154 INFO:__main__:Ignoring layer 155 [ERROR|tokenization_utils_base.py:1042] 2023-02-12 20:37:03,155 >> Using pad_token, but it is not set yet. INFO:__main__:Set PAD token to EOS: <|endoftext|> Running tokenizer on dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-90ffc998baae6362.arrow Running tokenizer on dataset: 100% 16/16 [00:01<00:00, 10.56ba/s] Running tokenizer on dataset: 0% 0/1 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-63170cac390fffb1.arrow Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 17.73ba/s] Running tokenizer on dataset: 0% 0/1 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/.cache_training/json/default-2f4908162fef247b/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9f72bdf1d16c26a9.arrow Running tokenizer on dataset: 100% 1/1 [00:00<00:00, 17.93ba/s] INFO:__main__:Sample 10476 of the training set: {'label': 4, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [72, 466, 1064, 649, 2460, 1312, 285, 1016, 284, 1949, 3131, 1327, 284, 787, 606, 2652, 290, 611, 1312, 5409, 326, 1312, 836, 256, 765, 284, 1254, 5938, 757, 290, 655, 6594, 503, 262, 938, 614, 286, 1524, 319, 616, 898, 1312, 285, 1016, 284, 423, 284, 1949, 3131, 1327, 407, 284, 1337, 644, 661, 892, 286, 502, 852, 257, 300, 14491, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 1824 of the training set: {'label': 2, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [72, 1965, 606, 284, 4654, 502, 287, 4441, 257, 995, 810, 477, 614, 1468, 4813, 714, 1663, 510, 4203, 17836, 290, 3665, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 409 of the training set: {'label': 3, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [72, 1254, 618, 345, 389, 257, 18088, 1048, 345, 4729, 584, 18088, 661, 656, 534, 1204, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. Downloading builder script: 100% 4.20k/4.20k [00:00<00:00, 3.65MB/s] [INFO|trainer.py:710] 2023-02-12 20:37:12,738 >> The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1650] 2023-02-12 20:37:12,747 >> ***** Running training ***** [INFO|trainer.py:1651] 2023-02-12 20:37:12,747 >> Num examples = 16000 [INFO|trainer.py:1652] 2023-02-12 20:37:12,748 >> Num Epochs = 5 [INFO|trainer.py:1653] 2023-02-12 20:37:12,748 >> Instantaneous batch size per device = 24 [INFO|trainer.py:1654] 2023-02-12 20:37:12,748 >> Total train batch size (w. parallel, distributed & accumulation) = 24 [INFO|trainer.py:1655] 2023-02-12 20:37:12,748 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1656] 2023-02-12 20:37:12,748 >> Total optimization steps = 3335 [INFO|trainer.py:1657] 2023-02-12 20:37:12,749 >> Number of trainable parameters = 68517888 {'loss': 1.0593, 'learning_rate': 1.7001499250374815e-05, 'epoch': 0.75} 15% 500/3335 [03:54<22:04, 2.14it/s][INFO|trainer.py:2709] 2023-02-12 20:41:07,509 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-500 [INFO|configuration_utils.py:453] 2023-02-12 20:41:07,510 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-500/config.json [INFO|modeling_utils.py:1704] 2023-02-12 20:41:09,283 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 20:41:09,284 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 20:41:09,284 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-500/special_tokens_map.json {'loss': 0.3829, 'learning_rate': 1.4002998500749626e-05, 'epoch': 1.5} 30% 1000/3335 [07:52<18:11, 2.14it/s][INFO|trainer.py:2709] 2023-02-12 20:45:05,515 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-1000 [INFO|configuration_utils.py:453] 2023-02-12 20:45:05,516 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-1000/config.json [INFO|modeling_utils.py:1704] 2023-02-12 20:45:07,205 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 20:45:07,205 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 20:45:07,205 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-1000/special_tokens_map.json {'loss': 0.256, 'learning_rate': 1.100449775112444e-05, 'epoch': 2.25} 45% 1500/3335 [11:50<14:19, 2.13it/s][INFO|trainer.py:2709] 2023-02-12 20:49:03,661 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-1500 [INFO|configuration_utils.py:453] 2023-02-12 20:49:03,662 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-1500/config.json [INFO|modeling_utils.py:1704] 2023-02-12 20:49:05,330 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-1500/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 20:49:05,331 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-1500/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 20:49:05,331 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-1500/special_tokens_map.json {'loss': 0.2101, 'learning_rate': 8.005997001499251e-06, 'epoch': 3.0} 60% 2000/3335 [15:49<10:24, 2.14it/s][INFO|trainer.py:2709] 2023-02-12 20:53:01,805 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-2000 [INFO|configuration_utils.py:453] 2023-02-12 20:53:01,806 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-2000/config.json [INFO|modeling_utils.py:1704] 2023-02-12 20:53:03,476 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 20:53:03,476 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 20:53:03,476 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-2000/special_tokens_map.json {'loss': 0.17, 'learning_rate': 5.0074962518740634e-06, 'epoch': 3.75} 75% 2500/3335 [19:47<06:30, 2.14it/s][INFO|trainer.py:2709] 2023-02-12 20:56:59,823 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-2500 [INFO|configuration_utils.py:453] 2023-02-12 20:56:59,824 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-2500/config.json [INFO|modeling_utils.py:1704] 2023-02-12 20:57:01,504 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-2500/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 20:57:01,505 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-2500/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 20:57:01,505 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-2500/special_tokens_map.json {'loss': 0.1569, 'learning_rate': 2.008995502248876e-06, 'epoch': 4.5} 90% 3000/3335 [23:44<02:36, 2.14it/s][INFO|trainer.py:2709] 2023-02-12 21:00:57,746 >> Saving model checkpoint to out/imdb-5k/gpt2/checkpoint-3000 [INFO|configuration_utils.py:453] 2023-02-12 21:00:57,747 >> Configuration saved in out/imdb-5k/gpt2/checkpoint-3000/config.json [INFO|modeling_utils.py:1704] 2023-02-12 21:00:59,386 >> Model weights saved in out/imdb-5k/gpt2/checkpoint-3000/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 21:00:59,387 >> tokenizer config file saved in out/imdb-5k/gpt2/checkpoint-3000/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 21:00:59,387 >> Special tokens file saved in out/imdb-5k/gpt2/checkpoint-3000/special_tokens_map.json 100% 3335/3335 [26:25<00:00, 2.36it/s][INFO|trainer.py:1901] 2023-02-12 21:03:38,497 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 1585.7622, 'train_samples_per_second': 50.449, 'train_steps_per_second': 2.103, 'train_loss': 0.35007504373118614, 'epoch': 5.0} 100% 3335/3335 [26:25<00:00, 2.10it/s] [INFO|trainer.py:2709] 2023-02-12 21:03:38,514 >> Saving model checkpoint to out/imdb-5k/gpt2 [INFO|configuration_utils.py:453] 2023-02-12 21:03:38,515 >> Configuration saved in out/imdb-5k/gpt2/config.json [INFO|modeling_utils.py:1704] 2023-02-12 21:03:40,135 >> Model weights saved in out/imdb-5k/gpt2/pytorch_model.bin [INFO|tokenization_utils_base.py:2160] 2023-02-12 21:03:40,136 >> tokenizer config file saved in out/imdb-5k/gpt2/tokenizer_config.json [INFO|tokenization_utils_base.py:2167] 2023-02-12 21:03:40,136 >> Special tokens file saved in out/imdb-5k/gpt2/special_tokens_map.json ***** train metrics ***** epoch = 5.0 train_loss = 0.3501 train_runtime = 0:26:25.76 train_samples = 16000 train_samples_per_second = 50.449 train_steps_per_second = 2.103 INFO:__main__:*** Evaluate *** [INFO|trainer.py:710] 2023-02-12 21:03:40,251 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2964] 2023-02-12 21:03:40,336 >> ***** Running Evaluation ***** [INFO|trainer.py:2966] 2023-02-12 21:03:40,336 >> Num examples = 637 [INFO|trainer.py:2969] 2023-02-12 21:03:40,337 >> Batch size = 24 100% 27/27 [00:04<00:00, 5.63it/s] ***** eval metrics ***** epoch = 5.0 eval_accuracy = 0.9498 eval_loss = 0.1357 eval_runtime = 0:00:05.07 eval_samples = 637 eval_samples_per_second = 125.597 eval_steps_per_second = 5.324 INFO:__main__:*** Predict *** [INFO|trainer.py:710] 2023-02-12 21:03:45,413 >> The following columns in the test set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2964] 2023-02-12 21:03:45,415 >> ***** Running Prediction ***** [INFO|trainer.py:2966] 2023-02-12 21:03:45,415 >> Num examples = 639 [INFO|trainer.py:2969] 2023-02-12 21:03:45,415 >> Batch size = 24 100% 27/27 [00:04<00:00, 5.62it/s] INFO:__main__:***** Predict results None ***** [INFO|modelcard.py:449] 2023-02-12 21:03:51,543 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9497645497322083}]}