431 KiB
431 KiB
Uczenie Głębokie - projekt
W projekcie wykorzystano dataset emotion, zawierający wpisy nacechowane określonymi emocjami.
Labels:
- 0 - sadness
- 1 - joy
- 2 - love
- 3 - anger
- 4 - fear
- 5 - surprise
REQUIREMENTS
!pip3 install transformers scikit-learn accelerate evaluate datasets torch sentencepiece torchvision
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Requirement already satisfied: transformers in /usr/local/lib/python3.8/dist-packages (4.23.1) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (1.2.1) Requirement already satisfied: accelerate in /usr/local/lib/python3.8/dist-packages (0.16.0) Requirement already satisfied: evaluate in /usr/local/lib/python3.8/dist-packages (0.4.0) Requirement already satisfied: datasets in /usr/local/lib/python3.8/dist-packages (2.9.0) Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1) Requirement already satisfied: sentencepiece in /usr/local/lib/python3.8/dist-packages (0.1.97) Requirement already satisfied: torchvision in /usr/local/lib/python3.8/dist-packages (0.14.1+cu116) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers) (3.9.0) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers) (4.64.1) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (1.21.6) Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.12.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (2022.6.2) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.13.2) Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers) (2.25.1) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (6.0) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (23.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (3.1.0) Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.2.0) Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.7.3) Requirement already satisfied: psutil in /usr/local/lib/python3.8/dist-packages (from accelerate) (5.4.8) Requirement already satisfied: dill in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.3.6) Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.18.0) Requirement already satisfied: fsspec[http]>=2021.05.0 in /usr/local/lib/python3.8/dist-packages (from evaluate) (2023.1.0) Requirement already satisfied: xxhash in /usr/local/lib/python3.8/dist-packages (from evaluate) (3.2.0) Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from evaluate) (1.3.5) Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.70.14) Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (9.0.0) Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets) (3.8.3) Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /usr/local/lib/python3.8/dist-packages (from torch) (11.10.3.66) Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /usr/local/lib/python3.8/dist-packages (from torch) (11.7.99) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.4.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /usr/local/lib/python3.8/dist-packages (from torch) (11.7.99) Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /usr/local/lib/python3.8/dist-packages (from torch) (8.5.0.96) Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.38.4) Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (57.4.0) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.8/dist-packages (from torchvision) (7.1.2) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.8.2) Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (6.0.4) Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (22.2.0) Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (2.1.1) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2022.12.7) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (1.26.14) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2.10) Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (4.0.0) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->evaluate) (2.8.2) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->evaluate) (2022.7.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->evaluate) (1.15.0)
import os
import json
from pathlib import Path
from typing import Dict, List
from datasets import load_dataset
import torch
import pandas as pd
os.environ['TOKENIZERS_PARALLELISM'] = 'true'
DATA PREP
!mkdir -p data
!python data_prep.py
No config specified, defaulting to: emotion/split Found cached dataset emotion (/root/.cache/huggingface/datasets/emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd) 0% 0/3 [00:00<?, ?it/s] 100% 3/3 [00:00<00:00, 182.77it/s] Saving into: data/train.json Saving into: data/s2s-train.json Saving into: data/valid.json Saving into: data/s2s-valid.json Saving into: data/test.json Saving into: data/s2s-test.json
!head data/train.json
{"label": 0, "text": "i didnt feel humiliated"} {"label": 0, "text": "i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake"} {"label": 3, "text": "im grabbing a minute to post i feel greedy wrong"} {"label": 2, "text": "i am ever feeling nostalgic about the fireplace i will know that it is still on the property"} {"label": 3, "text": "i am feeling grouchy"} {"label": 0, "text": "ive been feeling a little burdened lately wasnt sure why that was"} {"label": 5, "text": "ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny"} {"label": 4, "text": "i feel as confused about life as a teenager or as jaded as a year old man"} {"label": 1, "text": "i have been with petronas for years i feel that petronas has performed well and made a huge profit"} {"label": 2, "text": "i feel romantic too"}
!head data/s2s-train.json
{"label": "sadness", "text": "i didnt feel humiliated"} {"label": "sadness", "text": "i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake"} {"label": "anger", "text": "im grabbing a minute to post i feel greedy wrong"} {"label": "love", "text": "i am ever feeling nostalgic about the fireplace i will know that it is still on the property"} {"label": "anger", "text": "i am feeling grouchy"} {"label": "sadness", "text": "ive been feeling a little burdened lately wasnt sure why that was"} {"label": "surprise", "text": "ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny"} {"label": "fear", "text": "i feel as confused about life as a teenager or as jaded as a year old man"} {"label": "joy", "text": "i have been with petronas for years i feel that petronas has performed well and made a huge profit"} {"label": "love", "text": "i feel romantic too"}
!wc -l data/*
2000 data/s2s-test.json 16000 data/s2s-train.json 2000 data/s2s-valid.json 2000 data/test.json 16000 data/train.json 2000 data/valid.json 40000 total
ROBERTA
- full data
- model
roberta-base
- sequnece length: 128
- training epoch: 1
!python run_glue.py \
--cache_dir roberta_training_cache \
--model_name_or_path roberta-base \
--train_file data/train.json \
--validation_file data/valid.json \
--test_file data/test.json \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 24 \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--output_dir out/emotion/roberta \
--overwrite_output_dir
2023-02-14 21:44:57.299984: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-14 21:44:57.452345: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-02-14 21:44:58.236913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:44:58.237017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:44:58.237058: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/emotion/roberta/runs/Feb14_21-45-00_fc0011e45a00, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, output_dir=out/emotion/roberta, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=24, per_device_train_batch_size=24, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/emotion/roberta, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) INFO:__main__:load a local file for train: data/train.json INFO:__main__:load a local file for validation: data/valid.json INFO:__main__:load a local file for test: data/test.json WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 11491.24it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 1882.54it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 573.49it/s] Downloading (…)lve/main/config.json: 100% 481/481 [00:00<00:00, 83.8kB/s] [INFO|configuration_utils.py:653] 2023-02-14 21:45:01,575 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:45:01,576 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3", "4": "LABEL_4", "5": "LABEL_5" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3, "LABEL_4": 4, "LABEL_5": 5 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } [INFO|tokenization_auto.py:418] 2023-02-14 21:45:01,670 >> Could not locate the tokenizer configuration file, will try to use the model config instead. [INFO|configuration_utils.py:653] 2023-02-14 21:45:01,762 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:45:01,763 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 9.36MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.95MB/s] Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 11.7MB/s] [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,975 >> loading file vocab.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file merges.txt from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file tokenizer.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file tokenizer_config.json from cache at None [INFO|configuration_utils.py:653] 2023-02-14 21:45:02,976 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:45:02,977 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } INFO:__main__:Using implementation from class: AutoModelForSequenceClassification Downloading (…)"pytorch_model.bin";: 100% 501M/501M [00:04<00:00, 105MB/s] [INFO|modeling_utils.py:2156] 2023-02-14 21:45:08,072 >> loading weights file pytorch_model.bin from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin [WARNING|modeling_utils.py:2596] 2023-02-14 21:45:09,415 >> Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.bias', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight'] - This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2608] 2023-02-14 21:45:09,415 >> Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Frozen layers: [('roberta.encoder.layer.0.attention.self.query.weight', False), ('roberta.encoder.layer.0.attention.self.query.bias', False), ('roberta.encoder.layer.0.attention.self.key.weight', False), ('roberta.encoder.layer.0.attention.self.key.bias', False), ('roberta.encoder.layer.0.attention.self.value.weight', False), ('roberta.encoder.layer.0.attention.self.value.bias', False), ('roberta.encoder.layer.0.attention.output.dense.weight', False), ('roberta.encoder.layer.0.attention.output.dense.bias', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.0.intermediate.dense.weight', False), ('roberta.encoder.layer.0.intermediate.dense.bias', False), ('roberta.encoder.layer.0.output.dense.weight', False), ('roberta.encoder.layer.0.output.dense.bias', False), ('roberta.encoder.layer.0.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.attention.self.query.weight', False), ('roberta.encoder.layer.2.attention.self.query.bias', False), ('roberta.encoder.layer.2.attention.self.key.weight', False), ('roberta.encoder.layer.2.attention.self.key.bias', False), ('roberta.encoder.layer.2.attention.self.value.weight', False), ('roberta.encoder.layer.2.attention.self.value.bias', False), ('roberta.encoder.layer.2.attention.output.dense.weight', False), ('roberta.encoder.layer.2.attention.output.dense.bias', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.intermediate.dense.weight', False), ('roberta.encoder.layer.2.intermediate.dense.bias', False), ('roberta.encoder.layer.2.output.dense.weight', False), ('roberta.encoder.layer.2.output.dense.bias', False), ('roberta.encoder.layer.2.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.attention.self.query.weight', False), ('roberta.encoder.layer.4.attention.self.query.bias', False), ('roberta.encoder.layer.4.attention.self.key.weight', False), ('roberta.encoder.layer.4.attention.self.key.bias', False), ('roberta.encoder.layer.4.attention.self.value.weight', False), ('roberta.encoder.layer.4.attention.self.value.bias', False), ('roberta.encoder.layer.4.attention.output.dense.weight', False), ('roberta.encoder.layer.4.attention.output.dense.bias', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.intermediate.dense.weight', False), ('roberta.encoder.layer.4.intermediate.dense.bias', False), ('roberta.encoder.layer.4.output.dense.weight', False), ('roberta.encoder.layer.4.output.dense.bias', False), ('roberta.encoder.layer.4.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.attention.self.query.weight', False), ('roberta.encoder.layer.6.attention.self.query.bias', False), ('roberta.encoder.layer.6.attention.self.key.weight', False), ('roberta.encoder.layer.6.attention.self.key.bias', False), ('roberta.encoder.layer.6.attention.self.value.weight', False), ('roberta.encoder.layer.6.attention.self.value.bias', False), ('roberta.encoder.layer.6.attention.output.dense.weight', False), ('roberta.encoder.layer.6.attention.output.dense.bias', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.intermediate.dense.weight', False), ('roberta.encoder.layer.6.intermediate.dense.bias', False), ('roberta.encoder.layer.6.output.dense.weight', False), ('roberta.encoder.layer.6.output.dense.bias', False), ('roberta.encoder.layer.6.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.attention.self.query.weight', False), ('roberta.encoder.layer.8.attention.self.query.bias', False), ('roberta.encoder.layer.8.attention.self.key.weight', False), ('roberta.encoder.layer.8.attention.self.key.bias', False), ('roberta.encoder.layer.8.attention.self.value.weight', False), ('roberta.encoder.layer.8.attention.self.value.bias', False), ('roberta.encoder.layer.8.attention.output.dense.weight', False), ('roberta.encoder.layer.8.attention.output.dense.bias', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.intermediate.dense.weight', False), ('roberta.encoder.layer.8.intermediate.dense.bias', False), ('roberta.encoder.layer.8.output.dense.weight', False), ('roberta.encoder.layer.8.output.dense.bias', False), ('roberta.encoder.layer.8.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.attention.self.query.weight', False), ('roberta.encoder.layer.10.attention.self.query.bias', False), ('roberta.encoder.layer.10.attention.self.key.weight', False), ('roberta.encoder.layer.10.attention.self.key.bias', False), ('roberta.encoder.layer.10.attention.self.value.weight', False), ('roberta.encoder.layer.10.attention.self.value.bias', False), ('roberta.encoder.layer.10.attention.output.dense.weight', False), ('roberta.encoder.layer.10.attention.output.dense.bias', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.intermediate.dense.weight', False), ('roberta.encoder.layer.10.intermediate.dense.bias', False), ('roberta.encoder.layer.10.output.dense.weight', False), ('roberta.encoder.layer.10.output.dense.bias', False), ('roberta.encoder.layer.10.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.output.LayerNorm.bias', False)] Running tokenizer on dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 20.66ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 7.58ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.81ba/s] INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [0, 118, 109, 465, 92, 964, 939, 475, 164, 7, 860, 1823, 543, 7, 146, 106, 1095, 8, 114, 939, 2845, 14, 939, 218, 326, 236, 7, 619, 2581, 456, 8, 95, 3068, 66, 5, 94, 76, 9, 334, 15, 127, 308, 939, 475, 164, 7, 33, 7, 860, 1823, 543, 45, 7, 575, 99, 82, 206, 9, 162, 145, 10, 784, 9604, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [0, 118, 553, 106, 7, 1962, 162, 11, 2351, 10, 232, 147, 70, 76, 793, 1972, 115, 1733, 62, 2157, 7917, 8, 2247, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [0, 118, 619, 77, 47, 32, 10, 10837, 621, 47, 5696, 97, 10837, 82, 88, 110, 301, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. [INFO|trainer.py:725] 2023-02-14 21:45:13,102 >> The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message. /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1607] 2023-02-14 21:45:13,109 >> ***** Running training ***** [INFO|trainer.py:1608] 2023-02-14 21:45:13,109 >> Num examples = 16000 [INFO|trainer.py:1609] 2023-02-14 21:45:13,109 >> Num Epochs = 1 [INFO|trainer.py:1610] 2023-02-14 21:45:13,109 >> Instantaneous batch size per device = 24 [INFO|trainer.py:1611] 2023-02-14 21:45:13,109 >> Total train batch size (w. parallel, distributed & accumulation) = 24 [INFO|trainer.py:1612] 2023-02-14 21:45:13,109 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1613] 2023-02-14 21:45:13,109 >> Total optimization steps = 667 {'loss': 0.8083, 'learning_rate': 5.0074962518740634e-06, 'epoch': 0.75} 75% 500/667 [00:58<00:19, 8.76it/s][INFO|trainer.py:2656] 2023-02-14 21:46:11,148 >> Saving model checkpoint to out/emotion/roberta/checkpoint-500 [INFO|configuration_utils.py:447] 2023-02-14 21:46:11,149 >> Configuration saved in out/emotion/roberta/checkpoint-500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:46:12,047 >> Model weights saved in out/emotion/roberta/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:46:12,048 >> tokenizer config file saved in out/emotion/roberta/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:46:12,048 >> Special tokens file saved in out/emotion/roberta/checkpoint-500/special_tokens_map.json 100% 666/667 [01:19<00:00, 8.78it/s][INFO|trainer.py:1852] 2023-02-14 21:46:32,443 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 79.3341, 'train_samples_per_second': 201.679, 'train_steps_per_second': 8.407, 'train_loss': 0.7161429089227359, 'epoch': 1.0} 100% 667/667 [01:19<00:00, 8.41it/s] [INFO|trainer.py:2656] 2023-02-14 21:46:32,445 >> Saving model checkpoint to out/emotion/roberta [INFO|configuration_utils.py:447] 2023-02-14 21:46:32,446 >> Configuration saved in out/emotion/roberta/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:46:33,422 >> Model weights saved in out/emotion/roberta/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:46:33,422 >> tokenizer config file saved in out/emotion/roberta/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:46:33,423 >> Special tokens file saved in out/emotion/roberta/special_tokens_map.json ***** train metrics ***** epoch = 1.0 train_loss = 0.7161 train_runtime = 0:01:19.33 train_samples = 16000 train_samples_per_second = 201.679 train_steps_per_second = 8.407 INFO:__main__:*** Evaluate *** [INFO|trainer.py:725] 2023-02-14 21:46:33,524 >> The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:46:33,526 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:46:33,526 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:46:33,526 >> Batch size = 24 100% 84/84 [00:03<00:00, 23.66it/s] ***** eval metrics ***** epoch = 1.0 eval_accuracy = 0.889 eval_loss = 0.3302 eval_runtime = 0:00:03.59 eval_samples = 2000 eval_samples_per_second = 556.411 eval_steps_per_second = 23.369 INFO:__main__:*** Predict *** [INFO|trainer.py:725] 2023-02-14 21:46:37,124 >> The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:46:37,125 >> ***** Running Prediction ***** [INFO|trainer.py:2909] 2023-02-14 21:46:37,125 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:46:37,125 >> Batch size = 24 100% 84/84 [00:03<00:00, 23.68it/s] INFO:__main__:***** Predict results None ***** [INFO|modelcard.py:444] 2023-02-14 21:46:40,840 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.8889999985694885}]}
- full data
- sequence length: 128
- leakyRelu instad of relu
- every other layer frozen
- custom head
!python run_glue.py \
--cache_dir roberta_custom_training_cache \
--model_name_or_path roberta-base \
--custom_model roberta_custom \
--train_file data/train.json \
--validation_file data/valid.json \
--test_file data/test.json \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 24 \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--output_dir out/emotion/roberta_custom \
--overwrite_output_dir
2023-02-14 21:47:02.722049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-14 21:47:02.876002: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-02-14 21:47:03.659342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:47:03.659451: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:47:03.659470: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/emotion/roberta_custom/runs/Feb14_21-47-05_fc0011e45a00, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, output_dir=out/emotion/roberta_custom, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=24, per_device_train_batch_size=24, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/emotion/roberta_custom, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) INFO:__main__:load a local file for train: data/train.json INFO:__main__:load a local file for validation: data/valid.json INFO:__main__:load a local file for test: data/test.json WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 14463.12it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 2119.76it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 657.14it/s] Downloading (…)lve/main/config.json: 100% 481/481 [00:00<00:00, 88.4kB/s] [INFO|configuration_utils.py:653] 2023-02-14 21:47:06,896 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:47:06,897 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3", "4": "LABEL_4", "5": "LABEL_5" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3, "LABEL_4": 4, "LABEL_5": 5 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } [INFO|tokenization_auto.py:418] 2023-02-14 21:47:06,989 >> Could not locate the tokenizer configuration file, will try to use the model config instead. [INFO|configuration_utils.py:653] 2023-02-14 21:47:07,079 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:47:07,080 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 9.35MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.91MB/s] Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 10.3MB/s] [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file vocab.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file merges.txt from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file tokenizer.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file tokenizer_config.json from cache at None [INFO|configuration_utils.py:653] 2023-02-14 21:47:08,306 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:47:08,306 >> Model config RobertaConfig { "_name_or_path": "roberta-base", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.23.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } INFO:__main__:Using hidden states in model: False INFO:__main__:Using implementation from class: RobertaForSequenceClassificationCustomAlternative Downloading (…)"pytorch_model.bin";: 100% 501M/501M [00:04<00:00, 106MB/s] [INFO|modeling_utils.py:2156] 2023-02-14 21:47:13,300 >> loading weights file pytorch_model.bin from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin [WARNING|modeling_utils.py:2596] 2023-02-14 21:47:15,772 >> Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassificationCustomAlternative: ['roberta.pooler.dense.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight'] - This IS expected if you are initializing RobertaForSequenceClassificationCustomAlternative from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForSequenceClassificationCustomAlternative from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:2608] 2023-02-14 21:47:15,772 >> Some weights of RobertaForSequenceClassificationCustomAlternative were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense_1_input.weight', 'classifier.dense_2.weight', 'classifier.out_proj.bias', 'classifier.dense_2.bias', 'classifier.dense_1_input.bias', 'classifier.dense_1_hidden.weight', 'classifier.dense_1_hidden.bias', 'classifier.out_proj.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Frozen layers: [('roberta.encoder.layer.0.attention.self.query.weight', False), ('roberta.encoder.layer.0.attention.self.query.bias', False), ('roberta.encoder.layer.0.attention.self.key.weight', False), ('roberta.encoder.layer.0.attention.self.key.bias', False), ('roberta.encoder.layer.0.attention.self.value.weight', False), ('roberta.encoder.layer.0.attention.self.value.bias', False), ('roberta.encoder.layer.0.attention.output.dense.weight', False), ('roberta.encoder.layer.0.attention.output.dense.bias', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.0.intermediate.dense.weight', False), ('roberta.encoder.layer.0.intermediate.dense.bias', False), ('roberta.encoder.layer.0.output.dense.weight', False), ('roberta.encoder.layer.0.output.dense.bias', False), ('roberta.encoder.layer.0.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.attention.self.query.weight', False), ('roberta.encoder.layer.2.attention.self.query.bias', False), ('roberta.encoder.layer.2.attention.self.key.weight', False), ('roberta.encoder.layer.2.attention.self.key.bias', False), ('roberta.encoder.layer.2.attention.self.value.weight', False), ('roberta.encoder.layer.2.attention.self.value.bias', False), ('roberta.encoder.layer.2.attention.output.dense.weight', False), ('roberta.encoder.layer.2.attention.output.dense.bias', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.intermediate.dense.weight', False), ('roberta.encoder.layer.2.intermediate.dense.bias', False), ('roberta.encoder.layer.2.output.dense.weight', False), ('roberta.encoder.layer.2.output.dense.bias', False), ('roberta.encoder.layer.2.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.attention.self.query.weight', False), ('roberta.encoder.layer.4.attention.self.query.bias', False), ('roberta.encoder.layer.4.attention.self.key.weight', False), ('roberta.encoder.layer.4.attention.self.key.bias', False), ('roberta.encoder.layer.4.attention.self.value.weight', False), ('roberta.encoder.layer.4.attention.self.value.bias', False), ('roberta.encoder.layer.4.attention.output.dense.weight', False), ('roberta.encoder.layer.4.attention.output.dense.bias', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.intermediate.dense.weight', False), ('roberta.encoder.layer.4.intermediate.dense.bias', False), ('roberta.encoder.layer.4.output.dense.weight', False), ('roberta.encoder.layer.4.output.dense.bias', False), ('roberta.encoder.layer.4.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.attention.self.query.weight', False), ('roberta.encoder.layer.6.attention.self.query.bias', False), ('roberta.encoder.layer.6.attention.self.key.weight', False), ('roberta.encoder.layer.6.attention.self.key.bias', False), ('roberta.encoder.layer.6.attention.self.value.weight', False), ('roberta.encoder.layer.6.attention.self.value.bias', False), ('roberta.encoder.layer.6.attention.output.dense.weight', False), ('roberta.encoder.layer.6.attention.output.dense.bias', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.intermediate.dense.weight', False), ('roberta.encoder.layer.6.intermediate.dense.bias', False), ('roberta.encoder.layer.6.output.dense.weight', False), ('roberta.encoder.layer.6.output.dense.bias', False), ('roberta.encoder.layer.6.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.attention.self.query.weight', False), ('roberta.encoder.layer.8.attention.self.query.bias', False), ('roberta.encoder.layer.8.attention.self.key.weight', False), ('roberta.encoder.layer.8.attention.self.key.bias', False), ('roberta.encoder.layer.8.attention.self.value.weight', False), ('roberta.encoder.layer.8.attention.self.value.bias', False), ('roberta.encoder.layer.8.attention.output.dense.weight', False), ('roberta.encoder.layer.8.attention.output.dense.bias', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.intermediate.dense.weight', False), ('roberta.encoder.layer.8.intermediate.dense.bias', False), ('roberta.encoder.layer.8.output.dense.weight', False), ('roberta.encoder.layer.8.output.dense.bias', False), ('roberta.encoder.layer.8.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.attention.self.query.weight', False), ('roberta.encoder.layer.10.attention.self.query.bias', False), ('roberta.encoder.layer.10.attention.self.key.weight', False), ('roberta.encoder.layer.10.attention.self.key.bias', False), ('roberta.encoder.layer.10.attention.self.value.weight', False), ('roberta.encoder.layer.10.attention.self.value.bias', False), ('roberta.encoder.layer.10.attention.output.dense.weight', False), ('roberta.encoder.layer.10.attention.output.dense.bias', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.intermediate.dense.weight', False), ('roberta.encoder.layer.10.intermediate.dense.bias', False), ('roberta.encoder.layer.10.output.dense.weight', False), ('roberta.encoder.layer.10.output.dense.bias', False), ('roberta.encoder.layer.10.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.output.LayerNorm.bias', False)] Running tokenizer on dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow Running tokenizer on dataset: 100% 16/16 [00:01<00:00, 15.42ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 7.47ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 19.76ba/s] INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [0, 118, 109, 465, 92, 964, 939, 475, 164, 7, 860, 1823, 543, 7, 146, 106, 1095, 8, 114, 939, 2845, 14, 939, 218, 326, 236, 7, 619, 2581, 456, 8, 95, 3068, 66, 5, 94, 76, 9, 334, 15, 127, 308, 939, 475, 164, 7, 33, 7, 860, 1823, 543, 45, 7, 575, 99, 82, 206, 9, 162, 145, 10, 784, 9604, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [0, 118, 553, 106, 7, 1962, 162, 11, 2351, 10, 232, 147, 70, 76, 793, 1972, 115, 1733, 62, 2157, 7917, 8, 2247, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [0, 118, 619, 77, 47, 32, 10, 10837, 621, 47, 5696, 97, 10837, 82, 88, 110, 301, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. [INFO|trainer.py:725] 2023-02-14 21:47:19,642 >> The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`, you can safely ignore this message. /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1607] 2023-02-14 21:47:19,649 >> ***** Running training ***** [INFO|trainer.py:1608] 2023-02-14 21:47:19,649 >> Num examples = 16000 [INFO|trainer.py:1609] 2023-02-14 21:47:19,649 >> Num Epochs = 1 [INFO|trainer.py:1610] 2023-02-14 21:47:19,649 >> Instantaneous batch size per device = 24 [INFO|trainer.py:1611] 2023-02-14 21:47:19,649 >> Total train batch size (w. parallel, distributed & accumulation) = 24 [INFO|trainer.py:1612] 2023-02-14 21:47:19,649 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1613] 2023-02-14 21:47:19,649 >> Total optimization steps = 667 {'loss': 0.8955, 'learning_rate': 5.0074962518740634e-06, 'epoch': 0.75} 75% 500/667 [00:58<00:19, 8.75it/s][INFO|trainer.py:2656] 2023-02-14 21:48:17,996 >> Saving model checkpoint to out/emotion/roberta_custom/checkpoint-500 [INFO|configuration_utils.py:447] 2023-02-14 21:48:17,997 >> Configuration saved in out/emotion/roberta_custom/checkpoint-500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:48:19,015 >> Model weights saved in out/emotion/roberta_custom/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:48:19,016 >> tokenizer config file saved in out/emotion/roberta_custom/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:48:19,016 >> Special tokens file saved in out/emotion/roberta_custom/checkpoint-500/special_tokens_map.json 100% 666/667 [01:20<00:00, 8.66it/s][INFO|trainer.py:1852] 2023-02-14 21:48:40,745 >> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 81.0963, 'train_samples_per_second': 197.296, 'train_steps_per_second': 8.225, 'train_loss': 0.8004468377383573, 'epoch': 1.0} 100% 667/667 [01:21<00:00, 8.23it/s] [INFO|trainer.py:2656] 2023-02-14 21:48:40,747 >> Saving model checkpoint to out/emotion/roberta_custom [INFO|configuration_utils.py:447] 2023-02-14 21:48:40,748 >> Configuration saved in out/emotion/roberta_custom/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:48:41,796 >> Model weights saved in out/emotion/roberta_custom/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:48:41,797 >> tokenizer config file saved in out/emotion/roberta_custom/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:48:41,797 >> Special tokens file saved in out/emotion/roberta_custom/special_tokens_map.json ***** train metrics ***** epoch = 1.0 train_loss = 0.8004 train_runtime = 0:01:21.09 train_samples = 16000 train_samples_per_second = 197.296 train_steps_per_second = 8.225 INFO:__main__:*** Evaluate *** [INFO|trainer.py:725] 2023-02-14 21:48:41,898 >> The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:48:41,899 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:48:41,900 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:48:41,900 >> Batch size = 24 100% 84/84 [00:03<00:00, 23.62it/s] ***** eval metrics ***** epoch = 1.0 eval_accuracy = 0.867 eval_loss = 0.39 eval_runtime = 0:00:03.59 eval_samples = 2000 eval_samples_per_second = 555.583 eval_steps_per_second = 23.334 INFO:__main__:*** Predict *** [INFO|trainer.py:725] 2023-02-14 21:48:45,503 >> The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:48:45,504 >> ***** Running Prediction ***** [INFO|trainer.py:2909] 2023-02-14 21:48:45,504 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:48:45,504 >> Batch size = 24 100% 84/84 [00:03<00:00, 23.74it/s] INFO:__main__:***** Predict results None ***** [INFO|modelcard.py:444] 2023-02-14 21:48:49,211 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.8669999837875366}]}
GPT2
- full data
- model
GPT2
- sequnece length: 128
- training epoch: 1
!python run_glue.py \
--cache_dir gtp_cache_training \
--model_name_or_path gpt2 \
--train_file data/train.json \
--validation_file data/valid.json \
--test_file data/test.json \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 24 \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--output_dir out/emotion/gpt2 \
--overwrite_output_dir \
--eval_steps 250 \
--evaluation_strategy steps \
--metric_for_best_model accuracy \
--logging_steps 100 \
--save_total_limit 5 \
--max_steps 2500 \
--load_best_model_at_end True
2023-02-14 21:48:52.605236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-14 21:48:52.757779: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-02-14 21:48:53.540701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:48:53.540799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:48:53.540819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=250, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=100, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=2500, metric_for_best_model=accuracy, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, output_dir=out/emotion/gpt2, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=24, per_device_train_batch_size=24, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/emotion/gpt2, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) INFO:__main__:load a local file for train: data/train.json INFO:__main__:load a local file for validation: data/valid.json INFO:__main__:load a local file for test: data/test.json WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 12169.16it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 2183.40it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 665.62it/s] Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 125kB/s] [INFO|configuration_utils.py:653] 2023-02-14 21:48:57,052 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:48:57,053 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3", "4": "LABEL_4", "5": "LABEL_5" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3, "LABEL_4": 4, "LABEL_5": 5 }, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } [INFO|tokenization_auto.py:418] 2023-02-14 21:48:57,145 >> Could not locate the tokenizer configuration file, will try to use the model config instead. [INFO|configuration_utils.py:653] 2023-02-14 21:48:57,236 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:48:57,237 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:00<00:00, 9.20MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 6.19MB/s] Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 11.7MB/s] [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file vocab.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file merges.txt from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file tokenizer.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file tokenizer_config.json from cache at None [INFO|configuration_utils.py:653] 2023-02-14 21:48:58,447 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:48:58,448 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } INFO:__main__:Using implementation from class: AutoModelForSequenceClassification Downloading (…)"pytorch_model.bin";: 100% 548M/548M [00:05<00:00, 108MB/s] [INFO|modeling_utils.py:2156] 2023-02-14 21:49:03,784 >> loading weights file pytorch_model.bin from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin [INFO|modeling_utils.py:2606] 2023-02-14 21:49:05,169 >> All model checkpoint weights were used when initializing GPT2ForSequenceClassification. [WARNING|modeling_utils.py:2608] 2023-02-14 21:49:05,169 >> Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [ERROR|tokenization_utils_base.py:1019] 2023-02-14 21:49:05,177 >> Using pad_token, but it is not set yet. INFO:__main__:Set PAD token to EOS: <|endoftext|> Running tokenizer on dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 20.23ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.04ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.92ba/s] INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [72, 466, 1064, 649, 2460, 1312, 285, 1016, 284, 1949, 3131, 1327, 284, 787, 606, 2652, 290, 611, 1312, 5409, 326, 1312, 836, 256, 765, 284, 1254, 5938, 757, 290, 655, 6594, 503, 262, 938, 614, 286, 1524, 319, 616, 898, 1312, 285, 1016, 284, 423, 284, 1949, 3131, 1327, 407, 284, 1337, 644, 661, 892, 286, 502, 852, 257, 300, 14491, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [72, 1965, 606, 284, 4654, 502, 287, 4441, 257, 995, 810, 477, 614, 1468, 4813, 714, 1663, 510, 4203, 17836, 290, 3665, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [72, 1254, 618, 345, 389, 257, 18088, 1048, 345, 4729, 584, 18088, 661, 656, 534, 1204, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. [INFO|trainer.py:503] 2023-02-14 21:49:08,712 >> max_steps is given, it will override any value given in num_train_epochs [INFO|trainer.py:725] 2023-02-14 21:49:08,712 >> The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1607] 2023-02-14 21:49:08,718 >> ***** Running training ***** [INFO|trainer.py:1608] 2023-02-14 21:49:08,718 >> Num examples = 16000 [INFO|trainer.py:1609] 2023-02-14 21:49:08,718 >> Num Epochs = 4 [INFO|trainer.py:1610] 2023-02-14 21:49:08,719 >> Instantaneous batch size per device = 24 [INFO|trainer.py:1611] 2023-02-14 21:49:08,719 >> Total train batch size (w. parallel, distributed & accumulation) = 24 [INFO|trainer.py:1612] 2023-02-14 21:49:08,719 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1613] 2023-02-14 21:49:08,719 >> Total optimization steps = 2500 {'loss': 2.3442, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.15} {'loss': 1.3126, 'learning_rate': 1.8400000000000003e-05, 'epoch': 0.3} 10% 250/2500 [00:37<05:31, 6.79it/s][INFO|trainer.py:725] 2023-02-14 21:49:46,426 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:49:46,428 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:49:46,428 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:49:46,428 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 4% 3/84 [00:00<00:02, 29.40it/s][A 7% 6/84 [00:00<00:03, 23.74it/s][A 11% 9/84 [00:00<00:03, 22.40it/s][A 14% 12/84 [00:00<00:03, 21.78it/s][A 18% 15/84 [00:00<00:03, 21.50it/s][A 21% 18/84 [00:00<00:03, 21.30it/s][A 25% 21/84 [00:00<00:02, 21.20it/s][A 29% 24/84 [00:01<00:02, 20.97it/s][A 32% 27/84 [00:01<00:02, 20.93it/s][A 36% 30/84 [00:01<00:02, 20.97it/s][A 39% 33/84 [00:01<00:02, 21.00it/s][A 43% 36/84 [00:01<00:02, 21.01it/s][A 46% 39/84 [00:01<00:02, 21.03it/s][A 50% 42/84 [00:01<00:01, 21.03it/s][A 54% 45/84 [00:02<00:01, 21.02it/s][A 57% 48/84 [00:02<00:01, 21.01it/s][A 61% 51/84 [00:02<00:01, 21.01it/s][A 64% 54/84 [00:02<00:01, 21.01it/s][A 68% 57/84 [00:02<00:01, 21.00it/s][A 71% 60/84 [00:02<00:01, 21.00it/s][A 75% 63/84 [00:02<00:00, 21.00it/s][A 79% 66/84 [00:03<00:00, 20.99it/s][A 82% 69/84 [00:03<00:00, 20.94it/s][A 86% 72/84 [00:03<00:00, 20.95it/s][A 89% 75/84 [00:03<00:00, 20.98it/s][A 93% 78/84 [00:03<00:00, 21.00it/s][A 96% 81/84 [00:03<00:00, 21.00it/s][A 100% 84/84 [00:03<00:00, 22.24it/s][A {'eval_loss': 0.7983964085578918, 'eval_accuracy': 0.7465000152587891, 'eval_runtime': 3.9877, 'eval_samples_per_second': 501.548, 'eval_steps_per_second': 21.065, 'epoch': 0.37} 10% 250/2500 [00:41<05:31, 6.79it/s] {'loss': 0.7216, 'learning_rate': 1.76e-05, 'epoch': 0.45} {'loss': 0.5032, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.6} {'loss': 0.3904, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.75} 20% 500/2500 [01:18<04:56, 6.74it/s][INFO|trainer.py:725] 2023-02-14 21:50:27,312 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:50:27,314 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:50:27,314 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:50:27,314 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.77it/s][A 8% 7/84 [00:00<00:03, 23.71it/s][A 12% 10/84 [00:00<00:03, 22.34it/s][A 15% 13/84 [00:00<00:03, 21.72it/s][A 19% 16/84 [00:00<00:03, 21.40it/s][A 23% 19/84 [00:00<00:03, 21.09it/s][A 26% 22/84 [00:01<00:02, 21.01it/s][A 30% 25/84 [00:01<00:02, 20.95it/s][A 33% 28/84 [00:01<00:02, 20.92it/s][A 37% 31/84 [00:01<00:02, 20.87it/s][A 40% 34/84 [00:01<00:02, 20.91it/s][A 44% 37/84 [00:01<00:02, 20.95it/s][A 48% 40/84 [00:01<00:02, 20.91it/s][A 51% 43/84 [00:02<00:01, 20.96it/s][A 55% 46/84 [00:02<00:01, 20.82it/s][A 58% 49/84 [00:02<00:01, 20.87it/s][A 62% 52/84 [00:02<00:01, 20.90it/s][A 65% 55/84 [00:02<00:01, 20.94it/s][A 69% 58/84 [00:02<00:01, 20.97it/s][A 73% 61/84 [00:02<00:01, 21.01it/s][A 76% 64/84 [00:03<00:00, 21.01it/s][A 80% 67/84 [00:03<00:00, 21.01it/s][A 83% 70/84 [00:03<00:00, 21.03it/s][A 87% 73/84 [00:03<00:00, 21.02it/s][A 90% 76/84 [00:03<00:00, 21.00it/s][A 94% 79/84 [00:03<00:00, 21.02it/s][A 98% 82/84 [00:03<00:00, 21.00it/s][A {'eval_loss': 0.29131895303726196, 'eval_accuracy': 0.9035000205039978, 'eval_runtime': 3.9922, 'eval_samples_per_second': 500.974, 'eval_steps_per_second': 21.041, 'epoch': 0.75} 20% 500/2500 [01:22<04:56, 6.74it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:50:31,307 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-500 [INFO|configuration_utils.py:447] 2023-02-14 21:50:31,308 >> Configuration saved in out/emotion/gpt2/checkpoint-500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:50:32,356 >> Model weights saved in out/emotion/gpt2/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:50:32,357 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:50:32,357 >> Special tokens file saved in out/emotion/gpt2/checkpoint-500/special_tokens_map.json {'loss': 0.3554, 'learning_rate': 1.5200000000000002e-05, 'epoch': 0.9} {'loss': 0.2871, 'learning_rate': 1.4400000000000001e-05, 'epoch': 1.05} 30% 750/2500 [02:02<04:19, 6.74it/s][INFO|trainer.py:725] 2023-02-14 21:51:11,104 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:51:11,106 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:51:11,106 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:51:11,106 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.92it/s][A 8% 7/84 [00:00<00:03, 23.90it/s][A 12% 10/84 [00:00<00:03, 22.57it/s][A 15% 13/84 [00:00<00:03, 21.98it/s][A 19% 16/84 [00:00<00:03, 21.63it/s][A 23% 19/84 [00:00<00:03, 21.40it/s][A 26% 22/84 [00:00<00:02, 21.31it/s][A 30% 25/84 [00:01<00:02, 21.22it/s][A 33% 28/84 [00:01<00:02, 21.17it/s][A 37% 31/84 [00:01<00:02, 21.12it/s][A 40% 34/84 [00:01<00:02, 21.03it/s][A 44% 37/84 [00:01<00:02, 21.03it/s][A 48% 40/84 [00:01<00:02, 21.02it/s][A 51% 43/84 [00:01<00:01, 21.04it/s][A 55% 46/84 [00:02<00:01, 21.04it/s][A 58% 49/84 [00:02<00:01, 21.07it/s][A 62% 52/84 [00:02<00:01, 21.07it/s][A 65% 55/84 [00:02<00:01, 21.00it/s][A 69% 58/84 [00:02<00:01, 21.03it/s][A 73% 61/84 [00:02<00:01, 21.03it/s][A 76% 64/84 [00:02<00:00, 21.04it/s][A 80% 67/84 [00:03<00:00, 21.04it/s][A 83% 70/84 [00:03<00:00, 21.06it/s][A 87% 73/84 [00:03<00:00, 21.04it/s][A 90% 76/84 [00:03<00:00, 21.04it/s][A 94% 79/84 [00:03<00:00, 21.05it/s][A 98% 82/84 [00:03<00:00, 21.06it/s][A {'eval_loss': 0.2168988287448883, 'eval_accuracy': 0.9235000014305115, 'eval_runtime': 3.9688, 'eval_samples_per_second': 503.925, 'eval_steps_per_second': 21.165, 'epoch': 1.12} 30% 750/2500 [02:06<04:19, 6.74it/s] {'loss': 0.2285, 'learning_rate': 1.3600000000000002e-05, 'epoch': 1.2} {'loss': 0.1888, 'learning_rate': 1.2800000000000001e-05, 'epoch': 1.35} {'loss': 0.2106, 'learning_rate': 1.2e-05, 'epoch': 1.5} 40% 1000/2500 [02:43<03:41, 6.78it/s][INFO|trainer.py:725] 2023-02-14 21:51:51,748 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:51:51,749 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:51:51,750 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:51:51,750 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 28.08it/s][A 8% 7/84 [00:00<00:03, 23.96it/s][A 12% 10/84 [00:00<00:03, 22.63it/s][A 15% 13/84 [00:00<00:03, 21.99it/s][A 19% 16/84 [00:00<00:03, 21.68it/s][A 23% 19/84 [00:00<00:03, 21.48it/s][A 26% 22/84 [00:00<00:02, 21.32it/s][A 30% 25/84 [00:01<00:02, 21.23it/s][A 33% 28/84 [00:01<00:02, 21.15it/s][A 37% 31/84 [00:01<00:02, 21.10it/s][A 40% 34/84 [00:01<00:02, 21.08it/s][A 44% 37/84 [00:01<00:02, 21.08it/s][A 48% 40/84 [00:01<00:02, 21.07it/s][A 51% 43/84 [00:01<00:01, 21.05it/s][A 55% 46/84 [00:02<00:01, 21.05it/s][A 58% 49/84 [00:02<00:01, 21.04it/s][A 62% 52/84 [00:02<00:01, 21.02it/s][A 65% 55/84 [00:02<00:01, 21.03it/s][A 69% 58/84 [00:02<00:01, 21.04it/s][A 73% 61/84 [00:02<00:01, 21.04it/s][A 76% 64/84 [00:02<00:00, 21.03it/s][A 80% 67/84 [00:03<00:00, 21.05it/s][A 83% 70/84 [00:03<00:00, 21.06it/s][A 87% 73/84 [00:03<00:00, 21.07it/s][A 90% 76/84 [00:03<00:00, 21.06it/s][A 94% 79/84 [00:03<00:00, 21.07it/s][A 98% 82/84 [00:03<00:00, 21.08it/s][A {'eval_loss': 0.19490236043930054, 'eval_accuracy': 0.9259999990463257, 'eval_runtime': 3.9658, 'eval_samples_per_second': 504.311, 'eval_steps_per_second': 21.181, 'epoch': 1.5} 40% 1000/2500 [02:46<03:41, 6.78it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:51:55,716 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-1000 [INFO|configuration_utils.py:447] 2023-02-14 21:51:55,717 >> Configuration saved in out/emotion/gpt2/checkpoint-1000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:51:56,708 >> Model weights saved in out/emotion/gpt2/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:51:56,709 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:51:56,709 >> Special tokens file saved in out/emotion/gpt2/checkpoint-1000/special_tokens_map.json {'loss': 0.1906, 'learning_rate': 1.1200000000000001e-05, 'epoch': 1.65} {'loss': 0.1793, 'learning_rate': 1.04e-05, 'epoch': 1.8} 50% 1250/2500 [03:26<03:04, 6.76it/s][INFO|trainer.py:725] 2023-02-14 21:52:35,220 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:52:35,222 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:52:35,222 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:52:35,222 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.99it/s][A 8% 7/84 [00:00<00:03, 23.91it/s][A 12% 10/84 [00:00<00:03, 22.61it/s][A 15% 13/84 [00:00<00:03, 22.00it/s][A 19% 16/84 [00:00<00:03, 21.66it/s][A 23% 19/84 [00:00<00:03, 21.45it/s][A 26% 22/84 [00:00<00:02, 21.34it/s][A 30% 25/84 [00:01<00:02, 21.26it/s][A 33% 28/84 [00:01<00:02, 21.21it/s][A 37% 31/84 [00:01<00:02, 21.17it/s][A 40% 34/84 [00:01<00:02, 21.14it/s][A 44% 37/84 [00:01<00:02, 21.11it/s][A 48% 40/84 [00:01<00:02, 21.12it/s][A 51% 43/84 [00:01<00:01, 21.11it/s][A 55% 46/84 [00:02<00:01, 21.10it/s][A 58% 49/84 [00:02<00:01, 21.09it/s][A 62% 52/84 [00:02<00:01, 21.10it/s][A 65% 55/84 [00:02<00:01, 21.09it/s][A 69% 58/84 [00:02<00:01, 21.06it/s][A 73% 61/84 [00:02<00:01, 21.08it/s][A 76% 64/84 [00:02<00:00, 21.09it/s][A 80% 67/84 [00:03<00:00, 21.09it/s][A 83% 70/84 [00:03<00:00, 21.04it/s][A 87% 73/84 [00:03<00:00, 21.06it/s][A 90% 76/84 [00:03<00:00, 21.08it/s][A 94% 79/84 [00:03<00:00, 21.07it/s][A 98% 82/84 [00:03<00:00, 21.08it/s][A {'eval_loss': 0.1607103943824768, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 3.9612, 'eval_samples_per_second': 504.895, 'eval_steps_per_second': 21.206, 'epoch': 1.87} 50% 1250/2500 [03:30<03:04, 6.76it/s] {'loss': 0.2116, 'learning_rate': 9.600000000000001e-06, 'epoch': 1.95} {'loss': 0.1536, 'learning_rate': 8.8e-06, 'epoch': 2.1} {'loss': 0.1518, 'learning_rate': 8.000000000000001e-06, 'epoch': 2.25} 60% 1500/2500 [04:07<02:26, 6.82it/s][INFO|trainer.py:725] 2023-02-14 21:53:15,831 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:53:15,833 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:53:15,833 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:53:15,833 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 28.10it/s][A 8% 7/84 [00:00<00:03, 23.90it/s][A 12% 10/84 [00:00<00:03, 22.58it/s][A 15% 13/84 [00:00<00:03, 21.85it/s][A 19% 16/84 [00:00<00:03, 21.53it/s][A 23% 19/84 [00:00<00:03, 21.37it/s][A 26% 22/84 [00:01<00:02, 21.27it/s][A 30% 25/84 [00:01<00:02, 21.19it/s][A 33% 28/84 [00:01<00:02, 21.13it/s][A 37% 31/84 [00:01<00:02, 21.11it/s][A 40% 34/84 [00:01<00:02, 21.04it/s][A 44% 37/84 [00:01<00:02, 20.94it/s][A 48% 40/84 [00:01<00:02, 20.94it/s][A 51% 43/84 [00:02<00:01, 20.94it/s][A 55% 46/84 [00:02<00:01, 20.97it/s][A 58% 49/84 [00:02<00:01, 20.97it/s][A 62% 52/84 [00:02<00:01, 20.98it/s][A 65% 55/84 [00:02<00:01, 20.93it/s][A 69% 58/84 [00:02<00:01, 20.94it/s][A 73% 61/84 [00:02<00:01, 20.98it/s][A 76% 64/84 [00:03<00:00, 20.97it/s][A 80% 67/84 [00:03<00:00, 20.99it/s][A 83% 70/84 [00:03<00:00, 21.02it/s][A 87% 73/84 [00:03<00:00, 21.05it/s][A 90% 76/84 [00:03<00:00, 21.04it/s][A 94% 79/84 [00:03<00:00, 21.05it/s][A 98% 82/84 [00:03<00:00, 21.04it/s][A {'eval_loss': 0.160899356007576, 'eval_accuracy': 0.9330000281333923, 'eval_runtime': 3.9773, 'eval_samples_per_second': 502.855, 'eval_steps_per_second': 21.12, 'epoch': 2.25} 60% 1500/2500 [04:11<02:26, 6.82it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:53:19,811 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-1500 [INFO|configuration_utils.py:447] 2023-02-14 21:53:19,812 >> Configuration saved in out/emotion/gpt2/checkpoint-1500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:53:21,455 >> Model weights saved in out/emotion/gpt2/checkpoint-1500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:53:21,456 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-1500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:53:21,456 >> Special tokens file saved in out/emotion/gpt2/checkpoint-1500/special_tokens_map.json {'loss': 0.157, 'learning_rate': 7.2000000000000005e-06, 'epoch': 2.4} {'loss': 0.141, 'learning_rate': 6.4000000000000006e-06, 'epoch': 2.55} 70% 1750/2500 [04:51<01:50, 6.80it/s][INFO|trainer.py:725] 2023-02-14 21:54:00,007 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:54:00,009 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:54:00,009 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:54:00,009 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.89it/s][A 8% 7/84 [00:00<00:03, 23.82it/s][A 12% 10/84 [00:00<00:03, 22.49it/s][A 15% 13/84 [00:00<00:03, 21.85it/s][A 19% 16/84 [00:00<00:03, 21.48it/s][A 23% 19/84 [00:00<00:03, 21.31it/s][A 26% 22/84 [00:01<00:02, 21.20it/s][A 30% 25/84 [00:01<00:02, 21.09it/s][A 33% 28/84 [00:01<00:02, 21.00it/s][A 37% 31/84 [00:01<00:02, 20.99it/s][A 40% 34/84 [00:01<00:02, 21.00it/s][A 44% 37/84 [00:01<00:02, 20.98it/s][A 48% 40/84 [00:01<00:02, 20.98it/s][A 51% 43/84 [00:02<00:01, 21.01it/s][A 55% 46/84 [00:02<00:01, 21.02it/s][A 58% 49/84 [00:02<00:01, 21.02it/s][A 62% 52/84 [00:02<00:01, 21.01it/s][A 65% 55/84 [00:02<00:01, 21.00it/s][A 69% 58/84 [00:02<00:01, 21.02it/s][A 73% 61/84 [00:02<00:01, 21.01it/s][A 76% 64/84 [00:03<00:00, 21.03it/s][A 80% 67/84 [00:03<00:00, 21.05it/s][A 83% 70/84 [00:03<00:00, 21.05it/s][A 87% 73/84 [00:03<00:00, 21.07it/s][A 90% 76/84 [00:03<00:00, 21.07it/s][A 94% 79/84 [00:03<00:00, 21.06it/s][A 98% 82/84 [00:03<00:00, 21.07it/s][A {'eval_loss': 0.15204769372940063, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 3.9769, 'eval_samples_per_second': 502.901, 'eval_steps_per_second': 21.122, 'epoch': 2.62} 70% 1750/2500 [04:55<01:50, 6.80it/s] {'loss': 0.1426, 'learning_rate': 5.600000000000001e-06, 'epoch': 2.7} {'loss': 0.1463, 'learning_rate': 4.800000000000001e-06, 'epoch': 2.85} {'loss': 0.1403, 'learning_rate': 4.000000000000001e-06, 'epoch': 3.0} 80% 2000/2500 [05:31<01:13, 6.82it/s][INFO|trainer.py:725] 2023-02-14 21:54:40,633 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:54:40,635 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:54:40,635 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:54:40,635 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.95it/s][A 8% 7/84 [00:00<00:03, 23.86it/s][A 12% 10/84 [00:00<00:03, 22.54it/s][A 15% 13/84 [00:00<00:03, 21.95it/s][A 19% 16/84 [00:00<00:03, 21.60it/s][A 23% 19/84 [00:00<00:03, 21.42it/s][A 26% 22/84 [00:00<00:02, 21.29it/s][A 30% 25/84 [00:01<00:02, 21.14it/s][A 33% 28/84 [00:01<00:02, 21.10it/s][A 37% 31/84 [00:01<00:02, 21.07it/s][A 40% 34/84 [00:01<00:02, 21.08it/s][A 44% 37/84 [00:01<00:02, 21.05it/s][A 48% 40/84 [00:01<00:02, 21.06it/s][A 51% 43/84 [00:01<00:01, 21.04it/s][A 55% 46/84 [00:02<00:01, 21.00it/s][A 58% 49/84 [00:02<00:01, 21.00it/s][A 62% 52/84 [00:02<00:01, 21.00it/s][A 65% 55/84 [00:02<00:01, 20.96it/s][A 69% 58/84 [00:02<00:01, 20.97it/s][A 73% 61/84 [00:02<00:01, 20.96it/s][A 76% 64/84 [00:03<00:00, 20.97it/s][A 80% 67/84 [00:03<00:00, 20.94it/s][A 83% 70/84 [00:03<00:00, 20.95it/s][A 87% 73/84 [00:03<00:00, 20.95it/s][A 90% 76/84 [00:03<00:00, 21.00it/s][A 94% 79/84 [00:03<00:00, 21.00it/s][A 98% 82/84 [00:03<00:00, 21.02it/s][A {'eval_loss': 0.14609387516975403, 'eval_accuracy': 0.9290000200271606, 'eval_runtime': 3.9774, 'eval_samples_per_second': 502.846, 'eval_steps_per_second': 21.12, 'epoch': 3.0} 80% 2000/2500 [05:35<01:13, 6.82it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:54:44,614 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-2000 [INFO|configuration_utils.py:447] 2023-02-14 21:54:44,615 >> Configuration saved in out/emotion/gpt2/checkpoint-2000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:54:46,838 >> Model weights saved in out/emotion/gpt2/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:54:46,839 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:54:46,839 >> Special tokens file saved in out/emotion/gpt2/checkpoint-2000/special_tokens_map.json {'loss': 0.1256, 'learning_rate': 3.2000000000000003e-06, 'epoch': 3.15} {'loss': 0.1246, 'learning_rate': 2.4000000000000003e-06, 'epoch': 3.3} 90% 2250/2500 [06:16<00:36, 6.76it/s][INFO|trainer.py:725] 2023-02-14 21:55:25,309 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:55:25,311 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:55:25,311 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:55:25,311 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.89it/s][A 8% 7/84 [00:00<00:03, 23.86it/s][A 12% 10/84 [00:00<00:03, 22.52it/s][A 15% 13/84 [00:00<00:03, 21.87it/s][A 19% 16/84 [00:00<00:03, 21.57it/s][A 23% 19/84 [00:00<00:03, 21.40it/s][A 26% 22/84 [00:01<00:02, 21.29it/s][A 30% 25/84 [00:01<00:02, 21.22it/s][A 33% 28/84 [00:01<00:02, 21.18it/s][A 37% 31/84 [00:01<00:02, 21.15it/s][A 40% 34/84 [00:01<00:02, 21.14it/s][A 44% 37/84 [00:01<00:02, 21.12it/s][A 48% 40/84 [00:01<00:02, 21.10it/s][A 51% 43/84 [00:01<00:01, 21.09it/s][A 55% 46/84 [00:02<00:01, 21.09it/s][A 58% 49/84 [00:02<00:01, 21.10it/s][A 62% 52/84 [00:02<00:01, 21.10it/s][A 65% 55/84 [00:02<00:01, 21.10it/s][A 69% 58/84 [00:02<00:01, 21.10it/s][A 73% 61/84 [00:02<00:01, 21.06it/s][A 76% 64/84 [00:02<00:00, 21.06it/s][A 80% 67/84 [00:03<00:00, 21.07it/s][A 83% 70/84 [00:03<00:00, 21.07it/s][A 87% 73/84 [00:03<00:00, 21.06it/s][A 90% 76/84 [00:03<00:00, 21.00it/s][A 94% 79/84 [00:03<00:00, 21.02it/s][A 98% 82/84 [00:03<00:00, 21.01it/s][A {'eval_loss': 0.15553689002990723, 'eval_accuracy': 0.9294999837875366, 'eval_runtime': 3.967, 'eval_samples_per_second': 504.158, 'eval_steps_per_second': 21.175, 'epoch': 3.37} 90% 2250/2500 [06:20<00:36, 6.76it/s] {'loss': 0.1174, 'learning_rate': 1.6000000000000001e-06, 'epoch': 3.45} {'loss': 0.1374, 'learning_rate': 8.000000000000001e-07, 'epoch': 3.6} {'loss': 0.1207, 'learning_rate': 0.0, 'epoch': 3.75} 100% 2500/2500 [06:57<00:00, 6.82it/s][INFO|trainer.py:725] 2023-02-14 21:56:05,969 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:56:05,971 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:56:05,971 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:56:05,971 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.94it/s][A 8% 7/84 [00:00<00:03, 23.89it/s][A 12% 10/84 [00:00<00:03, 22.60it/s][A 15% 13/84 [00:00<00:03, 21.97it/s][A 19% 16/84 [00:00<00:03, 21.57it/s][A 23% 19/84 [00:00<00:03, 21.34it/s][A 26% 22/84 [00:01<00:02, 21.23it/s][A 30% 25/84 [00:01<00:02, 21.12it/s][A 33% 28/84 [00:01<00:02, 21.09it/s][A 37% 31/84 [00:01<00:02, 21.09it/s][A 40% 34/84 [00:01<00:02, 21.07it/s][A 44% 37/84 [00:01<00:02, 21.06it/s][A 48% 40/84 [00:01<00:02, 21.01it/s][A 51% 43/84 [00:02<00:01, 21.03it/s][A 55% 46/84 [00:02<00:01, 21.02it/s][A 58% 49/84 [00:02<00:01, 20.97it/s][A 62% 52/84 [00:02<00:01, 20.45it/s][A 65% 55/84 [00:02<00:01, 20.64it/s][A 69% 58/84 [00:02<00:01, 20.77it/s][A 73% 61/84 [00:02<00:01, 20.84it/s][A 76% 64/84 [00:03<00:00, 20.92it/s][A 80% 67/84 [00:03<00:00, 20.97it/s][A 83% 70/84 [00:03<00:00, 20.99it/s][A 87% 73/84 [00:03<00:00, 21.02it/s][A 90% 76/84 [00:03<00:00, 21.03it/s][A 94% 79/84 [00:03<00:00, 21.04it/s][A 98% 82/84 [00:03<00:00, 21.05it/s][A {'eval_loss': 0.15162073075771332, 'eval_accuracy': 0.9309999942779541, 'eval_runtime': 3.9841, 'eval_samples_per_second': 501.992, 'eval_steps_per_second': 21.084, 'epoch': 3.75} 100% 2500/2500 [07:01<00:00, 6.82it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:56:09,956 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-2500 [INFO|configuration_utils.py:447] 2023-02-14 21:56:09,957 >> Configuration saved in out/emotion/gpt2/checkpoint-2500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:56:10,953 >> Model weights saved in out/emotion/gpt2/checkpoint-2500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:56:10,954 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-2500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:56:10,954 >> Special tokens file saved in out/emotion/gpt2/checkpoint-2500/special_tokens_map.json [INFO|trainer.py:1852] 2023-02-14 21:56:12,777 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:1946] 2023-02-14 21:56:12,778 >> Loading best model from out/emotion/gpt2/checkpoint-1500 (score: 0.9330000281333923). {'train_runtime': 424.4983, 'train_samples_per_second': 141.343, 'train_steps_per_second': 5.889, 'train_loss': 0.351297896194458, 'epoch': 3.75} 100% 2500/2500 [07:04<00:00, 5.89it/s] [INFO|trainer.py:2656] 2023-02-14 21:56:13,218 >> Saving model checkpoint to out/emotion/gpt2 [INFO|configuration_utils.py:447] 2023-02-14 21:56:13,220 >> Configuration saved in out/emotion/gpt2/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:56:14,063 >> Model weights saved in out/emotion/gpt2/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:56:14,064 >> tokenizer config file saved in out/emotion/gpt2/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:56:14,064 >> Special tokens file saved in out/emotion/gpt2/special_tokens_map.json ***** train metrics ***** epoch = 3.75 train_loss = 0.3513 train_runtime = 0:07:04.49 train_samples = 16000 train_samples_per_second = 141.343 train_steps_per_second = 5.889 INFO:__main__:*** Evaluate *** [INFO|trainer.py:725] 2023-02-14 21:56:14,169 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:56:14,170 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:56:14,170 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:56:14,170 >> Batch size = 24 100% 84/84 [00:03<00:00, 21.20it/s] ***** eval metrics ***** epoch = 3.75 eval_accuracy = 0.933 eval_loss = 0.1609 eval_runtime = 0:00:04.02 eval_samples = 2000 eval_samples_per_second = 497.496 eval_steps_per_second = 20.895 INFO:__main__:*** Predict *** [INFO|trainer.py:725] 2023-02-14 21:56:18,194 >> The following columns in the test set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:56:18,195 >> ***** Running Prediction ***** [INFO|trainer.py:2909] 2023-02-14 21:56:18,195 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:56:18,195 >> Batch size = 24 100% 84/84 [00:03<00:00, 21.40it/s] INFO:__main__:***** Predict results None ***** [INFO|modelcard.py:444] 2023-02-14 21:56:22,304 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9330000281333923}]}
- full dataset
- custom head
!python run_glue.py \
--cache_dir gtp_custom_cache_training \
--model_name_or_path gpt2 \
--custom_model gpt2_custom \
--train_file data/train.json \
--validation_file data/valid.json \
--test_file data/test.json \
--per_device_train_batch_size 24 \
--per_device_eval_batch_size 24 \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 128 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--output_dir out/emotion/gpt2_custom \
--overwrite_output_dir \
--eval_steps 250 \
--evaluation_strategy steps \
--metric_for_best_model accuracy \
--logging_steps 100 \
--save_total_limit 5 \
--max_steps 2500 \
--load_best_model_at_end True
2023-02-14 21:56:25.884599: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-14 21:56:26.040127: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-02-14 21:56:26.823479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:56:26.823595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 21:56:26.823615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=250, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=100, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=2500, metric_for_best_model=accuracy, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, output_dir=out/emotion/gpt2_custom, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=24, per_device_train_batch_size=24, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/emotion/gpt2_custom, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) INFO:__main__:load a local file for train: data/train.json INFO:__main__:load a local file for validation: data/valid.json INFO:__main__:load a local file for test: data/test.json WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 14138.10it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 2175.09it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 672.49it/s] Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 123kB/s] [INFO|configuration_utils.py:653] 2023-02-14 21:56:30,068 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:56:30,068 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3", "4": "LABEL_4", "5": "LABEL_5" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3, "LABEL_4": 4, "LABEL_5": 5 }, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } [INFO|tokenization_auto.py:418] 2023-02-14 21:56:30,162 >> Could not locate the tokenizer configuration file, will try to use the model config instead. [INFO|configuration_utils.py:653] 2023-02-14 21:56:30,251 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:56:30,252 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:00<00:00, 9.18MB/s] Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.90MB/s] Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 14.3MB/s] [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file vocab.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file merges.txt from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file tokenizer.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file tokenizer_config.json from cache at None [INFO|configuration_utils.py:653] 2023-02-14 21:56:31,525 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json [INFO|configuration_utils.py:705] 2023-02-14 21:56:31,526 >> Model config GPT2Config { "_name_or_path": "gpt2", "activation_function": "gelu_new", "architectures": [ "GPT2LMHeadModel" ], "attn_pdrop": 0.1, "bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2", "n_ctx": 1024, "n_embd": 768, "n_head": 12, "n_inner": null, "n_layer": 12, "n_positions": 1024, "reorder_and_upcast_attn": false, "resid_pdrop": 0.1, "scale_attn_by_inverse_layer_idx": false, "scale_attn_weights": true, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true, "task_specific_params": { "text-generation": { "do_sample": true, "max_length": 50 } }, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 50257 } INFO:__main__:Using hidden states in model: False INFO:__main__:Using implementation from class: GPT2ForSequenceClassificationCustom Downloading (…)"pytorch_model.bin";: 100% 548M/548M [00:05<00:00, 108MB/s] [INFO|modeling_utils.py:2156] 2023-02-14 21:56:36,895 >> loading weights file pytorch_model.bin from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin [INFO|modeling_utils.py:2606] 2023-02-14 21:56:39,410 >> All model checkpoint weights were used when initializing GPT2ForSequenceClassificationCustom. [WARNING|modeling_utils.py:2608] 2023-02-14 21:56:39,410 >> Some weights of GPT2ForSequenceClassificationCustom were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.dense_1_hidden.bias', 'score.dense_1_input.weight', 'score.dense_2.bias', 'score.dense_2.weight', 'score.out_proj.weight', 'score.dense_1_hidden.weight', 'score.dense_1_input.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [ERROR|tokenization_utils_base.py:1019] 2023-02-14 21:56:39,418 >> Using pad_token, but it is not set yet. INFO:__main__:Set PAD token to EOS: <|endoftext|> Running tokenizer on dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 19.61ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.48ba/s] Running tokenizer on dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 7.71ba/s] INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [72, 466, 1064, 649, 2460, 1312, 285, 1016, 284, 1949, 3131, 1327, 284, 787, 606, 2652, 290, 611, 1312, 5409, 326, 1312, 836, 256, 765, 284, 1254, 5938, 757, 290, 655, 6594, 503, 262, 938, 614, 286, 1524, 319, 616, 898, 1312, 285, 1016, 284, 423, 284, 1949, 3131, 1327, 407, 284, 1337, 644, 661, 892, 286, 502, 852, 257, 300, 14491, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [72, 1965, 606, 284, 4654, 502, 287, 4441, 257, 995, 810, 477, 614, 1468, 4813, 714, 1663, 510, 4203, 17836, 290, 3665, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [72, 1254, 618, 345, 389, 257, 18088, 1048, 345, 4729, 584, 18088, 661, 656, 534, 1204, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}. [INFO|trainer.py:503] 2023-02-14 21:56:42,941 >> max_steps is given, it will override any value given in num_train_epochs [INFO|trainer.py:725] 2023-02-14 21:56:42,941 >> The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1607] 2023-02-14 21:56:42,947 >> ***** Running training ***** [INFO|trainer.py:1608] 2023-02-14 21:56:42,947 >> Num examples = 16000 [INFO|trainer.py:1609] 2023-02-14 21:56:42,947 >> Num Epochs = 4 [INFO|trainer.py:1610] 2023-02-14 21:56:42,947 >> Instantaneous batch size per device = 24 [INFO|trainer.py:1611] 2023-02-14 21:56:42,947 >> Total train batch size (w. parallel, distributed & accumulation) = 24 [INFO|trainer.py:1612] 2023-02-14 21:56:42,947 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1613] 2023-02-14 21:56:42,947 >> Total optimization steps = 2500 {'loss': 1.6218, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.15} {'loss': 1.1593, 'learning_rate': 1.8400000000000003e-05, 'epoch': 0.3} 10% 250/2500 [00:39<05:43, 6.56it/s][INFO|trainer.py:725] 2023-02-14 21:57:22,025 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:57:22,027 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:57:22,027 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:57:22,027 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 26.97it/s][A 8% 7/84 [00:00<00:03, 22.99it/s][A 12% 10/84 [00:00<00:03, 21.78it/s][A 15% 13/84 [00:00<00:03, 21.18it/s][A 19% 16/84 [00:00<00:03, 20.86it/s][A 23% 19/84 [00:00<00:03, 20.66it/s][A 26% 22/84 [00:01<00:03, 20.55it/s][A 30% 25/84 [00:01<00:02, 20.44it/s][A 33% 28/84 [00:01<00:02, 20.32it/s][A 37% 31/84 [00:01<00:02, 20.32it/s][A 40% 34/84 [00:01<00:02, 20.31it/s][A 44% 37/84 [00:01<00:02, 20.30it/s][A 48% 40/84 [00:01<00:02, 20.31it/s][A 51% 43/84 [00:02<00:02, 20.32it/s][A 55% 46/84 [00:02<00:01, 20.29it/s][A 58% 49/84 [00:02<00:01, 20.28it/s][A 62% 52/84 [00:02<00:01, 20.28it/s][A 65% 55/84 [00:02<00:01, 20.28it/s][A 69% 58/84 [00:02<00:01, 20.28it/s][A 73% 61/84 [00:02<00:01, 20.27it/s][A 76% 64/84 [00:03<00:00, 20.27it/s][A 80% 67/84 [00:03<00:00, 20.25it/s][A 83% 70/84 [00:03<00:00, 20.26it/s][A 87% 73/84 [00:03<00:00, 20.23it/s][A 90% 76/84 [00:03<00:00, 20.22it/s][A 94% 79/84 [00:03<00:00, 20.23it/s][A 98% 82/84 [00:03<00:00, 20.22it/s][A {'eval_loss': 0.6981180310249329, 'eval_accuracy': 0.7329999804496765, 'eval_runtime': 4.1201, 'eval_samples_per_second': 485.426, 'eval_steps_per_second': 20.388, 'epoch': 0.37} 10% 250/2500 [00:43<05:43, 6.56it/s] {'loss': 0.8016, 'learning_rate': 1.76e-05, 'epoch': 0.45} {'loss': 0.5481, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.6} {'loss': 0.4045, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.75} 20% 500/2500 [01:21<05:03, 6.58it/s][INFO|trainer.py:725] 2023-02-14 21:58:04,246 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:58:04,248 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:58:04,248 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:58:04,248 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 26.97it/s][A 8% 7/84 [00:00<00:03, 23.02it/s][A 12% 10/84 [00:00<00:03, 21.78it/s][A 15% 13/84 [00:00<00:03, 21.20it/s][A 19% 16/84 [00:00<00:03, 20.86it/s][A 23% 19/84 [00:00<00:03, 20.19it/s][A 26% 22/84 [00:01<00:03, 20.20it/s][A 30% 25/84 [00:01<00:02, 20.21it/s][A 33% 28/84 [00:01<00:02, 20.22it/s][A 37% 31/84 [00:01<00:02, 20.23it/s][A 40% 34/84 [00:01<00:02, 20.23it/s][A 44% 37/84 [00:01<00:02, 20.24it/s][A 48% 40/84 [00:01<00:02, 20.25it/s][A 51% 43/84 [00:02<00:02, 20.24it/s][A 55% 46/84 [00:02<00:01, 20.25it/s][A 58% 49/84 [00:02<00:01, 20.24it/s][A 62% 52/84 [00:02<00:01, 20.26it/s][A 65% 55/84 [00:02<00:01, 20.25it/s][A 69% 58/84 [00:02<00:01, 20.25it/s][A 73% 61/84 [00:02<00:01, 20.24it/s][A 76% 64/84 [00:03<00:00, 20.25it/s][A 80% 67/84 [00:03<00:00, 20.24it/s][A 83% 70/84 [00:03<00:00, 20.26it/s][A 87% 73/84 [00:03<00:00, 20.26it/s][A 90% 76/84 [00:03<00:00, 20.27it/s][A 94% 79/84 [00:03<00:00, 20.25it/s][A 98% 82/84 [00:04<00:00, 20.24it/s][A {'eval_loss': 0.29522550106048584, 'eval_accuracy': 0.9100000262260437, 'eval_runtime': 4.1309, 'eval_samples_per_second': 484.153, 'eval_steps_per_second': 20.334, 'epoch': 0.75} 20% 500/2500 [01:25<05:03, 6.58it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:58:08,380 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-500 [INFO|configuration_utils.py:447] 2023-02-14 21:58:08,381 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:58:09,983 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:58:09,984 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:58:09,984 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-500/special_tokens_map.json {'loss': 0.356, 'learning_rate': 1.5200000000000002e-05, 'epoch': 0.9} {'loss': 0.2714, 'learning_rate': 1.4400000000000001e-05, 'epoch': 1.05} 30% 750/2500 [02:07<04:25, 6.59it/s][INFO|trainer.py:725] 2023-02-14 21:58:49,972 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:58:49,973 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:58:49,974 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:58:49,974 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.06it/s][A 8% 7/84 [00:00<00:03, 23.11it/s][A 12% 10/84 [00:00<00:03, 21.85it/s][A 15% 13/84 [00:00<00:03, 21.25it/s][A 19% 16/84 [00:00<00:03, 20.89it/s][A 23% 19/84 [00:00<00:03, 20.67it/s][A 26% 22/84 [00:01<00:03, 20.56it/s][A 30% 25/84 [00:01<00:02, 20.48it/s][A 33% 28/84 [00:01<00:02, 20.42it/s][A 37% 31/84 [00:01<00:02, 20.39it/s][A 40% 34/84 [00:01<00:02, 20.37it/s][A 44% 37/84 [00:01<00:02, 20.34it/s][A 48% 40/84 [00:01<00:02, 20.31it/s][A 51% 43/84 [00:02<00:02, 20.32it/s][A 55% 46/84 [00:02<00:01, 20.29it/s][A 58% 49/84 [00:02<00:01, 20.30it/s][A 62% 52/84 [00:02<00:01, 20.30it/s][A 65% 55/84 [00:02<00:01, 20.30it/s][A 69% 58/84 [00:02<00:01, 20.25it/s][A 73% 61/84 [00:02<00:01, 20.27it/s][A 76% 64/84 [00:03<00:00, 20.27it/s][A 80% 67/84 [00:03<00:00, 20.28it/s][A 83% 70/84 [00:03<00:00, 20.30it/s][A 87% 73/84 [00:03<00:00, 20.30it/s][A 90% 76/84 [00:03<00:00, 20.31it/s][A 94% 79/84 [00:03<00:00, 20.30it/s][A 98% 82/84 [00:03<00:00, 20.30it/s][A {'eval_loss': 0.22870442271232605, 'eval_accuracy': 0.9200000166893005, 'eval_runtime': 4.1118, 'eval_samples_per_second': 486.403, 'eval_steps_per_second': 20.429, 'epoch': 1.12} 30% 750/2500 [02:11<04:25, 6.59it/s] {'loss': 0.2332, 'learning_rate': 1.3600000000000002e-05, 'epoch': 1.2} {'loss': 0.2135, 'learning_rate': 1.2800000000000001e-05, 'epoch': 1.35} {'loss': 0.2283, 'learning_rate': 1.2e-05, 'epoch': 1.5} 40% 1000/2500 [02:49<03:48, 6.57it/s][INFO|trainer.py:725] 2023-02-14 21:59:32,169 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 21:59:32,170 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 21:59:32,170 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 21:59:32,171 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.03it/s][A 8% 7/84 [00:00<00:03, 23.07it/s][A 12% 10/84 [00:00<00:03, 21.78it/s][A 15% 13/84 [00:00<00:03, 21.17it/s][A 19% 16/84 [00:00<00:03, 20.84it/s][A 23% 19/84 [00:00<00:03, 20.62it/s][A 26% 22/84 [00:01<00:03, 20.52it/s][A 30% 25/84 [00:01<00:02, 20.39it/s][A 33% 28/84 [00:01<00:02, 20.36it/s][A 37% 31/84 [00:01<00:02, 20.33it/s][A 40% 34/84 [00:01<00:02, 20.31it/s][A 44% 37/84 [00:01<00:02, 20.28it/s][A 48% 40/84 [00:01<00:02, 20.30it/s][A 51% 43/84 [00:02<00:02, 20.14it/s][A 55% 46/84 [00:02<00:01, 20.18it/s][A 58% 49/84 [00:02<00:01, 20.20it/s][A 62% 52/84 [00:02<00:01, 20.22it/s][A 65% 55/84 [00:02<00:01, 20.24it/s][A 69% 58/84 [00:02<00:01, 20.26it/s][A 73% 61/84 [00:02<00:01, 20.28it/s][A 76% 64/84 [00:03<00:00, 20.29it/s][A 80% 67/84 [00:03<00:00, 20.31it/s][A 83% 70/84 [00:03<00:00, 20.30it/s][A 87% 73/84 [00:03<00:00, 20.28it/s][A 90% 76/84 [00:03<00:00, 20.28it/s][A 94% 79/84 [00:03<00:00, 20.27it/s][A 98% 82/84 [00:04<00:00, 20.25it/s][A {'eval_loss': 0.16501356661319733, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 4.1217, 'eval_samples_per_second': 485.232, 'eval_steps_per_second': 20.38, 'epoch': 1.5} 40% 1000/2500 [02:53<03:48, 6.57it/s] [A[INFO|trainer.py:2656] 2023-02-14 21:59:36,293 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-1000 [INFO|configuration_utils.py:447] 2023-02-14 21:59:36,294 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-1000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 21:59:37,744 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 21:59:37,744 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 21:59:37,744 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-1000/special_tokens_map.json {'loss': 0.1836, 'learning_rate': 1.1200000000000001e-05, 'epoch': 1.65} {'loss': 0.1844, 'learning_rate': 1.04e-05, 'epoch': 1.8} 50% 1250/2500 [03:34<03:09, 6.59it/s][INFO|trainer.py:725] 2023-02-14 22:00:17,827 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:00:17,829 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:00:17,829 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:00:17,829 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.06it/s][A 8% 7/84 [00:00<00:03, 23.06it/s][A 12% 10/84 [00:00<00:03, 21.79it/s][A 15% 13/84 [00:00<00:03, 21.21it/s][A 19% 16/84 [00:00<00:03, 20.88it/s][A 23% 19/84 [00:00<00:03, 20.65it/s][A 26% 22/84 [00:01<00:03, 20.55it/s][A 30% 25/84 [00:01<00:02, 20.47it/s][A 33% 28/84 [00:01<00:02, 20.34it/s][A 37% 31/84 [00:01<00:02, 20.30it/s][A 40% 34/84 [00:01<00:02, 20.27it/s][A 44% 37/84 [00:01<00:02, 20.28it/s][A 48% 40/84 [00:01<00:02, 20.26it/s][A 51% 43/84 [00:02<00:02, 20.26it/s][A 55% 46/84 [00:02<00:01, 20.28it/s][A 58% 49/84 [00:02<00:01, 20.28it/s][A 62% 52/84 [00:02<00:01, 20.29it/s][A 65% 55/84 [00:02<00:01, 20.29it/s][A 69% 58/84 [00:02<00:01, 20.30it/s][A 73% 61/84 [00:02<00:01, 20.30it/s][A 76% 64/84 [00:03<00:00, 20.30it/s][A 80% 67/84 [00:03<00:00, 20.30it/s][A 83% 70/84 [00:03<00:00, 20.28it/s][A 87% 73/84 [00:03<00:00, 20.25it/s][A 90% 76/84 [00:03<00:00, 20.25it/s][A 94% 79/84 [00:03<00:00, 20.25it/s][A 98% 82/84 [00:03<00:00, 20.26it/s][A {'eval_loss': 0.15909001231193542, 'eval_accuracy': 0.9355000257492065, 'eval_runtime': 4.1177, 'eval_samples_per_second': 485.712, 'eval_steps_per_second': 20.4, 'epoch': 1.87} 50% 1250/2500 [03:38<03:09, 6.59it/s] {'loss': 0.2181, 'learning_rate': 9.600000000000001e-06, 'epoch': 1.95} {'loss': 0.1695, 'learning_rate': 8.8e-06, 'epoch': 2.1} {'loss': 0.1683, 'learning_rate': 8.000000000000001e-06, 'epoch': 2.25} 60% 1500/2500 [04:17<02:32, 6.55it/s][INFO|trainer.py:725] 2023-02-14 22:00:59,986 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:00:59,988 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:00:59,988 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:00:59,988 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.10it/s][A 8% 7/84 [00:00<00:03, 23.06it/s][A 12% 10/84 [00:00<00:03, 21.79it/s][A 15% 13/84 [00:00<00:03, 21.16it/s][A 19% 16/84 [00:00<00:03, 20.86it/s][A 23% 19/84 [00:00<00:03, 20.65it/s][A 26% 22/84 [00:01<00:03, 20.52it/s][A 30% 25/84 [00:01<00:02, 20.45it/s][A 33% 28/84 [00:01<00:02, 20.30it/s][A 37% 31/84 [00:01<00:02, 20.24it/s][A 40% 34/84 [00:01<00:02, 20.11it/s][A 44% 37/84 [00:01<00:02, 20.12it/s][A 48% 40/84 [00:01<00:02, 20.17it/s][A 51% 43/84 [00:02<00:02, 20.19it/s][A 55% 46/84 [00:02<00:01, 20.22it/s][A 58% 49/84 [00:02<00:01, 20.20it/s][A 62% 52/84 [00:02<00:01, 20.23it/s][A 65% 55/84 [00:02<00:01, 20.23it/s][A 69% 58/84 [00:02<00:01, 20.24it/s][A 73% 61/84 [00:02<00:01, 20.22it/s][A 76% 64/84 [00:03<00:00, 20.25it/s][A 80% 67/84 [00:03<00:00, 20.24it/s][A 83% 70/84 [00:03<00:00, 20.23it/s][A 87% 73/84 [00:03<00:00, 20.20it/s][A 90% 76/84 [00:03<00:00, 20.22it/s][A 94% 79/84 [00:03<00:00, 20.20it/s][A 98% 82/84 [00:04<00:00, 20.20it/s][A {'eval_loss': 0.1472882628440857, 'eval_accuracy': 0.934499979019165, 'eval_runtime': 4.13, 'eval_samples_per_second': 484.258, 'eval_steps_per_second': 20.339, 'epoch': 2.25} 60% 1500/2500 [04:21<02:32, 6.55it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:01:04,119 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-1500 [INFO|configuration_utils.py:447] 2023-02-14 22:01:04,120 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-1500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:01:05,576 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-1500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:01:05,576 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-1500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:01:05,576 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-1500/special_tokens_map.json {'loss': 0.1497, 'learning_rate': 7.2000000000000005e-06, 'epoch': 2.4} {'loss': 0.1496, 'learning_rate': 6.4000000000000006e-06, 'epoch': 2.55} 70% 1750/2500 [05:02<01:54, 6.54it/s][INFO|trainer.py:725] 2023-02-14 22:01:45,617 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:01:45,618 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:01:45,619 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:01:45,619 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 26.78it/s][A 8% 7/84 [00:00<00:03, 22.79it/s][A 12% 10/84 [00:00<00:03, 21.58it/s][A 15% 13/84 [00:00<00:03, 21.03it/s][A 19% 16/84 [00:00<00:03, 20.70it/s][A 23% 19/84 [00:00<00:03, 20.49it/s][A 26% 22/84 [00:01<00:03, 20.30it/s][A 30% 25/84 [00:01<00:02, 20.22it/s][A 33% 28/84 [00:01<00:02, 20.19it/s][A 37% 31/84 [00:01<00:02, 20.16it/s][A 40% 34/84 [00:01<00:02, 20.15it/s][A 44% 37/84 [00:01<00:02, 20.14it/s][A 48% 40/84 [00:01<00:02, 20.12it/s][A 51% 43/84 [00:02<00:02, 20.09it/s][A 55% 46/84 [00:02<00:01, 20.08it/s][A 58% 49/84 [00:02<00:01, 20.10it/s][A 62% 52/84 [00:02<00:01, 20.13it/s][A 65% 55/84 [00:02<00:01, 20.19it/s][A 69% 58/84 [00:02<00:01, 20.20it/s][A 73% 61/84 [00:02<00:01, 20.22it/s][A 76% 64/84 [00:03<00:00, 20.21it/s][A 80% 67/84 [00:03<00:00, 20.22it/s][A 83% 70/84 [00:03<00:00, 20.25it/s][A 87% 73/84 [00:03<00:00, 20.27it/s][A 90% 76/84 [00:03<00:00, 20.28it/s][A 94% 79/84 [00:03<00:00, 20.27it/s][A 98% 82/84 [00:04<00:00, 20.25it/s][A {'eval_loss': 0.14743593335151672, 'eval_accuracy': 0.9359999895095825, 'eval_runtime': 4.1413, 'eval_samples_per_second': 482.944, 'eval_steps_per_second': 20.284, 'epoch': 2.62} 70% 1750/2500 [05:06<01:54, 6.54it/s] {'loss': 0.1465, 'learning_rate': 5.600000000000001e-06, 'epoch': 2.7} {'loss': 0.1376, 'learning_rate': 4.800000000000001e-06, 'epoch': 2.85} {'loss': 0.1444, 'learning_rate': 4.000000000000001e-06, 'epoch': 3.0} 80% 2000/2500 [05:44<01:16, 6.57it/s][INFO|trainer.py:725] 2023-02-14 22:02:27,845 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:02:27,846 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:02:27,846 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:02:27,846 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.04it/s][A 8% 7/84 [00:00<00:03, 23.04it/s][A 12% 10/84 [00:00<00:03, 21.75it/s][A 15% 13/84 [00:00<00:03, 21.18it/s][A 19% 16/84 [00:00<00:03, 20.85it/s][A 23% 19/84 [00:00<00:03, 20.61it/s][A 26% 22/84 [00:01<00:03, 20.49it/s][A 30% 25/84 [00:01<00:02, 20.43it/s][A 33% 28/84 [00:01<00:02, 20.39it/s][A 37% 31/84 [00:01<00:02, 20.14it/s][A 40% 34/84 [00:01<00:02, 20.16it/s][A 44% 37/84 [00:01<00:02, 20.21it/s][A 48% 40/84 [00:01<00:02, 20.22it/s][A 51% 43/84 [00:02<00:02, 20.22it/s][A 55% 46/84 [00:02<00:01, 20.20it/s][A 58% 49/84 [00:02<00:01, 20.19it/s][A 62% 52/84 [00:02<00:01, 20.20it/s][A 65% 55/84 [00:02<00:01, 20.22it/s][A 69% 58/84 [00:02<00:01, 20.24it/s][A 73% 61/84 [00:02<00:01, 20.24it/s][A 76% 64/84 [00:03<00:00, 20.26it/s][A 80% 67/84 [00:03<00:00, 20.27it/s][A 83% 70/84 [00:03<00:00, 20.28it/s][A 87% 73/84 [00:03<00:00, 20.24it/s][A 90% 76/84 [00:03<00:00, 20.21it/s][A 94% 79/84 [00:03<00:00, 20.21it/s][A 98% 82/84 [00:04<00:00, 20.20it/s][A {'eval_loss': 0.14364145696163177, 'eval_accuracy': 0.9365000128746033, 'eval_runtime': 4.1279, 'eval_samples_per_second': 484.505, 'eval_steps_per_second': 20.349, 'epoch': 3.0} 80% 2000/2500 [05:49<01:16, 6.57it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:02:31,975 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-2000 [INFO|configuration_utils.py:447] 2023-02-14 22:02:31,976 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-2000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:02:33,429 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:02:33,430 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:02:33,430 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-2000/special_tokens_map.json {'loss': 0.104, 'learning_rate': 3.2000000000000003e-06, 'epoch': 3.15} {'loss': 0.1206, 'learning_rate': 2.4000000000000003e-06, 'epoch': 3.3} 90% 2250/2500 [06:30<00:38, 6.55it/s][INFO|trainer.py:725] 2023-02-14 22:03:13,484 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:03:13,486 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:03:13,486 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:03:13,486 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.11it/s][A 8% 7/84 [00:00<00:03, 23.10it/s][A 12% 10/84 [00:00<00:03, 21.81it/s][A 15% 13/84 [00:00<00:03, 21.22it/s][A 19% 16/84 [00:00<00:03, 20.88it/s][A 23% 19/84 [00:00<00:03, 20.68it/s][A 26% 22/84 [00:01<00:03, 20.56it/s][A 30% 25/84 [00:01<00:02, 20.47it/s][A 33% 28/84 [00:01<00:02, 20.41it/s][A 37% 31/84 [00:01<00:02, 20.38it/s][A 40% 34/84 [00:01<00:02, 20.34it/s][A 44% 37/84 [00:01<00:02, 20.34it/s][A 48% 40/84 [00:01<00:02, 20.33it/s][A 51% 43/84 [00:02<00:02, 20.26it/s][A 55% 46/84 [00:02<00:01, 20.26it/s][A 58% 49/84 [00:02<00:01, 20.17it/s][A 62% 52/84 [00:02<00:01, 20.21it/s][A 65% 55/84 [00:02<00:01, 20.21it/s][A 69% 58/84 [00:02<00:01, 20.23it/s][A 73% 61/84 [00:02<00:01, 20.25it/s][A 76% 64/84 [00:03<00:00, 20.26it/s][A 80% 67/84 [00:03<00:00, 20.26it/s][A 83% 70/84 [00:03<00:00, 20.28it/s][A 87% 73/84 [00:03<00:00, 20.29it/s][A 90% 76/84 [00:03<00:00, 20.26it/s][A 94% 79/84 [00:03<00:00, 20.27it/s][A 98% 82/84 [00:03<00:00, 20.27it/s][A {'eval_loss': 0.15543130040168762, 'eval_accuracy': 0.9369999766349792, 'eval_runtime': 4.1171, 'eval_samples_per_second': 485.782, 'eval_steps_per_second': 20.403, 'epoch': 3.37} 90% 2250/2500 [06:34<00:38, 6.55it/s] {'loss': 0.1289, 'learning_rate': 1.6000000000000001e-06, 'epoch': 3.45} {'loss': 0.1231, 'learning_rate': 8.000000000000001e-07, 'epoch': 3.6} {'loss': 0.1179, 'learning_rate': 0.0, 'epoch': 3.75} 100% 2500/2500 [07:12<00:00, 6.57it/s][INFO|trainer.py:725] 2023-02-14 22:03:55,704 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:03:55,705 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:03:55,705 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:03:55,706 >> Batch size = 24 0% 0/84 [00:00<?, ?it/s][A 5% 4/84 [00:00<00:02, 27.06it/s][A 8% 7/84 [00:00<00:03, 23.11it/s][A 12% 10/84 [00:00<00:03, 21.81it/s][A 15% 13/84 [00:00<00:03, 21.13it/s][A 19% 16/84 [00:00<00:03, 20.82it/s][A 23% 19/84 [00:00<00:03, 20.65it/s][A 26% 22/84 [00:01<00:03, 20.47it/s][A 30% 25/84 [00:01<00:02, 20.41it/s][A 33% 28/84 [00:01<00:02, 20.38it/s][A 37% 31/84 [00:01<00:02, 20.35it/s][A 40% 34/84 [00:01<00:02, 20.35it/s][A 44% 37/84 [00:01<00:02, 20.32it/s][A 48% 40/84 [00:01<00:02, 20.30it/s][A 51% 43/84 [00:02<00:02, 20.30it/s][A 55% 46/84 [00:02<00:01, 20.30it/s][A 58% 49/84 [00:02<00:01, 20.30it/s][A 62% 52/84 [00:02<00:01, 20.29it/s][A 65% 55/84 [00:02<00:01, 20.31it/s][A 69% 58/84 [00:02<00:01, 20.28it/s][A 73% 61/84 [00:02<00:01, 20.26it/s][A 76% 64/84 [00:03<00:00, 20.24it/s][A 80% 67/84 [00:03<00:00, 20.26it/s][A 83% 70/84 [00:03<00:00, 20.27it/s][A 87% 73/84 [00:03<00:00, 20.27it/s][A 90% 76/84 [00:03<00:00, 20.29it/s][A 94% 79/84 [00:03<00:00, 20.29it/s][A 98% 82/84 [00:03<00:00, 20.30it/s][A {'eval_loss': 0.14437170326709747, 'eval_accuracy': 0.9350000023841858, 'eval_runtime': 4.116, 'eval_samples_per_second': 485.915, 'eval_steps_per_second': 20.408, 'epoch': 3.75} 100% 2500/2500 [07:16<00:00, 6.57it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:03:59,822 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-2500 [INFO|configuration_utils.py:447] 2023-02-14 22:03:59,823 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-2500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:04:00,568 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-2500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:04:00,569 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-2500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:04:00,569 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-2500/special_tokens_map.json [INFO|trainer.py:1852] 2023-02-14 22:04:02,582 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:1946] 2023-02-14 22:04:02,582 >> Loading best model from out/emotion/gpt2_custom/checkpoint-2000 (score: 0.9365000128746033). {'train_runtime': 440.0758, 'train_samples_per_second': 136.34, 'train_steps_per_second': 5.681, 'train_loss': 0.32335229415893557, 'epoch': 3.75} 100% 2500/2500 [07:20<00:00, 5.68it/s] [INFO|trainer.py:2656] 2023-02-14 22:04:03,025 >> Saving model checkpoint to out/emotion/gpt2_custom [INFO|configuration_utils.py:447] 2023-02-14 22:04:03,026 >> Configuration saved in out/emotion/gpt2_custom/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:04:03,965 >> Model weights saved in out/emotion/gpt2_custom/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:04:03,966 >> tokenizer config file saved in out/emotion/gpt2_custom/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:04:03,966 >> Special tokens file saved in out/emotion/gpt2_custom/special_tokens_map.json ***** train metrics ***** epoch = 3.75 train_loss = 0.3234 train_runtime = 0:07:20.07 train_samples = 16000 train_samples_per_second = 136.34 train_steps_per_second = 5.681 INFO:__main__:*** Evaluate *** [INFO|trainer.py:725] 2023-02-14 22:04:04,068 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:04:04,069 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:04:04,069 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:04:04,070 >> Batch size = 24 100% 84/84 [00:04<00:00, 20.35it/s] ***** eval metrics ***** epoch = 3.75 eval_accuracy = 0.9365 eval_loss = 0.1436 eval_runtime = 0:00:04.18 eval_samples = 2000 eval_samples_per_second = 477.778 eval_steps_per_second = 20.067 INFO:__main__:*** Predict *** [INFO|trainer.py:725] 2023-02-14 22:04:08,259 >> The following columns in the test set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`, you can safely ignore this message. [INFO|trainer.py:2907] 2023-02-14 22:04:08,260 >> ***** Running Prediction ***** [INFO|trainer.py:2909] 2023-02-14 22:04:08,260 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:04:08,260 >> Batch size = 24 100% 84/84 [00:04<00:00, 20.62it/s] INFO:__main__:***** Predict results None ***** [INFO|modelcard.py:444] 2023-02-14 22:04:12,537 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9365000128746033}]}
T5
- full data
- model
T5
- sequnece length: 128
- training epoch: 1
- first few layers frozen
!python run_translation.py \
--cache_dir t5_cache_training \
--model_name_or_path "google/t5-v1_1-small" \
--train_file data/s2s-train.json \
--validation_file data/s2s-valid.json \
--test_file data/s2s-test.json \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--source_lang "text" \
--target_lang "label" \
--source_prefix "emotion classification" \
--max_source_length 256 \
--max_target_length 128 \
--generation_max_length 128 \
--do_train \
--do_eval \
--do_predict \
--predict_with_generate \
--num_train_epochs 1 \
--output_dir out/emotion/t5_v1_1 \
--overwrite_output_dir \
--eval_steps 250 \
--evaluation_strategy steps \
--metric_for_best_model accuracy \
--logging_steps 100 \
--save_total_limit 5 \
--max_steps 2500 \
--load_best_model_at_end True
2023-02-14 22:04:17.129470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-14 22:04:17.281426: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2023-02-14 22:04:18.087509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 22:04:18.087605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-14 22:04:18.087624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False INFO:__main__:Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=250, evaluation_strategy=steps, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_max_length=128, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=<HUB_TOKEN>, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=-1, log_level=passive, log_level_replica=passive, log_on_each_node=True, logging_dir=out/emotion/t5_v1_1/runs/Feb14_22-04-20_fc0011e45a00, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=100, logging_strategy=steps, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=2500, metric_for_best_model=accuracy, mp_parameters=, no_cuda=False, num_train_epochs=1.0, optim=adamw_hf, output_dir=out/emotion/t5_v1_1, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=8, predict_with_generate=True, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=<PUSH_TO_HUB_TOKEN>, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out/emotion/t5_v1_1, save_on_each_node=False, save_steps=500, save_strategy=steps, save_total_limit=5, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) WARNING:datasets.builder:Using custom data configuration default-a82ca4164dba097e INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json INFO:datasets.builder:Generating dataset json (/content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51) Downloading and preparing dataset json/default to /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51... Downloading data files: 100% 3/3 [00:00<00:00, 11848.32it/s] INFO:datasets.download.download_manager:Downloading took 0.0 min INFO:datasets.download.download_manager:Checksum Computation took 0.0 min Extracting data files: 100% 3/3 [00:00<00:00, 2097.85it/s] INFO:datasets.utils.info_utils:Unable to verify checksums. INFO:datasets.builder:Generating train split INFO:datasets.builder:Generating validation split INFO:datasets.builder:Generating test split INFO:datasets.utils.info_utils:Unable to verify splits sizes. Dataset json downloaded and prepared to /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data. 100% 3/3 [00:00<00:00, 953.83it/s] Downloading (…)lve/main/config.json: 100% 537/537 [00:00<00:00, 97.0kB/s] [INFO|configuration_utils.py:653] 2023-02-14 22:04:20,972 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json [INFO|configuration_utils.py:705] 2023-02-14 22:04:20,975 >> Model config T5Config { "_name_or_path": "google/t5-v1_1-small", "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 1024, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 8, "num_heads": 6, "num_layers": 8, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 32128 } Downloading (…)okenizer_config.json: 100% 1.86k/1.86k [00:00<00:00, 853kB/s] [INFO|configuration_utils.py:653] 2023-02-14 22:04:21,160 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json [INFO|configuration_utils.py:705] 2023-02-14 22:04:21,160 >> Model config T5Config { "_name_or_path": "google/t5-v1_1-small", "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 1024, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 8, "num_heads": 6, "num_layers": 8, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 32128 } Downloading (…)ve/main/spiece.model: 100% 792k/792k [00:00<00:00, 10.2MB/s] Downloading (…)cial_tokens_map.json: 100% 1.79k/1.79k [00:00<00:00, 705kB/s] [INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file spiece.model from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model [INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file tokenizer.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file special_tokens_map.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json [INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file tokenizer_config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json [INFO|configuration_utils.py:653] 2023-02-14 22:04:21,838 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json [INFO|configuration_utils.py:705] 2023-02-14 22:04:21,838 >> Model config T5Config { "_name_or_path": "google/t5-v1_1-small", "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 1024, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 8, "num_heads": 6, "num_layers": 8, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 32128 } [INFO|configuration_utils.py:653] 2023-02-14 22:04:21,888 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json [INFO|configuration_utils.py:705] 2023-02-14 22:04:21,889 >> Model config T5Config { "_name_or_path": "google/t5-v1_1-small", "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 1024, "d_kv": 64, "d_model": 512, "decoder_start_token_id": 0, "dense_act_fn": "gelu_new", "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "is_gated_act": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 8, "num_heads": 6, "num_layers": 8, "output_past": true, "pad_token_id": 0, "relative_attention_max_distance": 128, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "transformers_version": "4.23.1", "use_cache": true, "vocab_size": 32128 } Downloading (…)"pytorch_model.bin";: 100% 308M/308M [00:03<00:00, 84.8MB/s] [INFO|modeling_utils.py:2156] 2023-02-14 22:04:26,050 >> loading weights file pytorch_model.bin from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin [INFO|modeling_utils.py:2606] 2023-02-14 22:04:27,048 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration. [INFO|modeling_utils.py:2614] 2023-02-14 22:04:27,048 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/t5-v1_1-small. If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training. Frozen layers: [('encoder.block.1.layer.0.SelfAttention.q.weight', False), ('encoder.block.1.layer.0.SelfAttention.k.weight', False), ('encoder.block.1.layer.0.SelfAttention.v.weight', False), ('encoder.block.1.layer.0.SelfAttention.o.weight', False), ('encoder.block.1.layer.0.layer_norm.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.1.layer.1.layer_norm.weight', False), ('encoder.block.2.layer.0.SelfAttention.q.weight', False), ('encoder.block.2.layer.0.SelfAttention.k.weight', False), ('encoder.block.2.layer.0.SelfAttention.v.weight', False), ('encoder.block.2.layer.0.SelfAttention.o.weight', False), ('encoder.block.2.layer.0.layer_norm.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.2.layer.1.layer_norm.weight', False), ('encoder.block.3.layer.0.SelfAttention.q.weight', False), ('encoder.block.3.layer.0.SelfAttention.k.weight', False), ('encoder.block.3.layer.0.SelfAttention.v.weight', False), ('encoder.block.3.layer.0.SelfAttention.o.weight', False), ('encoder.block.3.layer.0.layer_norm.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.3.layer.1.layer_norm.weight', False), ('encoder.block.4.layer.0.SelfAttention.q.weight', False), ('encoder.block.4.layer.0.SelfAttention.k.weight', False), ('encoder.block.4.layer.0.SelfAttention.v.weight', False), ('encoder.block.4.layer.0.SelfAttention.o.weight', False), ('encoder.block.4.layer.0.layer_norm.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.4.layer.1.layer_norm.weight', False), ('encoder.block.5.layer.0.SelfAttention.q.weight', False), ('encoder.block.5.layer.0.SelfAttention.k.weight', False), ('encoder.block.5.layer.0.SelfAttention.v.weight', False), ('encoder.block.5.layer.0.SelfAttention.o.weight', False), ('encoder.block.5.layer.0.layer_norm.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.5.layer.1.layer_norm.weight', False), ('encoder.block.6.layer.0.SelfAttention.q.weight', False), ('encoder.block.6.layer.0.SelfAttention.k.weight', False), ('encoder.block.6.layer.0.SelfAttention.v.weight', False), ('encoder.block.6.layer.0.SelfAttention.o.weight', False), ('encoder.block.6.layer.0.layer_norm.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.6.layer.1.layer_norm.weight', False), ('encoder.block.7.layer.0.SelfAttention.q.weight', False), ('encoder.block.7.layer.0.SelfAttention.k.weight', False), ('encoder.block.7.layer.0.SelfAttention.v.weight', False), ('encoder.block.7.layer.0.SelfAttention.o.weight', False), ('encoder.block.7.layer.0.layer_norm.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.7.layer.1.layer_norm.weight', False)] INFO:__main__:Using translation prefix: "emotion classification: " Running tokenizer on train dataset: 0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa17416eabe18767.arrow Running tokenizer on train dataset: 100% 16/16 [00:00<00:00, 23.64ba/s] Running tokenizer on validation dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-c6cebbf9290f7df0.arrow Running tokenizer on validation dataset: 100% 2/2 [00:00<00:00, 33.01ba/s] Running tokenizer on prediction dataset: 0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-988bff0993eee389.arrow Running tokenizer on prediction dataset: 100% 2/2 [00:00<00:00, 33.06ba/s] [INFO|trainer.py:503] 2023-02-14 22:04:30,902 >> max_steps is given, it will override any value given in num_train_epochs /usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( [INFO|trainer.py:1607] 2023-02-14 22:04:30,911 >> ***** Running training ***** [INFO|trainer.py:1608] 2023-02-14 22:04:30,911 >> Num examples = 16000 [INFO|trainer.py:1609] 2023-02-14 22:04:30,911 >> Num Epochs = 2 [INFO|trainer.py:1610] 2023-02-14 22:04:30,911 >> Instantaneous batch size per device = 8 [INFO|trainer.py:1611] 2023-02-14 22:04:30,911 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1612] 2023-02-14 22:04:30,911 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1613] 2023-02-14 22:04:30,911 >> Total optimization steps = 2500 0% 0/2500 [00:00<?, ?it/s][WARNING|logging.py:281] 2023-02-14 22:04:30,925 >> You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. {'loss': 21.5908, 'learning_rate': 4.8e-05, 'epoch': 0.05} {'loss': 14.8264, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.1} 10% 249/2500 [00:24<03:31, 10.64it/s][INFO|trainer.py:2907] 2023-02-14 22:04:55,366 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:04:55,366 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:04:55,366 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.87it/s][A 2% 6/250 [00:00<00:14, 16.90it/s][A 3% 8/250 [00:00<00:15, 15.84it/s][A 4% 10/250 [00:00<00:15, 15.48it/s][A 5% 12/250 [00:00<00:15, 15.16it/s][A 6% 14/250 [00:00<00:15, 15.04it/s][A 6% 16/250 [00:01<00:15, 14.99it/s][A 7% 18/250 [00:01<00:15, 14.93it/s][A 8% 20/250 [00:01<00:15, 14.86it/s][A 9% 22/250 [00:01<00:15, 14.64it/s][A 10% 24/250 [00:01<00:15, 14.61it/s][A 10% 26/250 [00:01<00:15, 14.67it/s][A 11% 28/250 [00:01<00:15, 14.63it/s][A 12% 30/250 [00:01<00:15, 14.64it/s][A 13% 32/250 [00:02<00:14, 14.69it/s][A 14% 34/250 [00:02<00:14, 14.67it/s][A 14% 36/250 [00:02<00:14, 14.63it/s][A 15% 38/250 [00:02<00:14, 14.47it/s][A 16% 40/250 [00:02<00:14, 14.49it/s][A 17% 42/250 [00:02<00:14, 14.42it/s][A 18% 44/250 [00:02<00:14, 14.46it/s][A 18% 46/250 [00:03<00:14, 14.50it/s][A 19% 48/250 [00:03<00:13, 14.59it/s][A 20% 50/250 [00:03<00:13, 14.59it/s][A 21% 52/250 [00:03<00:13, 14.57it/s][A 22% 54/250 [00:03<00:13, 14.64it/s][A 22% 56/250 [00:03<00:13, 14.64it/s][A 23% 58/250 [00:03<00:13, 14.68it/s][A 24% 60/250 [00:04<00:12, 14.73it/s][A 25% 62/250 [00:04<00:12, 14.69it/s][A 26% 64/250 [00:04<00:12, 14.70it/s][A 26% 66/250 [00:04<00:12, 14.66it/s][A 27% 68/250 [00:04<00:12, 14.72it/s][A 28% 70/250 [00:04<00:12, 14.78it/s][A 29% 72/250 [00:04<00:12, 14.72it/s][A 30% 74/250 [00:04<00:11, 14.71it/s][A 30% 76/250 [00:05<00:11, 14.75it/s][A 31% 78/250 [00:05<00:11, 14.69it/s][A 32% 80/250 [00:05<00:11, 14.67it/s][A 33% 82/250 [00:05<00:11, 14.67it/s][A 34% 84/250 [00:05<00:11, 14.65it/s][A 34% 86/250 [00:05<00:11, 14.71it/s][A 35% 88/250 [00:05<00:11, 14.73it/s][A 36% 90/250 [00:06<00:10, 14.71it/s][A 37% 92/250 [00:06<00:10, 14.58it/s][A 38% 94/250 [00:06<00:10, 14.50it/s][A 38% 96/250 [00:06<00:10, 14.51it/s][A 39% 98/250 [00:06<00:10, 14.56it/s][A 40% 100/250 [00:06<00:10, 14.58it/s][A 41% 102/250 [00:06<00:10, 14.51it/s][A 42% 104/250 [00:07<00:10, 14.39it/s][A 42% 106/250 [00:07<00:10, 14.35it/s][A 43% 108/250 [00:07<00:09, 14.47it/s][A 44% 110/250 [00:07<00:09, 14.45it/s][A 45% 112/250 [00:07<00:09, 14.40it/s][A 46% 114/250 [00:07<00:09, 14.44it/s][A 46% 116/250 [00:07<00:09, 14.52it/s][A 47% 118/250 [00:08<00:09, 14.53it/s][A 48% 120/250 [00:08<00:08, 14.55it/s][A 49% 122/250 [00:08<00:08, 14.61it/s][A 50% 124/250 [00:08<00:08, 14.64it/s][A 50% 126/250 [00:08<00:08, 14.66it/s][A 51% 128/250 [00:08<00:08, 14.61it/s][A 52% 130/250 [00:08<00:08, 14.70it/s][A 53% 132/250 [00:08<00:07, 14.78it/s][A 54% 134/250 [00:09<00:07, 14.78it/s][A 54% 136/250 [00:09<00:07, 14.73it/s][A 55% 138/250 [00:09<00:07, 14.79it/s][A 56% 140/250 [00:09<00:07, 14.64it/s][A 57% 142/250 [00:09<00:07, 14.61it/s][A 58% 144/250 [00:09<00:07, 14.67it/s][A 58% 146/250 [00:09<00:07, 14.71it/s][A 59% 148/250 [00:10<00:06, 14.71it/s][A 60% 150/250 [00:10<00:06, 14.68it/s][A 61% 152/250 [00:10<00:06, 14.72it/s][A 62% 154/250 [00:10<00:06, 14.79it/s][A 62% 156/250 [00:10<00:06, 14.37it/s][A 63% 158/250 [00:10<00:06, 14.37it/s][A 64% 160/250 [00:10<00:06, 14.45it/s][A 65% 162/250 [00:11<00:06, 14.46it/s][A 66% 164/250 [00:11<00:05, 14.55it/s][A 66% 166/250 [00:11<00:05, 14.56it/s][A 67% 168/250 [00:11<00:05, 14.60it/s][A 68% 170/250 [00:11<00:05, 14.62it/s][A 69% 172/250 [00:11<00:05, 14.21it/s][A 70% 174/250 [00:11<00:05, 14.41it/s][A 70% 176/250 [00:11<00:05, 14.53it/s][A 71% 178/250 [00:12<00:04, 14.60it/s][A 72% 180/250 [00:12<00:04, 14.64it/s][A 73% 182/250 [00:12<00:04, 14.67it/s][A 74% 184/250 [00:12<00:04, 14.72it/s][A 74% 186/250 [00:12<00:04, 14.75it/s][A 75% 188/250 [00:12<00:04, 14.67it/s][A 76% 190/250 [00:12<00:04, 14.74it/s][A 77% 192/250 [00:13<00:03, 14.80it/s][A 78% 194/250 [00:13<00:03, 14.86it/s][A 78% 196/250 [00:13<00:03, 14.81it/s][A 79% 198/250 [00:13<00:03, 14.80it/s][A 80% 200/250 [00:13<00:03, 14.83it/s][A 81% 202/250 [00:13<00:03, 14.78it/s][A 82% 204/250 [00:13<00:03, 14.78it/s][A 82% 206/250 [00:14<00:02, 14.73it/s][A 83% 208/250 [00:14<00:02, 14.79it/s][A 84% 210/250 [00:14<00:02, 14.85it/s][A 85% 212/250 [00:14<00:02, 14.85it/s][A 86% 214/250 [00:14<00:02, 14.86it/s][A 86% 216/250 [00:14<00:02, 14.89it/s][A 87% 218/250 [00:14<00:02, 14.83it/s][A 88% 220/250 [00:14<00:02, 14.85it/s][A 89% 222/250 [00:15<00:01, 14.80it/s][A 10% 250/2500 [00:39<03:31, 10.64it/s] 90% 226/250 [00:15<00:01, 14.77it/s][A 91% 228/250 [00:15<00:01, 14.81it/s][A 92% 230/250 [00:15<00:01, 14.86it/s][A 93% 232/250 [00:15<00:01, 14.84it/s][A 94% 234/250 [00:15<00:01, 14.70it/s][A 94% 236/250 [00:16<00:00, 14.63it/s][A 95% 238/250 [00:16<00:00, 14.73it/s][A 96% 240/250 [00:16<00:00, 14.69it/s][A 97% 242/250 [00:16<00:00, 14.71it/s][A 98% 244/250 [00:16<00:00, 14.79it/s][A 98% 246/250 [00:16<00:00, 14.77it/s][A 99% 248/250 [00:16<00:00, 14.73it/s][A 100% 250/250 [00:16<00:00, 14.71it/s][A {'eval_loss': 9.001160621643066, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2175, 'eval_samples_per_second': 116.161, 'eval_steps_per_second': 14.52, 'epoch': 0.12} 10% 250/2500 [00:41<03:31, 10.64it/s] {'loss': 10.5792, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.15} {'loss': 7.8113, 'learning_rate': 4.2e-05, 'epoch': 0.2} {'loss': 5.2658, 'learning_rate': 4e-05, 'epoch': 0.25} 20% 500/2500 [01:05<03:04, 10.83it/s][INFO|trainer.py:2907] 2023-02-14 22:05:35,963 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:05:35,963 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:05:35,963 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 22.27it/s][A 2% 6/250 [00:00<00:14, 17.12it/s][A 3% 8/250 [00:00<00:14, 16.18it/s][A 4% 10/250 [00:00<00:15, 15.53it/s][A 5% 12/250 [00:00<00:15, 15.20it/s][A 6% 14/250 [00:00<00:15, 15.04it/s][A 6% 16/250 [00:01<00:15, 14.93it/s][A 7% 18/250 [00:01<00:15, 14.86it/s][A 8% 20/250 [00:01<00:15, 14.89it/s][A 9% 22/250 [00:01<00:15, 14.91it/s][A 10% 24/250 [00:01<00:15, 14.77it/s][A 10% 26/250 [00:01<00:15, 14.79it/s][A 11% 28/250 [00:01<00:15, 14.68it/s][A 12% 30/250 [00:01<00:14, 14.68it/s][A 13% 32/250 [00:02<00:14, 14.69it/s][A 14% 34/250 [00:02<00:14, 14.71it/s][A 14% 36/250 [00:02<00:14, 14.71it/s][A 15% 38/250 [00:02<00:14, 14.71it/s][A 16% 40/250 [00:02<00:14, 14.64it/s][A 17% 42/250 [00:02<00:14, 14.61it/s][A 18% 44/250 [00:02<00:14, 14.63it/s][A 18% 46/250 [00:03<00:13, 14.63it/s][A 19% 48/250 [00:03<00:13, 14.74it/s][A 20% 50/250 [00:03<00:13, 14.78it/s][A 21% 52/250 [00:03<00:13, 14.77it/s][A 22% 54/250 [00:03<00:13, 14.77it/s][A 22% 56/250 [00:03<00:13, 14.69it/s][A 23% 58/250 [00:03<00:13, 14.71it/s][A 24% 60/250 [00:04<00:12, 14.78it/s][A 25% 62/250 [00:04<00:12, 14.77it/s][A 26% 64/250 [00:04<00:12, 14.77it/s][A 26% 66/250 [00:04<00:12, 14.76it/s][A 27% 68/250 [00:04<00:12, 14.77it/s][A 28% 70/250 [00:04<00:12, 14.84it/s][A 29% 72/250 [00:04<00:12, 14.77it/s][A 30% 74/250 [00:04<00:11, 14.68it/s][A 30% 76/250 [00:05<00:11, 14.75it/s][A 31% 78/250 [00:05<00:11, 14.75it/s][A 32% 80/250 [00:05<00:11, 14.76it/s][A 33% 82/250 [00:05<00:11, 14.79it/s][A 34% 84/250 [00:05<00:11, 14.77it/s][A 34% 86/250 [00:05<00:11, 14.77it/s][A 35% 88/250 [00:05<00:10, 14.74it/s][A 36% 90/250 [00:06<00:10, 14.74it/s][A 37% 92/250 [00:06<00:10, 14.77it/s][A 38% 94/250 [00:06<00:10, 14.80it/s][A 38% 96/250 [00:06<00:10, 14.78it/s][A 39% 98/250 [00:06<00:10, 14.76it/s][A 40% 100/250 [00:06<00:10, 14.78it/s][A 41% 102/250 [00:06<00:09, 14.81it/s][A 42% 104/250 [00:06<00:09, 14.81it/s][A 42% 106/250 [00:07<00:09, 14.75it/s][A 43% 108/250 [00:07<00:09, 14.81it/s][A 44% 110/250 [00:07<00:09, 14.86it/s][A 45% 112/250 [00:07<00:09, 14.83it/s][A 46% 114/250 [00:07<00:09, 14.87it/s][A 46% 116/250 [00:07<00:09, 14.87it/s][A 47% 118/250 [00:07<00:08, 14.85it/s][A 48% 120/250 [00:08<00:08, 14.73it/s][A 49% 122/250 [00:08<00:08, 14.74it/s][A 50% 124/250 [00:08<00:08, 14.77it/s][A 50% 126/250 [00:08<00:08, 14.75it/s][A 51% 128/250 [00:08<00:08, 14.75it/s][A 52% 130/250 [00:08<00:08, 14.64it/s][A 53% 132/250 [00:08<00:08, 14.49it/s][A 54% 134/250 [00:09<00:07, 14.57it/s][A 54% 136/250 [00:09<00:07, 14.57it/s][A 55% 138/250 [00:09<00:07, 14.60it/s][A 56% 140/250 [00:09<00:07, 14.62it/s][A 57% 142/250 [00:09<00:07, 14.64it/s][A 58% 144/250 [00:09<00:07, 14.57it/s][A 58% 146/250 [00:09<00:07, 14.63it/s][A 59% 148/250 [00:09<00:06, 14.62it/s][A 60% 150/250 [00:10<00:06, 14.61it/s][A 61% 152/250 [00:10<00:06, 14.63it/s][A 62% 154/250 [00:10<00:06, 14.72it/s][A 62% 156/250 [00:10<00:06, 14.75it/s][A 63% 158/250 [00:10<00:06, 14.62it/s][A 64% 160/250 [00:10<00:06, 14.67it/s][A 65% 162/250 [00:10<00:06, 14.65it/s][A 66% 164/250 [00:11<00:05, 14.68it/s][A 66% 166/250 [00:11<00:05, 14.61it/s][A 67% 168/250 [00:11<00:05, 14.62it/s][A 68% 170/250 [00:11<00:05, 14.58it/s][A 69% 172/250 [00:11<00:05, 14.64it/s][A 70% 174/250 [00:11<00:05, 14.67it/s][A 70% 176/250 [00:11<00:05, 14.67it/s][A 71% 178/250 [00:12<00:04, 14.60it/s][A 72% 180/250 [00:12<00:04, 14.49it/s][A 73% 182/250 [00:12<00:04, 14.47it/s][A 74% 184/250 [00:12<00:04, 14.53it/s][A 74% 186/250 [00:12<00:04, 14.57it/s][A 75% 188/250 [00:12<00:04, 14.58it/s][A 76% 190/250 [00:12<00:04, 14.64it/s][A 77% 192/250 [00:12<00:03, 14.64it/s][A 78% 194/250 [00:13<00:03, 14.30it/s][A 78% 196/250 [00:13<00:03, 14.43it/s][A 79% 198/250 [00:13<00:03, 14.54it/s][A 80% 200/250 [00:13<00:03, 14.58it/s][A 81% 202/250 [00:13<00:03, 14.65it/s][A 82% 204/250 [00:13<00:03, 14.67it/s][A 82% 206/250 [00:13<00:02, 14.68it/s][A 83% 208/250 [00:14<00:02, 14.70it/s][A 84% 210/250 [00:14<00:02, 14.73it/s][A 85% 212/250 [00:14<00:02, 14.75it/s][A 86% 214/250 [00:14<00:02, 14.77it/s][A 20% 500/2500 [01:19<03:04, 10.83it/s] 87% 218/250 [00:14<00:02, 14.84it/s][A 88% 220/250 [00:14<00:02, 14.87it/s][A 89% 222/250 [00:15<00:01, 14.83it/s][A 90% 224/250 [00:15<00:01, 14.79it/s][A 90% 226/250 [00:15<00:01, 14.69it/s][A 91% 228/250 [00:15<00:01, 14.68it/s][A 92% 230/250 [00:15<00:01, 14.64it/s][A 93% 232/250 [00:15<00:01, 14.54it/s][A 94% 234/250 [00:15<00:01, 14.60it/s][A 94% 236/250 [00:15<00:00, 14.66it/s][A 95% 238/250 [00:16<00:00, 14.73it/s][A 96% 240/250 [00:16<00:00, 14.76it/s][A 97% 242/250 [00:16<00:00, 14.76it/s][A 98% 244/250 [00:16<00:00, 14.82it/s][A 98% 246/250 [00:16<00:00, 14.82it/s][A 99% 248/250 [00:16<00:00, 14.78it/s][A 100% 250/250 [00:16<00:00, 14.79it/s][A {'eval_loss': 2.1697170734405518, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.1551, 'eval_samples_per_second': 116.584, 'eval_steps_per_second': 14.573, 'epoch': 0.25} 20% 500/2500 [01:22<03:04, 10.83it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:05:53,119 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-500 [INFO|configuration_utils.py:447] 2023-02-14 22:05:53,120 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:05:53,749 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:05:53,750 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:05:53,750 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-500/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:05:53,788 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-500/spiece.model {'loss': 3.7795, 'learning_rate': 3.8e-05, 'epoch': 0.3} {'loss': 2.9169, 'learning_rate': 3.6e-05, 'epoch': 0.35} 30% 749/2500 [01:47<02:43, 10.71it/s][INFO|trainer.py:2907] 2023-02-14 22:06:18,135 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:06:18,136 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:06:18,136 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.21it/s][A 2% 6/250 [00:00<00:14, 16.54it/s][A 3% 8/250 [00:00<00:15, 15.62it/s][A 4% 10/250 [00:00<00:15, 15.04it/s][A 5% 12/250 [00:00<00:16, 14.78it/s][A 6% 14/250 [00:00<00:16, 14.60it/s][A 6% 16/250 [00:01<00:16, 14.53it/s][A 7% 18/250 [00:01<00:16, 14.44it/s][A 8% 20/250 [00:01<00:15, 14.51it/s][A 9% 22/250 [00:01<00:15, 14.57it/s][A 10% 24/250 [00:01<00:15, 14.56it/s][A 10% 26/250 [00:01<00:15, 14.65it/s][A 11% 28/250 [00:01<00:15, 14.64it/s][A 12% 30/250 [00:02<00:15, 14.66it/s][A 13% 32/250 [00:02<00:14, 14.63it/s][A 14% 34/250 [00:02<00:14, 14.67it/s][A 14% 36/250 [00:02<00:14, 14.64it/s][A 15% 38/250 [00:02<00:14, 14.60it/s][A 16% 40/250 [00:02<00:14, 14.58it/s][A 17% 42/250 [00:02<00:14, 14.59it/s][A 18% 44/250 [00:02<00:14, 14.65it/s][A 18% 46/250 [00:03<00:13, 14.69it/s][A 19% 48/250 [00:03<00:13, 14.78it/s][A 20% 50/250 [00:03<00:13, 14.85it/s][A 21% 52/250 [00:03<00:13, 14.84it/s][A 22% 54/250 [00:03<00:13, 14.80it/s][A 22% 56/250 [00:03<00:13, 14.77it/s][A 23% 58/250 [00:03<00:12, 14.77it/s][A 24% 60/250 [00:04<00:12, 14.81it/s][A 25% 62/250 [00:04<00:12, 14.78it/s][A 26% 64/250 [00:04<00:12, 14.76it/s][A 26% 66/250 [00:04<00:12, 14.71it/s][A 27% 68/250 [00:04<00:12, 14.73it/s][A 28% 70/250 [00:04<00:12, 14.66it/s][A 29% 72/250 [00:04<00:12, 14.69it/s][A 30% 74/250 [00:05<00:12, 14.64it/s][A 30% 76/250 [00:05<00:11, 14.70it/s][A 31% 78/250 [00:05<00:11, 14.70it/s][A 32% 80/250 [00:05<00:11, 14.76it/s][A 33% 82/250 [00:05<00:11, 14.76it/s][A 34% 84/250 [00:05<00:11, 14.71it/s][A 34% 86/250 [00:05<00:11, 14.74it/s][A 35% 88/250 [00:05<00:10, 14.76it/s][A 36% 90/250 [00:06<00:10, 14.69it/s][A 37% 92/250 [00:06<00:10, 14.71it/s][A 38% 94/250 [00:06<00:10, 14.75it/s][A 38% 96/250 [00:06<00:10, 14.72it/s][A 39% 98/250 [00:06<00:10, 14.70it/s][A 40% 100/250 [00:06<00:10, 14.68it/s][A 41% 102/250 [00:06<00:10, 14.69it/s][A 42% 104/250 [00:07<00:09, 14.72it/s][A 42% 106/250 [00:07<00:09, 14.65it/s][A 43% 108/250 [00:07<00:09, 14.66it/s][A 44% 110/250 [00:07<00:09, 14.70it/s][A 45% 112/250 [00:07<00:09, 14.69it/s][A 46% 114/250 [00:07<00:09, 14.63it/s][A 46% 116/250 [00:07<00:09, 14.69it/s][A 47% 118/250 [00:07<00:08, 14.71it/s][A 48% 120/250 [00:08<00:08, 14.59it/s][A 49% 122/250 [00:08<00:08, 14.68it/s][A 50% 124/250 [00:08<00:08, 14.68it/s][A 50% 126/250 [00:08<00:08, 14.71it/s][A 51% 128/250 [00:08<00:08, 14.73it/s][A 52% 130/250 [00:08<00:08, 14.64it/s][A 53% 132/250 [00:08<00:08, 14.70it/s][A 54% 134/250 [00:09<00:07, 14.74it/s][A 54% 136/250 [00:09<00:07, 14.41it/s][A 55% 138/250 [00:09<00:07, 14.46it/s][A 56% 140/250 [00:09<00:07, 14.51it/s][A 57% 142/250 [00:09<00:07, 14.60it/s][A 58% 144/250 [00:09<00:07, 14.50it/s][A 58% 146/250 [00:09<00:07, 14.53it/s][A 59% 148/250 [00:10<00:07, 14.55it/s][A 60% 150/250 [00:10<00:06, 14.53it/s][A 61% 152/250 [00:10<00:06, 14.48it/s][A 62% 154/250 [00:10<00:06, 14.60it/s][A 62% 156/250 [00:10<00:06, 14.54it/s][A 63% 158/250 [00:10<00:06, 14.46it/s][A 64% 160/250 [00:10<00:06, 14.42it/s][A 65% 162/250 [00:11<00:06, 14.38it/s][A 66% 164/250 [00:11<00:05, 14.38it/s][A 66% 166/250 [00:11<00:05, 14.32it/s][A 67% 168/250 [00:11<00:05, 14.33it/s][A 68% 170/250 [00:11<00:05, 14.23it/s][A 69% 172/250 [00:11<00:05, 14.23it/s][A 70% 174/250 [00:11<00:05, 14.24it/s][A 70% 176/250 [00:12<00:05, 14.21it/s][A 71% 178/250 [00:12<00:05, 14.17it/s][A 72% 180/250 [00:12<00:04, 14.16it/s][A 30% 750/2500 [01:59<02:43, 10.71it/s] 74% 184/250 [00:12<00:04, 14.30it/s][A 74% 186/250 [00:12<00:04, 14.40it/s][A 75% 188/250 [00:12<00:04, 14.40it/s][A 76% 190/250 [00:12<00:04, 14.48it/s][A 77% 192/250 [00:13<00:03, 14.58it/s][A 78% 194/250 [00:13<00:03, 14.58it/s][A 78% 196/250 [00:13<00:03, 14.56it/s][A 79% 198/250 [00:13<00:03, 14.62it/s][A 80% 200/250 [00:13<00:03, 14.69it/s][A 81% 202/250 [00:13<00:03, 14.69it/s][A 82% 204/250 [00:13<00:03, 14.68it/s][A 82% 206/250 [00:14<00:02, 14.68it/s][A 83% 208/250 [00:14<00:02, 14.68it/s][A 84% 210/250 [00:14<00:02, 14.65it/s][A 85% 212/250 [00:14<00:02, 14.72it/s][A 86% 214/250 [00:14<00:02, 14.71it/s][A 86% 216/250 [00:14<00:02, 14.68it/s][A 87% 218/250 [00:14<00:02, 14.69it/s][A 88% 220/250 [00:15<00:02, 14.75it/s][A 89% 222/250 [00:15<00:01, 14.74it/s][A 90% 224/250 [00:15<00:01, 14.76it/s][A 90% 226/250 [00:15<00:01, 14.73it/s][A 91% 228/250 [00:15<00:01, 14.82it/s][A 92% 230/250 [00:15<00:01, 14.77it/s][A 93% 232/250 [00:15<00:01, 14.75it/s][A 94% 234/250 [00:15<00:01, 14.67it/s][A 94% 236/250 [00:16<00:00, 14.65it/s][A 95% 238/250 [00:16<00:00, 14.64it/s][A 96% 240/250 [00:16<00:00, 14.60it/s][A 97% 242/250 [00:16<00:00, 14.60it/s][A 98% 244/250 [00:16<00:00, 14.26it/s][A 98% 246/250 [00:16<00:00, 14.42it/s][A 99% 248/250 [00:16<00:00, 14.45it/s][A 100% 250/250 [00:17<00:00, 14.54it/s][A {'eval_loss': 1.4527522325515747, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2954, 'eval_samples_per_second': 115.638, 'eval_steps_per_second': 14.455, 'epoch': 0.38} 30% 750/2500 [02:04<02:43, 10.71it/s] {'loss': 2.4516, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.4} {'loss': 2.2293, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.45} {'loss': 2.0123, 'learning_rate': 3e-05, 'epoch': 0.5} 40% 1000/2500 [02:27<02:21, 10.63it/s][INFO|trainer.py:2907] 2023-02-14 22:06:58,636 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:06:58,636 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:06:58,636 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:12, 20.13it/s][A 2% 6/250 [00:00<00:15, 16.26it/s][A 3% 8/250 [00:00<00:15, 15.45it/s][A 4% 10/250 [00:00<00:15, 15.09it/s][A 5% 12/250 [00:00<00:16, 14.85it/s][A 6% 14/250 [00:00<00:16, 14.66it/s][A 6% 16/250 [00:01<00:16, 14.56it/s][A 7% 18/250 [00:01<00:15, 14.65it/s][A 8% 20/250 [00:01<00:15, 14.77it/s][A 9% 22/250 [00:01<00:15, 14.88it/s][A 10% 24/250 [00:01<00:15, 14.83it/s][A 10% 26/250 [00:01<00:14, 14.94it/s][A 11% 28/250 [00:01<00:14, 14.94it/s][A 12% 30/250 [00:01<00:14, 14.96it/s][A 13% 32/250 [00:02<00:14, 14.80it/s][A 14% 34/250 [00:02<00:14, 14.82it/s][A 14% 36/250 [00:02<00:14, 14.73it/s][A 15% 38/250 [00:02<00:14, 14.59it/s][A 16% 40/250 [00:02<00:14, 14.47it/s][A 17% 42/250 [00:02<00:14, 14.47it/s][A 18% 44/250 [00:02<00:14, 14.53it/s][A 18% 46/250 [00:03<00:14, 14.19it/s][A 19% 48/250 [00:03<00:13, 14.44it/s][A 20% 50/250 [00:03<00:13, 14.54it/s][A 21% 52/250 [00:03<00:13, 14.56it/s][A 22% 54/250 [00:03<00:13, 14.64it/s][A 22% 56/250 [00:03<00:13, 14.70it/s][A 23% 58/250 [00:03<00:13, 14.71it/s][A 24% 60/250 [00:04<00:12, 14.77it/s][A 25% 62/250 [00:04<00:12, 14.80it/s][A 26% 64/250 [00:04<00:12, 14.79it/s][A 26% 66/250 [00:04<00:12, 14.79it/s][A 27% 68/250 [00:04<00:12, 14.83it/s][A 28% 70/250 [00:04<00:12, 14.89it/s][A 29% 72/250 [00:04<00:11, 14.88it/s][A 30% 74/250 [00:04<00:11, 14.83it/s][A 30% 76/250 [00:05<00:11, 14.83it/s][A 31% 78/250 [00:05<00:11, 14.83it/s][A 32% 80/250 [00:05<00:11, 14.81it/s][A 33% 82/250 [00:05<00:11, 14.78it/s][A 34% 84/250 [00:05<00:11, 14.78it/s][A 34% 86/250 [00:05<00:11, 14.85it/s][A 35% 88/250 [00:05<00:10, 14.79it/s][A 36% 90/250 [00:06<00:10, 14.68it/s][A 37% 92/250 [00:06<00:10, 14.71it/s][A 38% 94/250 [00:06<00:10, 14.76it/s][A 38% 96/250 [00:06<00:10, 14.70it/s][A 39% 98/250 [00:06<00:10, 14.74it/s][A 40% 100/250 [00:06<00:10, 14.72it/s][A 41% 102/250 [00:06<00:10, 14.76it/s][A 42% 104/250 [00:07<00:09, 14.79it/s][A 42% 106/250 [00:07<00:09, 14.72it/s][A 43% 108/250 [00:07<00:09, 14.81it/s][A 44% 110/250 [00:07<00:09, 14.84it/s][A 45% 112/250 [00:07<00:09, 14.83it/s][A 46% 114/250 [00:07<00:09, 14.82it/s][A 46% 116/250 [00:07<00:09, 14.85it/s][A 47% 118/250 [00:07<00:08, 14.85it/s][A 48% 120/250 [00:08<00:08, 14.80it/s][A 49% 122/250 [00:08<00:08, 14.85it/s][A 50% 124/250 [00:08<00:08, 14.87it/s][A 50% 126/250 [00:08<00:08, 14.88it/s][A 51% 128/250 [00:08<00:08, 14.78it/s][A 52% 130/250 [00:08<00:08, 14.78it/s][A 53% 132/250 [00:08<00:07, 14.81it/s][A 54% 134/250 [00:09<00:07, 14.79it/s][A 54% 136/250 [00:09<00:07, 14.77it/s][A 55% 138/250 [00:09<00:07, 14.77it/s][A 56% 140/250 [00:09<00:07, 14.81it/s][A 57% 142/250 [00:09<00:07, 14.84it/s][A 58% 144/250 [00:09<00:07, 14.84it/s][A 58% 146/250 [00:09<00:07, 14.83it/s][A 59% 148/250 [00:09<00:06, 14.83it/s][A 60% 150/250 [00:10<00:06, 14.74it/s][A 61% 152/250 [00:10<00:06, 14.68it/s][A 62% 154/250 [00:10<00:06, 14.76it/s][A 62% 156/250 [00:10<00:06, 14.77it/s][A 63% 158/250 [00:10<00:06, 14.77it/s][A 64% 160/250 [00:10<00:06, 14.80it/s][A 65% 162/250 [00:10<00:05, 14.70it/s][A 66% 164/250 [00:11<00:05, 14.68it/s][A 66% 166/250 [00:11<00:05, 14.62it/s][A 67% 168/250 [00:11<00:05, 14.69it/s][A 68% 170/250 [00:11<00:05, 14.75it/s][A 69% 172/250 [00:11<00:05, 14.82it/s][A 70% 174/250 [00:11<00:05, 14.87it/s][A 40% 1000/2500 [02:39<02:21, 10.63it/s] 71% 178/250 [00:12<00:04, 14.75it/s][A 72% 180/250 [00:12<00:04, 14.69it/s][A 73% 182/250 [00:12<00:04, 14.68it/s][A 74% 184/250 [00:12<00:04, 14.68it/s][A 74% 186/250 [00:12<00:04, 14.73it/s][A 75% 188/250 [00:12<00:04, 14.69it/s][A 76% 190/250 [00:12<00:04, 14.71it/s][A 77% 192/250 [00:12<00:03, 14.65it/s][A 78% 194/250 [00:13<00:03, 14.65it/s][A 78% 196/250 [00:13<00:03, 14.61it/s][A 79% 198/250 [00:13<00:03, 14.66it/s][A 80% 200/250 [00:13<00:03, 14.63it/s][A 81% 202/250 [00:13<00:03, 14.65it/s][A 82% 204/250 [00:13<00:03, 14.66it/s][A 82% 206/250 [00:13<00:03, 14.58it/s][A 83% 208/250 [00:14<00:02, 14.63it/s][A 84% 210/250 [00:14<00:02, 14.68it/s][A 85% 212/250 [00:14<00:02, 14.65it/s][A 86% 214/250 [00:14<00:02, 14.69it/s][A 86% 216/250 [00:14<00:02, 14.72it/s][A 87% 218/250 [00:14<00:02, 14.67it/s][A 88% 220/250 [00:14<00:02, 14.74it/s][A 89% 222/250 [00:15<00:01, 14.70it/s][A 90% 224/250 [00:15<00:01, 14.64it/s][A 90% 226/250 [00:15<00:01, 14.67it/s][A 91% 228/250 [00:15<00:01, 14.70it/s][A 92% 230/250 [00:15<00:01, 14.69it/s][A 93% 232/250 [00:15<00:01, 14.76it/s][A 94% 234/250 [00:15<00:01, 14.76it/s][A 94% 236/250 [00:15<00:00, 14.73it/s][A 95% 238/250 [00:16<00:00, 14.82it/s][A 96% 240/250 [00:16<00:00, 14.87it/s][A 97% 242/250 [00:16<00:00, 14.88it/s][A 98% 244/250 [00:16<00:00, 14.90it/s][A 98% 246/250 [00:16<00:00, 14.91it/s][A 99% 248/250 [00:16<00:00, 14.90it/s][A 100% 250/250 [00:16<00:00, 14.92it/s][A {'eval_loss': 1.160749912261963, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.1471, 'eval_samples_per_second': 116.638, 'eval_steps_per_second': 14.58, 'epoch': 0.5} 40% 1000/2500 [02:44<02:21, 10.63it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:07:15,784 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-1000 [INFO|configuration_utils.py:447] 2023-02-14 22:07:15,785 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-1000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:07:16,414 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-1000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:07:16,415 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-1000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:07:16,416 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-1000/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:07:16,453 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-1000/spiece.model {'loss': 1.9003, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.55} {'loss': 1.7884, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.6} 50% 1249/2500 [03:09<01:59, 10.49it/s][INFO|trainer.py:2907] 2023-02-14 22:07:40,879 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:07:40,879 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:07:40,879 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.99it/s][A 2% 6/250 [00:00<00:14, 17.06it/s][A 3% 8/250 [00:00<00:15, 16.09it/s][A 4% 10/250 [00:00<00:15, 15.50it/s][A 5% 12/250 [00:00<00:15, 15.00it/s][A 6% 14/250 [00:00<00:15, 14.84it/s][A 6% 16/250 [00:01<00:15, 14.74it/s][A 7% 18/250 [00:01<00:15, 14.69it/s][A 8% 20/250 [00:01<00:15, 14.74it/s][A 9% 22/250 [00:01<00:15, 14.73it/s][A 10% 24/250 [00:01<00:15, 14.70it/s][A 10% 26/250 [00:01<00:15, 14.71it/s][A 11% 28/250 [00:01<00:15, 14.56it/s][A 12% 30/250 [00:01<00:15, 14.62it/s][A 13% 32/250 [00:02<00:14, 14.64it/s][A 14% 34/250 [00:02<00:14, 14.56it/s][A 14% 36/250 [00:02<00:14, 14.57it/s][A 15% 38/250 [00:02<00:14, 14.60it/s][A 16% 40/250 [00:02<00:14, 14.60it/s][A 17% 42/250 [00:02<00:14, 14.57it/s][A 18% 44/250 [00:02<00:14, 14.61it/s][A 18% 46/250 [00:03<00:13, 14.64it/s][A 19% 48/250 [00:03<00:13, 14.75it/s][A 20% 50/250 [00:03<00:13, 14.78it/s][A 21% 52/250 [00:03<00:13, 14.73it/s][A 22% 54/250 [00:03<00:13, 14.71it/s][A 22% 56/250 [00:03<00:13, 14.68it/s][A 23% 58/250 [00:03<00:13, 14.63it/s][A 24% 60/250 [00:04<00:12, 14.74it/s][A 25% 62/250 [00:04<00:12, 14.73it/s][A 26% 64/250 [00:04<00:12, 14.68it/s][A 26% 66/250 [00:04<00:12, 14.64it/s][A 27% 68/250 [00:04<00:12, 14.65it/s][A 28% 70/250 [00:04<00:12, 14.68it/s][A 29% 72/250 [00:04<00:12, 14.29it/s][A 30% 74/250 [00:05<00:12, 14.38it/s][A 30% 76/250 [00:05<00:12, 14.47it/s][A 31% 78/250 [00:05<00:11, 14.52it/s][A 32% 80/250 [00:05<00:11, 14.64it/s][A 33% 82/250 [00:05<00:11, 14.66it/s][A 34% 84/250 [00:05<00:11, 14.64it/s][A 34% 86/250 [00:05<00:11, 14.66it/s][A 35% 88/250 [00:05<00:11, 14.72it/s][A 36% 90/250 [00:06<00:10, 14.73it/s][A 37% 92/250 [00:06<00:10, 14.69it/s][A 38% 94/250 [00:06<00:10, 14.75it/s][A 38% 96/250 [00:06<00:10, 14.69it/s][A 39% 98/250 [00:06<00:10, 14.64it/s][A 40% 100/250 [00:06<00:10, 14.67it/s][A 41% 102/250 [00:06<00:10, 14.71it/s][A 42% 104/250 [00:07<00:09, 14.75it/s][A 42% 106/250 [00:07<00:09, 14.71it/s][A 43% 108/250 [00:07<00:09, 14.80it/s][A 44% 110/250 [00:07<00:09, 14.84it/s][A 45% 112/250 [00:07<00:09, 14.73it/s][A 46% 114/250 [00:07<00:09, 14.73it/s][A 46% 116/250 [00:07<00:09, 14.67it/s][A 47% 118/250 [00:07<00:09, 14.50it/s][A 48% 120/250 [00:08<00:08, 14.51it/s][A 49% 122/250 [00:08<00:08, 14.63it/s][A 50% 124/250 [00:08<00:08, 14.69it/s][A 50% 126/250 [00:08<00:08, 14.67it/s][A 51% 128/250 [00:08<00:08, 14.62it/s][A 52% 130/250 [00:08<00:08, 14.60it/s][A 53% 132/250 [00:08<00:08, 14.59it/s][A 54% 134/250 [00:09<00:07, 14.64it/s][A 54% 136/250 [00:09<00:07, 14.65it/s][A 55% 138/250 [00:09<00:07, 14.71it/s][A 56% 140/250 [00:09<00:07, 14.67it/s][A 57% 142/250 [00:09<00:07, 14.70it/s][A 58% 144/250 [00:09<00:07, 14.67it/s][A 58% 146/250 [00:09<00:07, 14.62it/s][A 59% 148/250 [00:10<00:06, 14.65it/s][A 60% 150/250 [00:10<00:06, 14.58it/s][A 61% 152/250 [00:10<00:06, 14.55it/s][A 62% 154/250 [00:10<00:06, 14.58it/s][A 62% 156/250 [00:10<00:06, 14.57it/s][A 63% 158/250 [00:10<00:06, 14.59it/s][A 64% 160/250 [00:10<00:06, 14.66it/s][A 65% 162/250 [00:11<00:06, 14.53it/s][A 66% 164/250 [00:11<00:05, 14.72it/s][A 66% 166/250 [00:11<00:05, 14.60it/s][A 67% 168/250 [00:11<00:05, 14.52it/s][A 68% 170/250 [00:11<00:05, 14.50it/s][A 69% 172/250 [00:11<00:05, 14.49it/s][A 70% 174/250 [00:11<00:05, 14.47it/s][A 70% 176/250 [00:11<00:05, 14.37it/s][A 71% 178/250 [00:12<00:05, 14.29it/s][A 72% 180/250 [00:12<00:04, 14.27it/s][A 73% 182/250 [00:12<00:04, 14.25it/s][A 74% 184/250 [00:12<00:04, 14.27it/s][A 74% 186/250 [00:12<00:04, 14.24it/s][A 75% 188/250 [00:12<00:04, 14.18it/s][A 76% 190/250 [00:12<00:04, 14.22it/s][A 77% 192/250 [00:13<00:04, 14.16it/s][A 78% 194/250 [00:13<00:03, 14.21it/s][A 78% 196/250 [00:13<00:03, 14.22it/s][A 79% 198/250 [00:13<00:03, 14.27it/s][A 80% 200/250 [00:13<00:03, 14.28it/s][A 81% 202/250 [00:13<00:03, 14.16it/s][A 82% 204/250 [00:13<00:03, 14.06it/s][A 82% 206/250 [00:14<00:03, 14.05it/s][A 83% 208/250 [00:14<00:02, 14.06it/s][A 84% 210/250 [00:14<00:02, 14.06it/s][A 85% 212/250 [00:14<00:02, 13.87it/s][A 86% 214/250 [00:14<00:02, 14.01it/s][A 86% 216/250 [00:14<00:02, 14.22it/s][A 87% 218/250 [00:14<00:02, 14.28it/s][A 88% 220/250 [00:15<00:02, 14.42it/s][A 89% 222/250 [00:15<00:01, 14.39it/s][A 90% 224/250 [00:15<00:01, 14.35it/s][A 90% 226/250 [00:15<00:01, 14.49it/s][A 91% 228/250 [00:15<00:01, 14.57it/s][A 92% 230/250 [00:15<00:01, 14.65it/s][A 93% 232/250 [00:15<00:01, 14.74it/s][A 94% 234/250 [00:16<00:01, 14.73it/s][A 94% 236/250 [00:16<00:00, 14.74it/s][A 95% 238/250 [00:16<00:00, 14.80it/s][A 96% 240/250 [00:16<00:00, 14.79it/s][A 97% 242/250 [00:16<00:00, 14.78it/s][A 98% 244/250 [00:16<00:00, 14.83it/s][A 98% 246/250 [00:16<00:00, 14.81it/s][A 99% 248/250 [00:16<00:00, 14.72it/s][A 100% 250/250 [00:17<00:00, 14.63it/s][A {'eval_loss': 1.0410572290420532, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.3319, 'eval_samples_per_second': 115.394, 'eval_steps_per_second': 14.424, 'epoch': 0.62} 50% 1250/2500 [03:27<01:59, 10.49it/s] {'loss': 1.7415, 'learning_rate': 2.4e-05, 'epoch': 0.65} {'loss': 1.6231, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.7} {'loss': 1.5278, 'learning_rate': 2e-05, 'epoch': 0.75} 60% 1500/2500 [03:50<01:33, 10.71it/s][INFO|trainer.py:2907] 2023-02-14 22:08:21,432 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:08:21,433 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:08:21,433 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.79it/s][A 2% 6/250 [00:00<00:14, 16.88it/s][A 3% 8/250 [00:00<00:15, 15.94it/s][A 4% 10/250 [00:00<00:15, 15.36it/s][A 5% 12/250 [00:00<00:15, 14.98it/s][A 6% 14/250 [00:00<00:16, 14.72it/s][A 6% 16/250 [00:01<00:16, 14.47it/s][A 7% 18/250 [00:01<00:16, 14.40it/s][A 8% 20/250 [00:01<00:15, 14.48it/s][A 9% 22/250 [00:01<00:15, 14.54it/s][A 10% 24/250 [00:01<00:15, 14.57it/s][A 10% 26/250 [00:01<00:15, 14.56it/s][A 11% 28/250 [00:01<00:15, 14.52it/s][A 12% 30/250 [00:02<00:15, 14.53it/s][A 13% 32/250 [00:02<00:15, 14.51it/s][A 14% 34/250 [00:02<00:14, 14.51it/s][A 14% 36/250 [00:02<00:14, 14.53it/s][A 15% 38/250 [00:02<00:14, 14.51it/s][A 16% 40/250 [00:02<00:14, 14.52it/s][A 17% 42/250 [00:02<00:14, 14.52it/s][A 18% 44/250 [00:02<00:14, 14.53it/s][A 18% 46/250 [00:03<00:13, 14.62it/s][A 19% 48/250 [00:03<00:13, 14.58it/s][A 20% 50/250 [00:03<00:13, 14.66it/s][A 21% 52/250 [00:03<00:13, 14.70it/s][A 22% 54/250 [00:03<00:13, 14.75it/s][A 22% 56/250 [00:03<00:13, 14.69it/s][A 23% 58/250 [00:03<00:13, 14.72it/s][A 24% 60/250 [00:04<00:12, 14.72it/s][A 25% 62/250 [00:04<00:12, 14.72it/s][A 26% 64/250 [00:04<00:12, 14.66it/s][A 26% 66/250 [00:04<00:12, 14.65it/s][A 27% 68/250 [00:04<00:12, 14.72it/s][A 28% 70/250 [00:04<00:12, 14.80it/s][A 29% 72/250 [00:04<00:12, 14.80it/s][A 30% 74/250 [00:05<00:11, 14.74it/s][A 30% 76/250 [00:05<00:11, 14.77it/s][A 31% 78/250 [00:05<00:11, 14.59it/s][A 32% 80/250 [00:05<00:11, 14.69it/s][A 33% 82/250 [00:05<00:11, 14.69it/s][A 34% 84/250 [00:05<00:11, 14.67it/s][A 34% 86/250 [00:05<00:11, 14.75it/s][A 35% 88/250 [00:05<00:10, 14.80it/s][A 36% 90/250 [00:06<00:10, 14.82it/s][A 37% 92/250 [00:06<00:10, 14.80it/s][A 38% 94/250 [00:06<00:10, 14.81it/s][A 38% 96/250 [00:06<00:10, 14.78it/s][A 39% 98/250 [00:06<00:10, 14.78it/s][A 40% 100/250 [00:06<00:10, 14.73it/s][A 41% 102/250 [00:06<00:10, 14.73it/s][A 42% 104/250 [00:07<00:09, 14.81it/s][A 42% 106/250 [00:07<00:09, 14.73it/s][A 43% 108/250 [00:07<00:09, 14.74it/s][A 44% 110/250 [00:07<00:09, 14.78it/s][A 45% 112/250 [00:07<00:09, 14.73it/s][A 46% 114/250 [00:07<00:09, 14.75it/s][A 46% 116/250 [00:07<00:09, 14.80it/s][A 47% 118/250 [00:07<00:08, 14.80it/s][A 48% 120/250 [00:08<00:08, 14.79it/s][A 49% 122/250 [00:08<00:08, 14.81it/s][A 50% 124/250 [00:08<00:08, 14.76it/s][A 50% 126/250 [00:08<00:08, 14.80it/s][A 51% 128/250 [00:08<00:08, 14.80it/s][A 52% 130/250 [00:08<00:08, 14.81it/s][A 53% 132/250 [00:08<00:07, 14.82it/s][A 54% 134/250 [00:09<00:07, 14.82it/s][A 54% 136/250 [00:09<00:07, 14.70it/s][A 55% 138/250 [00:09<00:07, 14.70it/s][A 56% 140/250 [00:09<00:07, 14.72it/s][A 57% 142/250 [00:09<00:07, 14.73it/s][A 58% 144/250 [00:09<00:07, 14.71it/s][A 58% 146/250 [00:09<00:07, 14.74it/s][A 59% 148/250 [00:10<00:06, 14.74it/s][A 60% 150/250 [00:10<00:06, 14.78it/s][A 61% 152/250 [00:10<00:06, 14.74it/s][A 62% 154/250 [00:10<00:06, 14.79it/s][A 62% 156/250 [00:10<00:06, 14.79it/s][A 63% 158/250 [00:10<00:06, 14.78it/s][A 64% 160/250 [00:10<00:06, 14.85it/s][A 65% 162/250 [00:10<00:05, 14.82it/s][A 66% 164/250 [00:11<00:05, 14.85it/s][A 66% 166/250 [00:11<00:05, 14.89it/s][A 67% 168/250 [00:11<00:05, 14.85it/s][A 68% 170/250 [00:11<00:05, 14.67it/s][A 69% 172/250 [00:11<00:05, 14.56it/s][A 70% 174/250 [00:11<00:05, 14.69it/s][A 70% 176/250 [00:11<00:05, 14.70it/s][A 71% 178/250 [00:12<00:04, 14.69it/s][A 72% 180/250 [00:12<00:04, 14.73it/s][A 73% 182/250 [00:12<00:04, 14.75it/s][A 74% 184/250 [00:12<00:04, 14.78it/s][A 74% 186/250 [00:12<00:04, 14.85it/s][A 75% 188/250 [00:12<00:04, 14.87it/s][A 76% 190/250 [00:12<00:04, 14.91it/s][A 77% 192/250 [00:12<00:03, 14.91it/s][A 78% 194/250 [00:13<00:03, 14.81it/s][A 78% 196/250 [00:13<00:03, 14.65it/s][A 79% 198/250 [00:13<00:03, 14.54it/s][A 80% 200/250 [00:13<00:03, 14.59it/s][A 81% 202/250 [00:13<00:03, 14.63it/s][A 82% 204/250 [00:13<00:03, 14.63it/s][A 82% 206/250 [00:13<00:03, 14.50it/s][A 83% 208/250 [00:14<00:02, 14.58it/s][A 84% 210/250 [00:14<00:02, 14.65it/s][A 85% 212/250 [00:14<00:02, 14.65it/s][A 86% 214/250 [00:14<00:02, 14.49it/s][A 86% 216/250 [00:14<00:02, 14.58it/s][A 87% 218/250 [00:14<00:02, 14.58it/s][A 88% 220/250 [00:14<00:02, 14.66it/s][A 89% 222/250 [00:15<00:01, 14.57it/s][A 90% 224/250 [00:15<00:01, 14.56it/s][A 90% 226/250 [00:15<00:01, 14.58it/s][A 91% 228/250 [00:15<00:01, 14.56it/s][A 92% 230/250 [00:15<00:01, 14.55it/s][A 93% 232/250 [00:15<00:01, 14.49it/s][A 94% 234/250 [00:15<00:01, 14.42it/s][A 94% 236/250 [00:16<00:00, 14.39it/s][A 95% 238/250 [00:16<00:00, 14.40it/s][A 96% 240/250 [00:16<00:00, 14.35it/s][A 97% 242/250 [00:16<00:00, 14.37it/s][A 98% 244/250 [00:16<00:00, 14.43it/s][A 98% 246/250 [00:16<00:00, 14.44it/s][A 99% 248/250 [00:16<00:00, 14.44it/s][A 100% 250/250 [00:16<00:00, 14.48it/s][A {'eval_loss': 0.9458380341529846, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.379, 'eval_samples_per_second': 115.081, 'eval_steps_per_second': 14.385, 'epoch': 0.75} 60% 1500/2500 [04:07<01:33, 10.71it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:08:38,813 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-1500 [INFO|configuration_utils.py:447] 2023-02-14 22:08:38,814 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-1500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:08:39,285 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-1500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:08:39,286 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-1500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:08:39,286 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-1500/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:08:39,322 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-1500/spiece.model {'loss': 1.4835, 'learning_rate': 1.8e-05, 'epoch': 0.8} {'loss': 1.449, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.85} 70% 1749/2500 [04:32<01:10, 10.61it/s][INFO|trainer.py:2907] 2023-02-14 22:09:03,363 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:09:03,363 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:09:03,363 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 22.10it/s][A 2% 6/250 [00:00<00:14, 17.10it/s][A 3% 8/250 [00:00<00:14, 16.16it/s][A 4% 10/250 [00:00<00:15, 15.48it/s][A 5% 12/250 [00:00<00:15, 15.17it/s][A 6% 14/250 [00:00<00:15, 15.00it/s][A 6% 16/250 [00:01<00:15, 14.90it/s][A 7% 18/250 [00:01<00:15, 14.70it/s][A 8% 20/250 [00:01<00:15, 14.59it/s][A 9% 22/250 [00:01<00:15, 14.58it/s][A 10% 24/250 [00:01<00:15, 14.51it/s][A 10% 26/250 [00:01<00:15, 14.59it/s][A 11% 28/250 [00:01<00:15, 14.60it/s][A 12% 30/250 [00:01<00:15, 14.64it/s][A 13% 32/250 [00:02<00:15, 14.49it/s][A 14% 34/250 [00:02<00:14, 14.52it/s][A 14% 36/250 [00:02<00:14, 14.45it/s][A 15% 38/250 [00:02<00:14, 14.39it/s][A 16% 40/250 [00:02<00:14, 14.44it/s][A 17% 42/250 [00:02<00:14, 14.41it/s][A 18% 44/250 [00:02<00:14, 14.46it/s][A 18% 46/250 [00:03<00:14, 14.46it/s][A 19% 48/250 [00:03<00:13, 14.54it/s][A 20% 50/250 [00:03<00:13, 14.47it/s][A 21% 52/250 [00:03<00:13, 14.51it/s][A 22% 54/250 [00:03<00:13, 14.54it/s][A 22% 56/250 [00:03<00:13, 14.62it/s][A 23% 58/250 [00:03<00:13, 14.65it/s][A 24% 60/250 [00:04<00:12, 14.69it/s][A 25% 62/250 [00:04<00:12, 14.75it/s][A 26% 64/250 [00:04<00:12, 14.58it/s][A 26% 66/250 [00:04<00:12, 14.53it/s][A 27% 68/250 [00:04<00:12, 14.61it/s][A 28% 70/250 [00:04<00:12, 14.65it/s][A 29% 72/250 [00:04<00:12, 14.66it/s][A 30% 74/250 [00:05<00:11, 14.67it/s][A 30% 76/250 [00:05<00:11, 14.70it/s][A 31% 78/250 [00:05<00:11, 14.63it/s][A 32% 80/250 [00:05<00:11, 14.64it/s][A 33% 82/250 [00:05<00:11, 14.63it/s][A 34% 84/250 [00:05<00:11, 14.67it/s][A 34% 86/250 [00:05<00:11, 14.75it/s][A 35% 88/250 [00:05<00:10, 14.75it/s][A 36% 90/250 [00:06<00:10, 14.78it/s][A 37% 92/250 [00:06<00:10, 14.83it/s][A 38% 94/250 [00:06<00:10, 14.73it/s][A 38% 96/250 [00:06<00:10, 14.68it/s][A 39% 98/250 [00:06<00:10, 14.65it/s][A 40% 100/250 [00:06<00:10, 14.68it/s][A 41% 102/250 [00:06<00:10, 14.74it/s][A 42% 104/250 [00:07<00:09, 14.80it/s][A 42% 106/250 [00:07<00:09, 14.77it/s][A 43% 108/250 [00:07<00:09, 14.78it/s][A 44% 110/250 [00:07<00:09, 14.83it/s][A 45% 112/250 [00:07<00:09, 14.76it/s][A 46% 114/250 [00:07<00:09, 14.80it/s][A 46% 116/250 [00:07<00:09, 14.68it/s][A 47% 118/250 [00:08<00:08, 14.68it/s][A 48% 120/250 [00:08<00:08, 14.59it/s][A 49% 122/250 [00:08<00:08, 14.60it/s][A 50% 124/250 [00:08<00:08, 14.58it/s][A 50% 126/250 [00:08<00:08, 14.63it/s][A 51% 128/250 [00:08<00:08, 14.64it/s][A 52% 130/250 [00:08<00:08, 14.67it/s][A 53% 132/250 [00:08<00:08, 14.66it/s][A 54% 134/250 [00:09<00:07, 14.74it/s][A 54% 136/250 [00:09<00:07, 14.74it/s][A 55% 138/250 [00:09<00:07, 14.71it/s][A 56% 140/250 [00:09<00:07, 14.67it/s][A 57% 142/250 [00:09<00:07, 14.66it/s][A 58% 144/250 [00:09<00:07, 14.65it/s][A 58% 146/250 [00:09<00:07, 14.66it/s][A 59% 148/250 [00:10<00:06, 14.62it/s][A 60% 150/250 [00:10<00:06, 14.64it/s][A 61% 152/250 [00:10<00:06, 14.63it/s][A 62% 154/250 [00:10<00:06, 14.60it/s][A 62% 156/250 [00:10<00:06, 14.52it/s][A 63% 158/250 [00:10<00:06, 14.55it/s][A 64% 160/250 [00:10<00:06, 14.63it/s][A 65% 162/250 [00:11<00:06, 14.62it/s][A 66% 164/250 [00:11<00:05, 14.65it/s][A 66% 166/250 [00:11<00:05, 14.62it/s][A 67% 168/250 [00:11<00:05, 14.69it/s][A 68% 170/250 [00:11<00:05, 14.71it/s][A 69% 172/250 [00:11<00:05, 14.72it/s][A 70% 174/250 [00:11<00:05, 14.54it/s][A 70% 176/250 [00:11<00:05, 14.61it/s][A 71% 178/250 [00:12<00:04, 14.65it/s][A 72% 180/250 [00:12<00:04, 14.67it/s][A 73% 182/250 [00:12<00:04, 14.65it/s][A 74% 184/250 [00:12<00:04, 14.65it/s][A 74% 186/250 [00:12<00:04, 14.66it/s][A 75% 188/250 [00:12<00:04, 14.62it/s][A 76% 190/250 [00:12<00:04, 14.69it/s][A 77% 192/250 [00:13<00:03, 14.74it/s][A 78% 194/250 [00:13<00:03, 14.81it/s][A 78% 196/250 [00:13<00:03, 14.77it/s][A 79% 198/250 [00:13<00:03, 14.79it/s][A 80% 200/250 [00:13<00:03, 14.50it/s][A 81% 202/250 [00:13<00:03, 14.42it/s][A 82% 204/250 [00:13<00:03, 14.48it/s][A 82% 206/250 [00:14<00:03, 14.52it/s][A 83% 208/250 [00:14<00:02, 14.56it/s][A 84% 210/250 [00:14<00:02, 14.53it/s][A 85% 212/250 [00:14<00:02, 14.58it/s][A 86% 214/250 [00:14<00:02, 14.61it/s][A 86% 216/250 [00:14<00:02, 14.69it/s][A 87% 218/250 [00:14<00:02, 14.70it/s][A 88% 220/250 [00:14<00:02, 14.75it/s][A 89% 222/250 [00:15<00:01, 14.70it/s][A 90% 224/250 [00:15<00:01, 14.75it/s][A 90% 226/250 [00:15<00:01, 14.71it/s][A 91% 228/250 [00:15<00:01, 14.74it/s][A 92% 230/250 [00:15<00:01, 14.68it/s][A 93% 232/250 [00:15<00:01, 14.72it/s][A 94% 234/250 [00:15<00:01, 14.71it/s][A 94% 236/250 [00:16<00:00, 14.62it/s][A 95% 238/250 [00:16<00:00, 14.59it/s][A 96% 240/250 [00:16<00:00, 14.57it/s][A 97% 242/250 [00:16<00:00, 14.61it/s][A 98% 244/250 [00:16<00:00, 14.67it/s][A 98% 246/250 [00:16<00:00, 14.60it/s][A 99% 248/250 [00:16<00:00, 14.63it/s][A 100% 250/250 [00:17<00:00, 14.66it/s][A {'eval_loss': 0.8559792637825012, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2321, 'eval_samples_per_second': 116.063, 'eval_steps_per_second': 14.508, 'epoch': 0.88} 70% 1750/2500 [04:49<01:10, 10.61it/s] {'loss': 1.4421, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.9} {'loss': 1.3835, 'learning_rate': 1.2e-05, 'epoch': 0.95} {'loss': 1.325, 'learning_rate': 1e-05, 'epoch': 1.0} 80% 2000/2500 [05:12<00:45, 10.89it/s][INFO|trainer.py:2907] 2023-02-14 22:09:43,863 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:09:43,863 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:09:43,863 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.99it/s][A 2% 6/250 [00:00<00:14, 17.18it/s][A 3% 8/250 [00:00<00:14, 16.14it/s][A 4% 10/250 [00:00<00:15, 15.55it/s][A 5% 12/250 [00:00<00:15, 15.22it/s][A 6% 14/250 [00:00<00:15, 15.01it/s][A 6% 16/250 [00:01<00:15, 14.86it/s][A 7% 18/250 [00:01<00:15, 14.84it/s][A 8% 20/250 [00:01<00:15, 14.87it/s][A 9% 22/250 [00:01<00:15, 14.65it/s][A 10% 24/250 [00:01<00:15, 14.46it/s][A 10% 26/250 [00:01<00:15, 14.51it/s][A 11% 28/250 [00:01<00:15, 14.51it/s][A 12% 30/250 [00:01<00:15, 14.50it/s][A 13% 32/250 [00:02<00:14, 14.59it/s][A 14% 34/250 [00:02<00:14, 14.65it/s][A 14% 36/250 [00:02<00:14, 14.69it/s][A 15% 38/250 [00:02<00:14, 14.72it/s][A 16% 40/250 [00:02<00:14, 14.71it/s][A 17% 42/250 [00:02<00:14, 14.69it/s][A 18% 44/250 [00:02<00:14, 14.61it/s][A 18% 46/250 [00:03<00:14, 14.52it/s][A 19% 48/250 [00:03<00:13, 14.56it/s][A 20% 50/250 [00:03<00:13, 14.62it/s][A 21% 52/250 [00:03<00:13, 14.60it/s][A 22% 54/250 [00:03<00:13, 14.56it/s][A 22% 56/250 [00:03<00:13, 14.40it/s][A 23% 58/250 [00:03<00:13, 14.43it/s][A 24% 60/250 [00:04<00:13, 14.46it/s][A 25% 62/250 [00:04<00:12, 14.51it/s][A 26% 64/250 [00:04<00:12, 14.50it/s][A 26% 66/250 [00:04<00:12, 14.44it/s][A 27% 68/250 [00:04<00:12, 14.49it/s][A 28% 70/250 [00:04<00:12, 14.50it/s][A 29% 72/250 [00:04<00:12, 14.52it/s][A 30% 74/250 [00:05<00:12, 14.53it/s][A 30% 76/250 [00:05<00:12, 14.50it/s][A 31% 78/250 [00:05<00:12, 14.33it/s][A 32% 80/250 [00:05<00:11, 14.36it/s][A 33% 82/250 [00:05<00:11, 14.41it/s][A 34% 84/250 [00:05<00:11, 14.37it/s][A 34% 86/250 [00:05<00:11, 14.42it/s][A 35% 88/250 [00:05<00:11, 14.52it/s][A 36% 90/250 [00:06<00:10, 14.55it/s][A 37% 92/250 [00:06<00:10, 14.57it/s][A 38% 94/250 [00:06<00:10, 14.63it/s][A 38% 96/250 [00:06<00:10, 14.64it/s][A 39% 98/250 [00:06<00:10, 14.57it/s][A 40% 100/250 [00:06<00:10, 14.51it/s][A 41% 102/250 [00:06<00:10, 14.60it/s][A 42% 104/250 [00:07<00:09, 14.63it/s][A 42% 106/250 [00:07<00:09, 14.57it/s][A 43% 108/250 [00:07<00:09, 14.67it/s][A 44% 110/250 [00:07<00:09, 14.68it/s][A 45% 112/250 [00:07<00:09, 14.65it/s][A 46% 114/250 [00:07<00:09, 14.65it/s][A 46% 116/250 [00:07<00:09, 14.52it/s][A 47% 118/250 [00:08<00:09, 14.52it/s][A 48% 120/250 [00:08<00:08, 14.51it/s][A 49% 122/250 [00:08<00:08, 14.61it/s][A 50% 124/250 [00:08<00:08, 14.70it/s][A 50% 126/250 [00:08<00:08, 14.75it/s][A 51% 128/250 [00:08<00:08, 14.71it/s][A 52% 130/250 [00:08<00:08, 14.72it/s][A 53% 132/250 [00:08<00:08, 14.71it/s][A 54% 134/250 [00:09<00:07, 14.76it/s][A 54% 136/250 [00:09<00:07, 14.77it/s][A 55% 138/250 [00:09<00:07, 14.81it/s][A 56% 140/250 [00:09<00:07, 14.87it/s][A 57% 142/250 [00:09<00:07, 14.90it/s][A 58% 144/250 [00:09<00:07, 14.69it/s][A 58% 146/250 [00:09<00:07, 14.69it/s][A 59% 148/250 [00:10<00:06, 14.70it/s][A 60% 150/250 [00:10<00:06, 14.62it/s][A 61% 152/250 [00:10<00:06, 14.63it/s][A 62% 154/250 [00:10<00:06, 14.73it/s][A 62% 156/250 [00:10<00:06, 14.71it/s][A 63% 158/250 [00:10<00:06, 14.64it/s][A 64% 160/250 [00:10<00:06, 14.64it/s][A 65% 162/250 [00:11<00:05, 14.70it/s][A 66% 164/250 [00:11<00:05, 14.78it/s][A 66% 166/250 [00:11<00:05, 14.78it/s][A 67% 168/250 [00:11<00:05, 14.82it/s][A 68% 170/250 [00:11<00:05, 14.86it/s][A 69% 172/250 [00:11<00:05, 14.87it/s][A 70% 174/250 [00:11<00:05, 14.91it/s][A 70% 176/250 [00:11<00:05, 14.53it/s][A 71% 178/250 [00:12<00:04, 14.56it/s][A 72% 180/250 [00:12<00:04, 14.57it/s][A 73% 182/250 [00:12<00:04, 14.63it/s][A 74% 184/250 [00:12<00:04, 14.69it/s][A 74% 186/250 [00:12<00:04, 14.75it/s][A 75% 188/250 [00:12<00:04, 14.69it/s][A 76% 190/250 [00:12<00:04, 14.65it/s][A 77% 192/250 [00:13<00:03, 14.69it/s][A 78% 194/250 [00:13<00:03, 14.64it/s][A 78% 196/250 [00:13<00:03, 14.67it/s][A 79% 198/250 [00:13<00:03, 14.72it/s][A 80% 200/250 [00:13<00:03, 14.70it/s][A 81% 202/250 [00:13<00:03, 14.61it/s][A 82% 204/250 [00:13<00:03, 14.59it/s][A 82% 206/250 [00:14<00:03, 14.53it/s][A 83% 208/250 [00:14<00:02, 14.63it/s][A 84% 210/250 [00:14<00:02, 14.70it/s][A 85% 212/250 [00:14<00:02, 14.68it/s][A 86% 214/250 [00:14<00:02, 14.67it/s][A 86% 216/250 [00:14<00:02, 14.73it/s][A 87% 218/250 [00:14<00:02, 14.76it/s][A 88% 220/250 [00:14<00:02, 14.74it/s][A 89% 222/250 [00:15<00:01, 14.74it/s][A 90% 224/250 [00:15<00:01, 14.75it/s][A 90% 226/250 [00:15<00:01, 14.74it/s][A 91% 228/250 [00:15<00:01, 14.70it/s][A 92% 230/250 [00:15<00:01, 14.60it/s][A 93% 232/250 [00:15<00:01, 14.68it/s][A 94% 234/250 [00:15<00:01, 14.47it/s][A 94% 236/250 [00:16<00:00, 14.53it/s][A 95% 238/250 [00:16<00:00, 14.60it/s][A 96% 240/250 [00:16<00:00, 14.61it/s][A 97% 242/250 [00:16<00:00, 14.66it/s][A 98% 244/250 [00:16<00:00, 14.70it/s][A 80% 2000/2500 [05:29<00:45, 10.89it/s] 99% 248/250 [00:16<00:00, 14.70it/s][A 100% 250/250 [00:17<00:00, 14.62it/s][A {'eval_loss': 0.8163257241249084, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2395, 'eval_samples_per_second': 116.013, 'eval_steps_per_second': 14.502, 'epoch': 1.0} 80% 2000/2500 [05:30<00:45, 10.89it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:10:01,104 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-2000 [INFO|configuration_utils.py:447] 2023-02-14 22:10:01,105 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-2000/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:10:01,585 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-2000/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:10:01,586 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-2000/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:10:01,586 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-2000/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:10:01,623 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-2000/spiece.model {'loss': 1.2708, 'learning_rate': 8.000000000000001e-06, 'epoch': 1.05} {'loss': 1.3351, 'learning_rate': 6e-06, 'epoch': 1.1} 90% 2249/2500 [05:54<00:23, 10.80it/s][INFO|trainer.py:2907] 2023-02-14 22:10:25,736 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:10:25,736 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:10:25,736 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.89it/s][A 2% 6/250 [00:00<00:14, 16.90it/s][A 3% 8/250 [00:00<00:15, 16.04it/s][A 4% 10/250 [00:00<00:15, 15.53it/s][A 5% 12/250 [00:00<00:15, 15.20it/s][A 6% 14/250 [00:00<00:15, 14.99it/s][A 6% 16/250 [00:01<00:15, 14.93it/s][A 7% 18/250 [00:01<00:15, 14.90it/s][A 8% 20/250 [00:01<00:15, 14.70it/s][A 9% 22/250 [00:01<00:15, 14.76it/s][A 10% 24/250 [00:01<00:15, 14.76it/s][A 10% 26/250 [00:01<00:15, 14.80it/s][A 11% 28/250 [00:01<00:15, 14.78it/s][A 12% 30/250 [00:01<00:15, 14.64it/s][A 13% 32/250 [00:02<00:14, 14.61it/s][A 14% 34/250 [00:02<00:14, 14.64it/s][A 14% 36/250 [00:02<00:14, 14.56it/s][A 15% 38/250 [00:02<00:14, 14.63it/s][A 16% 40/250 [00:02<00:14, 14.69it/s][A 17% 42/250 [00:02<00:14, 14.72it/s][A 18% 44/250 [00:02<00:14, 14.57it/s][A 18% 46/250 [00:03<00:14, 14.53it/s][A 19% 48/250 [00:03<00:13, 14.45it/s][A 20% 50/250 [00:03<00:13, 14.54it/s][A 21% 52/250 [00:03<00:13, 14.55it/s][A 22% 54/250 [00:03<00:13, 14.57it/s][A 22% 56/250 [00:03<00:13, 14.53it/s][A 23% 58/250 [00:03<00:13, 14.45it/s][A 24% 60/250 [00:04<00:13, 14.50it/s][A 25% 62/250 [00:04<00:12, 14.57it/s][A 26% 64/250 [00:04<00:12, 14.41it/s][A 26% 66/250 [00:04<00:12, 14.43it/s][A 27% 68/250 [00:04<00:12, 14.54it/s][A 28% 70/250 [00:04<00:12, 14.54it/s][A 29% 72/250 [00:04<00:12, 14.48it/s][A 30% 74/250 [00:05<00:12, 14.39it/s][A 30% 76/250 [00:05<00:11, 14.52it/s][A 31% 78/250 [00:05<00:11, 14.52it/s][A 32% 80/250 [00:05<00:11, 14.50it/s][A 33% 82/250 [00:05<00:11, 14.49it/s][A 34% 84/250 [00:05<00:11, 14.54it/s][A 34% 86/250 [00:05<00:11, 14.62it/s][A 35% 88/250 [00:05<00:11, 14.63it/s][A 36% 90/250 [00:06<00:10, 14.59it/s][A 37% 92/250 [00:06<00:10, 14.69it/s][A 38% 94/250 [00:06<00:10, 14.65it/s][A 38% 96/250 [00:06<00:10, 14.60it/s][A 39% 98/250 [00:06<00:10, 14.63it/s][A 40% 100/250 [00:06<00:10, 14.66it/s][A 41% 102/250 [00:06<00:10, 14.65it/s][A 42% 104/250 [00:07<00:09, 14.69it/s][A 42% 106/250 [00:07<00:09, 14.67it/s][A 43% 108/250 [00:07<00:09, 14.75it/s][A 44% 110/250 [00:07<00:09, 14.77it/s][A 45% 112/250 [00:07<00:09, 14.76it/s][A 46% 114/250 [00:07<00:09, 14.78it/s][A 46% 116/250 [00:07<00:09, 14.82it/s][A 47% 118/250 [00:08<00:08, 14.79it/s][A 48% 120/250 [00:08<00:08, 14.80it/s][A 49% 122/250 [00:08<00:08, 14.80it/s][A 50% 124/250 [00:08<00:08, 14.83it/s][A 50% 126/250 [00:08<00:08, 14.81it/s][A 51% 128/250 [00:08<00:08, 14.78it/s][A 52% 130/250 [00:08<00:08, 14.77it/s][A 53% 132/250 [00:08<00:07, 14.80it/s][A 54% 134/250 [00:09<00:07, 14.70it/s][A 54% 136/250 [00:09<00:07, 14.65it/s][A 55% 138/250 [00:09<00:07, 14.65it/s][A 56% 140/250 [00:09<00:07, 14.61it/s][A 57% 142/250 [00:09<00:07, 14.69it/s][A 58% 144/250 [00:09<00:07, 14.75it/s][A 58% 146/250 [00:09<00:07, 14.72it/s][A 59% 148/250 [00:10<00:06, 14.69it/s][A 60% 150/250 [00:10<00:06, 14.65it/s][A 61% 152/250 [00:10<00:06, 14.62it/s][A 62% 154/250 [00:10<00:06, 14.60it/s][A 62% 156/250 [00:10<00:06, 14.64it/s][A 63% 158/250 [00:10<00:06, 14.63it/s][A 64% 160/250 [00:10<00:06, 14.71it/s][A 65% 162/250 [00:10<00:05, 14.69it/s][A 66% 164/250 [00:11<00:05, 14.77it/s][A 66% 166/250 [00:11<00:05, 14.78it/s][A 67% 168/250 [00:11<00:05, 14.79it/s][A 68% 170/250 [00:11<00:05, 14.73it/s][A 69% 172/250 [00:11<00:05, 14.73it/s][A 70% 174/250 [00:11<00:05, 14.79it/s][A 70% 176/250 [00:11<00:05, 14.80it/s][A 71% 178/250 [00:12<00:04, 14.70it/s][A 72% 180/250 [00:12<00:04, 14.66it/s][A 73% 182/250 [00:12<00:04, 14.67it/s][A 74% 184/250 [00:12<00:04, 14.71it/s][A 74% 186/250 [00:12<00:04, 14.76it/s][A 75% 188/250 [00:12<00:04, 14.73it/s][A 76% 190/250 [00:12<00:04, 14.79it/s][A 77% 192/250 [00:13<00:03, 14.72it/s][A 78% 194/250 [00:13<00:03, 14.64it/s][A 78% 196/250 [00:13<00:03, 14.67it/s][A 79% 198/250 [00:13<00:03, 14.68it/s][A 80% 200/250 [00:13<00:03, 14.74it/s][A 81% 202/250 [00:13<00:03, 14.74it/s][A 82% 204/250 [00:13<00:03, 14.67it/s][A 82% 206/250 [00:13<00:03, 14.65it/s][A 83% 208/250 [00:14<00:02, 14.31it/s][A 84% 210/250 [00:14<00:02, 14.45it/s][A 85% 212/250 [00:14<00:02, 14.61it/s][A 86% 214/250 [00:14<00:02, 14.60it/s][A 86% 216/250 [00:14<00:02, 14.70it/s][A 90% 2250/2500 [06:09<00:23, 10.80it/s] 88% 220/250 [00:14<00:02, 14.71it/s][A 89% 222/250 [00:15<00:01, 14.67it/s][A 90% 224/250 [00:15<00:01, 14.73it/s][A 90% 226/250 [00:15<00:01, 14.77it/s][A 91% 228/250 [00:15<00:01, 14.83it/s][A 92% 230/250 [00:15<00:01, 14.84it/s][A 93% 232/250 [00:15<00:01, 14.82it/s][A 94% 234/250 [00:15<00:01, 14.80it/s][A 94% 236/250 [00:16<00:00, 14.78it/s][A 95% 238/250 [00:16<00:00, 14.63it/s][A 96% 240/250 [00:16<00:00, 14.62it/s][A 97% 242/250 [00:16<00:00, 14.66it/s][A 98% 244/250 [00:16<00:00, 14.69it/s][A 98% 246/250 [00:16<00:00, 14.68it/s][A 99% 248/250 [00:16<00:00, 14.62it/s][A 100% 250/250 [00:16<00:00, 14.59it/s][A {'eval_loss': 0.8037287592887878, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2062, 'eval_samples_per_second': 116.237, 'eval_steps_per_second': 14.53, 'epoch': 1.12} 90% 2250/2500 [06:12<00:23, 10.80it/s] {'loss': 1.2308, 'learning_rate': 4.000000000000001e-06, 'epoch': 1.15} {'loss': 1.376, 'learning_rate': 2.0000000000000003e-06, 'epoch': 1.2} {'loss': 1.2416, 'learning_rate': 0.0, 'epoch': 1.25} 100% 2500/2500 [06:35<00:00, 10.84it/s][INFO|trainer.py:2907] 2023-02-14 22:11:06,282 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:11:06,283 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:11:06,283 >> Batch size = 8 0% 0/250 [00:00<?, ?it/s][A 1% 3/250 [00:00<00:11, 21.34it/s][A 2% 6/250 [00:00<00:14, 16.78it/s][A 3% 8/250 [00:00<00:15, 15.85it/s][A 4% 10/250 [00:00<00:15, 15.37it/s][A 5% 12/250 [00:00<00:15, 15.00it/s][A 6% 14/250 [00:00<00:15, 14.91it/s][A 6% 16/250 [00:01<00:15, 14.80it/s][A 7% 18/250 [00:01<00:15, 14.76it/s][A 8% 20/250 [00:01<00:15, 14.78it/s][A 9% 22/250 [00:01<00:15, 14.67it/s][A 10% 24/250 [00:01<00:15, 14.60it/s][A 10% 26/250 [00:01<00:15, 14.65it/s][A 11% 28/250 [00:01<00:15, 14.63it/s][A 12% 30/250 [00:01<00:14, 14.67it/s][A 13% 32/250 [00:02<00:14, 14.64it/s][A 14% 34/250 [00:02<00:14, 14.68it/s][A 14% 36/250 [00:02<00:14, 14.62it/s][A 15% 38/250 [00:02<00:14, 14.53it/s][A 16% 40/250 [00:02<00:14, 14.59it/s][A 17% 42/250 [00:02<00:14, 14.63it/s][A 18% 44/250 [00:02<00:14, 14.57it/s][A 18% 46/250 [00:03<00:13, 14.67it/s][A 19% 48/250 [00:03<00:13, 14.73it/s][A 20% 50/250 [00:03<00:13, 14.82it/s][A 21% 52/250 [00:03<00:13, 14.79it/s][A 22% 54/250 [00:03<00:13, 14.71it/s][A 22% 56/250 [00:03<00:13, 14.70it/s][A 23% 58/250 [00:03<00:13, 14.59it/s][A 24% 60/250 [00:04<00:13, 14.53it/s][A 25% 62/250 [00:04<00:12, 14.46it/s][A 26% 64/250 [00:04<00:12, 14.47it/s][A 26% 66/250 [00:04<00:12, 14.48it/s][A 27% 68/250 [00:04<00:12, 14.65it/s][A 28% 70/250 [00:04<00:12, 14.77it/s][A 29% 72/250 [00:04<00:12, 14.74it/s][A 30% 74/250 [00:04<00:12, 14.66it/s][A 30% 76/250 [00:05<00:11, 14.67it/s][A 31% 78/250 [00:05<00:11, 14.68it/s][A 32% 80/250 [00:05<00:11, 14.70it/s][A 33% 82/250 [00:05<00:11, 14.66it/s][A 34% 84/250 [00:05<00:11, 14.61it/s][A 34% 86/250 [00:05<00:11, 14.62it/s][A 35% 88/250 [00:05<00:11, 14.56it/s][A 36% 90/250 [00:06<00:10, 14.59it/s][A 37% 92/250 [00:06<00:10, 14.51it/s][A 38% 94/250 [00:06<00:10, 14.38it/s][A 38% 96/250 [00:06<00:10, 14.33it/s][A 39% 98/250 [00:06<00:10, 14.30it/s][A 40% 100/250 [00:06<00:10, 14.35it/s][A 41% 102/250 [00:06<00:10, 14.40it/s][A 42% 104/250 [00:07<00:10, 14.40it/s][A 42% 106/250 [00:07<00:10, 14.36it/s][A 43% 108/250 [00:07<00:09, 14.27it/s][A 44% 110/250 [00:07<00:09, 14.36it/s][A 45% 112/250 [00:07<00:09, 14.34it/s][A 46% 114/250 [00:07<00:09, 14.33it/s][A 46% 116/250 [00:07<00:09, 14.31it/s][A 47% 118/250 [00:08<00:09, 14.35it/s][A 48% 120/250 [00:08<00:09, 14.41it/s][A 49% 122/250 [00:08<00:08, 14.47it/s][A 50% 124/250 [00:08<00:08, 14.50it/s][A 50% 126/250 [00:08<00:08, 14.59it/s][A 51% 128/250 [00:08<00:08, 14.56it/s][A 52% 130/250 [00:08<00:08, 14.59it/s][A 53% 132/250 [00:09<00:08, 14.59it/s][A 54% 134/250 [00:09<00:07, 14.67it/s][A 54% 136/250 [00:09<00:07, 14.62it/s][A 55% 138/250 [00:09<00:07, 14.57it/s][A 56% 140/250 [00:09<00:07, 14.65it/s][A 57% 142/250 [00:09<00:07, 14.69it/s][A 58% 144/250 [00:09<00:07, 14.76it/s][A 58% 146/250 [00:09<00:07, 14.65it/s][A 59% 148/250 [00:10<00:06, 14.67it/s][A 60% 150/250 [00:10<00:06, 14.75it/s][A 61% 152/250 [00:10<00:06, 14.59it/s][A 62% 154/250 [00:10<00:06, 14.68it/s][A 62% 156/250 [00:10<00:06, 14.72it/s][A 63% 158/250 [00:10<00:06, 14.66it/s][A 64% 160/250 [00:10<00:06, 14.72it/s][A 65% 162/250 [00:11<00:05, 14.67it/s][A 66% 164/250 [00:11<00:05, 14.69it/s][A 66% 166/250 [00:11<00:05, 14.70it/s][A 67% 168/250 [00:11<00:05, 14.67it/s][A 68% 170/250 [00:11<00:05, 14.65it/s][A 69% 172/250 [00:11<00:05, 14.71it/s][A 70% 174/250 [00:11<00:05, 14.72it/s][A 70% 176/250 [00:12<00:05, 14.71it/s][A 71% 178/250 [00:12<00:04, 14.68it/s][A 72% 180/250 [00:12<00:04, 14.56it/s][A 73% 182/250 [00:12<00:04, 14.55it/s][A 74% 184/250 [00:12<00:04, 14.62it/s][A 74% 186/250 [00:12<00:04, 14.63it/s][A 75% 188/250 [00:12<00:04, 14.64it/s][A 76% 190/250 [00:12<00:04, 14.71it/s][A 77% 192/250 [00:13<00:03, 14.64it/s][A 78% 194/250 [00:13<00:03, 14.71it/s][A 78% 196/250 [00:13<00:03, 14.66it/s][A 79% 198/250 [00:13<00:03, 14.67it/s][A 80% 200/250 [00:13<00:03, 14.73it/s][A 81% 202/250 [00:13<00:03, 14.69it/s][A 82% 204/250 [00:13<00:03, 14.60it/s][A 82% 206/250 [00:14<00:03, 14.59it/s][A 83% 208/250 [00:14<00:02, 14.49it/s][A 100% 2500/2500 [06:49<00:00, 10.84it/s] 85% 212/250 [00:14<00:02, 14.53it/s][A 86% 214/250 [00:14<00:02, 14.51it/s][A 86% 216/250 [00:14<00:02, 14.54it/s][A 87% 218/250 [00:14<00:02, 14.56it/s][A 88% 220/250 [00:15<00:02, 14.67it/s][A 89% 222/250 [00:15<00:01, 14.66it/s][A 90% 224/250 [00:15<00:01, 14.68it/s][A 90% 226/250 [00:15<00:01, 14.68it/s][A 91% 228/250 [00:15<00:01, 14.78it/s][A 92% 230/250 [00:15<00:01, 14.83it/s][A 93% 232/250 [00:15<00:01, 14.82it/s][A 94% 234/250 [00:15<00:01, 14.74it/s][A 94% 236/250 [00:16<00:00, 14.72it/s][A 95% 238/250 [00:16<00:00, 14.71it/s][A 96% 240/250 [00:16<00:00, 14.70it/s][A 97% 242/250 [00:16<00:00, 14.73it/s][A 98% 244/250 [00:16<00:00, 14.74it/s][A 98% 246/250 [00:16<00:00, 14.65it/s][A 99% 248/250 [00:16<00:00, 14.69it/s][A 100% 250/250 [00:17<00:00, 14.64it/s][A {'eval_loss': 0.7921838760375977, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2721, 'eval_samples_per_second': 115.794, 'eval_steps_per_second': 14.474, 'epoch': 1.25} 100% 2500/2500 [06:52<00:00, 10.84it/s] [A[INFO|trainer.py:2656] 2023-02-14 22:11:23,556 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-2500 [INFO|configuration_utils.py:447] 2023-02-14 22:11:23,557 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-2500/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:11:24,033 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-2500/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:11:24,034 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-2500/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:11:24,034 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-2500/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:11:24,070 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-2500/spiece.model [INFO|trainer.py:1852] 2023-02-14 22:11:24,853 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:1946] 2023-02-14 22:11:24,854 >> Loading best model from out/emotion/t5_v1_1/checkpoint-500 (score: 1.0). {'train_runtime': 414.2608, 'train_samples_per_second': 48.279, 'train_steps_per_second': 6.035, 'train_loss': 3.8232721221923827, 'epoch': 1.25} 100% 2500/2500 [06:54<00:00, 6.03it/s] [INFO|trainer.py:2656] 2023-02-14 22:11:25,173 >> Saving model checkpoint to out/emotion/t5_v1_1 [INFO|configuration_utils.py:447] 2023-02-14 22:11:25,174 >> Configuration saved in out/emotion/t5_v1_1/config.json [INFO|modeling_utils.py:1624] 2023-02-14 22:11:25,662 >> Model weights saved in out/emotion/t5_v1_1/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2023-02-14 22:11:25,663 >> tokenizer config file saved in out/emotion/t5_v1_1/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2023-02-14 22:11:25,663 >> Special tokens file saved in out/emotion/t5_v1_1/special_tokens_map.json [INFO|tokenization_t5_fast.py:187] 2023-02-14 22:11:25,703 >> Copy vocab file to out/emotion/t5_v1_1/spiece.model ***** train metrics ***** epoch = 1.25 train_loss = 3.8233 train_runtime = 0:06:54.26 train_samples = 16000 train_samples_per_second = 48.279 train_steps_per_second = 6.035 INFO:__main__:*** Evaluate *** [INFO|trainer.py:2907] 2023-02-14 22:11:25,713 >> ***** Running Evaluation ***** [INFO|trainer.py:2909] 2023-02-14 22:11:25,713 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:11:25,713 >> Batch size = 8 100% 250/250 [00:17<00:00, 14.50it/s] ***** eval metrics ***** epoch = 1.25 eval_accuracy = 1.0 eval_bleu = 0.0 eval_gen_len = 2.0 eval_loss = 2.1697 eval_runtime = 0:00:17.31 eval_samples = 2000 eval_samples_per_second = 115.494 eval_steps_per_second = 14.437 INFO:__main__:*** Predict *** [INFO|trainer.py:2907] 2023-02-14 22:11:43,033 >> ***** Running Prediction ***** [INFO|trainer.py:2909] 2023-02-14 22:11:43,033 >> Num examples = 2000 [INFO|trainer.py:2912] 2023-02-14 22:11:43,034 >> Batch size = 8 100% 250/250 [00:17<00:00, 14.58it/s] ***** predict metrics ***** predict_accuracy = 1.0 predict_bleu = 0.0 predict_gen_len = 2.0 predict_loss = 2.1029 predict_runtime = 0:00:17.21 predict_samples = 2000 predict_samples_per_second = 116.158 predict_steps_per_second = 14.52 [INFO|modelcard.py:444] 2023-02-14 22:12:00,417 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Translation', 'type': 'translation'}, 'metrics': [{'name': 'Bleu', 'type': 'bleu', 'value': 0.0}, {'name': 'Accuracy', 'type': 'accuracy', 'value': 1.0}]}
FLAN T5
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import json
if torch.cuda.is_available():
device = 0
else:
device = -1
def perform_shot_learning(pipeline_type, model_name, test_file):
class_type = AutoModelForSeq2SeqLM
model = class_type.from_pretrained(model_name, torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained(model_name)
our_pipeline = pipeline(pipeline_type, model=model, tokenizer=tokenizer, device=device)
correct = 0
labels = "possible labels: sadness, joy, love, anger, fear, surprise"
with open(test_file) as f:
f_lines = f.readlines()
for line in f_lines:
ex = json.loads(line)
prompt = ex['text']
tmp = labels + '\n' + f'text: {prompt}' + '\n' + 'label: '
predict = our_pipeline(tmp, do_sample=False)[0]['generated_text']
if predict == ex['label']:
correct += 1
print(f'Accuracy: {correct/len(f_lines)}')
test_ds = 'data/s2s-test.json'
perform_shot_learning('text2text-generation', 'google/flan-t5-large', test_ds)
Downloading (…)okenizer_config.json: 0%| | 0.00/2.54k [00:00<?, ?B/s]
Downloading (…)"spiece.model";: 0%| | 0.00/792k [00:00<?, ?B/s]
Downloading (…)/main/tokenizer.json: 0%| | 0.00/2.42M [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.20k [00:00<?, ?B/s]
/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py:1043: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset warnings.warn(
Accuracy: 0.647
!zip -r /content/projekt.zip /content/
adding: content/ (stored 0%) adding: content/.config/ (stored 0%) adding: content/.config/config_sentinel (stored 0%) adding: content/.config/logs/ (stored 0%) adding: content/.config/logs/2023.02.10/ (stored 0%) adding: content/.config/logs/2023.02.10/14.32.38.026074.log (deflated 58%) adding: content/.config/logs/2023.02.10/14.33.38.691407.log (deflated 56%) adding: content/.config/logs/2023.02.10/14.33.11.427170.log (deflated 58%) adding: content/.config/logs/2023.02.10/14.33.37.863925.log (deflated 57%) adding: content/.config/logs/2023.02.10/14.32.12.281772.log (deflated 91%) adding: content/.config/logs/2023.02.10/14.33.03.230973.log (deflated 86%) adding: content/.config/gce (stored 0%) adding: content/.config/.last_survey_prompt.yaml (stored 0%) adding: content/.config/configurations/ (stored 0%) adding: content/.config/configurations/config_default (deflated 15%) adding: content/.config/active_config (stored 0%) adding: content/.config/.last_update_check.json (deflated 22%) adding: content/.config/.last_opt_in_prompt.yaml (stored 0%) adding: content/__pycache__/ (stored 0%) adding: content/__pycache__/roberta.cpython-38.pyc (deflated 62%) adding: content/__pycache__/gpt2.cpython-38.pyc (deflated 53%) adding: content/data/ (stored 0%) adding: content/data/.ipynb_checkpoints/ (stored 0%) adding: content/data/test.json (deflated 69%) adding: content/data/s2s-test.json (deflated 70%) adding: content/data/s2s-valid.json (deflated 70%) adding: content/data/valid.json (deflated 69%) adding: content/data/s2s-train.json (deflated 70%) adding: content/data/train.json (deflated 69%) adding: content/req.txt (deflated 30%) adding: content/.cache_training_roberta/ (stored 0%) adding: content/.cache_training_roberta/.cache_training_roberta_json_default-1808ac39383e9432_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%) adding: content/.cache_training_roberta/json/ (stored 0%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/ (stored 0%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/ (stored 0%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-ff4234a2fb1a9582.arrow (deflated 88%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6bbf8957e5f0cf7b.arrow (deflated 88%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-5efe26f1bca5cac0.arrow (deflated 88%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%) adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/blobs/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%) adding: content/.cache_training_roberta/models--roberta-base/blobs/5606f48548d99a9829d10a96cd364b816b02cd21 (deflated 63%) adding: content/.cache_training_roberta/models--roberta-base/blobs/ad0bcbeb288f0d1373d88e0762e66357f55b8311 (deflated 59%) adding: content/.cache_training_roberta/models--roberta-base/blobs/8db5e7ac5bfc9ec8b613b776009300fe3685d957 (deflated 47%) adding: content/.cache_training_roberta/models--roberta-base/blobs/278b7a95739c4392fae9b818bb5343dde20be1b89318f37a6d939e1e1b9e461b (deflated 41%) adding: content/.cache_training_roberta/models--roberta-base/refs/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/refs/main (deflated 3%) adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/tokenizer_config.json (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/added_tokens.json (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/special_tokens_map.json (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json (deflated 47%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json (deflated 59%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt (deflated 53%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json (deflated 63%) adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin (deflated 41%) adding: content/cache_training_t5/ (stored 0%) adding: content/cache_training_t5/cache_training_t5_json_default-25a5883a4a222bad_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/4e28ff6ebdf584f5372d9de68867399142435d9a (deflated 48%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/b114c318caf72f6e89ea92e0755c41327a453198 (deflated 82%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/07b81619b82546ab7f30e06c9615c7fca8fe3abd (deflated 44%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/881bdbffc06e471924ecea57f962bc5f8e2a9f21 (deflated 83%) adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/7c9a3e998a8c74b52484f3a1ccfdcc9767972ee6b34ae7a527cdf6f972a34163 (deflated 53%) adding: content/cache_training_t5/models--google--t5-v1_1-small/refs/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/refs/main (deflated 5%) adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer.json (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/added_tokens.json (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json (deflated 44%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json (deflated 82%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model (deflated 48%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json (deflated 83%) adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin (deflated 53%) adding: content/cache_training_t5/json/ (stored 0%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/ (stored 0%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/ (stored 0%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bef49b953c77fdf0.arrow (deflated 74%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-105206b5fd478147.arrow (deflated 74%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-040b968aed3576f7.arrow (deflated 74%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 62%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f37cf2f406b18541.arrow (deflated 74%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 62%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 58%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-b0aef076d30fe2f7.arrow (deflated 74%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 62%) adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%) adding: content/run_glue.py (deflated 73%) adding: content/run_translation.py (deflated 74%) adding: content/roberta_custom_training_cache/ (stored 0%) adding: content/roberta_custom_training_cache/roberta_custom_training_cache_json_default-01aa9d8252a24a0d_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%) adding: content/roberta_custom_training_cache/json/ (stored 0%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/ (stored 0%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/ (stored 0%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow (deflated 88%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow (deflated 88%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow (deflated 88%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%) adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/5606f48548d99a9829d10a96cd364b816b02cd21 (deflated 63%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/ad0bcbeb288f0d1373d88e0762e66357f55b8311 (deflated 59%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/8db5e7ac5bfc9ec8b613b776009300fe3685d957 (deflated 47%) adding: content/roberta_custom_training_cache/models--roberta-base/blobs/278b7a95739c4392fae9b818bb5343dde20be1b89318f37a6d939e1e1b9e461b (deflated 41%) adding: content/roberta_custom_training_cache/models--roberta-base/refs/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/refs/main (deflated 3%) adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/tokenizer_config.json (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/added_tokens.json (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/special_tokens_map.json (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json (deflated 47%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json (deflated 59%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt (deflated 53%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json (deflated 63%) adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin (deflated 41%) adding: content/gtp_cache_training/ (stored 0%) adding: content/gtp_cache_training/json/ (stored 0%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/ (stored 0%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/ (stored 0%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow (deflated 88%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow (deflated 88%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow (deflated 88%) adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%) adding: content/gtp_cache_training/models--gpt2/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/blobs/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%) adding: content/gtp_cache_training/models--gpt2/blobs/7c5d3f4b8b76583b422fcb9189ad6c89d5d97a094541ce8932dce3ecabde1421 (deflated 16%) adding: content/gtp_cache_training/models--gpt2/blobs/1f1d9aaca301414e7f6c9396df506798ff4eb9a6 (deflated 67%) adding: content/gtp_cache_training/models--gpt2/blobs/10c66461e4c109db5a2196bff4bb59be30396ed8 (deflated 50%) adding: content/gtp_cache_training/models--gpt2/blobs/4b988bccc9dc5adacd403c00b4704976196548f8 (deflated 59%) adding: content/gtp_cache_training/models--gpt2/refs/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/refs/main (deflated 3%) adding: content/gtp_cache_training/models--gpt2/.no_exist/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer_config.json (stored 0%) adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/added_tokens.json (stored 0%) adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/special_tokens_map.json (stored 0%) adding: content/gtp_cache_training/models--gpt2/snapshots/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/ (stored 0%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json (deflated 50%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json (deflated 59%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt (deflated 53%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json (deflated 67%) adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin (deflated 16%) adding: content/gtp_cache_training/gtp_cache_training_json_default-01aa9d8252a24a0d_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%) adding: content/t5_cache_training/ (stored 0%) adding: content/t5_cache_training/t5_cache_training_json_default-a82ca4164dba097e_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/4e28ff6ebdf584f5372d9de68867399142435d9a (deflated 48%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/b114c318caf72f6e89ea92e0755c41327a453198 (deflated 82%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/07b81619b82546ab7f30e06c9615c7fca8fe3abd (deflated 44%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/881bdbffc06e471924ecea57f962bc5f8e2a9f21 (deflated 83%) adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/7c9a3e998a8c74b52484f3a1ccfdcc9767972ee6b34ae7a527cdf6f972a34163 (deflated 53%) adding: content/t5_cache_training/models--google--t5-v1_1-small/refs/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/refs/main (deflated 5%) adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer.json (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/added_tokens.json (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json (deflated 44%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json (deflated 82%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model (deflated 48%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json (deflated 83%) adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin (deflated 53%) adding: content/t5_cache_training/json/ (stored 0%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/ (stored 0%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/ (stored 0%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 62%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-988bff0993eee389.arrow (deflated 74%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 62%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 58%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa17416eabe18767.arrow (deflated 74%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-c6cebbf9290f7df0.arrow (deflated 74%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 62%) adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%) adding: content/out/ (stored 0%) adding: content/out/emotion/ (stored 0%) adding: content/out/emotion/gpt2_custom/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/trainer_state.json (deflated 79%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/optimizer.pt (deflated 30%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-2000/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2_custom/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/all_results.json (deflated 56%) adding: content/out/emotion/gpt2_custom/predict_results_None.txt (deflated 62%) adding: content/out/emotion/gpt2_custom/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/trainer_state.json (deflated 80%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/trainer_state.json (deflated 77%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/optimizer.pt (deflated 30%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-1500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2_custom/train_results.json (deflated 40%) adding: content/out/emotion/gpt2_custom/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/eval_results.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/scheduler.pt (deflated 50%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/trainer_state.json (deflated 80%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/optimizer.pt (deflated 30%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-2500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2_custom/runs/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/1676409101.551365/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/1676409101.551365/events.out.tfevents.1676409101.fc0011e45a00.60473.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/events.out.tfevents.1676409101.fc0011e45a00.60473.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/events.out.tfevents.1676407620.fc0011e45a00.53924.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/1676407620.269752/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/1676407620.269752/events.out.tfevents.1676407620.fc0011e45a00.53924.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/events.out.tfevents.1676411802.fc0011e45a00.72811.0 (deflated 63%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/events.out.tfevents.1676412248.fc0011e45a00.72811.2 (deflated 28%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/1676411802.9557116/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/1676411802.9557116/events.out.tfevents.1676411802.fc0011e45a00.72811.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/events.out.tfevents.1676409199.fc0011e45a00.60936.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/1676409199.1303008/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/1676409199.1303008/events.out.tfevents.1676409199.fc0011e45a00.60936.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/1676408364.7675455/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/1676408364.7675455/events.out.tfevents.1676408364.fc0011e45a00.57251.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/events.out.tfevents.1676408364.fc0011e45a00.57251.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/events.out.tfevents.1676409294.fc0011e45a00.61381.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/1676409294.483754/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/1676409294.483754/events.out.tfevents.1676409294.fc0011e45a00.61381.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/events.out.tfevents.1676407574.fc0011e45a00.53675.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/1676407574.5370467/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/1676407574.5370467/events.out.tfevents.1676407574.fc0011e45a00.53675.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/1676409363.3658211/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/1676409363.3658211/events.out.tfevents.1676409363.fc0011e45a00.61724.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/events.out.tfevents.1676409363.fc0011e45a00.61724.0 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/events.out.tfevents.1676407449.fc0011e45a00.53094.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/1676407449.3215246/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/1676407449.3215246/events.out.tfevents.1676407449.fc0011e45a00.53094.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/events.out.tfevents.1676408949.fc0011e45a00.59782.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/1676408949.6798263/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/1676408949.6798263/events.out.tfevents.1676408949.fc0011e45a00.59782.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/events.out.tfevents.1676410915.fc0011e45a00.68705.0 (deflated 57%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/1676410915.0364006/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/1676410915.0364006/events.out.tfevents.1676410915.fc0011e45a00.68705.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/events.out.tfevents.1676407741.fc0011e45a00.54546.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/1676407741.3566854/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/1676407741.3566854/events.out.tfevents.1676407741.fc0011e45a00.54546.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/events.out.tfevents.1676407672.fc0011e45a00.54203.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/1676407672.9366086/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/1676407672.9366086/events.out.tfevents.1676407672.fc0011e45a00.54203.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/events.out.tfevents.1676408205.fc0011e45a00.56536.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/1676408205.8404686/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/1676408205.8404686/events.out.tfevents.1676408205.fc0011e45a00.56536.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/1676408153.0722597/ (stored 0%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/1676408153.0722597/events.out.tfevents.1676408153.fc0011e45a00.56263.1 (deflated 62%) adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/events.out.tfevents.1676408153.fc0011e45a00.56263.0 (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/trainer_state.json (deflated 75%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/optimizer.pt (deflated 30%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-1000/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2_custom/README.md (deflated 54%) adding: content/out/emotion/gpt2_custom/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-500/ (stored 0%) adding: content/out/emotion/gpt2_custom/checkpoint-500/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2_custom/checkpoint-500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2_custom/checkpoint-500/config.json (deflated 56%) adding: content/out/emotion/gpt2_custom/checkpoint-500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2_custom/checkpoint-500/trainer_state.json (deflated 67%) adding: content/out/emotion/gpt2_custom/checkpoint-500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2_custom/checkpoint-500/optimizer.pt (deflated 31%) adding: content/out/emotion/gpt2_custom/checkpoint-500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2_custom/checkpoint-500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2_custom/checkpoint-500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2_custom/checkpoint-500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2_custom/checkpoint-500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2_custom/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-2000/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-2000/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2/checkpoint-2000/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2/checkpoint-2000/config.json (deflated 56%) adding: content/out/emotion/gpt2/checkpoint-2000/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-2000/trainer_state.json (deflated 80%) adding: content/out/emotion/gpt2/checkpoint-2000/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/checkpoint-2000/optimizer.pt (deflated 29%) adding: content/out/emotion/gpt2/checkpoint-2000/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/checkpoint-2000/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/checkpoint-2000/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/checkpoint-2000/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-2000/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/config.json (deflated 56%) adding: content/out/emotion/gpt2/all_results.json (deflated 55%) adding: content/out/emotion/gpt2/predict_results_None.txt (deflated 62%) adding: content/out/emotion/gpt2/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/trainer_state.json (deflated 81%) adding: content/out/emotion/gpt2/checkpoint-1500/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-1500/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2/checkpoint-1500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2/checkpoint-1500/config.json (deflated 56%) adding: content/out/emotion/gpt2/checkpoint-1500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-1500/trainer_state.json (deflated 78%) adding: content/out/emotion/gpt2/checkpoint-1500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/checkpoint-1500/optimizer.pt (deflated 29%) adding: content/out/emotion/gpt2/checkpoint-1500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/checkpoint-1500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/checkpoint-1500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/checkpoint-1500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-1500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/train_results.json (deflated 41%) adding: content/out/emotion/gpt2/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/eval_results.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-2500/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-2500/scheduler.pt (deflated 50%) adding: content/out/emotion/gpt2/checkpoint-2500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2/checkpoint-2500/config.json (deflated 56%) adding: content/out/emotion/gpt2/checkpoint-2500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-2500/trainer_state.json (deflated 81%) adding: content/out/emotion/gpt2/checkpoint-2500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/checkpoint-2500/optimizer.pt (deflated 29%) adding: content/out/emotion/gpt2/checkpoint-2500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/checkpoint-2500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/checkpoint-2500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/checkpoint-2500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-2500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/runs/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/events.out.tfevents.1676411778.fc0011e45a00.70872.2 (deflated 28%) adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/1676411348.7268953/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/1676411348.7268953/events.out.tfevents.1676411348.fc0011e45a00.70872.1 (deflated 62%) adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/events.out.tfevents.1676411348.fc0011e45a00.70872.0 (deflated 63%) adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/events.out.tfevents.1676407272.fc0011e45a00.50524.2 (deflated 28%) adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/events.out.tfevents.1676406850.fc0011e45a00.50524.0 (deflated 63%) adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/1676406850.2390406/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/1676406850.2390406/events.out.tfevents.1676406850.fc0011e45a00.50524.1 (deflated 62%) adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/events.out.tfevents.1676403875.fc0011e45a00.37469.0 (deflated 60%) adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/1676403875.9091897/ (stored 0%) adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/1676403875.9091897/events.out.tfevents.1676403875.fc0011e45a00.37469.1 (deflated 62%) adding: content/out/emotion/gpt2/checkpoint-1000/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-1000/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2/checkpoint-1000/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2/checkpoint-1000/config.json (deflated 56%) adding: content/out/emotion/gpt2/checkpoint-1000/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-1000/trainer_state.json (deflated 75%) adding: content/out/emotion/gpt2/checkpoint-1000/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/checkpoint-1000/optimizer.pt (deflated 29%) adding: content/out/emotion/gpt2/checkpoint-1000/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/checkpoint-1000/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/checkpoint-1000/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/checkpoint-1000/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-1000/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/README.md (deflated 54%) adding: content/out/emotion/gpt2/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-500/ (stored 0%) adding: content/out/emotion/gpt2/checkpoint-500/scheduler.pt (deflated 49%) adding: content/out/emotion/gpt2/checkpoint-500/rng_state.pth (deflated 28%) adding: content/out/emotion/gpt2/checkpoint-500/config.json (deflated 56%) adding: content/out/emotion/gpt2/checkpoint-500/tokenizer_config.json (deflated 41%) adding: content/out/emotion/gpt2/checkpoint-500/trainer_state.json (deflated 67%) adding: content/out/emotion/gpt2/checkpoint-500/tokenizer.json (deflated 72%) adding: content/out/emotion/gpt2/checkpoint-500/optimizer.pt (deflated 30%) adding: content/out/emotion/gpt2/checkpoint-500/training_args.bin (deflated 48%) adding: content/out/emotion/gpt2/checkpoint-500/special_tokens_map.json (deflated 60%) adding: content/out/emotion/gpt2/checkpoint-500/merges.txt (deflated 53%) adding: content/out/emotion/gpt2/checkpoint-500/vocab.json (deflated 59%) adding: content/out/emotion/gpt2/checkpoint-500/pytorch_model.bin (deflated 9%) adding: content/out/emotion/gpt2/pytorch_model.bin zip error: Interrupted (aborting)