UG-final/all_models.ipynb
2023-02-14 23:44:20 +01:00

431 KiB
Raw Blame History

Uczenie Głębokie - projekt

W projekcie wykorzystano dataset emotion, zawierający wpisy nacechowane określonymi emocjami.


Labels:

  • 0 - sadness
  • 1 - joy
  • 2 - love
  • 3 - anger
  • 4 - fear
  • 5 - surprise

REQUIREMENTS

!pip3 install transformers scikit-learn accelerate evaluate datasets torch sentencepiece torchvision
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: transformers in /usr/local/lib/python3.8/dist-packages (4.23.1)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/dist-packages (1.2.1)
Requirement already satisfied: accelerate in /usr/local/lib/python3.8/dist-packages (0.16.0)
Requirement already satisfied: evaluate in /usr/local/lib/python3.8/dist-packages (0.4.0)
Requirement already satisfied: datasets in /usr/local/lib/python3.8/dist-packages (2.9.0)
Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (1.13.1)
Requirement already satisfied: sentencepiece in /usr/local/lib/python3.8/dist-packages (0.1.97)
Requirement already satisfied: torchvision in /usr/local/lib/python3.8/dist-packages (0.14.1+cu116)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers) (3.9.0)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers) (4.64.1)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (1.21.6)
Requirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.12.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers) (2022.6.2)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (0.13.2)
Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers) (2.25.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers) (6.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers) (23.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (3.1.0)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.2.0)
Requirement already satisfied: scipy>=1.3.2 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.7.3)
Requirement already satisfied: psutil in /usr/local/lib/python3.8/dist-packages (from accelerate) (5.4.8)
Requirement already satisfied: dill in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.3.6)
Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.18.0)
Requirement already satisfied: fsspec[http]>=2021.05.0 in /usr/local/lib/python3.8/dist-packages (from evaluate) (2023.1.0)
Requirement already satisfied: xxhash in /usr/local/lib/python3.8/dist-packages (from evaluate) (3.2.0)
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from evaluate) (1.3.5)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.8/dist-packages (from evaluate) (0.70.14)
Requirement already satisfied: pyarrow>=6.0.0 in /usr/local/lib/python3.8/dist-packages (from datasets) (9.0.0)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from datasets) (3.8.3)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /usr/local/lib/python3.8/dist-packages (from torch) (11.10.3.66)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /usr/local/lib/python3.8/dist-packages (from torch) (11.7.99)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.8/dist-packages (from torch) (4.4.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /usr/local/lib/python3.8/dist-packages (from torch) (11.7.99)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /usr/local/lib/python3.8/dist-packages (from torch) (8.5.0.96)
Requirement already satisfied: wheel in /usr/local/lib/python3.8/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.38.4)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (57.4.0)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.8/dist-packages (from torchvision) (7.1.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.8.2)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (6.0.4)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (22.2.0)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->datasets) (2.1.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (1.26.14)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: chardet<5,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers) (4.0.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas->evaluate) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->evaluate) (2022.7.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.7.3->pandas->evaluate) (1.15.0)
import os
import json
from pathlib import Path
from typing import Dict, List
from datasets import load_dataset
import torch
import pandas as pd

os.environ['TOKENIZERS_PARALLELISM'] = 'true'

DATA PREP

!mkdir -p data
!python data_prep.py
No config specified, defaulting to: emotion/split
Found cached dataset emotion (/root/.cache/huggingface/datasets/emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)

  0% 0/3 [00:00<?, ?it/s]
100% 3/3 [00:00<00:00, 182.77it/s]
Saving into: data/train.json
Saving into: data/s2s-train.json
Saving into: data/valid.json
Saving into: data/s2s-valid.json
Saving into: data/test.json
Saving into: data/s2s-test.json
!head data/train.json
{"label": 0, "text": "i didnt feel humiliated"}
{"label": 0, "text": "i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake"}
{"label": 3, "text": "im grabbing a minute to post i feel greedy wrong"}
{"label": 2, "text": "i am ever feeling nostalgic about the fireplace i will know that it is still on the property"}
{"label": 3, "text": "i am feeling grouchy"}
{"label": 0, "text": "ive been feeling a little burdened lately wasnt sure why that was"}
{"label": 5, "text": "ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny"}
{"label": 4, "text": "i feel as confused about life as a teenager or as jaded as a year old man"}
{"label": 1, "text": "i have been with petronas for years i feel that petronas has performed well and made a huge profit"}
{"label": 2, "text": "i feel romantic too"}
!head data/s2s-train.json
{"label": "sadness", "text": "i didnt feel humiliated"}
{"label": "sadness", "text": "i can go from feeling so hopeless to so damned hopeful just from being around someone who cares and is awake"}
{"label": "anger", "text": "im grabbing a minute to post i feel greedy wrong"}
{"label": "love", "text": "i am ever feeling nostalgic about the fireplace i will know that it is still on the property"}
{"label": "anger", "text": "i am feeling grouchy"}
{"label": "sadness", "text": "ive been feeling a little burdened lately wasnt sure why that was"}
{"label": "surprise", "text": "ive been taking or milligrams or times recommended amount and ive fallen asleep a lot faster but i also feel like so funny"}
{"label": "fear", "text": "i feel as confused about life as a teenager or as jaded as a year old man"}
{"label": "joy", "text": "i have been with petronas for years i feel that petronas has performed well and made a huge profit"}
{"label": "love", "text": "i feel romantic too"}
!wc -l data/*
   2000 data/s2s-test.json
  16000 data/s2s-train.json
   2000 data/s2s-valid.json
   2000 data/test.json
  16000 data/train.json
   2000 data/valid.json
  40000 total

ROBERTA

  • full data
  • model roberta-base
  • sequnece length: 128
  • training epoch: 1
!python run_glue.py \
  --cache_dir roberta_training_cache \
  --model_name_or_path roberta-base \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --do_train \
  --do_eval \
  --do_predict \
  --max_seq_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 1 \
  --output_dir out/emotion/roberta  \
  --overwrite_output_dir
2023-02-14 21:44:57.299984: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 21:44:57.452345: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-14 21:44:58.236913: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:44:58.237017: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:44:58.237058: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
INFO:__main__:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=out/emotion/roberta/runs/Feb14_21-45-00_fc0011e45a00,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=out/emotion/roberta,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=24,
per_device_train_batch_size=24,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=out/emotion/roberta,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
INFO:__main__:load a local file for train: data/train.json
INFO:__main__:load a local file for validation: data/valid.json
INFO:__main__:load a local file for test: data/test.json
WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json
INFO:datasets.builder:Generating dataset json (/content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Downloading and preparing dataset json/default to /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100% 3/3 [00:00<00:00, 11491.24it/s]
INFO:datasets.download.download_manager:Downloading took 0.0 min
INFO:datasets.download.download_manager:Checksum Computation took 0.0 min
Extracting data files: 100% 3/3 [00:00<00:00, 1882.54it/s]
INFO:datasets.utils.info_utils:Unable to verify checksums.
INFO:datasets.builder:Generating train split
INFO:datasets.builder:Generating validation split
INFO:datasets.builder:Generating test split
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 573.49it/s]
Downloading (…)lve/main/config.json: 100% 481/481 [00:00<00:00, 83.8kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 21:45:01,575 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:45:01,576 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|tokenization_auto.py:418] 2023-02-14 21:45:01,670 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:653] 2023-02-14 21:45:01,762 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:45:01,763 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 9.36MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.95MB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 11.7MB/s]
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,975 >> loading file vocab.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file merges.txt from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file tokenizer.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:45:02,976 >> loading file tokenizer_config.json from cache at None
[INFO|configuration_utils.py:653] 2023-02-14 21:45:02,976 >> loading configuration file config.json from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:45:02,977 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

INFO:__main__:Using implementation from class: AutoModelForSequenceClassification
Downloading (…)"pytorch_model.bin";: 100% 501M/501M [00:04<00:00, 105MB/s]
[INFO|modeling_utils.py:2156] 2023-02-14 21:45:08,072 >> loading weights file pytorch_model.bin from cache at roberta_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin
[WARNING|modeling_utils.py:2596] 2023-02-14 21:45:09,415 >> Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.bias', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2608] 2023-02-14 21:45:09,415 >> Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Frozen layers:
[('roberta.encoder.layer.0.attention.self.query.weight', False), ('roberta.encoder.layer.0.attention.self.query.bias', False), ('roberta.encoder.layer.0.attention.self.key.weight', False), ('roberta.encoder.layer.0.attention.self.key.bias', False), ('roberta.encoder.layer.0.attention.self.value.weight', False), ('roberta.encoder.layer.0.attention.self.value.bias', False), ('roberta.encoder.layer.0.attention.output.dense.weight', False), ('roberta.encoder.layer.0.attention.output.dense.bias', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.0.intermediate.dense.weight', False), ('roberta.encoder.layer.0.intermediate.dense.bias', False), ('roberta.encoder.layer.0.output.dense.weight', False), ('roberta.encoder.layer.0.output.dense.bias', False), ('roberta.encoder.layer.0.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.attention.self.query.weight', False), ('roberta.encoder.layer.2.attention.self.query.bias', False), ('roberta.encoder.layer.2.attention.self.key.weight', False), ('roberta.encoder.layer.2.attention.self.key.bias', False), ('roberta.encoder.layer.2.attention.self.value.weight', False), ('roberta.encoder.layer.2.attention.self.value.bias', False), ('roberta.encoder.layer.2.attention.output.dense.weight', False), ('roberta.encoder.layer.2.attention.output.dense.bias', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.intermediate.dense.weight', False), ('roberta.encoder.layer.2.intermediate.dense.bias', False), ('roberta.encoder.layer.2.output.dense.weight', False), ('roberta.encoder.layer.2.output.dense.bias', False), ('roberta.encoder.layer.2.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.attention.self.query.weight', False), ('roberta.encoder.layer.4.attention.self.query.bias', False), ('roberta.encoder.layer.4.attention.self.key.weight', False), ('roberta.encoder.layer.4.attention.self.key.bias', False), ('roberta.encoder.layer.4.attention.self.value.weight', False), ('roberta.encoder.layer.4.attention.self.value.bias', False), ('roberta.encoder.layer.4.attention.output.dense.weight', False), ('roberta.encoder.layer.4.attention.output.dense.bias', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.intermediate.dense.weight', False), ('roberta.encoder.layer.4.intermediate.dense.bias', False), ('roberta.encoder.layer.4.output.dense.weight', False), ('roberta.encoder.layer.4.output.dense.bias', False), ('roberta.encoder.layer.4.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.attention.self.query.weight', False), ('roberta.encoder.layer.6.attention.self.query.bias', False), ('roberta.encoder.layer.6.attention.self.key.weight', False), ('roberta.encoder.layer.6.attention.self.key.bias', False), ('roberta.encoder.layer.6.attention.self.value.weight', False), ('roberta.encoder.layer.6.attention.self.value.bias', False), ('roberta.encoder.layer.6.attention.output.dense.weight', False), ('roberta.encoder.layer.6.attention.output.dense.bias', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.intermediate.dense.weight', False), ('roberta.encoder.layer.6.intermediate.dense.bias', False), ('roberta.encoder.layer.6.output.dense.weight', False), ('roberta.encoder.layer.6.output.dense.bias', False), ('roberta.encoder.layer.6.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.attention.self.query.weight', False), ('roberta.encoder.layer.8.attention.self.query.bias', False), ('roberta.encoder.layer.8.attention.self.key.weight', False), ('roberta.encoder.layer.8.attention.self.key.bias', False), ('roberta.encoder.layer.8.attention.self.value.weight', False), ('roberta.encoder.layer.8.attention.self.value.bias', False), ('roberta.encoder.layer.8.attention.output.dense.weight', False), ('roberta.encoder.layer.8.attention.output.dense.bias', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.intermediate.dense.weight', False), ('roberta.encoder.layer.8.intermediate.dense.bias', False), ('roberta.encoder.layer.8.output.dense.weight', False), ('roberta.encoder.layer.8.output.dense.bias', False), ('roberta.encoder.layer.8.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.attention.self.query.weight', False), ('roberta.encoder.layer.10.attention.self.query.bias', False), ('roberta.encoder.layer.10.attention.self.key.weight', False), ('roberta.encoder.layer.10.attention.self.key.bias', False), ('roberta.encoder.layer.10.attention.self.value.weight', False), ('roberta.encoder.layer.10.attention.self.value.bias', False), ('roberta.encoder.layer.10.attention.output.dense.weight', False), ('roberta.encoder.layer.10.attention.output.dense.bias', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.intermediate.dense.weight', False), ('roberta.encoder.layer.10.intermediate.dense.bias', False), ('roberta.encoder.layer.10.output.dense.weight', False), ('roberta.encoder.layer.10.output.dense.bias', False), ('roberta.encoder.layer.10.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.output.LayerNorm.bias', False)] 


Running tokenizer on dataset:   0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow
Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 20.66ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00,  7.58ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.81ba/s]
INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [0, 118, 109, 465, 92, 964, 939, 475, 164, 7, 860, 1823, 543, 7, 146, 106, 1095, 8, 114, 939, 2845, 14, 939, 218, 326, 236, 7, 619, 2581, 456, 8, 95, 3068, 66, 5, 94, 76, 9, 334, 15, 127, 308, 939, 475, 164, 7, 33, 7, 860, 1823, 543, 45, 7, 575, 99, 82, 206, 9, 162, 145, 10, 784, 9604, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [0, 118, 553, 106, 7, 1962, 162, 11, 2351, 10, 232, 147, 70, 76, 793, 1972, 115, 1733, 62, 2157, 7917, 8, 2247, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [0, 118, 619, 77, 47, 32, 10, 10837, 621, 47, 5696, 97, 10837, 82, 88, 110, 301, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
[INFO|trainer.py:725] 2023-02-14 21:45:13,102 >> The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1607] 2023-02-14 21:45:13,109 >> ***** Running training *****
[INFO|trainer.py:1608] 2023-02-14 21:45:13,109 >>   Num examples = 16000
[INFO|trainer.py:1609] 2023-02-14 21:45:13,109 >>   Num Epochs = 1
[INFO|trainer.py:1610] 2023-02-14 21:45:13,109 >>   Instantaneous batch size per device = 24
[INFO|trainer.py:1611] 2023-02-14 21:45:13,109 >>   Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:1612] 2023-02-14 21:45:13,109 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1613] 2023-02-14 21:45:13,109 >>   Total optimization steps = 667
{'loss': 0.8083, 'learning_rate': 5.0074962518740634e-06, 'epoch': 0.75}
 75% 500/667 [00:58<00:19,  8.76it/s][INFO|trainer.py:2656] 2023-02-14 21:46:11,148 >> Saving model checkpoint to out/emotion/roberta/checkpoint-500
[INFO|configuration_utils.py:447] 2023-02-14 21:46:11,149 >> Configuration saved in out/emotion/roberta/checkpoint-500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:46:12,047 >> Model weights saved in out/emotion/roberta/checkpoint-500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:46:12,048 >> tokenizer config file saved in out/emotion/roberta/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:46:12,048 >> Special tokens file saved in out/emotion/roberta/checkpoint-500/special_tokens_map.json
100% 666/667 [01:19<00:00,  8.78it/s][INFO|trainer.py:1852] 2023-02-14 21:46:32,443 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 79.3341, 'train_samples_per_second': 201.679, 'train_steps_per_second': 8.407, 'train_loss': 0.7161429089227359, 'epoch': 1.0}
100% 667/667 [01:19<00:00,  8.41it/s]
[INFO|trainer.py:2656] 2023-02-14 21:46:32,445 >> Saving model checkpoint to out/emotion/roberta
[INFO|configuration_utils.py:447] 2023-02-14 21:46:32,446 >> Configuration saved in out/emotion/roberta/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:46:33,422 >> Model weights saved in out/emotion/roberta/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:46:33,422 >> tokenizer config file saved in out/emotion/roberta/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:46:33,423 >> Special tokens file saved in out/emotion/roberta/special_tokens_map.json
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     0.7161
  train_runtime            = 0:01:19.33
  train_samples            =      16000
  train_samples_per_second =    201.679
  train_steps_per_second   =      8.407
INFO:__main__:*** Evaluate ***
[INFO|trainer.py:725] 2023-02-14 21:46:33,524 >> The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:46:33,526 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:46:33,526 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:46:33,526 >>   Batch size = 24
100% 84/84 [00:03<00:00, 23.66it/s]
***** eval metrics *****
  epoch                   =        1.0
  eval_accuracy           =      0.889
  eval_loss               =     0.3302
  eval_runtime            = 0:00:03.59
  eval_samples            =       2000
  eval_samples_per_second =    556.411
  eval_steps_per_second   =     23.369
INFO:__main__:*** Predict ***
[INFO|trainer.py:725] 2023-02-14 21:46:37,124 >> The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:46:37,125 >> ***** Running Prediction *****
[INFO|trainer.py:2909] 2023-02-14 21:46:37,125 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:46:37,125 >>   Batch size = 24
100% 84/84 [00:03<00:00, 23.68it/s]
INFO:__main__:***** Predict results None *****
[INFO|modelcard.py:444] 2023-02-14 21:46:40,840 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.8889999985694885}]}
  • full data
  • sequence length: 128
  • leakyRelu instad of relu
  • every other layer frozen
  • custom head
!python run_glue.py \
  --cache_dir roberta_custom_training_cache \
  --model_name_or_path roberta-base \
  --custom_model roberta_custom \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --do_train \
  --do_eval \
  --do_predict \
  --max_seq_length 128 \
  --learning_rate 2e-5 \
  --num_train_epochs 1 \
  --output_dir out/emotion/roberta_custom  \
  --overwrite_output_dir
2023-02-14 21:47:02.722049: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 21:47:02.876002: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-14 21:47:03.659342: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:47:03.659451: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:47:03.659470: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
INFO:__main__:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=out/emotion/roberta_custom/runs/Feb14_21-47-05_fc0011e45a00,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=out/emotion/roberta_custom,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=24,
per_device_train_batch_size=24,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=out/emotion/roberta_custom,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
INFO:__main__:load a local file for train: data/train.json
INFO:__main__:load a local file for validation: data/valid.json
INFO:__main__:load a local file for test: data/test.json
WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json
INFO:datasets.builder:Generating dataset json (/content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Downloading and preparing dataset json/default to /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100% 3/3 [00:00<00:00, 14463.12it/s]
INFO:datasets.download.download_manager:Downloading took 0.0 min
INFO:datasets.download.download_manager:Checksum Computation took 0.0 min
Extracting data files: 100% 3/3 [00:00<00:00, 2119.76it/s]
INFO:datasets.utils.info_utils:Unable to verify checksums.
INFO:datasets.builder:Generating train split
INFO:datasets.builder:Generating validation split
INFO:datasets.builder:Generating test split
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 657.14it/s]
Downloading (…)lve/main/config.json: 100% 481/481 [00:00<00:00, 88.4kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 21:47:06,896 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:47:06,897 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

[INFO|tokenization_auto.py:418] 2023-02-14 21:47:06,989 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:653] 2023-02-14 21:47:07,079 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:47:07,080 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

Downloading (…)olve/main/vocab.json: 100% 899k/899k [00:00<00:00, 9.35MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.91MB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 10.3MB/s]
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file vocab.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file merges.txt from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file tokenizer.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:47:08,305 >> loading file tokenizer_config.json from cache at None
[INFO|configuration_utils.py:653] 2023-02-14 21:47:08,306 >> loading configuration file config.json from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:47:08,306 >> Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.23.1",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

INFO:__main__:Using hidden states in model: False
INFO:__main__:Using implementation from class: RobertaForSequenceClassificationCustomAlternative
Downloading (…)"pytorch_model.bin";: 100% 501M/501M [00:04<00:00, 106MB/s]
[INFO|modeling_utils.py:2156] 2023-02-14 21:47:13,300 >> loading weights file pytorch_model.bin from cache at roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin
[WARNING|modeling_utils.py:2596] 2023-02-14 21:47:15,772 >> Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassificationCustomAlternative: ['roberta.pooler.dense.bias', 'lm_head.dense.weight', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaForSequenceClassificationCustomAlternative from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassificationCustomAlternative from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[WARNING|modeling_utils.py:2608] 2023-02-14 21:47:15,772 >> Some weights of RobertaForSequenceClassificationCustomAlternative were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense_1_input.weight', 'classifier.dense_2.weight', 'classifier.out_proj.bias', 'classifier.dense_2.bias', 'classifier.dense_1_input.bias', 'classifier.dense_1_hidden.weight', 'classifier.dense_1_hidden.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Frozen layers:
[('roberta.encoder.layer.0.attention.self.query.weight', False), ('roberta.encoder.layer.0.attention.self.query.bias', False), ('roberta.encoder.layer.0.attention.self.key.weight', False), ('roberta.encoder.layer.0.attention.self.key.bias', False), ('roberta.encoder.layer.0.attention.self.value.weight', False), ('roberta.encoder.layer.0.attention.self.value.bias', False), ('roberta.encoder.layer.0.attention.output.dense.weight', False), ('roberta.encoder.layer.0.attention.output.dense.bias', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.0.intermediate.dense.weight', False), ('roberta.encoder.layer.0.intermediate.dense.bias', False), ('roberta.encoder.layer.0.output.dense.weight', False), ('roberta.encoder.layer.0.output.dense.bias', False), ('roberta.encoder.layer.0.output.LayerNorm.weight', False), ('roberta.encoder.layer.0.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.attention.self.query.weight', False), ('roberta.encoder.layer.2.attention.self.query.bias', False), ('roberta.encoder.layer.2.attention.self.key.weight', False), ('roberta.encoder.layer.2.attention.self.key.bias', False), ('roberta.encoder.layer.2.attention.self.value.weight', False), ('roberta.encoder.layer.2.attention.self.value.bias', False), ('roberta.encoder.layer.2.attention.output.dense.weight', False), ('roberta.encoder.layer.2.attention.output.dense.bias', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.2.intermediate.dense.weight', False), ('roberta.encoder.layer.2.intermediate.dense.bias', False), ('roberta.encoder.layer.2.output.dense.weight', False), ('roberta.encoder.layer.2.output.dense.bias', False), ('roberta.encoder.layer.2.output.LayerNorm.weight', False), ('roberta.encoder.layer.2.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.attention.self.query.weight', False), ('roberta.encoder.layer.4.attention.self.query.bias', False), ('roberta.encoder.layer.4.attention.self.key.weight', False), ('roberta.encoder.layer.4.attention.self.key.bias', False), ('roberta.encoder.layer.4.attention.self.value.weight', False), ('roberta.encoder.layer.4.attention.self.value.bias', False), ('roberta.encoder.layer.4.attention.output.dense.weight', False), ('roberta.encoder.layer.4.attention.output.dense.bias', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.4.intermediate.dense.weight', False), ('roberta.encoder.layer.4.intermediate.dense.bias', False), ('roberta.encoder.layer.4.output.dense.weight', False), ('roberta.encoder.layer.4.output.dense.bias', False), ('roberta.encoder.layer.4.output.LayerNorm.weight', False), ('roberta.encoder.layer.4.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.attention.self.query.weight', False), ('roberta.encoder.layer.6.attention.self.query.bias', False), ('roberta.encoder.layer.6.attention.self.key.weight', False), ('roberta.encoder.layer.6.attention.self.key.bias', False), ('roberta.encoder.layer.6.attention.self.value.weight', False), ('roberta.encoder.layer.6.attention.self.value.bias', False), ('roberta.encoder.layer.6.attention.output.dense.weight', False), ('roberta.encoder.layer.6.attention.output.dense.bias', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.6.intermediate.dense.weight', False), ('roberta.encoder.layer.6.intermediate.dense.bias', False), ('roberta.encoder.layer.6.output.dense.weight', False), ('roberta.encoder.layer.6.output.dense.bias', False), ('roberta.encoder.layer.6.output.LayerNorm.weight', False), ('roberta.encoder.layer.6.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.attention.self.query.weight', False), ('roberta.encoder.layer.8.attention.self.query.bias', False), ('roberta.encoder.layer.8.attention.self.key.weight', False), ('roberta.encoder.layer.8.attention.self.key.bias', False), ('roberta.encoder.layer.8.attention.self.value.weight', False), ('roberta.encoder.layer.8.attention.self.value.bias', False), ('roberta.encoder.layer.8.attention.output.dense.weight', False), ('roberta.encoder.layer.8.attention.output.dense.bias', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.8.intermediate.dense.weight', False), ('roberta.encoder.layer.8.intermediate.dense.bias', False), ('roberta.encoder.layer.8.output.dense.weight', False), ('roberta.encoder.layer.8.output.dense.bias', False), ('roberta.encoder.layer.8.output.LayerNorm.weight', False), ('roberta.encoder.layer.8.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.attention.self.query.weight', False), ('roberta.encoder.layer.10.attention.self.query.bias', False), ('roberta.encoder.layer.10.attention.self.key.weight', False), ('roberta.encoder.layer.10.attention.self.key.bias', False), ('roberta.encoder.layer.10.attention.self.value.weight', False), ('roberta.encoder.layer.10.attention.self.value.bias', False), ('roberta.encoder.layer.10.attention.output.dense.weight', False), ('roberta.encoder.layer.10.attention.output.dense.bias', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.attention.output.LayerNorm.bias', False), ('roberta.encoder.layer.10.intermediate.dense.weight', False), ('roberta.encoder.layer.10.intermediate.dense.bias', False), ('roberta.encoder.layer.10.output.dense.weight', False), ('roberta.encoder.layer.10.output.dense.bias', False), ('roberta.encoder.layer.10.output.LayerNorm.weight', False), ('roberta.encoder.layer.10.output.LayerNorm.bias', False)] 


Running tokenizer on dataset:   0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow
Running tokenizer on dataset: 100% 16/16 [00:01<00:00, 15.42ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00,  7.47ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 19.76ba/s]
INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [0, 118, 109, 465, 92, 964, 939, 475, 164, 7, 860, 1823, 543, 7, 146, 106, 1095, 8, 114, 939, 2845, 14, 939, 218, 326, 236, 7, 619, 2581, 456, 8, 95, 3068, 66, 5, 94, 76, 9, 334, 15, 127, 308, 939, 475, 164, 7, 33, 7, 860, 1823, 543, 45, 7, 575, 99, 82, 206, 9, 162, 145, 10, 784, 9604, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [0, 118, 553, 106, 7, 1962, 162, 11, 2351, 10, 232, 147, 70, 76, 793, 1972, 115, 1733, 62, 2157, 7917, 8, 2247, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [0, 118, 619, 77, 47, 32, 10, 10837, 621, 47, 5696, 97, 10837, 82, 88, 110, 301, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
[INFO|trainer.py:725] 2023-02-14 21:47:19,642 >> The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`,  you can safely ignore this message.
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1607] 2023-02-14 21:47:19,649 >> ***** Running training *****
[INFO|trainer.py:1608] 2023-02-14 21:47:19,649 >>   Num examples = 16000
[INFO|trainer.py:1609] 2023-02-14 21:47:19,649 >>   Num Epochs = 1
[INFO|trainer.py:1610] 2023-02-14 21:47:19,649 >>   Instantaneous batch size per device = 24
[INFO|trainer.py:1611] 2023-02-14 21:47:19,649 >>   Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:1612] 2023-02-14 21:47:19,649 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1613] 2023-02-14 21:47:19,649 >>   Total optimization steps = 667
{'loss': 0.8955, 'learning_rate': 5.0074962518740634e-06, 'epoch': 0.75}
 75% 500/667 [00:58<00:19,  8.75it/s][INFO|trainer.py:2656] 2023-02-14 21:48:17,996 >> Saving model checkpoint to out/emotion/roberta_custom/checkpoint-500
[INFO|configuration_utils.py:447] 2023-02-14 21:48:17,997 >> Configuration saved in out/emotion/roberta_custom/checkpoint-500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:48:19,015 >> Model weights saved in out/emotion/roberta_custom/checkpoint-500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:48:19,016 >> tokenizer config file saved in out/emotion/roberta_custom/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:48:19,016 >> Special tokens file saved in out/emotion/roberta_custom/checkpoint-500/special_tokens_map.json
100% 666/667 [01:20<00:00,  8.66it/s][INFO|trainer.py:1852] 2023-02-14 21:48:40,745 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 81.0963, 'train_samples_per_second': 197.296, 'train_steps_per_second': 8.225, 'train_loss': 0.8004468377383573, 'epoch': 1.0}
100% 667/667 [01:21<00:00,  8.23it/s]
[INFO|trainer.py:2656] 2023-02-14 21:48:40,747 >> Saving model checkpoint to out/emotion/roberta_custom
[INFO|configuration_utils.py:447] 2023-02-14 21:48:40,748 >> Configuration saved in out/emotion/roberta_custom/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:48:41,796 >> Model weights saved in out/emotion/roberta_custom/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:48:41,797 >> tokenizer config file saved in out/emotion/roberta_custom/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:48:41,797 >> Special tokens file saved in out/emotion/roberta_custom/special_tokens_map.json
***** train metrics *****
  epoch                    =        1.0
  train_loss               =     0.8004
  train_runtime            = 0:01:21.09
  train_samples            =      16000
  train_samples_per_second =    197.296
  train_steps_per_second   =      8.225
INFO:__main__:*** Evaluate ***
[INFO|trainer.py:725] 2023-02-14 21:48:41,898 >> The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:48:41,899 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:48:41,900 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:48:41,900 >>   Batch size = 24
100% 84/84 [00:03<00:00, 23.62it/s]
***** eval metrics *****
  epoch                   =        1.0
  eval_accuracy           =      0.867
  eval_loss               =       0.39
  eval_runtime            = 0:00:03.59
  eval_samples            =       2000
  eval_samples_per_second =    555.583
  eval_steps_per_second   =     23.334
INFO:__main__:*** Predict ***
[INFO|trainer.py:725] 2023-02-14 21:48:45,503 >> The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassificationCustomAlternative.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassificationCustomAlternative.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:48:45,504 >> ***** Running Prediction *****
[INFO|trainer.py:2909] 2023-02-14 21:48:45,504 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:48:45,504 >>   Batch size = 24
100% 84/84 [00:03<00:00, 23.74it/s]
INFO:__main__:***** Predict results None *****
[INFO|modelcard.py:444] 2023-02-14 21:48:49,211 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.8669999837875366}]}

GPT2

  • full data
  • model GPT2
  • sequnece length: 128
  • training epoch: 1
!python run_glue.py \
  --cache_dir gtp_cache_training \
  --model_name_or_path gpt2 \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json  \
  --per_device_train_batch_size 24  \
  --per_device_eval_batch_size 24 \
  --do_train  \
  --do_eval \
  --do_predict  \
  --max_seq_length 128  \
  --learning_rate 2e-5  \
  --num_train_epochs 1  \
  --output_dir out/emotion/gpt2  \
  --overwrite_output_dir \
  --eval_steps 250 \
  --evaluation_strategy steps \
  --metric_for_best_model accuracy \
  --logging_steps 100 \
  --save_total_limit 5 \
  --max_steps 2500 \
  --load_best_model_at_end True 
2023-02-14 21:48:52.605236: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 21:48:52.757779: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-14 21:48:53.540701: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:48:53.540799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:48:53.540819: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
INFO:__main__:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=250,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=100,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=2500,
metric_for_best_model=accuracy,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=out/emotion/gpt2,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=24,
per_device_train_batch_size=24,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=out/emotion/gpt2,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
INFO:__main__:load a local file for train: data/train.json
INFO:__main__:load a local file for validation: data/valid.json
INFO:__main__:load a local file for test: data/test.json
WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json
INFO:datasets.builder:Generating dataset json (/content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Downloading and preparing dataset json/default to /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100% 3/3 [00:00<00:00, 12169.16it/s]
INFO:datasets.download.download_manager:Downloading took 0.0 min
INFO:datasets.download.download_manager:Checksum Computation took 0.0 min
Extracting data files: 100% 3/3 [00:00<00:00, 2183.40it/s]
INFO:datasets.utils.info_utils:Unable to verify checksums.
INFO:datasets.builder:Generating train split
INFO:datasets.builder:Generating validation split
INFO:datasets.builder:Generating test split
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 665.62it/s]
Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 125kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 21:48:57,052 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:48:57,053 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|tokenization_auto.py:418] 2023-02-14 21:48:57,145 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:653] 2023-02-14 21:48:57,236 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:48:57,237 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:00<00:00, 9.20MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 6.19MB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 11.7MB/s]
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file vocab.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file merges.txt from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file tokenizer.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:48:58,447 >> loading file tokenizer_config.json from cache at None
[INFO|configuration_utils.py:653] 2023-02-14 21:48:58,447 >> loading configuration file config.json from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:48:58,448 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

INFO:__main__:Using implementation from class: AutoModelForSequenceClassification
Downloading (…)"pytorch_model.bin";: 100% 548M/548M [00:05<00:00, 108MB/s]
[INFO|modeling_utils.py:2156] 2023-02-14 21:49:03,784 >> loading weights file pytorch_model.bin from cache at gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin
[INFO|modeling_utils.py:2606] 2023-02-14 21:49:05,169 >> All model checkpoint weights were used when initializing GPT2ForSequenceClassification.

[WARNING|modeling_utils.py:2608] 2023-02-14 21:49:05,169 >> Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[ERROR|tokenization_utils_base.py:1019] 2023-02-14 21:49:05,177 >> Using pad_token, but it is not set yet.
INFO:__main__:Set PAD token to EOS: <|endoftext|>
Running tokenizer on dataset:   0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow
Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 20.23ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.04ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.92ba/s]
INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [72, 466, 1064, 649, 2460, 1312, 285, 1016, 284, 1949, 3131, 1327, 284, 787, 606, 2652, 290, 611, 1312, 5409, 326, 1312, 836, 256, 765, 284, 1254, 5938, 757, 290, 655, 6594, 503, 262, 938, 614, 286, 1524, 319, 616, 898, 1312, 285, 1016, 284, 423, 284, 1949, 3131, 1327, 407, 284, 1337, 644, 661, 892, 286, 502, 852, 257, 300, 14491, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [72, 1965, 606, 284, 4654, 502, 287, 4441, 257, 995, 810, 477, 614, 1468, 4813, 714, 1663, 510, 4203, 17836, 290, 3665, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [72, 1254, 618, 345, 389, 257, 18088, 1048, 345, 4729, 584, 18088, 661, 656, 534, 1204, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
[INFO|trainer.py:503] 2023-02-14 21:49:08,712 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:725] 2023-02-14 21:49:08,712 >> The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1607] 2023-02-14 21:49:08,718 >> ***** Running training *****
[INFO|trainer.py:1608] 2023-02-14 21:49:08,718 >>   Num examples = 16000
[INFO|trainer.py:1609] 2023-02-14 21:49:08,718 >>   Num Epochs = 4
[INFO|trainer.py:1610] 2023-02-14 21:49:08,719 >>   Instantaneous batch size per device = 24
[INFO|trainer.py:1611] 2023-02-14 21:49:08,719 >>   Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:1612] 2023-02-14 21:49:08,719 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1613] 2023-02-14 21:49:08,719 >>   Total optimization steps = 2500
{'loss': 2.3442, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.15}
{'loss': 1.3126, 'learning_rate': 1.8400000000000003e-05, 'epoch': 0.3}
 10% 250/2500 [00:37<05:31,  6.79it/s][INFO|trainer.py:725] 2023-02-14 21:49:46,426 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:49:46,428 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:49:46,428 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:49:46,428 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  4% 3/84 [00:00<00:02, 29.40it/s]
  7% 6/84 [00:00<00:03, 23.74it/s]
 11% 9/84 [00:00<00:03, 22.40it/s]
 14% 12/84 [00:00<00:03, 21.78it/s]
 18% 15/84 [00:00<00:03, 21.50it/s]
 21% 18/84 [00:00<00:03, 21.30it/s]
 25% 21/84 [00:00<00:02, 21.20it/s]
 29% 24/84 [00:01<00:02, 20.97it/s]
 32% 27/84 [00:01<00:02, 20.93it/s]
 36% 30/84 [00:01<00:02, 20.97it/s]
 39% 33/84 [00:01<00:02, 21.00it/s]
 43% 36/84 [00:01<00:02, 21.01it/s]
 46% 39/84 [00:01<00:02, 21.03it/s]
 50% 42/84 [00:01<00:01, 21.03it/s]
 54% 45/84 [00:02<00:01, 21.02it/s]
 57% 48/84 [00:02<00:01, 21.01it/s]
 61% 51/84 [00:02<00:01, 21.01it/s]
 64% 54/84 [00:02<00:01, 21.01it/s]
 68% 57/84 [00:02<00:01, 21.00it/s]
 71% 60/84 [00:02<00:01, 21.00it/s]
 75% 63/84 [00:02<00:00, 21.00it/s]
 79% 66/84 [00:03<00:00, 20.99it/s]
 82% 69/84 [00:03<00:00, 20.94it/s]
 86% 72/84 [00:03<00:00, 20.95it/s]
 89% 75/84 [00:03<00:00, 20.98it/s]
 93% 78/84 [00:03<00:00, 21.00it/s]
 96% 81/84 [00:03<00:00, 21.00it/s]
100% 84/84 [00:03<00:00, 22.24it/s]
{'eval_loss': 0.7983964085578918, 'eval_accuracy': 0.7465000152587891, 'eval_runtime': 3.9877, 'eval_samples_per_second': 501.548, 'eval_steps_per_second': 21.065, 'epoch': 0.37}

 10% 250/2500 [00:41<05:31,  6.79it/s]
{'loss': 0.7216, 'learning_rate': 1.76e-05, 'epoch': 0.45}
{'loss': 0.5032, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.6}
{'loss': 0.3904, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.75}
 20% 500/2500 [01:18<04:56,  6.74it/s][INFO|trainer.py:725] 2023-02-14 21:50:27,312 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:50:27,314 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:50:27,314 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:50:27,314 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.77it/s]
  8% 7/84 [00:00<00:03, 23.71it/s]
 12% 10/84 [00:00<00:03, 22.34it/s]
 15% 13/84 [00:00<00:03, 21.72it/s]
 19% 16/84 [00:00<00:03, 21.40it/s]
 23% 19/84 [00:00<00:03, 21.09it/s]
 26% 22/84 [00:01<00:02, 21.01it/s]
 30% 25/84 [00:01<00:02, 20.95it/s]
 33% 28/84 [00:01<00:02, 20.92it/s]
 37% 31/84 [00:01<00:02, 20.87it/s]
 40% 34/84 [00:01<00:02, 20.91it/s]
 44% 37/84 [00:01<00:02, 20.95it/s]
 48% 40/84 [00:01<00:02, 20.91it/s]
 51% 43/84 [00:02<00:01, 20.96it/s]
 55% 46/84 [00:02<00:01, 20.82it/s]
 58% 49/84 [00:02<00:01, 20.87it/s]
 62% 52/84 [00:02<00:01, 20.90it/s]
 65% 55/84 [00:02<00:01, 20.94it/s]
 69% 58/84 [00:02<00:01, 20.97it/s]
 73% 61/84 [00:02<00:01, 21.01it/s]
 76% 64/84 [00:03<00:00, 21.01it/s]
 80% 67/84 [00:03<00:00, 21.01it/s]
 83% 70/84 [00:03<00:00, 21.03it/s]
 87% 73/84 [00:03<00:00, 21.02it/s]
 90% 76/84 [00:03<00:00, 21.00it/s]
 94% 79/84 [00:03<00:00, 21.02it/s]
 98% 82/84 [00:03<00:00, 21.00it/s]
{'eval_loss': 0.29131895303726196, 'eval_accuracy': 0.9035000205039978, 'eval_runtime': 3.9922, 'eval_samples_per_second': 500.974, 'eval_steps_per_second': 21.041, 'epoch': 0.75}

 20% 500/2500 [01:22<04:56,  6.74it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:50:31,307 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-500
[INFO|configuration_utils.py:447] 2023-02-14 21:50:31,308 >> Configuration saved in out/emotion/gpt2/checkpoint-500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:50:32,356 >> Model weights saved in out/emotion/gpt2/checkpoint-500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:50:32,357 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:50:32,357 >> Special tokens file saved in out/emotion/gpt2/checkpoint-500/special_tokens_map.json
{'loss': 0.3554, 'learning_rate': 1.5200000000000002e-05, 'epoch': 0.9}
{'loss': 0.2871, 'learning_rate': 1.4400000000000001e-05, 'epoch': 1.05}
 30% 750/2500 [02:02<04:19,  6.74it/s][INFO|trainer.py:725] 2023-02-14 21:51:11,104 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:51:11,106 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:51:11,106 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:51:11,106 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.92it/s]
  8% 7/84 [00:00<00:03, 23.90it/s]
 12% 10/84 [00:00<00:03, 22.57it/s]
 15% 13/84 [00:00<00:03, 21.98it/s]
 19% 16/84 [00:00<00:03, 21.63it/s]
 23% 19/84 [00:00<00:03, 21.40it/s]
 26% 22/84 [00:00<00:02, 21.31it/s]
 30% 25/84 [00:01<00:02, 21.22it/s]
 33% 28/84 [00:01<00:02, 21.17it/s]
 37% 31/84 [00:01<00:02, 21.12it/s]
 40% 34/84 [00:01<00:02, 21.03it/s]
 44% 37/84 [00:01<00:02, 21.03it/s]
 48% 40/84 [00:01<00:02, 21.02it/s]
 51% 43/84 [00:01<00:01, 21.04it/s]
 55% 46/84 [00:02<00:01, 21.04it/s]
 58% 49/84 [00:02<00:01, 21.07it/s]
 62% 52/84 [00:02<00:01, 21.07it/s]
 65% 55/84 [00:02<00:01, 21.00it/s]
 69% 58/84 [00:02<00:01, 21.03it/s]
 73% 61/84 [00:02<00:01, 21.03it/s]
 76% 64/84 [00:02<00:00, 21.04it/s]
 80% 67/84 [00:03<00:00, 21.04it/s]
 83% 70/84 [00:03<00:00, 21.06it/s]
 87% 73/84 [00:03<00:00, 21.04it/s]
 90% 76/84 [00:03<00:00, 21.04it/s]
 94% 79/84 [00:03<00:00, 21.05it/s]
 98% 82/84 [00:03<00:00, 21.06it/s]
{'eval_loss': 0.2168988287448883, 'eval_accuracy': 0.9235000014305115, 'eval_runtime': 3.9688, 'eval_samples_per_second': 503.925, 'eval_steps_per_second': 21.165, 'epoch': 1.12}

 30% 750/2500 [02:06<04:19,  6.74it/s]
{'loss': 0.2285, 'learning_rate': 1.3600000000000002e-05, 'epoch': 1.2}
{'loss': 0.1888, 'learning_rate': 1.2800000000000001e-05, 'epoch': 1.35}
{'loss': 0.2106, 'learning_rate': 1.2e-05, 'epoch': 1.5}
 40% 1000/2500 [02:43<03:41,  6.78it/s][INFO|trainer.py:725] 2023-02-14 21:51:51,748 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:51:51,749 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:51:51,750 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:51:51,750 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 28.08it/s]
  8% 7/84 [00:00<00:03, 23.96it/s]
 12% 10/84 [00:00<00:03, 22.63it/s]
 15% 13/84 [00:00<00:03, 21.99it/s]
 19% 16/84 [00:00<00:03, 21.68it/s]
 23% 19/84 [00:00<00:03, 21.48it/s]
 26% 22/84 [00:00<00:02, 21.32it/s]
 30% 25/84 [00:01<00:02, 21.23it/s]
 33% 28/84 [00:01<00:02, 21.15it/s]
 37% 31/84 [00:01<00:02, 21.10it/s]
 40% 34/84 [00:01<00:02, 21.08it/s]
 44% 37/84 [00:01<00:02, 21.08it/s]
 48% 40/84 [00:01<00:02, 21.07it/s]
 51% 43/84 [00:01<00:01, 21.05it/s]
 55% 46/84 [00:02<00:01, 21.05it/s]
 58% 49/84 [00:02<00:01, 21.04it/s]
 62% 52/84 [00:02<00:01, 21.02it/s]
 65% 55/84 [00:02<00:01, 21.03it/s]
 69% 58/84 [00:02<00:01, 21.04it/s]
 73% 61/84 [00:02<00:01, 21.04it/s]
 76% 64/84 [00:02<00:00, 21.03it/s]
 80% 67/84 [00:03<00:00, 21.05it/s]
 83% 70/84 [00:03<00:00, 21.06it/s]
 87% 73/84 [00:03<00:00, 21.07it/s]
 90% 76/84 [00:03<00:00, 21.06it/s]
 94% 79/84 [00:03<00:00, 21.07it/s]
 98% 82/84 [00:03<00:00, 21.08it/s]
{'eval_loss': 0.19490236043930054, 'eval_accuracy': 0.9259999990463257, 'eval_runtime': 3.9658, 'eval_samples_per_second': 504.311, 'eval_steps_per_second': 21.181, 'epoch': 1.5}

 40% 1000/2500 [02:46<03:41,  6.78it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:51:55,716 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-1000
[INFO|configuration_utils.py:447] 2023-02-14 21:51:55,717 >> Configuration saved in out/emotion/gpt2/checkpoint-1000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:51:56,708 >> Model weights saved in out/emotion/gpt2/checkpoint-1000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:51:56,709 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-1000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:51:56,709 >> Special tokens file saved in out/emotion/gpt2/checkpoint-1000/special_tokens_map.json
{'loss': 0.1906, 'learning_rate': 1.1200000000000001e-05, 'epoch': 1.65}
{'loss': 0.1793, 'learning_rate': 1.04e-05, 'epoch': 1.8}
 50% 1250/2500 [03:26<03:04,  6.76it/s][INFO|trainer.py:725] 2023-02-14 21:52:35,220 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:52:35,222 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:52:35,222 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:52:35,222 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.99it/s]
  8% 7/84 [00:00<00:03, 23.91it/s]
 12% 10/84 [00:00<00:03, 22.61it/s]
 15% 13/84 [00:00<00:03, 22.00it/s]
 19% 16/84 [00:00<00:03, 21.66it/s]
 23% 19/84 [00:00<00:03, 21.45it/s]
 26% 22/84 [00:00<00:02, 21.34it/s]
 30% 25/84 [00:01<00:02, 21.26it/s]
 33% 28/84 [00:01<00:02, 21.21it/s]
 37% 31/84 [00:01<00:02, 21.17it/s]
 40% 34/84 [00:01<00:02, 21.14it/s]
 44% 37/84 [00:01<00:02, 21.11it/s]
 48% 40/84 [00:01<00:02, 21.12it/s]
 51% 43/84 [00:01<00:01, 21.11it/s]
 55% 46/84 [00:02<00:01, 21.10it/s]
 58% 49/84 [00:02<00:01, 21.09it/s]
 62% 52/84 [00:02<00:01, 21.10it/s]
 65% 55/84 [00:02<00:01, 21.09it/s]
 69% 58/84 [00:02<00:01, 21.06it/s]
 73% 61/84 [00:02<00:01, 21.08it/s]
 76% 64/84 [00:02<00:00, 21.09it/s]
 80% 67/84 [00:03<00:00, 21.09it/s]
 83% 70/84 [00:03<00:00, 21.04it/s]
 87% 73/84 [00:03<00:00, 21.06it/s]
 90% 76/84 [00:03<00:00, 21.08it/s]
 94% 79/84 [00:03<00:00, 21.07it/s]
 98% 82/84 [00:03<00:00, 21.08it/s]
{'eval_loss': 0.1607103943824768, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 3.9612, 'eval_samples_per_second': 504.895, 'eval_steps_per_second': 21.206, 'epoch': 1.87}

 50% 1250/2500 [03:30<03:04,  6.76it/s]
{'loss': 0.2116, 'learning_rate': 9.600000000000001e-06, 'epoch': 1.95}
{'loss': 0.1536, 'learning_rate': 8.8e-06, 'epoch': 2.1}
{'loss': 0.1518, 'learning_rate': 8.000000000000001e-06, 'epoch': 2.25}
 60% 1500/2500 [04:07<02:26,  6.82it/s][INFO|trainer.py:725] 2023-02-14 21:53:15,831 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:53:15,833 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:53:15,833 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:53:15,833 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 28.10it/s]
  8% 7/84 [00:00<00:03, 23.90it/s]
 12% 10/84 [00:00<00:03, 22.58it/s]
 15% 13/84 [00:00<00:03, 21.85it/s]
 19% 16/84 [00:00<00:03, 21.53it/s]
 23% 19/84 [00:00<00:03, 21.37it/s]
 26% 22/84 [00:01<00:02, 21.27it/s]
 30% 25/84 [00:01<00:02, 21.19it/s]
 33% 28/84 [00:01<00:02, 21.13it/s]
 37% 31/84 [00:01<00:02, 21.11it/s]
 40% 34/84 [00:01<00:02, 21.04it/s]
 44% 37/84 [00:01<00:02, 20.94it/s]
 48% 40/84 [00:01<00:02, 20.94it/s]
 51% 43/84 [00:02<00:01, 20.94it/s]
 55% 46/84 [00:02<00:01, 20.97it/s]
 58% 49/84 [00:02<00:01, 20.97it/s]
 62% 52/84 [00:02<00:01, 20.98it/s]
 65% 55/84 [00:02<00:01, 20.93it/s]
 69% 58/84 [00:02<00:01, 20.94it/s]
 73% 61/84 [00:02<00:01, 20.98it/s]
 76% 64/84 [00:03<00:00, 20.97it/s]
 80% 67/84 [00:03<00:00, 20.99it/s]
 83% 70/84 [00:03<00:00, 21.02it/s]
 87% 73/84 [00:03<00:00, 21.05it/s]
 90% 76/84 [00:03<00:00, 21.04it/s]
 94% 79/84 [00:03<00:00, 21.05it/s]
 98% 82/84 [00:03<00:00, 21.04it/s]
{'eval_loss': 0.160899356007576, 'eval_accuracy': 0.9330000281333923, 'eval_runtime': 3.9773, 'eval_samples_per_second': 502.855, 'eval_steps_per_second': 21.12, 'epoch': 2.25}

 60% 1500/2500 [04:11<02:26,  6.82it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:53:19,811 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-1500
[INFO|configuration_utils.py:447] 2023-02-14 21:53:19,812 >> Configuration saved in out/emotion/gpt2/checkpoint-1500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:53:21,455 >> Model weights saved in out/emotion/gpt2/checkpoint-1500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:53:21,456 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-1500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:53:21,456 >> Special tokens file saved in out/emotion/gpt2/checkpoint-1500/special_tokens_map.json
{'loss': 0.157, 'learning_rate': 7.2000000000000005e-06, 'epoch': 2.4}
{'loss': 0.141, 'learning_rate': 6.4000000000000006e-06, 'epoch': 2.55}
 70% 1750/2500 [04:51<01:50,  6.80it/s][INFO|trainer.py:725] 2023-02-14 21:54:00,007 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:54:00,009 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:54:00,009 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:54:00,009 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.89it/s]
  8% 7/84 [00:00<00:03, 23.82it/s]
 12% 10/84 [00:00<00:03, 22.49it/s]
 15% 13/84 [00:00<00:03, 21.85it/s]
 19% 16/84 [00:00<00:03, 21.48it/s]
 23% 19/84 [00:00<00:03, 21.31it/s]
 26% 22/84 [00:01<00:02, 21.20it/s]
 30% 25/84 [00:01<00:02, 21.09it/s]
 33% 28/84 [00:01<00:02, 21.00it/s]
 37% 31/84 [00:01<00:02, 20.99it/s]
 40% 34/84 [00:01<00:02, 21.00it/s]
 44% 37/84 [00:01<00:02, 20.98it/s]
 48% 40/84 [00:01<00:02, 20.98it/s]
 51% 43/84 [00:02<00:01, 21.01it/s]
 55% 46/84 [00:02<00:01, 21.02it/s]
 58% 49/84 [00:02<00:01, 21.02it/s]
 62% 52/84 [00:02<00:01, 21.01it/s]
 65% 55/84 [00:02<00:01, 21.00it/s]
 69% 58/84 [00:02<00:01, 21.02it/s]
 73% 61/84 [00:02<00:01, 21.01it/s]
 76% 64/84 [00:03<00:00, 21.03it/s]
 80% 67/84 [00:03<00:00, 21.05it/s]
 83% 70/84 [00:03<00:00, 21.05it/s]
 87% 73/84 [00:03<00:00, 21.07it/s]
 90% 76/84 [00:03<00:00, 21.07it/s]
 94% 79/84 [00:03<00:00, 21.06it/s]
 98% 82/84 [00:03<00:00, 21.07it/s]
{'eval_loss': 0.15204769372940063, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 3.9769, 'eval_samples_per_second': 502.901, 'eval_steps_per_second': 21.122, 'epoch': 2.62}

 70% 1750/2500 [04:55<01:50,  6.80it/s]
{'loss': 0.1426, 'learning_rate': 5.600000000000001e-06, 'epoch': 2.7}
{'loss': 0.1463, 'learning_rate': 4.800000000000001e-06, 'epoch': 2.85}
{'loss': 0.1403, 'learning_rate': 4.000000000000001e-06, 'epoch': 3.0}
 80% 2000/2500 [05:31<01:13,  6.82it/s][INFO|trainer.py:725] 2023-02-14 21:54:40,633 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:54:40,635 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:54:40,635 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:54:40,635 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.95it/s]
  8% 7/84 [00:00<00:03, 23.86it/s]
 12% 10/84 [00:00<00:03, 22.54it/s]
 15% 13/84 [00:00<00:03, 21.95it/s]
 19% 16/84 [00:00<00:03, 21.60it/s]
 23% 19/84 [00:00<00:03, 21.42it/s]
 26% 22/84 [00:00<00:02, 21.29it/s]
 30% 25/84 [00:01<00:02, 21.14it/s]
 33% 28/84 [00:01<00:02, 21.10it/s]
 37% 31/84 [00:01<00:02, 21.07it/s]
 40% 34/84 [00:01<00:02, 21.08it/s]
 44% 37/84 [00:01<00:02, 21.05it/s]
 48% 40/84 [00:01<00:02, 21.06it/s]
 51% 43/84 [00:01<00:01, 21.04it/s]
 55% 46/84 [00:02<00:01, 21.00it/s]
 58% 49/84 [00:02<00:01, 21.00it/s]
 62% 52/84 [00:02<00:01, 21.00it/s]
 65% 55/84 [00:02<00:01, 20.96it/s]
 69% 58/84 [00:02<00:01, 20.97it/s]
 73% 61/84 [00:02<00:01, 20.96it/s]
 76% 64/84 [00:03<00:00, 20.97it/s]
 80% 67/84 [00:03<00:00, 20.94it/s]
 83% 70/84 [00:03<00:00, 20.95it/s]
 87% 73/84 [00:03<00:00, 20.95it/s]
 90% 76/84 [00:03<00:00, 21.00it/s]
 94% 79/84 [00:03<00:00, 21.00it/s]
 98% 82/84 [00:03<00:00, 21.02it/s]
{'eval_loss': 0.14609387516975403, 'eval_accuracy': 0.9290000200271606, 'eval_runtime': 3.9774, 'eval_samples_per_second': 502.846, 'eval_steps_per_second': 21.12, 'epoch': 3.0}

 80% 2000/2500 [05:35<01:13,  6.82it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:54:44,614 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-2000
[INFO|configuration_utils.py:447] 2023-02-14 21:54:44,615 >> Configuration saved in out/emotion/gpt2/checkpoint-2000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:54:46,838 >> Model weights saved in out/emotion/gpt2/checkpoint-2000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:54:46,839 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-2000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:54:46,839 >> Special tokens file saved in out/emotion/gpt2/checkpoint-2000/special_tokens_map.json
{'loss': 0.1256, 'learning_rate': 3.2000000000000003e-06, 'epoch': 3.15}
{'loss': 0.1246, 'learning_rate': 2.4000000000000003e-06, 'epoch': 3.3}
 90% 2250/2500 [06:16<00:36,  6.76it/s][INFO|trainer.py:725] 2023-02-14 21:55:25,309 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:55:25,311 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:55:25,311 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:55:25,311 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.89it/s]
  8% 7/84 [00:00<00:03, 23.86it/s]
 12% 10/84 [00:00<00:03, 22.52it/s]
 15% 13/84 [00:00<00:03, 21.87it/s]
 19% 16/84 [00:00<00:03, 21.57it/s]
 23% 19/84 [00:00<00:03, 21.40it/s]
 26% 22/84 [00:01<00:02, 21.29it/s]
 30% 25/84 [00:01<00:02, 21.22it/s]
 33% 28/84 [00:01<00:02, 21.18it/s]
 37% 31/84 [00:01<00:02, 21.15it/s]
 40% 34/84 [00:01<00:02, 21.14it/s]
 44% 37/84 [00:01<00:02, 21.12it/s]
 48% 40/84 [00:01<00:02, 21.10it/s]
 51% 43/84 [00:01<00:01, 21.09it/s]
 55% 46/84 [00:02<00:01, 21.09it/s]
 58% 49/84 [00:02<00:01, 21.10it/s]
 62% 52/84 [00:02<00:01, 21.10it/s]
 65% 55/84 [00:02<00:01, 21.10it/s]
 69% 58/84 [00:02<00:01, 21.10it/s]
 73% 61/84 [00:02<00:01, 21.06it/s]
 76% 64/84 [00:02<00:00, 21.06it/s]
 80% 67/84 [00:03<00:00, 21.07it/s]
 83% 70/84 [00:03<00:00, 21.07it/s]
 87% 73/84 [00:03<00:00, 21.06it/s]
 90% 76/84 [00:03<00:00, 21.00it/s]
 94% 79/84 [00:03<00:00, 21.02it/s]
 98% 82/84 [00:03<00:00, 21.01it/s]
{'eval_loss': 0.15553689002990723, 'eval_accuracy': 0.9294999837875366, 'eval_runtime': 3.967, 'eval_samples_per_second': 504.158, 'eval_steps_per_second': 21.175, 'epoch': 3.37}

 90% 2250/2500 [06:20<00:36,  6.76it/s]
{'loss': 0.1174, 'learning_rate': 1.6000000000000001e-06, 'epoch': 3.45}
{'loss': 0.1374, 'learning_rate': 8.000000000000001e-07, 'epoch': 3.6}
{'loss': 0.1207, 'learning_rate': 0.0, 'epoch': 3.75}
100% 2500/2500 [06:57<00:00,  6.82it/s][INFO|trainer.py:725] 2023-02-14 21:56:05,969 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:56:05,971 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:56:05,971 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:56:05,971 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.94it/s]
  8% 7/84 [00:00<00:03, 23.89it/s]
 12% 10/84 [00:00<00:03, 22.60it/s]
 15% 13/84 [00:00<00:03, 21.97it/s]
 19% 16/84 [00:00<00:03, 21.57it/s]
 23% 19/84 [00:00<00:03, 21.34it/s]
 26% 22/84 [00:01<00:02, 21.23it/s]
 30% 25/84 [00:01<00:02, 21.12it/s]
 33% 28/84 [00:01<00:02, 21.09it/s]
 37% 31/84 [00:01<00:02, 21.09it/s]
 40% 34/84 [00:01<00:02, 21.07it/s]
 44% 37/84 [00:01<00:02, 21.06it/s]
 48% 40/84 [00:01<00:02, 21.01it/s]
 51% 43/84 [00:02<00:01, 21.03it/s]
 55% 46/84 [00:02<00:01, 21.02it/s]
 58% 49/84 [00:02<00:01, 20.97it/s]
 62% 52/84 [00:02<00:01, 20.45it/s]
 65% 55/84 [00:02<00:01, 20.64it/s]
 69% 58/84 [00:02<00:01, 20.77it/s]
 73% 61/84 [00:02<00:01, 20.84it/s]
 76% 64/84 [00:03<00:00, 20.92it/s]
 80% 67/84 [00:03<00:00, 20.97it/s]
 83% 70/84 [00:03<00:00, 20.99it/s]
 87% 73/84 [00:03<00:00, 21.02it/s]
 90% 76/84 [00:03<00:00, 21.03it/s]
 94% 79/84 [00:03<00:00, 21.04it/s]
 98% 82/84 [00:03<00:00, 21.05it/s]
{'eval_loss': 0.15162073075771332, 'eval_accuracy': 0.9309999942779541, 'eval_runtime': 3.9841, 'eval_samples_per_second': 501.992, 'eval_steps_per_second': 21.084, 'epoch': 3.75}

100% 2500/2500 [07:01<00:00,  6.82it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:56:09,956 >> Saving model checkpoint to out/emotion/gpt2/checkpoint-2500
[INFO|configuration_utils.py:447] 2023-02-14 21:56:09,957 >> Configuration saved in out/emotion/gpt2/checkpoint-2500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:56:10,953 >> Model weights saved in out/emotion/gpt2/checkpoint-2500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:56:10,954 >> tokenizer config file saved in out/emotion/gpt2/checkpoint-2500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:56:10,954 >> Special tokens file saved in out/emotion/gpt2/checkpoint-2500/special_tokens_map.json
[INFO|trainer.py:1852] 2023-02-14 21:56:12,777 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1946] 2023-02-14 21:56:12,778 >> Loading best model from out/emotion/gpt2/checkpoint-1500 (score: 0.9330000281333923).
{'train_runtime': 424.4983, 'train_samples_per_second': 141.343, 'train_steps_per_second': 5.889, 'train_loss': 0.351297896194458, 'epoch': 3.75}
100% 2500/2500 [07:04<00:00,  5.89it/s]
[INFO|trainer.py:2656] 2023-02-14 21:56:13,218 >> Saving model checkpoint to out/emotion/gpt2
[INFO|configuration_utils.py:447] 2023-02-14 21:56:13,220 >> Configuration saved in out/emotion/gpt2/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:56:14,063 >> Model weights saved in out/emotion/gpt2/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:56:14,064 >> tokenizer config file saved in out/emotion/gpt2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:56:14,064 >> Special tokens file saved in out/emotion/gpt2/special_tokens_map.json
***** train metrics *****
  epoch                    =       3.75
  train_loss               =     0.3513
  train_runtime            = 0:07:04.49
  train_samples            =      16000
  train_samples_per_second =    141.343
  train_steps_per_second   =      5.889
INFO:__main__:*** Evaluate ***
[INFO|trainer.py:725] 2023-02-14 21:56:14,169 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:56:14,170 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:56:14,170 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:56:14,170 >>   Batch size = 24
100% 84/84 [00:03<00:00, 21.20it/s]
***** eval metrics *****
  epoch                   =       3.75
  eval_accuracy           =      0.933
  eval_loss               =     0.1609
  eval_runtime            = 0:00:04.02
  eval_samples            =       2000
  eval_samples_per_second =    497.496
  eval_steps_per_second   =     20.895
INFO:__main__:*** Predict ***
[INFO|trainer.py:725] 2023-02-14 21:56:18,194 >> The following columns in the test set don't have a corresponding argument in `GPT2ForSequenceClassification.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:56:18,195 >> ***** Running Prediction *****
[INFO|trainer.py:2909] 2023-02-14 21:56:18,195 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:56:18,195 >>   Batch size = 24
100% 84/84 [00:03<00:00, 21.40it/s]
INFO:__main__:***** Predict results None *****
[INFO|modelcard.py:444] 2023-02-14 21:56:22,304 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9330000281333923}]}
  • full dataset
  • custom head
!python run_glue.py \
  --cache_dir gtp_custom_cache_training \
  --model_name_or_path gpt2 \
  --custom_model gpt2_custom  \
  --train_file data/train.json  \
  --validation_file data/valid.json \
  --test_file data/test.json  \
  --per_device_train_batch_size 24  \
  --per_device_eval_batch_size 24 \
  --do_train  \
  --do_eval \
  --do_predict  \
  --max_seq_length 128  \
  --learning_rate 2e-5  \
  --num_train_epochs 1  \
  --output_dir out/emotion/gpt2_custom  \
  --overwrite_output_dir \
  --eval_steps 250 \
  --evaluation_strategy steps \
  --metric_for_best_model accuracy \
  --logging_steps 100 \
  --save_total_limit 5 \
  --max_steps 2500 \
  --load_best_model_at_end True 
2023-02-14 21:56:25.884599: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 21:56:26.040127: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-14 21:56:26.823479: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:56:26.823595: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 21:56:26.823615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
INFO:__main__:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=250,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=100,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=2500,
metric_for_best_model=accuracy,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=out/emotion/gpt2_custom,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=24,
per_device_train_batch_size=24,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=out/emotion/gpt2_custom,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
INFO:__main__:load a local file for train: data/train.json
INFO:__main__:load a local file for validation: data/valid.json
INFO:__main__:load a local file for test: data/test.json
WARNING:datasets.builder:Using custom data configuration default-01aa9d8252a24a0d
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json
INFO:datasets.builder:Generating dataset json (/content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Downloading and preparing dataset json/default to /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100% 3/3 [00:00<00:00, 14138.10it/s]
INFO:datasets.download.download_manager:Downloading took 0.0 min
INFO:datasets.download.download_manager:Checksum Computation took 0.0 min
Extracting data files: 100% 3/3 [00:00<00:00, 2175.09it/s]
INFO:datasets.utils.info_utils:Unable to verify checksums.
INFO:datasets.builder:Generating train split
INFO:datasets.builder:Generating validation split
INFO:datasets.builder:Generating test split
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 672.49it/s]
Downloading (…)lve/main/config.json: 100% 665/665 [00:00<00:00, 123kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 21:56:30,068 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:56:30,068 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3",
    "4": "LABEL_4",
    "5": "LABEL_5"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3,
    "LABEL_4": 4,
    "LABEL_5": 5
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|tokenization_auto.py:418] 2023-02-14 21:56:30,162 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
[INFO|configuration_utils.py:653] 2023-02-14 21:56:30,251 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:56:30,252 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

Downloading (…)olve/main/vocab.json: 100% 1.04M/1.04M [00:00<00:00, 9.18MB/s]
Downloading (…)olve/main/merges.txt: 100% 456k/456k [00:00<00:00, 4.90MB/s]
Downloading (…)/main/tokenizer.json: 100% 1.36M/1.36M [00:00<00:00, 14.3MB/s]
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file vocab.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file merges.txt from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file tokenizer.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 21:56:31,525 >> loading file tokenizer_config.json from cache at None
[INFO|configuration_utils.py:653] 2023-02-14 21:56:31,525 >> loading configuration file config.json from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json
[INFO|configuration_utils.py:705] 2023-02-14 21:56:31,526 >> Model config GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 50257
}

INFO:__main__:Using hidden states in model: False
INFO:__main__:Using implementation from class: GPT2ForSequenceClassificationCustom
Downloading (…)"pytorch_model.bin";: 100% 548M/548M [00:05<00:00, 108MB/s]
[INFO|modeling_utils.py:2156] 2023-02-14 21:56:36,895 >> loading weights file pytorch_model.bin from cache at gtp_custom_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin
[INFO|modeling_utils.py:2606] 2023-02-14 21:56:39,410 >> All model checkpoint weights were used when initializing GPT2ForSequenceClassificationCustom.

[WARNING|modeling_utils.py:2608] 2023-02-14 21:56:39,410 >> Some weights of GPT2ForSequenceClassificationCustom were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.dense_1_hidden.bias', 'score.dense_1_input.weight', 'score.dense_2.bias', 'score.dense_2.weight', 'score.out_proj.weight', 'score.dense_1_hidden.weight', 'score.dense_1_input.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[ERROR|tokenization_utils_base.py:1019] 2023-02-14 21:56:39,418 >> Using pad_token, but it is not set yet.
INFO:__main__:Set PAD token to EOS: <|endoftext|>
Running tokenizer on dataset:   0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow
Running tokenizer on dataset: 100% 16/16 [00:00<00:00, 19.61ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00, 20.48ba/s]
Running tokenizer on dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/gtp_custom_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow
Running tokenizer on dataset: 100% 2/2 [00:00<00:00,  7.71ba/s]
INFO:__main__:Sample 10476 of the training set: {'label': 0, 'text': 'i do find new friends i m going to try extra hard to make them stay and if i decide that i don t want to feel hurt again and just ride out the last year of school on my own i m going to have to try extra hard not to care what people think of me being a loner', 'input_ids': [72, 466, 1064, 649, 2460, 1312, 285, 1016, 284, 1949, 3131, 1327, 284, 787, 606, 2652, 290, 611, 1312, 5409, 326, 1312, 836, 256, 765, 284, 1254, 5938, 757, 290, 655, 6594, 503, 262, 938, 614, 286, 1524, 319, 616, 898, 1312, 285, 1016, 284, 423, 284, 1949, 3131, 1327, 407, 284, 1337, 644, 661, 892, 286, 502, 852, 257, 300, 14491, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 1824 of the training set: {'label': 1, 'text': 'i asked them to join me in creating a world where all year old girls could grow up feeling hopeful and powerful', 'input_ids': [72, 1965, 606, 284, 4654, 502, 287, 4441, 257, 995, 810, 477, 614, 1468, 4813, 714, 1663, 510, 4203, 17836, 290, 3665, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
INFO:__main__:Sample 409 of the training set: {'label': 2, 'text': 'i feel when you are a caring person you attract other caring people into your life', 'input_ids': [72, 1254, 618, 345, 389, 257, 18088, 1048, 345, 4729, 584, 18088, 661, 656, 534, 1204, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]}.
[INFO|trainer.py:503] 2023-02-14 21:56:42,941 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:725] 2023-02-14 21:56:42,941 >> The following columns in the training set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1607] 2023-02-14 21:56:42,947 >> ***** Running training *****
[INFO|trainer.py:1608] 2023-02-14 21:56:42,947 >>   Num examples = 16000
[INFO|trainer.py:1609] 2023-02-14 21:56:42,947 >>   Num Epochs = 4
[INFO|trainer.py:1610] 2023-02-14 21:56:42,947 >>   Instantaneous batch size per device = 24
[INFO|trainer.py:1611] 2023-02-14 21:56:42,947 >>   Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:1612] 2023-02-14 21:56:42,947 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1613] 2023-02-14 21:56:42,947 >>   Total optimization steps = 2500
{'loss': 1.6218, 'learning_rate': 1.9200000000000003e-05, 'epoch': 0.15}
{'loss': 1.1593, 'learning_rate': 1.8400000000000003e-05, 'epoch': 0.3}
 10% 250/2500 [00:39<05:43,  6.56it/s][INFO|trainer.py:725] 2023-02-14 21:57:22,025 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:57:22,027 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:57:22,027 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:57:22,027 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 26.97it/s]
  8% 7/84 [00:00<00:03, 22.99it/s]
 12% 10/84 [00:00<00:03, 21.78it/s]
 15% 13/84 [00:00<00:03, 21.18it/s]
 19% 16/84 [00:00<00:03, 20.86it/s]
 23% 19/84 [00:00<00:03, 20.66it/s]
 26% 22/84 [00:01<00:03, 20.55it/s]
 30% 25/84 [00:01<00:02, 20.44it/s]
 33% 28/84 [00:01<00:02, 20.32it/s]
 37% 31/84 [00:01<00:02, 20.32it/s]
 40% 34/84 [00:01<00:02, 20.31it/s]
 44% 37/84 [00:01<00:02, 20.30it/s]
 48% 40/84 [00:01<00:02, 20.31it/s]
 51% 43/84 [00:02<00:02, 20.32it/s]
 55% 46/84 [00:02<00:01, 20.29it/s]
 58% 49/84 [00:02<00:01, 20.28it/s]
 62% 52/84 [00:02<00:01, 20.28it/s]
 65% 55/84 [00:02<00:01, 20.28it/s]
 69% 58/84 [00:02<00:01, 20.28it/s]
 73% 61/84 [00:02<00:01, 20.27it/s]
 76% 64/84 [00:03<00:00, 20.27it/s]
 80% 67/84 [00:03<00:00, 20.25it/s]
 83% 70/84 [00:03<00:00, 20.26it/s]
 87% 73/84 [00:03<00:00, 20.23it/s]
 90% 76/84 [00:03<00:00, 20.22it/s]
 94% 79/84 [00:03<00:00, 20.23it/s]
 98% 82/84 [00:03<00:00, 20.22it/s]
{'eval_loss': 0.6981180310249329, 'eval_accuracy': 0.7329999804496765, 'eval_runtime': 4.1201, 'eval_samples_per_second': 485.426, 'eval_steps_per_second': 20.388, 'epoch': 0.37}

 10% 250/2500 [00:43<05:43,  6.56it/s]
{'loss': 0.8016, 'learning_rate': 1.76e-05, 'epoch': 0.45}
{'loss': 0.5481, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.6}
{'loss': 0.4045, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.75}
 20% 500/2500 [01:21<05:03,  6.58it/s][INFO|trainer.py:725] 2023-02-14 21:58:04,246 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:58:04,248 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:58:04,248 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:58:04,248 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 26.97it/s]
  8% 7/84 [00:00<00:03, 23.02it/s]
 12% 10/84 [00:00<00:03, 21.78it/s]
 15% 13/84 [00:00<00:03, 21.20it/s]
 19% 16/84 [00:00<00:03, 20.86it/s]
 23% 19/84 [00:00<00:03, 20.19it/s]
 26% 22/84 [00:01<00:03, 20.20it/s]
 30% 25/84 [00:01<00:02, 20.21it/s]
 33% 28/84 [00:01<00:02, 20.22it/s]
 37% 31/84 [00:01<00:02, 20.23it/s]
 40% 34/84 [00:01<00:02, 20.23it/s]
 44% 37/84 [00:01<00:02, 20.24it/s]
 48% 40/84 [00:01<00:02, 20.25it/s]
 51% 43/84 [00:02<00:02, 20.24it/s]
 55% 46/84 [00:02<00:01, 20.25it/s]
 58% 49/84 [00:02<00:01, 20.24it/s]
 62% 52/84 [00:02<00:01, 20.26it/s]
 65% 55/84 [00:02<00:01, 20.25it/s]
 69% 58/84 [00:02<00:01, 20.25it/s]
 73% 61/84 [00:02<00:01, 20.24it/s]
 76% 64/84 [00:03<00:00, 20.25it/s]
 80% 67/84 [00:03<00:00, 20.24it/s]
 83% 70/84 [00:03<00:00, 20.26it/s]
 87% 73/84 [00:03<00:00, 20.26it/s]
 90% 76/84 [00:03<00:00, 20.27it/s]
 94% 79/84 [00:03<00:00, 20.25it/s]
 98% 82/84 [00:04<00:00, 20.24it/s]
{'eval_loss': 0.29522550106048584, 'eval_accuracy': 0.9100000262260437, 'eval_runtime': 4.1309, 'eval_samples_per_second': 484.153, 'eval_steps_per_second': 20.334, 'epoch': 0.75}

 20% 500/2500 [01:25<05:03,  6.58it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:58:08,380 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-500
[INFO|configuration_utils.py:447] 2023-02-14 21:58:08,381 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:58:09,983 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:58:09,984 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:58:09,984 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-500/special_tokens_map.json
{'loss': 0.356, 'learning_rate': 1.5200000000000002e-05, 'epoch': 0.9}
{'loss': 0.2714, 'learning_rate': 1.4400000000000001e-05, 'epoch': 1.05}
 30% 750/2500 [02:07<04:25,  6.59it/s][INFO|trainer.py:725] 2023-02-14 21:58:49,972 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:58:49,973 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:58:49,974 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:58:49,974 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.06it/s]
  8% 7/84 [00:00<00:03, 23.11it/s]
 12% 10/84 [00:00<00:03, 21.85it/s]
 15% 13/84 [00:00<00:03, 21.25it/s]
 19% 16/84 [00:00<00:03, 20.89it/s]
 23% 19/84 [00:00<00:03, 20.67it/s]
 26% 22/84 [00:01<00:03, 20.56it/s]
 30% 25/84 [00:01<00:02, 20.48it/s]
 33% 28/84 [00:01<00:02, 20.42it/s]
 37% 31/84 [00:01<00:02, 20.39it/s]
 40% 34/84 [00:01<00:02, 20.37it/s]
 44% 37/84 [00:01<00:02, 20.34it/s]
 48% 40/84 [00:01<00:02, 20.31it/s]
 51% 43/84 [00:02<00:02, 20.32it/s]
 55% 46/84 [00:02<00:01, 20.29it/s]
 58% 49/84 [00:02<00:01, 20.30it/s]
 62% 52/84 [00:02<00:01, 20.30it/s]
 65% 55/84 [00:02<00:01, 20.30it/s]
 69% 58/84 [00:02<00:01, 20.25it/s]
 73% 61/84 [00:02<00:01, 20.27it/s]
 76% 64/84 [00:03<00:00, 20.27it/s]
 80% 67/84 [00:03<00:00, 20.28it/s]
 83% 70/84 [00:03<00:00, 20.30it/s]
 87% 73/84 [00:03<00:00, 20.30it/s]
 90% 76/84 [00:03<00:00, 20.31it/s]
 94% 79/84 [00:03<00:00, 20.30it/s]
 98% 82/84 [00:03<00:00, 20.30it/s]
{'eval_loss': 0.22870442271232605, 'eval_accuracy': 0.9200000166893005, 'eval_runtime': 4.1118, 'eval_samples_per_second': 486.403, 'eval_steps_per_second': 20.429, 'epoch': 1.12}

 30% 750/2500 [02:11<04:25,  6.59it/s]
{'loss': 0.2332, 'learning_rate': 1.3600000000000002e-05, 'epoch': 1.2}
{'loss': 0.2135, 'learning_rate': 1.2800000000000001e-05, 'epoch': 1.35}
{'loss': 0.2283, 'learning_rate': 1.2e-05, 'epoch': 1.5}
 40% 1000/2500 [02:49<03:48,  6.57it/s][INFO|trainer.py:725] 2023-02-14 21:59:32,169 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 21:59:32,170 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 21:59:32,170 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 21:59:32,171 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.03it/s]
  8% 7/84 [00:00<00:03, 23.07it/s]
 12% 10/84 [00:00<00:03, 21.78it/s]
 15% 13/84 [00:00<00:03, 21.17it/s]
 19% 16/84 [00:00<00:03, 20.84it/s]
 23% 19/84 [00:00<00:03, 20.62it/s]
 26% 22/84 [00:01<00:03, 20.52it/s]
 30% 25/84 [00:01<00:02, 20.39it/s]
 33% 28/84 [00:01<00:02, 20.36it/s]
 37% 31/84 [00:01<00:02, 20.33it/s]
 40% 34/84 [00:01<00:02, 20.31it/s]
 44% 37/84 [00:01<00:02, 20.28it/s]
 48% 40/84 [00:01<00:02, 20.30it/s]
 51% 43/84 [00:02<00:02, 20.14it/s]
 55% 46/84 [00:02<00:01, 20.18it/s]
 58% 49/84 [00:02<00:01, 20.20it/s]
 62% 52/84 [00:02<00:01, 20.22it/s]
 65% 55/84 [00:02<00:01, 20.24it/s]
 69% 58/84 [00:02<00:01, 20.26it/s]
 73% 61/84 [00:02<00:01, 20.28it/s]
 76% 64/84 [00:03<00:00, 20.29it/s]
 80% 67/84 [00:03<00:00, 20.31it/s]
 83% 70/84 [00:03<00:00, 20.30it/s]
 87% 73/84 [00:03<00:00, 20.28it/s]
 90% 76/84 [00:03<00:00, 20.28it/s]
 94% 79/84 [00:03<00:00, 20.27it/s]
 98% 82/84 [00:04<00:00, 20.25it/s]
{'eval_loss': 0.16501356661319733, 'eval_accuracy': 0.9319999814033508, 'eval_runtime': 4.1217, 'eval_samples_per_second': 485.232, 'eval_steps_per_second': 20.38, 'epoch': 1.5}

 40% 1000/2500 [02:53<03:48,  6.57it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 21:59:36,293 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-1000
[INFO|configuration_utils.py:447] 2023-02-14 21:59:36,294 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-1000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 21:59:37,744 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-1000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 21:59:37,744 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-1000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 21:59:37,744 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-1000/special_tokens_map.json
{'loss': 0.1836, 'learning_rate': 1.1200000000000001e-05, 'epoch': 1.65}
{'loss': 0.1844, 'learning_rate': 1.04e-05, 'epoch': 1.8}
 50% 1250/2500 [03:34<03:09,  6.59it/s][INFO|trainer.py:725] 2023-02-14 22:00:17,827 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:00:17,829 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:00:17,829 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:00:17,829 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.06it/s]
  8% 7/84 [00:00<00:03, 23.06it/s]
 12% 10/84 [00:00<00:03, 21.79it/s]
 15% 13/84 [00:00<00:03, 21.21it/s]
 19% 16/84 [00:00<00:03, 20.88it/s]
 23% 19/84 [00:00<00:03, 20.65it/s]
 26% 22/84 [00:01<00:03, 20.55it/s]
 30% 25/84 [00:01<00:02, 20.47it/s]
 33% 28/84 [00:01<00:02, 20.34it/s]
 37% 31/84 [00:01<00:02, 20.30it/s]
 40% 34/84 [00:01<00:02, 20.27it/s]
 44% 37/84 [00:01<00:02, 20.28it/s]
 48% 40/84 [00:01<00:02, 20.26it/s]
 51% 43/84 [00:02<00:02, 20.26it/s]
 55% 46/84 [00:02<00:01, 20.28it/s]
 58% 49/84 [00:02<00:01, 20.28it/s]
 62% 52/84 [00:02<00:01, 20.29it/s]
 65% 55/84 [00:02<00:01, 20.29it/s]
 69% 58/84 [00:02<00:01, 20.30it/s]
 73% 61/84 [00:02<00:01, 20.30it/s]
 76% 64/84 [00:03<00:00, 20.30it/s]
 80% 67/84 [00:03<00:00, 20.30it/s]
 83% 70/84 [00:03<00:00, 20.28it/s]
 87% 73/84 [00:03<00:00, 20.25it/s]
 90% 76/84 [00:03<00:00, 20.25it/s]
 94% 79/84 [00:03<00:00, 20.25it/s]
 98% 82/84 [00:03<00:00, 20.26it/s]
{'eval_loss': 0.15909001231193542, 'eval_accuracy': 0.9355000257492065, 'eval_runtime': 4.1177, 'eval_samples_per_second': 485.712, 'eval_steps_per_second': 20.4, 'epoch': 1.87}

 50% 1250/2500 [03:38<03:09,  6.59it/s]
{'loss': 0.2181, 'learning_rate': 9.600000000000001e-06, 'epoch': 1.95}
{'loss': 0.1695, 'learning_rate': 8.8e-06, 'epoch': 2.1}
{'loss': 0.1683, 'learning_rate': 8.000000000000001e-06, 'epoch': 2.25}
 60% 1500/2500 [04:17<02:32,  6.55it/s][INFO|trainer.py:725] 2023-02-14 22:00:59,986 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:00:59,988 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:00:59,988 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:00:59,988 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.10it/s]
  8% 7/84 [00:00<00:03, 23.06it/s]
 12% 10/84 [00:00<00:03, 21.79it/s]
 15% 13/84 [00:00<00:03, 21.16it/s]
 19% 16/84 [00:00<00:03, 20.86it/s]
 23% 19/84 [00:00<00:03, 20.65it/s]
 26% 22/84 [00:01<00:03, 20.52it/s]
 30% 25/84 [00:01<00:02, 20.45it/s]
 33% 28/84 [00:01<00:02, 20.30it/s]
 37% 31/84 [00:01<00:02, 20.24it/s]
 40% 34/84 [00:01<00:02, 20.11it/s]
 44% 37/84 [00:01<00:02, 20.12it/s]
 48% 40/84 [00:01<00:02, 20.17it/s]
 51% 43/84 [00:02<00:02, 20.19it/s]
 55% 46/84 [00:02<00:01, 20.22it/s]
 58% 49/84 [00:02<00:01, 20.20it/s]
 62% 52/84 [00:02<00:01, 20.23it/s]
 65% 55/84 [00:02<00:01, 20.23it/s]
 69% 58/84 [00:02<00:01, 20.24it/s]
 73% 61/84 [00:02<00:01, 20.22it/s]
 76% 64/84 [00:03<00:00, 20.25it/s]
 80% 67/84 [00:03<00:00, 20.24it/s]
 83% 70/84 [00:03<00:00, 20.23it/s]
 87% 73/84 [00:03<00:00, 20.20it/s]
 90% 76/84 [00:03<00:00, 20.22it/s]
 94% 79/84 [00:03<00:00, 20.20it/s]
 98% 82/84 [00:04<00:00, 20.20it/s]
{'eval_loss': 0.1472882628440857, 'eval_accuracy': 0.934499979019165, 'eval_runtime': 4.13, 'eval_samples_per_second': 484.258, 'eval_steps_per_second': 20.339, 'epoch': 2.25}

 60% 1500/2500 [04:21<02:32,  6.55it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 22:01:04,119 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-1500
[INFO|configuration_utils.py:447] 2023-02-14 22:01:04,120 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-1500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:01:05,576 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-1500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:01:05,576 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-1500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:01:05,576 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-1500/special_tokens_map.json
{'loss': 0.1497, 'learning_rate': 7.2000000000000005e-06, 'epoch': 2.4}
{'loss': 0.1496, 'learning_rate': 6.4000000000000006e-06, 'epoch': 2.55}
 70% 1750/2500 [05:02<01:54,  6.54it/s][INFO|trainer.py:725] 2023-02-14 22:01:45,617 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:01:45,618 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:01:45,619 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:01:45,619 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 26.78it/s]
  8% 7/84 [00:00<00:03, 22.79it/s]
 12% 10/84 [00:00<00:03, 21.58it/s]
 15% 13/84 [00:00<00:03, 21.03it/s]
 19% 16/84 [00:00<00:03, 20.70it/s]
 23% 19/84 [00:00<00:03, 20.49it/s]
 26% 22/84 [00:01<00:03, 20.30it/s]
 30% 25/84 [00:01<00:02, 20.22it/s]
 33% 28/84 [00:01<00:02, 20.19it/s]
 37% 31/84 [00:01<00:02, 20.16it/s]
 40% 34/84 [00:01<00:02, 20.15it/s]
 44% 37/84 [00:01<00:02, 20.14it/s]
 48% 40/84 [00:01<00:02, 20.12it/s]
 51% 43/84 [00:02<00:02, 20.09it/s]
 55% 46/84 [00:02<00:01, 20.08it/s]
 58% 49/84 [00:02<00:01, 20.10it/s]
 62% 52/84 [00:02<00:01, 20.13it/s]
 65% 55/84 [00:02<00:01, 20.19it/s]
 69% 58/84 [00:02<00:01, 20.20it/s]
 73% 61/84 [00:02<00:01, 20.22it/s]
 76% 64/84 [00:03<00:00, 20.21it/s]
 80% 67/84 [00:03<00:00, 20.22it/s]
 83% 70/84 [00:03<00:00, 20.25it/s]
 87% 73/84 [00:03<00:00, 20.27it/s]
 90% 76/84 [00:03<00:00, 20.28it/s]
 94% 79/84 [00:03<00:00, 20.27it/s]
 98% 82/84 [00:04<00:00, 20.25it/s]
{'eval_loss': 0.14743593335151672, 'eval_accuracy': 0.9359999895095825, 'eval_runtime': 4.1413, 'eval_samples_per_second': 482.944, 'eval_steps_per_second': 20.284, 'epoch': 2.62}

 70% 1750/2500 [05:06<01:54,  6.54it/s]
{'loss': 0.1465, 'learning_rate': 5.600000000000001e-06, 'epoch': 2.7}
{'loss': 0.1376, 'learning_rate': 4.800000000000001e-06, 'epoch': 2.85}
{'loss': 0.1444, 'learning_rate': 4.000000000000001e-06, 'epoch': 3.0}
 80% 2000/2500 [05:44<01:16,  6.57it/s][INFO|trainer.py:725] 2023-02-14 22:02:27,845 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:02:27,846 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:02:27,846 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:02:27,846 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.04it/s]
  8% 7/84 [00:00<00:03, 23.04it/s]
 12% 10/84 [00:00<00:03, 21.75it/s]
 15% 13/84 [00:00<00:03, 21.18it/s]
 19% 16/84 [00:00<00:03, 20.85it/s]
 23% 19/84 [00:00<00:03, 20.61it/s]
 26% 22/84 [00:01<00:03, 20.49it/s]
 30% 25/84 [00:01<00:02, 20.43it/s]
 33% 28/84 [00:01<00:02, 20.39it/s]
 37% 31/84 [00:01<00:02, 20.14it/s]
 40% 34/84 [00:01<00:02, 20.16it/s]
 44% 37/84 [00:01<00:02, 20.21it/s]
 48% 40/84 [00:01<00:02, 20.22it/s]
 51% 43/84 [00:02<00:02, 20.22it/s]
 55% 46/84 [00:02<00:01, 20.20it/s]
 58% 49/84 [00:02<00:01, 20.19it/s]
 62% 52/84 [00:02<00:01, 20.20it/s]
 65% 55/84 [00:02<00:01, 20.22it/s]
 69% 58/84 [00:02<00:01, 20.24it/s]
 73% 61/84 [00:02<00:01, 20.24it/s]
 76% 64/84 [00:03<00:00, 20.26it/s]
 80% 67/84 [00:03<00:00, 20.27it/s]
 83% 70/84 [00:03<00:00, 20.28it/s]
 87% 73/84 [00:03<00:00, 20.24it/s]
 90% 76/84 [00:03<00:00, 20.21it/s]
 94% 79/84 [00:03<00:00, 20.21it/s]
 98% 82/84 [00:04<00:00, 20.20it/s]
{'eval_loss': 0.14364145696163177, 'eval_accuracy': 0.9365000128746033, 'eval_runtime': 4.1279, 'eval_samples_per_second': 484.505, 'eval_steps_per_second': 20.349, 'epoch': 3.0}

 80% 2000/2500 [05:49<01:16,  6.57it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 22:02:31,975 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-2000
[INFO|configuration_utils.py:447] 2023-02-14 22:02:31,976 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-2000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:02:33,429 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-2000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:02:33,430 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-2000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:02:33,430 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-2000/special_tokens_map.json
{'loss': 0.104, 'learning_rate': 3.2000000000000003e-06, 'epoch': 3.15}
{'loss': 0.1206, 'learning_rate': 2.4000000000000003e-06, 'epoch': 3.3}
 90% 2250/2500 [06:30<00:38,  6.55it/s][INFO|trainer.py:725] 2023-02-14 22:03:13,484 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:03:13,486 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:03:13,486 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:03:13,486 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.11it/s]
  8% 7/84 [00:00<00:03, 23.10it/s]
 12% 10/84 [00:00<00:03, 21.81it/s]
 15% 13/84 [00:00<00:03, 21.22it/s]
 19% 16/84 [00:00<00:03, 20.88it/s]
 23% 19/84 [00:00<00:03, 20.68it/s]
 26% 22/84 [00:01<00:03, 20.56it/s]
 30% 25/84 [00:01<00:02, 20.47it/s]
 33% 28/84 [00:01<00:02, 20.41it/s]
 37% 31/84 [00:01<00:02, 20.38it/s]
 40% 34/84 [00:01<00:02, 20.34it/s]
 44% 37/84 [00:01<00:02, 20.34it/s]
 48% 40/84 [00:01<00:02, 20.33it/s]
 51% 43/84 [00:02<00:02, 20.26it/s]
 55% 46/84 [00:02<00:01, 20.26it/s]
 58% 49/84 [00:02<00:01, 20.17it/s]
 62% 52/84 [00:02<00:01, 20.21it/s]
 65% 55/84 [00:02<00:01, 20.21it/s]
 69% 58/84 [00:02<00:01, 20.23it/s]
 73% 61/84 [00:02<00:01, 20.25it/s]
 76% 64/84 [00:03<00:00, 20.26it/s]
 80% 67/84 [00:03<00:00, 20.26it/s]
 83% 70/84 [00:03<00:00, 20.28it/s]
 87% 73/84 [00:03<00:00, 20.29it/s]
 90% 76/84 [00:03<00:00, 20.26it/s]
 94% 79/84 [00:03<00:00, 20.27it/s]
 98% 82/84 [00:03<00:00, 20.27it/s]
{'eval_loss': 0.15543130040168762, 'eval_accuracy': 0.9369999766349792, 'eval_runtime': 4.1171, 'eval_samples_per_second': 485.782, 'eval_steps_per_second': 20.403, 'epoch': 3.37}

 90% 2250/2500 [06:34<00:38,  6.55it/s]
{'loss': 0.1289, 'learning_rate': 1.6000000000000001e-06, 'epoch': 3.45}
{'loss': 0.1231, 'learning_rate': 8.000000000000001e-07, 'epoch': 3.6}
{'loss': 0.1179, 'learning_rate': 0.0, 'epoch': 3.75}
100% 2500/2500 [07:12<00:00,  6.57it/s][INFO|trainer.py:725] 2023-02-14 22:03:55,704 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:03:55,705 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:03:55,705 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:03:55,706 >>   Batch size = 24

  0% 0/84 [00:00<?, ?it/s]
  5% 4/84 [00:00<00:02, 27.06it/s]
  8% 7/84 [00:00<00:03, 23.11it/s]
 12% 10/84 [00:00<00:03, 21.81it/s]
 15% 13/84 [00:00<00:03, 21.13it/s]
 19% 16/84 [00:00<00:03, 20.82it/s]
 23% 19/84 [00:00<00:03, 20.65it/s]
 26% 22/84 [00:01<00:03, 20.47it/s]
 30% 25/84 [00:01<00:02, 20.41it/s]
 33% 28/84 [00:01<00:02, 20.38it/s]
 37% 31/84 [00:01<00:02, 20.35it/s]
 40% 34/84 [00:01<00:02, 20.35it/s]
 44% 37/84 [00:01<00:02, 20.32it/s]
 48% 40/84 [00:01<00:02, 20.30it/s]
 51% 43/84 [00:02<00:02, 20.30it/s]
 55% 46/84 [00:02<00:01, 20.30it/s]
 58% 49/84 [00:02<00:01, 20.30it/s]
 62% 52/84 [00:02<00:01, 20.29it/s]
 65% 55/84 [00:02<00:01, 20.31it/s]
 69% 58/84 [00:02<00:01, 20.28it/s]
 73% 61/84 [00:02<00:01, 20.26it/s]
 76% 64/84 [00:03<00:00, 20.24it/s]
 80% 67/84 [00:03<00:00, 20.26it/s]
 83% 70/84 [00:03<00:00, 20.27it/s]
 87% 73/84 [00:03<00:00, 20.27it/s]
 90% 76/84 [00:03<00:00, 20.29it/s]
 94% 79/84 [00:03<00:00, 20.29it/s]
 98% 82/84 [00:03<00:00, 20.30it/s]
{'eval_loss': 0.14437170326709747, 'eval_accuracy': 0.9350000023841858, 'eval_runtime': 4.116, 'eval_samples_per_second': 485.915, 'eval_steps_per_second': 20.408, 'epoch': 3.75}

100% 2500/2500 [07:16<00:00,  6.57it/s]
                                   [INFO|trainer.py:2656] 2023-02-14 22:03:59,822 >> Saving model checkpoint to out/emotion/gpt2_custom/checkpoint-2500
[INFO|configuration_utils.py:447] 2023-02-14 22:03:59,823 >> Configuration saved in out/emotion/gpt2_custom/checkpoint-2500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:04:00,568 >> Model weights saved in out/emotion/gpt2_custom/checkpoint-2500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:04:00,569 >> tokenizer config file saved in out/emotion/gpt2_custom/checkpoint-2500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:04:00,569 >> Special tokens file saved in out/emotion/gpt2_custom/checkpoint-2500/special_tokens_map.json
[INFO|trainer.py:1852] 2023-02-14 22:04:02,582 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1946] 2023-02-14 22:04:02,582 >> Loading best model from out/emotion/gpt2_custom/checkpoint-2000 (score: 0.9365000128746033).
{'train_runtime': 440.0758, 'train_samples_per_second': 136.34, 'train_steps_per_second': 5.681, 'train_loss': 0.32335229415893557, 'epoch': 3.75}
100% 2500/2500 [07:20<00:00,  5.68it/s]
[INFO|trainer.py:2656] 2023-02-14 22:04:03,025 >> Saving model checkpoint to out/emotion/gpt2_custom
[INFO|configuration_utils.py:447] 2023-02-14 22:04:03,026 >> Configuration saved in out/emotion/gpt2_custom/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:04:03,965 >> Model weights saved in out/emotion/gpt2_custom/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:04:03,966 >> tokenizer config file saved in out/emotion/gpt2_custom/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:04:03,966 >> Special tokens file saved in out/emotion/gpt2_custom/special_tokens_map.json
***** train metrics *****
  epoch                    =       3.75
  train_loss               =     0.3234
  train_runtime            = 0:07:20.07
  train_samples            =      16000
  train_samples_per_second =     136.34
  train_steps_per_second   =      5.681
INFO:__main__:*** Evaluate ***
[INFO|trainer.py:725] 2023-02-14 22:04:04,068 >> The following columns in the evaluation set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:04:04,069 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:04:04,069 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:04:04,070 >>   Batch size = 24
100% 84/84 [00:04<00:00, 20.35it/s]
***** eval metrics *****
  epoch                   =       3.75
  eval_accuracy           =     0.9365
  eval_loss               =     0.1436
  eval_runtime            = 0:00:04.18
  eval_samples            =       2000
  eval_samples_per_second =    477.778
  eval_steps_per_second   =     20.067
INFO:__main__:*** Predict ***
[INFO|trainer.py:725] 2023-02-14 22:04:08,259 >> The following columns in the test set don't have a corresponding argument in `GPT2ForSequenceClassificationCustom.forward` and have been ignored: text. If text are not expected by `GPT2ForSequenceClassificationCustom.forward`,  you can safely ignore this message.
[INFO|trainer.py:2907] 2023-02-14 22:04:08,260 >> ***** Running Prediction *****
[INFO|trainer.py:2909] 2023-02-14 22:04:08,260 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:04:08,260 >>   Batch size = 24
100% 84/84 [00:04<00:00, 20.62it/s]
INFO:__main__:***** Predict results None *****
[INFO|modelcard.py:444] 2023-02-14 22:04:12,537 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Text Classification', 'type': 'text-classification'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.9365000128746033}]}

T5

  • full data
  • model T5
  • sequnece length: 128
  • training epoch: 1
  • first few layers frozen
!python run_translation.py \
  --cache_dir t5_cache_training \
  --model_name_or_path "google/t5-v1_1-small" \
  --train_file data/s2s-train.json \
  --validation_file data/s2s-valid.json \
  --test_file data/s2s-test.json \
  --per_device_train_batch_size 8 \
  --per_device_eval_batch_size 8 \
  --source_lang "text" \
  --target_lang "label" \
  --source_prefix "emotion classification" \
  --max_source_length 256 \
  --max_target_length 128 \
  --generation_max_length 128 \
  --do_train \
  --do_eval \
  --do_predict \
  --predict_with_generate \
  --num_train_epochs 1 \
  --output_dir out/emotion/t5_v1_1  \
  --overwrite_output_dir \
  --eval_steps 250 \
  --evaluation_strategy steps \
  --metric_for_best_model accuracy \
  --logging_steps 100 \
  --save_total_limit 5 \
  --max_steps 2500 \
  --load_best_model_at_end True 
2023-02-14 22:04:17.129470: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 22:04:17.281426: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-02-14 22:04:18.087509: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 22:04:18.087605: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 22:04:18.087624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
WARNING:__main__:Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
INFO:__main__:Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=250,
evaluation_strategy=steps,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_max_length=128,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=passive,
log_level_replica=passive,
log_on_each_node=True,
logging_dir=out/emotion/t5_v1_1/runs/Feb14_22-04-20_fc0011e45a00,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=100,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=2500,
metric_for_best_model=accuracy,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=adamw_hf,
output_dir=out/emotion/t5_v1_1,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=8,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=out/emotion/t5_v1_1,
save_on_each_node=False,
save_steps=500,
save_strategy=steps,
save_total_limit=5,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
WARNING:datasets.builder:Using custom data configuration default-a82ca4164dba097e
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.8/dist-packages/datasets/packaged_modules/json
INFO:datasets.builder:Generating dataset json (/content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Downloading and preparing dataset json/default to /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100% 3/3 [00:00<00:00, 11848.32it/s]
INFO:datasets.download.download_manager:Downloading took 0.0 min
INFO:datasets.download.download_manager:Checksum Computation took 0.0 min
Extracting data files: 100% 3/3 [00:00<00:00, 2097.85it/s]
INFO:datasets.utils.info_utils:Unable to verify checksums.
INFO:datasets.builder:Generating train split
INFO:datasets.builder:Generating validation split
INFO:datasets.builder:Generating test split
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100% 3/3 [00:00<00:00, 953.83it/s]
Downloading (…)lve/main/config.json: 100% 537/537 [00:00<00:00, 97.0kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 22:04:20,972 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json
[INFO|configuration_utils.py:705] 2023-02-14 22:04:20,975 >> Model config T5Config {
  "_name_or_path": "google/t5-v1_1-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 32128
}

Downloading (…)okenizer_config.json: 100% 1.86k/1.86k [00:00<00:00, 853kB/s]
[INFO|configuration_utils.py:653] 2023-02-14 22:04:21,160 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json
[INFO|configuration_utils.py:705] 2023-02-14 22:04:21,160 >> Model config T5Config {
  "_name_or_path": "google/t5-v1_1-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 32128
}

Downloading (…)ve/main/spiece.model: 100% 792k/792k [00:00<00:00, 10.2MB/s]
Downloading (…)cial_tokens_map.json: 100% 1.79k/1.79k [00:00<00:00, 705kB/s]
[INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file spiece.model from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model
[INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file tokenizer.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file special_tokens_map.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json
[INFO|tokenization_utils_base.py:1773] 2023-02-14 22:04:21,837 >> loading file tokenizer_config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json
[INFO|configuration_utils.py:653] 2023-02-14 22:04:21,838 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json
[INFO|configuration_utils.py:705] 2023-02-14 22:04:21,838 >> Model config T5Config {
  "_name_or_path": "google/t5-v1_1-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 32128
}

[INFO|configuration_utils.py:653] 2023-02-14 22:04:21,888 >> loading configuration file config.json from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json
[INFO|configuration_utils.py:705] 2023-02-14 22:04:21,889 >> Model config T5Config {
  "_name_or_path": "google/t5-v1_1-small",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "d_ff": 1024,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 8,
  "num_heads": 6,
  "num_layers": 8,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "transformers_version": "4.23.1",
  "use_cache": true,
  "vocab_size": 32128
}

Downloading (…)"pytorch_model.bin";: 100% 308M/308M [00:03<00:00, 84.8MB/s]
[INFO|modeling_utils.py:2156] 2023-02-14 22:04:26,050 >> loading weights file pytorch_model.bin from cache at t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin
[INFO|modeling_utils.py:2606] 2023-02-14 22:04:27,048 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration.

[INFO|modeling_utils.py:2614] 2023-02-14 22:04:27,048 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at google/t5-v1_1-small.
If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training.


Frozen layers:
[('encoder.block.1.layer.0.SelfAttention.q.weight', False), ('encoder.block.1.layer.0.SelfAttention.k.weight', False), ('encoder.block.1.layer.0.SelfAttention.v.weight', False), ('encoder.block.1.layer.0.SelfAttention.o.weight', False), ('encoder.block.1.layer.0.layer_norm.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.1.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.1.layer.1.layer_norm.weight', False), ('encoder.block.2.layer.0.SelfAttention.q.weight', False), ('encoder.block.2.layer.0.SelfAttention.k.weight', False), ('encoder.block.2.layer.0.SelfAttention.v.weight', False), ('encoder.block.2.layer.0.SelfAttention.o.weight', False), ('encoder.block.2.layer.0.layer_norm.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.2.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.2.layer.1.layer_norm.weight', False), ('encoder.block.3.layer.0.SelfAttention.q.weight', False), ('encoder.block.3.layer.0.SelfAttention.k.weight', False), ('encoder.block.3.layer.0.SelfAttention.v.weight', False), ('encoder.block.3.layer.0.SelfAttention.o.weight', False), ('encoder.block.3.layer.0.layer_norm.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.3.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.3.layer.1.layer_norm.weight', False), ('encoder.block.4.layer.0.SelfAttention.q.weight', False), ('encoder.block.4.layer.0.SelfAttention.k.weight', False), ('encoder.block.4.layer.0.SelfAttention.v.weight', False), ('encoder.block.4.layer.0.SelfAttention.o.weight', False), ('encoder.block.4.layer.0.layer_norm.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.4.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.4.layer.1.layer_norm.weight', False), ('encoder.block.5.layer.0.SelfAttention.q.weight', False), ('encoder.block.5.layer.0.SelfAttention.k.weight', False), ('encoder.block.5.layer.0.SelfAttention.v.weight', False), ('encoder.block.5.layer.0.SelfAttention.o.weight', False), ('encoder.block.5.layer.0.layer_norm.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.5.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.5.layer.1.layer_norm.weight', False), ('encoder.block.6.layer.0.SelfAttention.q.weight', False), ('encoder.block.6.layer.0.SelfAttention.k.weight', False), ('encoder.block.6.layer.0.SelfAttention.v.weight', False), ('encoder.block.6.layer.0.SelfAttention.o.weight', False), ('encoder.block.6.layer.0.layer_norm.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.6.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.6.layer.1.layer_norm.weight', False), ('encoder.block.7.layer.0.SelfAttention.q.weight', False), ('encoder.block.7.layer.0.SelfAttention.k.weight', False), ('encoder.block.7.layer.0.SelfAttention.v.weight', False), ('encoder.block.7.layer.0.SelfAttention.o.weight', False), ('encoder.block.7.layer.0.layer_norm.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wi_0.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wi_1.weight', False), ('encoder.block.7.layer.1.DenseReluDense.wo.weight', False), ('encoder.block.7.layer.1.layer_norm.weight', False)] 


INFO:__main__:Using translation prefix: "emotion classification: "
Running tokenizer on train dataset:   0% 0/16 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa17416eabe18767.arrow
Running tokenizer on train dataset: 100% 16/16 [00:00<00:00, 23.64ba/s]
Running tokenizer on validation dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-c6cebbf9290f7df0.arrow
Running tokenizer on validation dataset: 100% 2/2 [00:00<00:00, 33.01ba/s]
Running tokenizer on prediction dataset:   0% 0/2 [00:00<?, ?ba/s]INFO:datasets.arrow_dataset:Caching processed dataset at /content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-988bff0993eee389.arrow
Running tokenizer on prediction dataset: 100% 2/2 [00:00<00:00, 33.06ba/s]
[INFO|trainer.py:503] 2023-02-14 22:04:30,902 >> max_steps is given, it will override any value given in num_train_epochs
/usr/local/lib/python3.8/dist-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[INFO|trainer.py:1607] 2023-02-14 22:04:30,911 >> ***** Running training *****
[INFO|trainer.py:1608] 2023-02-14 22:04:30,911 >>   Num examples = 16000
[INFO|trainer.py:1609] 2023-02-14 22:04:30,911 >>   Num Epochs = 2
[INFO|trainer.py:1610] 2023-02-14 22:04:30,911 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:1611] 2023-02-14 22:04:30,911 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1612] 2023-02-14 22:04:30,911 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1613] 2023-02-14 22:04:30,911 >>   Total optimization steps = 2500
  0% 0/2500 [00:00<?, ?it/s][WARNING|logging.py:281] 2023-02-14 22:04:30,925 >> You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
{'loss': 21.5908, 'learning_rate': 4.8e-05, 'epoch': 0.05}
{'loss': 14.8264, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.1}
 10% 249/2500 [00:24<03:31, 10.64it/s][INFO|trainer.py:2907] 2023-02-14 22:04:55,366 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:04:55,366 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:04:55,366 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.87it/s]
  2% 6/250 [00:00<00:14, 16.90it/s]
  3% 8/250 [00:00<00:15, 15.84it/s]
  4% 10/250 [00:00<00:15, 15.48it/s]
  5% 12/250 [00:00<00:15, 15.16it/s]
  6% 14/250 [00:00<00:15, 15.04it/s]
  6% 16/250 [00:01<00:15, 14.99it/s]
  7% 18/250 [00:01<00:15, 14.93it/s]
  8% 20/250 [00:01<00:15, 14.86it/s]
  9% 22/250 [00:01<00:15, 14.64it/s]
 10% 24/250 [00:01<00:15, 14.61it/s]
 10% 26/250 [00:01<00:15, 14.67it/s]
 11% 28/250 [00:01<00:15, 14.63it/s]
 12% 30/250 [00:01<00:15, 14.64it/s]
 13% 32/250 [00:02<00:14, 14.69it/s]
 14% 34/250 [00:02<00:14, 14.67it/s]
 14% 36/250 [00:02<00:14, 14.63it/s]
 15% 38/250 [00:02<00:14, 14.47it/s]
 16% 40/250 [00:02<00:14, 14.49it/s]
 17% 42/250 [00:02<00:14, 14.42it/s]
 18% 44/250 [00:02<00:14, 14.46it/s]
 18% 46/250 [00:03<00:14, 14.50it/s]
 19% 48/250 [00:03<00:13, 14.59it/s]
 20% 50/250 [00:03<00:13, 14.59it/s]
 21% 52/250 [00:03<00:13, 14.57it/s]
 22% 54/250 [00:03<00:13, 14.64it/s]
 22% 56/250 [00:03<00:13, 14.64it/s]
 23% 58/250 [00:03<00:13, 14.68it/s]
 24% 60/250 [00:04<00:12, 14.73it/s]
 25% 62/250 [00:04<00:12, 14.69it/s]
 26% 64/250 [00:04<00:12, 14.70it/s]
 26% 66/250 [00:04<00:12, 14.66it/s]
 27% 68/250 [00:04<00:12, 14.72it/s]
 28% 70/250 [00:04<00:12, 14.78it/s]
 29% 72/250 [00:04<00:12, 14.72it/s]
 30% 74/250 [00:04<00:11, 14.71it/s]
 30% 76/250 [00:05<00:11, 14.75it/s]
 31% 78/250 [00:05<00:11, 14.69it/s]
 32% 80/250 [00:05<00:11, 14.67it/s]
 33% 82/250 [00:05<00:11, 14.67it/s]
 34% 84/250 [00:05<00:11, 14.65it/s]
 34% 86/250 [00:05<00:11, 14.71it/s]
 35% 88/250 [00:05<00:11, 14.73it/s]
 36% 90/250 [00:06<00:10, 14.71it/s]
 37% 92/250 [00:06<00:10, 14.58it/s]
 38% 94/250 [00:06<00:10, 14.50it/s]
 38% 96/250 [00:06<00:10, 14.51it/s]
 39% 98/250 [00:06<00:10, 14.56it/s]
 40% 100/250 [00:06<00:10, 14.58it/s]
 41% 102/250 [00:06<00:10, 14.51it/s]
 42% 104/250 [00:07<00:10, 14.39it/s]
 42% 106/250 [00:07<00:10, 14.35it/s]
 43% 108/250 [00:07<00:09, 14.47it/s]
 44% 110/250 [00:07<00:09, 14.45it/s]
 45% 112/250 [00:07<00:09, 14.40it/s]
 46% 114/250 [00:07<00:09, 14.44it/s]
 46% 116/250 [00:07<00:09, 14.52it/s]
 47% 118/250 [00:08<00:09, 14.53it/s]
 48% 120/250 [00:08<00:08, 14.55it/s]
 49% 122/250 [00:08<00:08, 14.61it/s]
 50% 124/250 [00:08<00:08, 14.64it/s]
 50% 126/250 [00:08<00:08, 14.66it/s]
 51% 128/250 [00:08<00:08, 14.61it/s]
 52% 130/250 [00:08<00:08, 14.70it/s]
 53% 132/250 [00:08<00:07, 14.78it/s]
 54% 134/250 [00:09<00:07, 14.78it/s]
 54% 136/250 [00:09<00:07, 14.73it/s]
 55% 138/250 [00:09<00:07, 14.79it/s]
 56% 140/250 [00:09<00:07, 14.64it/s]
 57% 142/250 [00:09<00:07, 14.61it/s]
 58% 144/250 [00:09<00:07, 14.67it/s]
 58% 146/250 [00:09<00:07, 14.71it/s]
 59% 148/250 [00:10<00:06, 14.71it/s]
 60% 150/250 [00:10<00:06, 14.68it/s]
 61% 152/250 [00:10<00:06, 14.72it/s]
 62% 154/250 [00:10<00:06, 14.79it/s]
 62% 156/250 [00:10<00:06, 14.37it/s]
 63% 158/250 [00:10<00:06, 14.37it/s]
 64% 160/250 [00:10<00:06, 14.45it/s]
 65% 162/250 [00:11<00:06, 14.46it/s]
 66% 164/250 [00:11<00:05, 14.55it/s]
 66% 166/250 [00:11<00:05, 14.56it/s]
 67% 168/250 [00:11<00:05, 14.60it/s]
 68% 170/250 [00:11<00:05, 14.62it/s]
 69% 172/250 [00:11<00:05, 14.21it/s]
 70% 174/250 [00:11<00:05, 14.41it/s]
 70% 176/250 [00:11<00:05, 14.53it/s]
 71% 178/250 [00:12<00:04, 14.60it/s]
 72% 180/250 [00:12<00:04, 14.64it/s]
 73% 182/250 [00:12<00:04, 14.67it/s]
 74% 184/250 [00:12<00:04, 14.72it/s]
 74% 186/250 [00:12<00:04, 14.75it/s]
 75% 188/250 [00:12<00:04, 14.67it/s]
 76% 190/250 [00:12<00:04, 14.74it/s]
 77% 192/250 [00:13<00:03, 14.80it/s]
 78% 194/250 [00:13<00:03, 14.86it/s]
 78% 196/250 [00:13<00:03, 14.81it/s]
 79% 198/250 [00:13<00:03, 14.80it/s]
 80% 200/250 [00:13<00:03, 14.83it/s]
 81% 202/250 [00:13<00:03, 14.78it/s]
 82% 204/250 [00:13<00:03, 14.78it/s]
 82% 206/250 [00:14<00:02, 14.73it/s]
 83% 208/250 [00:14<00:02, 14.79it/s]
 84% 210/250 [00:14<00:02, 14.85it/s]
 85% 212/250 [00:14<00:02, 14.85it/s]
 86% 214/250 [00:14<00:02, 14.86it/s]
 86% 216/250 [00:14<00:02, 14.89it/s]
 87% 218/250 [00:14<00:02, 14.83it/s]
 88% 220/250 [00:14<00:02, 14.85it/s]
 89% 222/250 [00:15<00:01, 14.80it/s]
 10% 250/2500 [00:39<03:31, 10.64it/s]
 90% 226/250 [00:15<00:01, 14.77it/s]
 91% 228/250 [00:15<00:01, 14.81it/s]
 92% 230/250 [00:15<00:01, 14.86it/s]
 93% 232/250 [00:15<00:01, 14.84it/s]
 94% 234/250 [00:15<00:01, 14.70it/s]
 94% 236/250 [00:16<00:00, 14.63it/s]
 95% 238/250 [00:16<00:00, 14.73it/s]
 96% 240/250 [00:16<00:00, 14.69it/s]
 97% 242/250 [00:16<00:00, 14.71it/s]
 98% 244/250 [00:16<00:00, 14.79it/s]
 98% 246/250 [00:16<00:00, 14.77it/s]
 99% 248/250 [00:16<00:00, 14.73it/s]
100% 250/250 [00:16<00:00, 14.71it/s]
{'eval_loss': 9.001160621643066, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2175, 'eval_samples_per_second': 116.161, 'eval_steps_per_second': 14.52, 'epoch': 0.12}

 10% 250/2500 [00:41<03:31, 10.64it/s]
{'loss': 10.5792, 'learning_rate': 4.4000000000000006e-05, 'epoch': 0.15}
{'loss': 7.8113, 'learning_rate': 4.2e-05, 'epoch': 0.2}
{'loss': 5.2658, 'learning_rate': 4e-05, 'epoch': 0.25}
 20% 500/2500 [01:05<03:04, 10.83it/s][INFO|trainer.py:2907] 2023-02-14 22:05:35,963 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:05:35,963 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:05:35,963 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 22.27it/s]
  2% 6/250 [00:00<00:14, 17.12it/s]
  3% 8/250 [00:00<00:14, 16.18it/s]
  4% 10/250 [00:00<00:15, 15.53it/s]
  5% 12/250 [00:00<00:15, 15.20it/s]
  6% 14/250 [00:00<00:15, 15.04it/s]
  6% 16/250 [00:01<00:15, 14.93it/s]
  7% 18/250 [00:01<00:15, 14.86it/s]
  8% 20/250 [00:01<00:15, 14.89it/s]
  9% 22/250 [00:01<00:15, 14.91it/s]
 10% 24/250 [00:01<00:15, 14.77it/s]
 10% 26/250 [00:01<00:15, 14.79it/s]
 11% 28/250 [00:01<00:15, 14.68it/s]
 12% 30/250 [00:01<00:14, 14.68it/s]
 13% 32/250 [00:02<00:14, 14.69it/s]
 14% 34/250 [00:02<00:14, 14.71it/s]
 14% 36/250 [00:02<00:14, 14.71it/s]
 15% 38/250 [00:02<00:14, 14.71it/s]
 16% 40/250 [00:02<00:14, 14.64it/s]
 17% 42/250 [00:02<00:14, 14.61it/s]
 18% 44/250 [00:02<00:14, 14.63it/s]
 18% 46/250 [00:03<00:13, 14.63it/s]
 19% 48/250 [00:03<00:13, 14.74it/s]
 20% 50/250 [00:03<00:13, 14.78it/s]
 21% 52/250 [00:03<00:13, 14.77it/s]
 22% 54/250 [00:03<00:13, 14.77it/s]
 22% 56/250 [00:03<00:13, 14.69it/s]
 23% 58/250 [00:03<00:13, 14.71it/s]
 24% 60/250 [00:04<00:12, 14.78it/s]
 25% 62/250 [00:04<00:12, 14.77it/s]
 26% 64/250 [00:04<00:12, 14.77it/s]
 26% 66/250 [00:04<00:12, 14.76it/s]
 27% 68/250 [00:04<00:12, 14.77it/s]
 28% 70/250 [00:04<00:12, 14.84it/s]
 29% 72/250 [00:04<00:12, 14.77it/s]
 30% 74/250 [00:04<00:11, 14.68it/s]
 30% 76/250 [00:05<00:11, 14.75it/s]
 31% 78/250 [00:05<00:11, 14.75it/s]
 32% 80/250 [00:05<00:11, 14.76it/s]
 33% 82/250 [00:05<00:11, 14.79it/s]
 34% 84/250 [00:05<00:11, 14.77it/s]
 34% 86/250 [00:05<00:11, 14.77it/s]
 35% 88/250 [00:05<00:10, 14.74it/s]
 36% 90/250 [00:06<00:10, 14.74it/s]
 37% 92/250 [00:06<00:10, 14.77it/s]
 38% 94/250 [00:06<00:10, 14.80it/s]
 38% 96/250 [00:06<00:10, 14.78it/s]
 39% 98/250 [00:06<00:10, 14.76it/s]
 40% 100/250 [00:06<00:10, 14.78it/s]
 41% 102/250 [00:06<00:09, 14.81it/s]
 42% 104/250 [00:06<00:09, 14.81it/s]
 42% 106/250 [00:07<00:09, 14.75it/s]
 43% 108/250 [00:07<00:09, 14.81it/s]
 44% 110/250 [00:07<00:09, 14.86it/s]
 45% 112/250 [00:07<00:09, 14.83it/s]
 46% 114/250 [00:07<00:09, 14.87it/s]
 46% 116/250 [00:07<00:09, 14.87it/s]
 47% 118/250 [00:07<00:08, 14.85it/s]
 48% 120/250 [00:08<00:08, 14.73it/s]
 49% 122/250 [00:08<00:08, 14.74it/s]
 50% 124/250 [00:08<00:08, 14.77it/s]
 50% 126/250 [00:08<00:08, 14.75it/s]
 51% 128/250 [00:08<00:08, 14.75it/s]
 52% 130/250 [00:08<00:08, 14.64it/s]
 53% 132/250 [00:08<00:08, 14.49it/s]
 54% 134/250 [00:09<00:07, 14.57it/s]
 54% 136/250 [00:09<00:07, 14.57it/s]
 55% 138/250 [00:09<00:07, 14.60it/s]
 56% 140/250 [00:09<00:07, 14.62it/s]
 57% 142/250 [00:09<00:07, 14.64it/s]
 58% 144/250 [00:09<00:07, 14.57it/s]
 58% 146/250 [00:09<00:07, 14.63it/s]
 59% 148/250 [00:09<00:06, 14.62it/s]
 60% 150/250 [00:10<00:06, 14.61it/s]
 61% 152/250 [00:10<00:06, 14.63it/s]
 62% 154/250 [00:10<00:06, 14.72it/s]
 62% 156/250 [00:10<00:06, 14.75it/s]
 63% 158/250 [00:10<00:06, 14.62it/s]
 64% 160/250 [00:10<00:06, 14.67it/s]
 65% 162/250 [00:10<00:06, 14.65it/s]
 66% 164/250 [00:11<00:05, 14.68it/s]
 66% 166/250 [00:11<00:05, 14.61it/s]
 67% 168/250 [00:11<00:05, 14.62it/s]
 68% 170/250 [00:11<00:05, 14.58it/s]
 69% 172/250 [00:11<00:05, 14.64it/s]
 70% 174/250 [00:11<00:05, 14.67it/s]
 70% 176/250 [00:11<00:05, 14.67it/s]
 71% 178/250 [00:12<00:04, 14.60it/s]
 72% 180/250 [00:12<00:04, 14.49it/s]
 73% 182/250 [00:12<00:04, 14.47it/s]
 74% 184/250 [00:12<00:04, 14.53it/s]
 74% 186/250 [00:12<00:04, 14.57it/s]
 75% 188/250 [00:12<00:04, 14.58it/s]
 76% 190/250 [00:12<00:04, 14.64it/s]
 77% 192/250 [00:12<00:03, 14.64it/s]
 78% 194/250 [00:13<00:03, 14.30it/s]
 78% 196/250 [00:13<00:03, 14.43it/s]
 79% 198/250 [00:13<00:03, 14.54it/s]
 80% 200/250 [00:13<00:03, 14.58it/s]
 81% 202/250 [00:13<00:03, 14.65it/s]
 82% 204/250 [00:13<00:03, 14.67it/s]
 82% 206/250 [00:13<00:02, 14.68it/s]
 83% 208/250 [00:14<00:02, 14.70it/s]
 84% 210/250 [00:14<00:02, 14.73it/s]
 85% 212/250 [00:14<00:02, 14.75it/s]
 86% 214/250 [00:14<00:02, 14.77it/s]
 20% 500/2500 [01:19<03:04, 10.83it/s]
 87% 218/250 [00:14<00:02, 14.84it/s]
 88% 220/250 [00:14<00:02, 14.87it/s]
 89% 222/250 [00:15<00:01, 14.83it/s]
 90% 224/250 [00:15<00:01, 14.79it/s]
 90% 226/250 [00:15<00:01, 14.69it/s]
 91% 228/250 [00:15<00:01, 14.68it/s]
 92% 230/250 [00:15<00:01, 14.64it/s]
 93% 232/250 [00:15<00:01, 14.54it/s]
 94% 234/250 [00:15<00:01, 14.60it/s]
 94% 236/250 [00:15<00:00, 14.66it/s]
 95% 238/250 [00:16<00:00, 14.73it/s]
 96% 240/250 [00:16<00:00, 14.76it/s]
 97% 242/250 [00:16<00:00, 14.76it/s]
 98% 244/250 [00:16<00:00, 14.82it/s]
 98% 246/250 [00:16<00:00, 14.82it/s]
 99% 248/250 [00:16<00:00, 14.78it/s]
100% 250/250 [00:16<00:00, 14.79it/s]
{'eval_loss': 2.1697170734405518, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.1551, 'eval_samples_per_second': 116.584, 'eval_steps_per_second': 14.573, 'epoch': 0.25}

 20% 500/2500 [01:22<03:04, 10.83it/s]
                                     [INFO|trainer.py:2656] 2023-02-14 22:05:53,119 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-500
[INFO|configuration_utils.py:447] 2023-02-14 22:05:53,120 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:05:53,749 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:05:53,750 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:05:53,750 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-500/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:05:53,788 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-500/spiece.model
{'loss': 3.7795, 'learning_rate': 3.8e-05, 'epoch': 0.3}
{'loss': 2.9169, 'learning_rate': 3.6e-05, 'epoch': 0.35}
 30% 749/2500 [01:47<02:43, 10.71it/s][INFO|trainer.py:2907] 2023-02-14 22:06:18,135 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:06:18,136 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:06:18,136 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.21it/s]
  2% 6/250 [00:00<00:14, 16.54it/s]
  3% 8/250 [00:00<00:15, 15.62it/s]
  4% 10/250 [00:00<00:15, 15.04it/s]
  5% 12/250 [00:00<00:16, 14.78it/s]
  6% 14/250 [00:00<00:16, 14.60it/s]
  6% 16/250 [00:01<00:16, 14.53it/s]
  7% 18/250 [00:01<00:16, 14.44it/s]
  8% 20/250 [00:01<00:15, 14.51it/s]
  9% 22/250 [00:01<00:15, 14.57it/s]
 10% 24/250 [00:01<00:15, 14.56it/s]
 10% 26/250 [00:01<00:15, 14.65it/s]
 11% 28/250 [00:01<00:15, 14.64it/s]
 12% 30/250 [00:02<00:15, 14.66it/s]
 13% 32/250 [00:02<00:14, 14.63it/s]
 14% 34/250 [00:02<00:14, 14.67it/s]
 14% 36/250 [00:02<00:14, 14.64it/s]
 15% 38/250 [00:02<00:14, 14.60it/s]
 16% 40/250 [00:02<00:14, 14.58it/s]
 17% 42/250 [00:02<00:14, 14.59it/s]
 18% 44/250 [00:02<00:14, 14.65it/s]
 18% 46/250 [00:03<00:13, 14.69it/s]
 19% 48/250 [00:03<00:13, 14.78it/s]
 20% 50/250 [00:03<00:13, 14.85it/s]
 21% 52/250 [00:03<00:13, 14.84it/s]
 22% 54/250 [00:03<00:13, 14.80it/s]
 22% 56/250 [00:03<00:13, 14.77it/s]
 23% 58/250 [00:03<00:12, 14.77it/s]
 24% 60/250 [00:04<00:12, 14.81it/s]
 25% 62/250 [00:04<00:12, 14.78it/s]
 26% 64/250 [00:04<00:12, 14.76it/s]
 26% 66/250 [00:04<00:12, 14.71it/s]
 27% 68/250 [00:04<00:12, 14.73it/s]
 28% 70/250 [00:04<00:12, 14.66it/s]
 29% 72/250 [00:04<00:12, 14.69it/s]
 30% 74/250 [00:05<00:12, 14.64it/s]
 30% 76/250 [00:05<00:11, 14.70it/s]
 31% 78/250 [00:05<00:11, 14.70it/s]
 32% 80/250 [00:05<00:11, 14.76it/s]
 33% 82/250 [00:05<00:11, 14.76it/s]
 34% 84/250 [00:05<00:11, 14.71it/s]
 34% 86/250 [00:05<00:11, 14.74it/s]
 35% 88/250 [00:05<00:10, 14.76it/s]
 36% 90/250 [00:06<00:10, 14.69it/s]
 37% 92/250 [00:06<00:10, 14.71it/s]
 38% 94/250 [00:06<00:10, 14.75it/s]
 38% 96/250 [00:06<00:10, 14.72it/s]
 39% 98/250 [00:06<00:10, 14.70it/s]
 40% 100/250 [00:06<00:10, 14.68it/s]
 41% 102/250 [00:06<00:10, 14.69it/s]
 42% 104/250 [00:07<00:09, 14.72it/s]
 42% 106/250 [00:07<00:09, 14.65it/s]
 43% 108/250 [00:07<00:09, 14.66it/s]
 44% 110/250 [00:07<00:09, 14.70it/s]
 45% 112/250 [00:07<00:09, 14.69it/s]
 46% 114/250 [00:07<00:09, 14.63it/s]
 46% 116/250 [00:07<00:09, 14.69it/s]
 47% 118/250 [00:07<00:08, 14.71it/s]
 48% 120/250 [00:08<00:08, 14.59it/s]
 49% 122/250 [00:08<00:08, 14.68it/s]
 50% 124/250 [00:08<00:08, 14.68it/s]
 50% 126/250 [00:08<00:08, 14.71it/s]
 51% 128/250 [00:08<00:08, 14.73it/s]
 52% 130/250 [00:08<00:08, 14.64it/s]
 53% 132/250 [00:08<00:08, 14.70it/s]
 54% 134/250 [00:09<00:07, 14.74it/s]
 54% 136/250 [00:09<00:07, 14.41it/s]
 55% 138/250 [00:09<00:07, 14.46it/s]
 56% 140/250 [00:09<00:07, 14.51it/s]
 57% 142/250 [00:09<00:07, 14.60it/s]
 58% 144/250 [00:09<00:07, 14.50it/s]
 58% 146/250 [00:09<00:07, 14.53it/s]
 59% 148/250 [00:10<00:07, 14.55it/s]
 60% 150/250 [00:10<00:06, 14.53it/s]
 61% 152/250 [00:10<00:06, 14.48it/s]
 62% 154/250 [00:10<00:06, 14.60it/s]
 62% 156/250 [00:10<00:06, 14.54it/s]
 63% 158/250 [00:10<00:06, 14.46it/s]
 64% 160/250 [00:10<00:06, 14.42it/s]
 65% 162/250 [00:11<00:06, 14.38it/s]
 66% 164/250 [00:11<00:05, 14.38it/s]
 66% 166/250 [00:11<00:05, 14.32it/s]
 67% 168/250 [00:11<00:05, 14.33it/s]
 68% 170/250 [00:11<00:05, 14.23it/s]
 69% 172/250 [00:11<00:05, 14.23it/s]
 70% 174/250 [00:11<00:05, 14.24it/s]
 70% 176/250 [00:12<00:05, 14.21it/s]
 71% 178/250 [00:12<00:05, 14.17it/s]
 72% 180/250 [00:12<00:04, 14.16it/s]
 30% 750/2500 [01:59<02:43, 10.71it/s]
 74% 184/250 [00:12<00:04, 14.30it/s]
 74% 186/250 [00:12<00:04, 14.40it/s]
 75% 188/250 [00:12<00:04, 14.40it/s]
 76% 190/250 [00:12<00:04, 14.48it/s]
 77% 192/250 [00:13<00:03, 14.58it/s]
 78% 194/250 [00:13<00:03, 14.58it/s]
 78% 196/250 [00:13<00:03, 14.56it/s]
 79% 198/250 [00:13<00:03, 14.62it/s]
 80% 200/250 [00:13<00:03, 14.69it/s]
 81% 202/250 [00:13<00:03, 14.69it/s]
 82% 204/250 [00:13<00:03, 14.68it/s]
 82% 206/250 [00:14<00:02, 14.68it/s]
 83% 208/250 [00:14<00:02, 14.68it/s]
 84% 210/250 [00:14<00:02, 14.65it/s]
 85% 212/250 [00:14<00:02, 14.72it/s]
 86% 214/250 [00:14<00:02, 14.71it/s]
 86% 216/250 [00:14<00:02, 14.68it/s]
 87% 218/250 [00:14<00:02, 14.69it/s]
 88% 220/250 [00:15<00:02, 14.75it/s]
 89% 222/250 [00:15<00:01, 14.74it/s]
 90% 224/250 [00:15<00:01, 14.76it/s]
 90% 226/250 [00:15<00:01, 14.73it/s]
 91% 228/250 [00:15<00:01, 14.82it/s]
 92% 230/250 [00:15<00:01, 14.77it/s]
 93% 232/250 [00:15<00:01, 14.75it/s]
 94% 234/250 [00:15<00:01, 14.67it/s]
 94% 236/250 [00:16<00:00, 14.65it/s]
 95% 238/250 [00:16<00:00, 14.64it/s]
 96% 240/250 [00:16<00:00, 14.60it/s]
 97% 242/250 [00:16<00:00, 14.60it/s]
 98% 244/250 [00:16<00:00, 14.26it/s]
 98% 246/250 [00:16<00:00, 14.42it/s]
 99% 248/250 [00:16<00:00, 14.45it/s]
100% 250/250 [00:17<00:00, 14.54it/s]
{'eval_loss': 1.4527522325515747, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2954, 'eval_samples_per_second': 115.638, 'eval_steps_per_second': 14.455, 'epoch': 0.38}

 30% 750/2500 [02:04<02:43, 10.71it/s]
{'loss': 2.4516, 'learning_rate': 3.4000000000000007e-05, 'epoch': 0.4}
{'loss': 2.2293, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.45}
{'loss': 2.0123, 'learning_rate': 3e-05, 'epoch': 0.5}
 40% 1000/2500 [02:27<02:21, 10.63it/s][INFO|trainer.py:2907] 2023-02-14 22:06:58,636 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:06:58,636 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:06:58,636 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:12, 20.13it/s]
  2% 6/250 [00:00<00:15, 16.26it/s]
  3% 8/250 [00:00<00:15, 15.45it/s]
  4% 10/250 [00:00<00:15, 15.09it/s]
  5% 12/250 [00:00<00:16, 14.85it/s]
  6% 14/250 [00:00<00:16, 14.66it/s]
  6% 16/250 [00:01<00:16, 14.56it/s]
  7% 18/250 [00:01<00:15, 14.65it/s]
  8% 20/250 [00:01<00:15, 14.77it/s]
  9% 22/250 [00:01<00:15, 14.88it/s]
 10% 24/250 [00:01<00:15, 14.83it/s]
 10% 26/250 [00:01<00:14, 14.94it/s]
 11% 28/250 [00:01<00:14, 14.94it/s]
 12% 30/250 [00:01<00:14, 14.96it/s]
 13% 32/250 [00:02<00:14, 14.80it/s]
 14% 34/250 [00:02<00:14, 14.82it/s]
 14% 36/250 [00:02<00:14, 14.73it/s]
 15% 38/250 [00:02<00:14, 14.59it/s]
 16% 40/250 [00:02<00:14, 14.47it/s]
 17% 42/250 [00:02<00:14, 14.47it/s]
 18% 44/250 [00:02<00:14, 14.53it/s]
 18% 46/250 [00:03<00:14, 14.19it/s]
 19% 48/250 [00:03<00:13, 14.44it/s]
 20% 50/250 [00:03<00:13, 14.54it/s]
 21% 52/250 [00:03<00:13, 14.56it/s]
 22% 54/250 [00:03<00:13, 14.64it/s]
 22% 56/250 [00:03<00:13, 14.70it/s]
 23% 58/250 [00:03<00:13, 14.71it/s]
 24% 60/250 [00:04<00:12, 14.77it/s]
 25% 62/250 [00:04<00:12, 14.80it/s]
 26% 64/250 [00:04<00:12, 14.79it/s]
 26% 66/250 [00:04<00:12, 14.79it/s]
 27% 68/250 [00:04<00:12, 14.83it/s]
 28% 70/250 [00:04<00:12, 14.89it/s]
 29% 72/250 [00:04<00:11, 14.88it/s]
 30% 74/250 [00:04<00:11, 14.83it/s]
 30% 76/250 [00:05<00:11, 14.83it/s]
 31% 78/250 [00:05<00:11, 14.83it/s]
 32% 80/250 [00:05<00:11, 14.81it/s]
 33% 82/250 [00:05<00:11, 14.78it/s]
 34% 84/250 [00:05<00:11, 14.78it/s]
 34% 86/250 [00:05<00:11, 14.85it/s]
 35% 88/250 [00:05<00:10, 14.79it/s]
 36% 90/250 [00:06<00:10, 14.68it/s]
 37% 92/250 [00:06<00:10, 14.71it/s]
 38% 94/250 [00:06<00:10, 14.76it/s]
 38% 96/250 [00:06<00:10, 14.70it/s]
 39% 98/250 [00:06<00:10, 14.74it/s]
 40% 100/250 [00:06<00:10, 14.72it/s]
 41% 102/250 [00:06<00:10, 14.76it/s]
 42% 104/250 [00:07<00:09, 14.79it/s]
 42% 106/250 [00:07<00:09, 14.72it/s]
 43% 108/250 [00:07<00:09, 14.81it/s]
 44% 110/250 [00:07<00:09, 14.84it/s]
 45% 112/250 [00:07<00:09, 14.83it/s]
 46% 114/250 [00:07<00:09, 14.82it/s]
 46% 116/250 [00:07<00:09, 14.85it/s]
 47% 118/250 [00:07<00:08, 14.85it/s]
 48% 120/250 [00:08<00:08, 14.80it/s]
 49% 122/250 [00:08<00:08, 14.85it/s]
 50% 124/250 [00:08<00:08, 14.87it/s]
 50% 126/250 [00:08<00:08, 14.88it/s]
 51% 128/250 [00:08<00:08, 14.78it/s]
 52% 130/250 [00:08<00:08, 14.78it/s]
 53% 132/250 [00:08<00:07, 14.81it/s]
 54% 134/250 [00:09<00:07, 14.79it/s]
 54% 136/250 [00:09<00:07, 14.77it/s]
 55% 138/250 [00:09<00:07, 14.77it/s]
 56% 140/250 [00:09<00:07, 14.81it/s]
 57% 142/250 [00:09<00:07, 14.84it/s]
 58% 144/250 [00:09<00:07, 14.84it/s]
 58% 146/250 [00:09<00:07, 14.83it/s]
 59% 148/250 [00:09<00:06, 14.83it/s]
 60% 150/250 [00:10<00:06, 14.74it/s]
 61% 152/250 [00:10<00:06, 14.68it/s]
 62% 154/250 [00:10<00:06, 14.76it/s]
 62% 156/250 [00:10<00:06, 14.77it/s]
 63% 158/250 [00:10<00:06, 14.77it/s]
 64% 160/250 [00:10<00:06, 14.80it/s]
 65% 162/250 [00:10<00:05, 14.70it/s]
 66% 164/250 [00:11<00:05, 14.68it/s]
 66% 166/250 [00:11<00:05, 14.62it/s]
 67% 168/250 [00:11<00:05, 14.69it/s]
 68% 170/250 [00:11<00:05, 14.75it/s]
 69% 172/250 [00:11<00:05, 14.82it/s]
 70% 174/250 [00:11<00:05, 14.87it/s]
 40% 1000/2500 [02:39<02:21, 10.63it/s]
 71% 178/250 [00:12<00:04, 14.75it/s]
 72% 180/250 [00:12<00:04, 14.69it/s]
 73% 182/250 [00:12<00:04, 14.68it/s]
 74% 184/250 [00:12<00:04, 14.68it/s]
 74% 186/250 [00:12<00:04, 14.73it/s]
 75% 188/250 [00:12<00:04, 14.69it/s]
 76% 190/250 [00:12<00:04, 14.71it/s]
 77% 192/250 [00:12<00:03, 14.65it/s]
 78% 194/250 [00:13<00:03, 14.65it/s]
 78% 196/250 [00:13<00:03, 14.61it/s]
 79% 198/250 [00:13<00:03, 14.66it/s]
 80% 200/250 [00:13<00:03, 14.63it/s]
 81% 202/250 [00:13<00:03, 14.65it/s]
 82% 204/250 [00:13<00:03, 14.66it/s]
 82% 206/250 [00:13<00:03, 14.58it/s]
 83% 208/250 [00:14<00:02, 14.63it/s]
 84% 210/250 [00:14<00:02, 14.68it/s]
 85% 212/250 [00:14<00:02, 14.65it/s]
 86% 214/250 [00:14<00:02, 14.69it/s]
 86% 216/250 [00:14<00:02, 14.72it/s]
 87% 218/250 [00:14<00:02, 14.67it/s]
 88% 220/250 [00:14<00:02, 14.74it/s]
 89% 222/250 [00:15<00:01, 14.70it/s]
 90% 224/250 [00:15<00:01, 14.64it/s]
 90% 226/250 [00:15<00:01, 14.67it/s]
 91% 228/250 [00:15<00:01, 14.70it/s]
 92% 230/250 [00:15<00:01, 14.69it/s]
 93% 232/250 [00:15<00:01, 14.76it/s]
 94% 234/250 [00:15<00:01, 14.76it/s]
 94% 236/250 [00:15<00:00, 14.73it/s]
 95% 238/250 [00:16<00:00, 14.82it/s]
 96% 240/250 [00:16<00:00, 14.87it/s]
 97% 242/250 [00:16<00:00, 14.88it/s]
 98% 244/250 [00:16<00:00, 14.90it/s]
 98% 246/250 [00:16<00:00, 14.91it/s]
 99% 248/250 [00:16<00:00, 14.90it/s]
100% 250/250 [00:16<00:00, 14.92it/s]
{'eval_loss': 1.160749912261963, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.1471, 'eval_samples_per_second': 116.638, 'eval_steps_per_second': 14.58, 'epoch': 0.5}

 40% 1000/2500 [02:44<02:21, 10.63it/s]
                                     [INFO|trainer.py:2656] 2023-02-14 22:07:15,784 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-1000
[INFO|configuration_utils.py:447] 2023-02-14 22:07:15,785 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-1000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:07:16,414 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-1000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:07:16,415 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-1000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:07:16,416 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-1000/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:07:16,453 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-1000/spiece.model
{'loss': 1.9003, 'learning_rate': 2.8000000000000003e-05, 'epoch': 0.55}
{'loss': 1.7884, 'learning_rate': 2.6000000000000002e-05, 'epoch': 0.6}
 50% 1249/2500 [03:09<01:59, 10.49it/s][INFO|trainer.py:2907] 2023-02-14 22:07:40,879 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:07:40,879 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:07:40,879 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.99it/s]
  2% 6/250 [00:00<00:14, 17.06it/s]
  3% 8/250 [00:00<00:15, 16.09it/s]
  4% 10/250 [00:00<00:15, 15.50it/s]
  5% 12/250 [00:00<00:15, 15.00it/s]
  6% 14/250 [00:00<00:15, 14.84it/s]
  6% 16/250 [00:01<00:15, 14.74it/s]
  7% 18/250 [00:01<00:15, 14.69it/s]
  8% 20/250 [00:01<00:15, 14.74it/s]
  9% 22/250 [00:01<00:15, 14.73it/s]
 10% 24/250 [00:01<00:15, 14.70it/s]
 10% 26/250 [00:01<00:15, 14.71it/s]
 11% 28/250 [00:01<00:15, 14.56it/s]
 12% 30/250 [00:01<00:15, 14.62it/s]
 13% 32/250 [00:02<00:14, 14.64it/s]
 14% 34/250 [00:02<00:14, 14.56it/s]
 14% 36/250 [00:02<00:14, 14.57it/s]
 15% 38/250 [00:02<00:14, 14.60it/s]
 16% 40/250 [00:02<00:14, 14.60it/s]
 17% 42/250 [00:02<00:14, 14.57it/s]
 18% 44/250 [00:02<00:14, 14.61it/s]
 18% 46/250 [00:03<00:13, 14.64it/s]
 19% 48/250 [00:03<00:13, 14.75it/s]
 20% 50/250 [00:03<00:13, 14.78it/s]
 21% 52/250 [00:03<00:13, 14.73it/s]
 22% 54/250 [00:03<00:13, 14.71it/s]
 22% 56/250 [00:03<00:13, 14.68it/s]
 23% 58/250 [00:03<00:13, 14.63it/s]
 24% 60/250 [00:04<00:12, 14.74it/s]
 25% 62/250 [00:04<00:12, 14.73it/s]
 26% 64/250 [00:04<00:12, 14.68it/s]
 26% 66/250 [00:04<00:12, 14.64it/s]
 27% 68/250 [00:04<00:12, 14.65it/s]
 28% 70/250 [00:04<00:12, 14.68it/s]
 29% 72/250 [00:04<00:12, 14.29it/s]
 30% 74/250 [00:05<00:12, 14.38it/s]
 30% 76/250 [00:05<00:12, 14.47it/s]
 31% 78/250 [00:05<00:11, 14.52it/s]
 32% 80/250 [00:05<00:11, 14.64it/s]
 33% 82/250 [00:05<00:11, 14.66it/s]
 34% 84/250 [00:05<00:11, 14.64it/s]
 34% 86/250 [00:05<00:11, 14.66it/s]
 35% 88/250 [00:05<00:11, 14.72it/s]
 36% 90/250 [00:06<00:10, 14.73it/s]
 37% 92/250 [00:06<00:10, 14.69it/s]
 38% 94/250 [00:06<00:10, 14.75it/s]
 38% 96/250 [00:06<00:10, 14.69it/s]
 39% 98/250 [00:06<00:10, 14.64it/s]
 40% 100/250 [00:06<00:10, 14.67it/s]
 41% 102/250 [00:06<00:10, 14.71it/s]
 42% 104/250 [00:07<00:09, 14.75it/s]
 42% 106/250 [00:07<00:09, 14.71it/s]
 43% 108/250 [00:07<00:09, 14.80it/s]
 44% 110/250 [00:07<00:09, 14.84it/s]
 45% 112/250 [00:07<00:09, 14.73it/s]
 46% 114/250 [00:07<00:09, 14.73it/s]
 46% 116/250 [00:07<00:09, 14.67it/s]
 47% 118/250 [00:07<00:09, 14.50it/s]
 48% 120/250 [00:08<00:08, 14.51it/s]
 49% 122/250 [00:08<00:08, 14.63it/s]
 50% 124/250 [00:08<00:08, 14.69it/s]
 50% 126/250 [00:08<00:08, 14.67it/s]
 51% 128/250 [00:08<00:08, 14.62it/s]
 52% 130/250 [00:08<00:08, 14.60it/s]
 53% 132/250 [00:08<00:08, 14.59it/s]
 54% 134/250 [00:09<00:07, 14.64it/s]
 54% 136/250 [00:09<00:07, 14.65it/s]
 55% 138/250 [00:09<00:07, 14.71it/s]
 56% 140/250 [00:09<00:07, 14.67it/s]
 57% 142/250 [00:09<00:07, 14.70it/s]
 58% 144/250 [00:09<00:07, 14.67it/s]
 58% 146/250 [00:09<00:07, 14.62it/s]
 59% 148/250 [00:10<00:06, 14.65it/s]
 60% 150/250 [00:10<00:06, 14.58it/s]
 61% 152/250 [00:10<00:06, 14.55it/s]
 62% 154/250 [00:10<00:06, 14.58it/s]
 62% 156/250 [00:10<00:06, 14.57it/s]
 63% 158/250 [00:10<00:06, 14.59it/s]
 64% 160/250 [00:10<00:06, 14.66it/s]
 65% 162/250 [00:11<00:06, 14.53it/s]
 66% 164/250 [00:11<00:05, 14.72it/s]
 66% 166/250 [00:11<00:05, 14.60it/s]
 67% 168/250 [00:11<00:05, 14.52it/s]
 68% 170/250 [00:11<00:05, 14.50it/s]
 69% 172/250 [00:11<00:05, 14.49it/s]
 70% 174/250 [00:11<00:05, 14.47it/s]
 70% 176/250 [00:11<00:05, 14.37it/s]
 71% 178/250 [00:12<00:05, 14.29it/s]
 72% 180/250 [00:12<00:04, 14.27it/s]
 73% 182/250 [00:12<00:04, 14.25it/s]
 74% 184/250 [00:12<00:04, 14.27it/s]
 74% 186/250 [00:12<00:04, 14.24it/s]
 75% 188/250 [00:12<00:04, 14.18it/s]
 76% 190/250 [00:12<00:04, 14.22it/s]
 77% 192/250 [00:13<00:04, 14.16it/s]
 78% 194/250 [00:13<00:03, 14.21it/s]
 78% 196/250 [00:13<00:03, 14.22it/s]
 79% 198/250 [00:13<00:03, 14.27it/s]
 80% 200/250 [00:13<00:03, 14.28it/s]
 81% 202/250 [00:13<00:03, 14.16it/s]
 82% 204/250 [00:13<00:03, 14.06it/s]
 82% 206/250 [00:14<00:03, 14.05it/s]
 83% 208/250 [00:14<00:02, 14.06it/s]
 84% 210/250 [00:14<00:02, 14.06it/s]
 85% 212/250 [00:14<00:02, 13.87it/s]
 86% 214/250 [00:14<00:02, 14.01it/s]
 86% 216/250 [00:14<00:02, 14.22it/s]
 87% 218/250 [00:14<00:02, 14.28it/s]
 88% 220/250 [00:15<00:02, 14.42it/s]
 89% 222/250 [00:15<00:01, 14.39it/s]
 90% 224/250 [00:15<00:01, 14.35it/s]
 90% 226/250 [00:15<00:01, 14.49it/s]
 91% 228/250 [00:15<00:01, 14.57it/s]
 92% 230/250 [00:15<00:01, 14.65it/s]
 93% 232/250 [00:15<00:01, 14.74it/s]
 94% 234/250 [00:16<00:01, 14.73it/s]
 94% 236/250 [00:16<00:00, 14.74it/s]
 95% 238/250 [00:16<00:00, 14.80it/s]
 96% 240/250 [00:16<00:00, 14.79it/s]
 97% 242/250 [00:16<00:00, 14.78it/s]
 98% 244/250 [00:16<00:00, 14.83it/s]
 98% 246/250 [00:16<00:00, 14.81it/s]
 99% 248/250 [00:16<00:00, 14.72it/s]
100% 250/250 [00:17<00:00, 14.63it/s]
{'eval_loss': 1.0410572290420532, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.3319, 'eval_samples_per_second': 115.394, 'eval_steps_per_second': 14.424, 'epoch': 0.62}

 50% 1250/2500 [03:27<01:59, 10.49it/s]
{'loss': 1.7415, 'learning_rate': 2.4e-05, 'epoch': 0.65}
{'loss': 1.6231, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.7}
{'loss': 1.5278, 'learning_rate': 2e-05, 'epoch': 0.75}
 60% 1500/2500 [03:50<01:33, 10.71it/s][INFO|trainer.py:2907] 2023-02-14 22:08:21,432 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:08:21,433 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:08:21,433 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.79it/s]
  2% 6/250 [00:00<00:14, 16.88it/s]
  3% 8/250 [00:00<00:15, 15.94it/s]
  4% 10/250 [00:00<00:15, 15.36it/s]
  5% 12/250 [00:00<00:15, 14.98it/s]
  6% 14/250 [00:00<00:16, 14.72it/s]
  6% 16/250 [00:01<00:16, 14.47it/s]
  7% 18/250 [00:01<00:16, 14.40it/s]
  8% 20/250 [00:01<00:15, 14.48it/s]
  9% 22/250 [00:01<00:15, 14.54it/s]
 10% 24/250 [00:01<00:15, 14.57it/s]
 10% 26/250 [00:01<00:15, 14.56it/s]
 11% 28/250 [00:01<00:15, 14.52it/s]
 12% 30/250 [00:02<00:15, 14.53it/s]
 13% 32/250 [00:02<00:15, 14.51it/s]
 14% 34/250 [00:02<00:14, 14.51it/s]
 14% 36/250 [00:02<00:14, 14.53it/s]
 15% 38/250 [00:02<00:14, 14.51it/s]
 16% 40/250 [00:02<00:14, 14.52it/s]
 17% 42/250 [00:02<00:14, 14.52it/s]
 18% 44/250 [00:02<00:14, 14.53it/s]
 18% 46/250 [00:03<00:13, 14.62it/s]
 19% 48/250 [00:03<00:13, 14.58it/s]
 20% 50/250 [00:03<00:13, 14.66it/s]
 21% 52/250 [00:03<00:13, 14.70it/s]
 22% 54/250 [00:03<00:13, 14.75it/s]
 22% 56/250 [00:03<00:13, 14.69it/s]
 23% 58/250 [00:03<00:13, 14.72it/s]
 24% 60/250 [00:04<00:12, 14.72it/s]
 25% 62/250 [00:04<00:12, 14.72it/s]
 26% 64/250 [00:04<00:12, 14.66it/s]
 26% 66/250 [00:04<00:12, 14.65it/s]
 27% 68/250 [00:04<00:12, 14.72it/s]
 28% 70/250 [00:04<00:12, 14.80it/s]
 29% 72/250 [00:04<00:12, 14.80it/s]
 30% 74/250 [00:05<00:11, 14.74it/s]
 30% 76/250 [00:05<00:11, 14.77it/s]
 31% 78/250 [00:05<00:11, 14.59it/s]
 32% 80/250 [00:05<00:11, 14.69it/s]
 33% 82/250 [00:05<00:11, 14.69it/s]
 34% 84/250 [00:05<00:11, 14.67it/s]
 34% 86/250 [00:05<00:11, 14.75it/s]
 35% 88/250 [00:05<00:10, 14.80it/s]
 36% 90/250 [00:06<00:10, 14.82it/s]
 37% 92/250 [00:06<00:10, 14.80it/s]
 38% 94/250 [00:06<00:10, 14.81it/s]
 38% 96/250 [00:06<00:10, 14.78it/s]
 39% 98/250 [00:06<00:10, 14.78it/s]
 40% 100/250 [00:06<00:10, 14.73it/s]
 41% 102/250 [00:06<00:10, 14.73it/s]
 42% 104/250 [00:07<00:09, 14.81it/s]
 42% 106/250 [00:07<00:09, 14.73it/s]
 43% 108/250 [00:07<00:09, 14.74it/s]
 44% 110/250 [00:07<00:09, 14.78it/s]
 45% 112/250 [00:07<00:09, 14.73it/s]
 46% 114/250 [00:07<00:09, 14.75it/s]
 46% 116/250 [00:07<00:09, 14.80it/s]
 47% 118/250 [00:07<00:08, 14.80it/s]
 48% 120/250 [00:08<00:08, 14.79it/s]
 49% 122/250 [00:08<00:08, 14.81it/s]
 50% 124/250 [00:08<00:08, 14.76it/s]
 50% 126/250 [00:08<00:08, 14.80it/s]
 51% 128/250 [00:08<00:08, 14.80it/s]
 52% 130/250 [00:08<00:08, 14.81it/s]
 53% 132/250 [00:08<00:07, 14.82it/s]
 54% 134/250 [00:09<00:07, 14.82it/s]
 54% 136/250 [00:09<00:07, 14.70it/s]
 55% 138/250 [00:09<00:07, 14.70it/s]
 56% 140/250 [00:09<00:07, 14.72it/s]
 57% 142/250 [00:09<00:07, 14.73it/s]
 58% 144/250 [00:09<00:07, 14.71it/s]
 58% 146/250 [00:09<00:07, 14.74it/s]
 59% 148/250 [00:10<00:06, 14.74it/s]
 60% 150/250 [00:10<00:06, 14.78it/s]
 61% 152/250 [00:10<00:06, 14.74it/s]
 62% 154/250 [00:10<00:06, 14.79it/s]
 62% 156/250 [00:10<00:06, 14.79it/s]
 63% 158/250 [00:10<00:06, 14.78it/s]
 64% 160/250 [00:10<00:06, 14.85it/s]
 65% 162/250 [00:10<00:05, 14.82it/s]
 66% 164/250 [00:11<00:05, 14.85it/s]
 66% 166/250 [00:11<00:05, 14.89it/s]
 67% 168/250 [00:11<00:05, 14.85it/s]
 68% 170/250 [00:11<00:05, 14.67it/s]
 69% 172/250 [00:11<00:05, 14.56it/s]
 70% 174/250 [00:11<00:05, 14.69it/s]
 70% 176/250 [00:11<00:05, 14.70it/s]
 71% 178/250 [00:12<00:04, 14.69it/s]
 72% 180/250 [00:12<00:04, 14.73it/s]
 73% 182/250 [00:12<00:04, 14.75it/s]
 74% 184/250 [00:12<00:04, 14.78it/s]
 74% 186/250 [00:12<00:04, 14.85it/s]
 75% 188/250 [00:12<00:04, 14.87it/s]
 76% 190/250 [00:12<00:04, 14.91it/s]
 77% 192/250 [00:12<00:03, 14.91it/s]
 78% 194/250 [00:13<00:03, 14.81it/s]
 78% 196/250 [00:13<00:03, 14.65it/s]
 79% 198/250 [00:13<00:03, 14.54it/s]
 80% 200/250 [00:13<00:03, 14.59it/s]
 81% 202/250 [00:13<00:03, 14.63it/s]
 82% 204/250 [00:13<00:03, 14.63it/s]
 82% 206/250 [00:13<00:03, 14.50it/s]
 83% 208/250 [00:14<00:02, 14.58it/s]
 84% 210/250 [00:14<00:02, 14.65it/s]
 85% 212/250 [00:14<00:02, 14.65it/s]
 86% 214/250 [00:14<00:02, 14.49it/s]
 86% 216/250 [00:14<00:02, 14.58it/s]
 87% 218/250 [00:14<00:02, 14.58it/s]
 88% 220/250 [00:14<00:02, 14.66it/s]
 89% 222/250 [00:15<00:01, 14.57it/s]
 90% 224/250 [00:15<00:01, 14.56it/s]
 90% 226/250 [00:15<00:01, 14.58it/s]
 91% 228/250 [00:15<00:01, 14.56it/s]
 92% 230/250 [00:15<00:01, 14.55it/s]
 93% 232/250 [00:15<00:01, 14.49it/s]
 94% 234/250 [00:15<00:01, 14.42it/s]
 94% 236/250 [00:16<00:00, 14.39it/s]
 95% 238/250 [00:16<00:00, 14.40it/s]
 96% 240/250 [00:16<00:00, 14.35it/s]
 97% 242/250 [00:16<00:00, 14.37it/s]
 98% 244/250 [00:16<00:00, 14.43it/s]
 98% 246/250 [00:16<00:00, 14.44it/s]
 99% 248/250 [00:16<00:00, 14.44it/s]
100% 250/250 [00:16<00:00, 14.48it/s]
{'eval_loss': 0.9458380341529846, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.379, 'eval_samples_per_second': 115.081, 'eval_steps_per_second': 14.385, 'epoch': 0.75}

 60% 1500/2500 [04:07<01:33, 10.71it/s]
                                     [INFO|trainer.py:2656] 2023-02-14 22:08:38,813 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-1500
[INFO|configuration_utils.py:447] 2023-02-14 22:08:38,814 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-1500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:08:39,285 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-1500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:08:39,286 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-1500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:08:39,286 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-1500/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:08:39,322 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-1500/spiece.model
{'loss': 1.4835, 'learning_rate': 1.8e-05, 'epoch': 0.8}
{'loss': 1.449, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.85}
 70% 1749/2500 [04:32<01:10, 10.61it/s][INFO|trainer.py:2907] 2023-02-14 22:09:03,363 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:09:03,363 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:09:03,363 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 22.10it/s]
  2% 6/250 [00:00<00:14, 17.10it/s]
  3% 8/250 [00:00<00:14, 16.16it/s]
  4% 10/250 [00:00<00:15, 15.48it/s]
  5% 12/250 [00:00<00:15, 15.17it/s]
  6% 14/250 [00:00<00:15, 15.00it/s]
  6% 16/250 [00:01<00:15, 14.90it/s]
  7% 18/250 [00:01<00:15, 14.70it/s]
  8% 20/250 [00:01<00:15, 14.59it/s]
  9% 22/250 [00:01<00:15, 14.58it/s]
 10% 24/250 [00:01<00:15, 14.51it/s]
 10% 26/250 [00:01<00:15, 14.59it/s]
 11% 28/250 [00:01<00:15, 14.60it/s]
 12% 30/250 [00:01<00:15, 14.64it/s]
 13% 32/250 [00:02<00:15, 14.49it/s]
 14% 34/250 [00:02<00:14, 14.52it/s]
 14% 36/250 [00:02<00:14, 14.45it/s]
 15% 38/250 [00:02<00:14, 14.39it/s]
 16% 40/250 [00:02<00:14, 14.44it/s]
 17% 42/250 [00:02<00:14, 14.41it/s]
 18% 44/250 [00:02<00:14, 14.46it/s]
 18% 46/250 [00:03<00:14, 14.46it/s]
 19% 48/250 [00:03<00:13, 14.54it/s]
 20% 50/250 [00:03<00:13, 14.47it/s]
 21% 52/250 [00:03<00:13, 14.51it/s]
 22% 54/250 [00:03<00:13, 14.54it/s]
 22% 56/250 [00:03<00:13, 14.62it/s]
 23% 58/250 [00:03<00:13, 14.65it/s]
 24% 60/250 [00:04<00:12, 14.69it/s]
 25% 62/250 [00:04<00:12, 14.75it/s]
 26% 64/250 [00:04<00:12, 14.58it/s]
 26% 66/250 [00:04<00:12, 14.53it/s]
 27% 68/250 [00:04<00:12, 14.61it/s]
 28% 70/250 [00:04<00:12, 14.65it/s]
 29% 72/250 [00:04<00:12, 14.66it/s]
 30% 74/250 [00:05<00:11, 14.67it/s]
 30% 76/250 [00:05<00:11, 14.70it/s]
 31% 78/250 [00:05<00:11, 14.63it/s]
 32% 80/250 [00:05<00:11, 14.64it/s]
 33% 82/250 [00:05<00:11, 14.63it/s]
 34% 84/250 [00:05<00:11, 14.67it/s]
 34% 86/250 [00:05<00:11, 14.75it/s]
 35% 88/250 [00:05<00:10, 14.75it/s]
 36% 90/250 [00:06<00:10, 14.78it/s]
 37% 92/250 [00:06<00:10, 14.83it/s]
 38% 94/250 [00:06<00:10, 14.73it/s]
 38% 96/250 [00:06<00:10, 14.68it/s]
 39% 98/250 [00:06<00:10, 14.65it/s]
 40% 100/250 [00:06<00:10, 14.68it/s]
 41% 102/250 [00:06<00:10, 14.74it/s]
 42% 104/250 [00:07<00:09, 14.80it/s]
 42% 106/250 [00:07<00:09, 14.77it/s]
 43% 108/250 [00:07<00:09, 14.78it/s]
 44% 110/250 [00:07<00:09, 14.83it/s]
 45% 112/250 [00:07<00:09, 14.76it/s]
 46% 114/250 [00:07<00:09, 14.80it/s]
 46% 116/250 [00:07<00:09, 14.68it/s]
 47% 118/250 [00:08<00:08, 14.68it/s]
 48% 120/250 [00:08<00:08, 14.59it/s]
 49% 122/250 [00:08<00:08, 14.60it/s]
 50% 124/250 [00:08<00:08, 14.58it/s]
 50% 126/250 [00:08<00:08, 14.63it/s]
 51% 128/250 [00:08<00:08, 14.64it/s]
 52% 130/250 [00:08<00:08, 14.67it/s]
 53% 132/250 [00:08<00:08, 14.66it/s]
 54% 134/250 [00:09<00:07, 14.74it/s]
 54% 136/250 [00:09<00:07, 14.74it/s]
 55% 138/250 [00:09<00:07, 14.71it/s]
 56% 140/250 [00:09<00:07, 14.67it/s]
 57% 142/250 [00:09<00:07, 14.66it/s]
 58% 144/250 [00:09<00:07, 14.65it/s]
 58% 146/250 [00:09<00:07, 14.66it/s]
 59% 148/250 [00:10<00:06, 14.62it/s]
 60% 150/250 [00:10<00:06, 14.64it/s]
 61% 152/250 [00:10<00:06, 14.63it/s]
 62% 154/250 [00:10<00:06, 14.60it/s]
 62% 156/250 [00:10<00:06, 14.52it/s]
 63% 158/250 [00:10<00:06, 14.55it/s]
 64% 160/250 [00:10<00:06, 14.63it/s]
 65% 162/250 [00:11<00:06, 14.62it/s]
 66% 164/250 [00:11<00:05, 14.65it/s]
 66% 166/250 [00:11<00:05, 14.62it/s]
 67% 168/250 [00:11<00:05, 14.69it/s]
 68% 170/250 [00:11<00:05, 14.71it/s]
 69% 172/250 [00:11<00:05, 14.72it/s]
 70% 174/250 [00:11<00:05, 14.54it/s]
 70% 176/250 [00:11<00:05, 14.61it/s]
 71% 178/250 [00:12<00:04, 14.65it/s]
 72% 180/250 [00:12<00:04, 14.67it/s]
 73% 182/250 [00:12<00:04, 14.65it/s]
 74% 184/250 [00:12<00:04, 14.65it/s]
 74% 186/250 [00:12<00:04, 14.66it/s]
 75% 188/250 [00:12<00:04, 14.62it/s]
 76% 190/250 [00:12<00:04, 14.69it/s]
 77% 192/250 [00:13<00:03, 14.74it/s]
 78% 194/250 [00:13<00:03, 14.81it/s]
 78% 196/250 [00:13<00:03, 14.77it/s]
 79% 198/250 [00:13<00:03, 14.79it/s]
 80% 200/250 [00:13<00:03, 14.50it/s]
 81% 202/250 [00:13<00:03, 14.42it/s]
 82% 204/250 [00:13<00:03, 14.48it/s]
 82% 206/250 [00:14<00:03, 14.52it/s]
 83% 208/250 [00:14<00:02, 14.56it/s]
 84% 210/250 [00:14<00:02, 14.53it/s]
 85% 212/250 [00:14<00:02, 14.58it/s]
 86% 214/250 [00:14<00:02, 14.61it/s]
 86% 216/250 [00:14<00:02, 14.69it/s]
 87% 218/250 [00:14<00:02, 14.70it/s]
 88% 220/250 [00:14<00:02, 14.75it/s]
 89% 222/250 [00:15<00:01, 14.70it/s]
 90% 224/250 [00:15<00:01, 14.75it/s]
 90% 226/250 [00:15<00:01, 14.71it/s]
 91% 228/250 [00:15<00:01, 14.74it/s]
 92% 230/250 [00:15<00:01, 14.68it/s]
 93% 232/250 [00:15<00:01, 14.72it/s]
 94% 234/250 [00:15<00:01, 14.71it/s]
 94% 236/250 [00:16<00:00, 14.62it/s]
 95% 238/250 [00:16<00:00, 14.59it/s]
 96% 240/250 [00:16<00:00, 14.57it/s]
 97% 242/250 [00:16<00:00, 14.61it/s]
 98% 244/250 [00:16<00:00, 14.67it/s]
 98% 246/250 [00:16<00:00, 14.60it/s]
 99% 248/250 [00:16<00:00, 14.63it/s]
100% 250/250 [00:17<00:00, 14.66it/s]
{'eval_loss': 0.8559792637825012, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2321, 'eval_samples_per_second': 116.063, 'eval_steps_per_second': 14.508, 'epoch': 0.88}

 70% 1750/2500 [04:49<01:10, 10.61it/s]
{'loss': 1.4421, 'learning_rate': 1.4000000000000001e-05, 'epoch': 0.9}
{'loss': 1.3835, 'learning_rate': 1.2e-05, 'epoch': 0.95}
{'loss': 1.325, 'learning_rate': 1e-05, 'epoch': 1.0}
 80% 2000/2500 [05:12<00:45, 10.89it/s][INFO|trainer.py:2907] 2023-02-14 22:09:43,863 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:09:43,863 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:09:43,863 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.99it/s]
  2% 6/250 [00:00<00:14, 17.18it/s]
  3% 8/250 [00:00<00:14, 16.14it/s]
  4% 10/250 [00:00<00:15, 15.55it/s]
  5% 12/250 [00:00<00:15, 15.22it/s]
  6% 14/250 [00:00<00:15, 15.01it/s]
  6% 16/250 [00:01<00:15, 14.86it/s]
  7% 18/250 [00:01<00:15, 14.84it/s]
  8% 20/250 [00:01<00:15, 14.87it/s]
  9% 22/250 [00:01<00:15, 14.65it/s]
 10% 24/250 [00:01<00:15, 14.46it/s]
 10% 26/250 [00:01<00:15, 14.51it/s]
 11% 28/250 [00:01<00:15, 14.51it/s]
 12% 30/250 [00:01<00:15, 14.50it/s]
 13% 32/250 [00:02<00:14, 14.59it/s]
 14% 34/250 [00:02<00:14, 14.65it/s]
 14% 36/250 [00:02<00:14, 14.69it/s]
 15% 38/250 [00:02<00:14, 14.72it/s]
 16% 40/250 [00:02<00:14, 14.71it/s]
 17% 42/250 [00:02<00:14, 14.69it/s]
 18% 44/250 [00:02<00:14, 14.61it/s]
 18% 46/250 [00:03<00:14, 14.52it/s]
 19% 48/250 [00:03<00:13, 14.56it/s]
 20% 50/250 [00:03<00:13, 14.62it/s]
 21% 52/250 [00:03<00:13, 14.60it/s]
 22% 54/250 [00:03<00:13, 14.56it/s]
 22% 56/250 [00:03<00:13, 14.40it/s]
 23% 58/250 [00:03<00:13, 14.43it/s]
 24% 60/250 [00:04<00:13, 14.46it/s]
 25% 62/250 [00:04<00:12, 14.51it/s]
 26% 64/250 [00:04<00:12, 14.50it/s]
 26% 66/250 [00:04<00:12, 14.44it/s]
 27% 68/250 [00:04<00:12, 14.49it/s]
 28% 70/250 [00:04<00:12, 14.50it/s]
 29% 72/250 [00:04<00:12, 14.52it/s]
 30% 74/250 [00:05<00:12, 14.53it/s]
 30% 76/250 [00:05<00:12, 14.50it/s]
 31% 78/250 [00:05<00:12, 14.33it/s]
 32% 80/250 [00:05<00:11, 14.36it/s]
 33% 82/250 [00:05<00:11, 14.41it/s]
 34% 84/250 [00:05<00:11, 14.37it/s]
 34% 86/250 [00:05<00:11, 14.42it/s]
 35% 88/250 [00:05<00:11, 14.52it/s]
 36% 90/250 [00:06<00:10, 14.55it/s]
 37% 92/250 [00:06<00:10, 14.57it/s]
 38% 94/250 [00:06<00:10, 14.63it/s]
 38% 96/250 [00:06<00:10, 14.64it/s]
 39% 98/250 [00:06<00:10, 14.57it/s]
 40% 100/250 [00:06<00:10, 14.51it/s]
 41% 102/250 [00:06<00:10, 14.60it/s]
 42% 104/250 [00:07<00:09, 14.63it/s]
 42% 106/250 [00:07<00:09, 14.57it/s]
 43% 108/250 [00:07<00:09, 14.67it/s]
 44% 110/250 [00:07<00:09, 14.68it/s]
 45% 112/250 [00:07<00:09, 14.65it/s]
 46% 114/250 [00:07<00:09, 14.65it/s]
 46% 116/250 [00:07<00:09, 14.52it/s]
 47% 118/250 [00:08<00:09, 14.52it/s]
 48% 120/250 [00:08<00:08, 14.51it/s]
 49% 122/250 [00:08<00:08, 14.61it/s]
 50% 124/250 [00:08<00:08, 14.70it/s]
 50% 126/250 [00:08<00:08, 14.75it/s]
 51% 128/250 [00:08<00:08, 14.71it/s]
 52% 130/250 [00:08<00:08, 14.72it/s]
 53% 132/250 [00:08<00:08, 14.71it/s]
 54% 134/250 [00:09<00:07, 14.76it/s]
 54% 136/250 [00:09<00:07, 14.77it/s]
 55% 138/250 [00:09<00:07, 14.81it/s]
 56% 140/250 [00:09<00:07, 14.87it/s]
 57% 142/250 [00:09<00:07, 14.90it/s]
 58% 144/250 [00:09<00:07, 14.69it/s]
 58% 146/250 [00:09<00:07, 14.69it/s]
 59% 148/250 [00:10<00:06, 14.70it/s]
 60% 150/250 [00:10<00:06, 14.62it/s]
 61% 152/250 [00:10<00:06, 14.63it/s]
 62% 154/250 [00:10<00:06, 14.73it/s]
 62% 156/250 [00:10<00:06, 14.71it/s]
 63% 158/250 [00:10<00:06, 14.64it/s]
 64% 160/250 [00:10<00:06, 14.64it/s]
 65% 162/250 [00:11<00:05, 14.70it/s]
 66% 164/250 [00:11<00:05, 14.78it/s]
 66% 166/250 [00:11<00:05, 14.78it/s]
 67% 168/250 [00:11<00:05, 14.82it/s]
 68% 170/250 [00:11<00:05, 14.86it/s]
 69% 172/250 [00:11<00:05, 14.87it/s]
 70% 174/250 [00:11<00:05, 14.91it/s]
 70% 176/250 [00:11<00:05, 14.53it/s]
 71% 178/250 [00:12<00:04, 14.56it/s]
 72% 180/250 [00:12<00:04, 14.57it/s]
 73% 182/250 [00:12<00:04, 14.63it/s]
 74% 184/250 [00:12<00:04, 14.69it/s]
 74% 186/250 [00:12<00:04, 14.75it/s]
 75% 188/250 [00:12<00:04, 14.69it/s]
 76% 190/250 [00:12<00:04, 14.65it/s]
 77% 192/250 [00:13<00:03, 14.69it/s]
 78% 194/250 [00:13<00:03, 14.64it/s]
 78% 196/250 [00:13<00:03, 14.67it/s]
 79% 198/250 [00:13<00:03, 14.72it/s]
 80% 200/250 [00:13<00:03, 14.70it/s]
 81% 202/250 [00:13<00:03, 14.61it/s]
 82% 204/250 [00:13<00:03, 14.59it/s]
 82% 206/250 [00:14<00:03, 14.53it/s]
 83% 208/250 [00:14<00:02, 14.63it/s]
 84% 210/250 [00:14<00:02, 14.70it/s]
 85% 212/250 [00:14<00:02, 14.68it/s]
 86% 214/250 [00:14<00:02, 14.67it/s]
 86% 216/250 [00:14<00:02, 14.73it/s]
 87% 218/250 [00:14<00:02, 14.76it/s]
 88% 220/250 [00:14<00:02, 14.74it/s]
 89% 222/250 [00:15<00:01, 14.74it/s]
 90% 224/250 [00:15<00:01, 14.75it/s]
 90% 226/250 [00:15<00:01, 14.74it/s]
 91% 228/250 [00:15<00:01, 14.70it/s]
 92% 230/250 [00:15<00:01, 14.60it/s]
 93% 232/250 [00:15<00:01, 14.68it/s]
 94% 234/250 [00:15<00:01, 14.47it/s]
 94% 236/250 [00:16<00:00, 14.53it/s]
 95% 238/250 [00:16<00:00, 14.60it/s]
 96% 240/250 [00:16<00:00, 14.61it/s]
 97% 242/250 [00:16<00:00, 14.66it/s]
 98% 244/250 [00:16<00:00, 14.70it/s]
 80% 2000/2500 [05:29<00:45, 10.89it/s]
 99% 248/250 [00:16<00:00, 14.70it/s]
100% 250/250 [00:17<00:00, 14.62it/s]
{'eval_loss': 0.8163257241249084, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2395, 'eval_samples_per_second': 116.013, 'eval_steps_per_second': 14.502, 'epoch': 1.0}

 80% 2000/2500 [05:30<00:45, 10.89it/s]
                                     [INFO|trainer.py:2656] 2023-02-14 22:10:01,104 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-2000
[INFO|configuration_utils.py:447] 2023-02-14 22:10:01,105 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-2000/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:10:01,585 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-2000/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:10:01,586 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-2000/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:10:01,586 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-2000/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:10:01,623 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-2000/spiece.model
{'loss': 1.2708, 'learning_rate': 8.000000000000001e-06, 'epoch': 1.05}
{'loss': 1.3351, 'learning_rate': 6e-06, 'epoch': 1.1}
 90% 2249/2500 [05:54<00:23, 10.80it/s][INFO|trainer.py:2907] 2023-02-14 22:10:25,736 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:10:25,736 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:10:25,736 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.89it/s]
  2% 6/250 [00:00<00:14, 16.90it/s]
  3% 8/250 [00:00<00:15, 16.04it/s]
  4% 10/250 [00:00<00:15, 15.53it/s]
  5% 12/250 [00:00<00:15, 15.20it/s]
  6% 14/250 [00:00<00:15, 14.99it/s]
  6% 16/250 [00:01<00:15, 14.93it/s]
  7% 18/250 [00:01<00:15, 14.90it/s]
  8% 20/250 [00:01<00:15, 14.70it/s]
  9% 22/250 [00:01<00:15, 14.76it/s]
 10% 24/250 [00:01<00:15, 14.76it/s]
 10% 26/250 [00:01<00:15, 14.80it/s]
 11% 28/250 [00:01<00:15, 14.78it/s]
 12% 30/250 [00:01<00:15, 14.64it/s]
 13% 32/250 [00:02<00:14, 14.61it/s]
 14% 34/250 [00:02<00:14, 14.64it/s]
 14% 36/250 [00:02<00:14, 14.56it/s]
 15% 38/250 [00:02<00:14, 14.63it/s]
 16% 40/250 [00:02<00:14, 14.69it/s]
 17% 42/250 [00:02<00:14, 14.72it/s]
 18% 44/250 [00:02<00:14, 14.57it/s]
 18% 46/250 [00:03<00:14, 14.53it/s]
 19% 48/250 [00:03<00:13, 14.45it/s]
 20% 50/250 [00:03<00:13, 14.54it/s]
 21% 52/250 [00:03<00:13, 14.55it/s]
 22% 54/250 [00:03<00:13, 14.57it/s]
 22% 56/250 [00:03<00:13, 14.53it/s]
 23% 58/250 [00:03<00:13, 14.45it/s]
 24% 60/250 [00:04<00:13, 14.50it/s]
 25% 62/250 [00:04<00:12, 14.57it/s]
 26% 64/250 [00:04<00:12, 14.41it/s]
 26% 66/250 [00:04<00:12, 14.43it/s]
 27% 68/250 [00:04<00:12, 14.54it/s]
 28% 70/250 [00:04<00:12, 14.54it/s]
 29% 72/250 [00:04<00:12, 14.48it/s]
 30% 74/250 [00:05<00:12, 14.39it/s]
 30% 76/250 [00:05<00:11, 14.52it/s]
 31% 78/250 [00:05<00:11, 14.52it/s]
 32% 80/250 [00:05<00:11, 14.50it/s]
 33% 82/250 [00:05<00:11, 14.49it/s]
 34% 84/250 [00:05<00:11, 14.54it/s]
 34% 86/250 [00:05<00:11, 14.62it/s]
 35% 88/250 [00:05<00:11, 14.63it/s]
 36% 90/250 [00:06<00:10, 14.59it/s]
 37% 92/250 [00:06<00:10, 14.69it/s]
 38% 94/250 [00:06<00:10, 14.65it/s]
 38% 96/250 [00:06<00:10, 14.60it/s]
 39% 98/250 [00:06<00:10, 14.63it/s]
 40% 100/250 [00:06<00:10, 14.66it/s]
 41% 102/250 [00:06<00:10, 14.65it/s]
 42% 104/250 [00:07<00:09, 14.69it/s]
 42% 106/250 [00:07<00:09, 14.67it/s]
 43% 108/250 [00:07<00:09, 14.75it/s]
 44% 110/250 [00:07<00:09, 14.77it/s]
 45% 112/250 [00:07<00:09, 14.76it/s]
 46% 114/250 [00:07<00:09, 14.78it/s]
 46% 116/250 [00:07<00:09, 14.82it/s]
 47% 118/250 [00:08<00:08, 14.79it/s]
 48% 120/250 [00:08<00:08, 14.80it/s]
 49% 122/250 [00:08<00:08, 14.80it/s]
 50% 124/250 [00:08<00:08, 14.83it/s]
 50% 126/250 [00:08<00:08, 14.81it/s]
 51% 128/250 [00:08<00:08, 14.78it/s]
 52% 130/250 [00:08<00:08, 14.77it/s]
 53% 132/250 [00:08<00:07, 14.80it/s]
 54% 134/250 [00:09<00:07, 14.70it/s]
 54% 136/250 [00:09<00:07, 14.65it/s]
 55% 138/250 [00:09<00:07, 14.65it/s]
 56% 140/250 [00:09<00:07, 14.61it/s]
 57% 142/250 [00:09<00:07, 14.69it/s]
 58% 144/250 [00:09<00:07, 14.75it/s]
 58% 146/250 [00:09<00:07, 14.72it/s]
 59% 148/250 [00:10<00:06, 14.69it/s]
 60% 150/250 [00:10<00:06, 14.65it/s]
 61% 152/250 [00:10<00:06, 14.62it/s]
 62% 154/250 [00:10<00:06, 14.60it/s]
 62% 156/250 [00:10<00:06, 14.64it/s]
 63% 158/250 [00:10<00:06, 14.63it/s]
 64% 160/250 [00:10<00:06, 14.71it/s]
 65% 162/250 [00:10<00:05, 14.69it/s]
 66% 164/250 [00:11<00:05, 14.77it/s]
 66% 166/250 [00:11<00:05, 14.78it/s]
 67% 168/250 [00:11<00:05, 14.79it/s]
 68% 170/250 [00:11<00:05, 14.73it/s]
 69% 172/250 [00:11<00:05, 14.73it/s]
 70% 174/250 [00:11<00:05, 14.79it/s]
 70% 176/250 [00:11<00:05, 14.80it/s]
 71% 178/250 [00:12<00:04, 14.70it/s]
 72% 180/250 [00:12<00:04, 14.66it/s]
 73% 182/250 [00:12<00:04, 14.67it/s]
 74% 184/250 [00:12<00:04, 14.71it/s]
 74% 186/250 [00:12<00:04, 14.76it/s]
 75% 188/250 [00:12<00:04, 14.73it/s]
 76% 190/250 [00:12<00:04, 14.79it/s]
 77% 192/250 [00:13<00:03, 14.72it/s]
 78% 194/250 [00:13<00:03, 14.64it/s]
 78% 196/250 [00:13<00:03, 14.67it/s]
 79% 198/250 [00:13<00:03, 14.68it/s]
 80% 200/250 [00:13<00:03, 14.74it/s]
 81% 202/250 [00:13<00:03, 14.74it/s]
 82% 204/250 [00:13<00:03, 14.67it/s]
 82% 206/250 [00:13<00:03, 14.65it/s]
 83% 208/250 [00:14<00:02, 14.31it/s]
 84% 210/250 [00:14<00:02, 14.45it/s]
 85% 212/250 [00:14<00:02, 14.61it/s]
 86% 214/250 [00:14<00:02, 14.60it/s]
 86% 216/250 [00:14<00:02, 14.70it/s]
 90% 2250/2500 [06:09<00:23, 10.80it/s]
 88% 220/250 [00:14<00:02, 14.71it/s]
 89% 222/250 [00:15<00:01, 14.67it/s]
 90% 224/250 [00:15<00:01, 14.73it/s]
 90% 226/250 [00:15<00:01, 14.77it/s]
 91% 228/250 [00:15<00:01, 14.83it/s]
 92% 230/250 [00:15<00:01, 14.84it/s]
 93% 232/250 [00:15<00:01, 14.82it/s]
 94% 234/250 [00:15<00:01, 14.80it/s]
 94% 236/250 [00:16<00:00, 14.78it/s]
 95% 238/250 [00:16<00:00, 14.63it/s]
 96% 240/250 [00:16<00:00, 14.62it/s]
 97% 242/250 [00:16<00:00, 14.66it/s]
 98% 244/250 [00:16<00:00, 14.69it/s]
 98% 246/250 [00:16<00:00, 14.68it/s]
 99% 248/250 [00:16<00:00, 14.62it/s]
100% 250/250 [00:16<00:00, 14.59it/s]
{'eval_loss': 0.8037287592887878, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2062, 'eval_samples_per_second': 116.237, 'eval_steps_per_second': 14.53, 'epoch': 1.12}

 90% 2250/2500 [06:12<00:23, 10.80it/s]
{'loss': 1.2308, 'learning_rate': 4.000000000000001e-06, 'epoch': 1.15}
{'loss': 1.376, 'learning_rate': 2.0000000000000003e-06, 'epoch': 1.2}
{'loss': 1.2416, 'learning_rate': 0.0, 'epoch': 1.25}
100% 2500/2500 [06:35<00:00, 10.84it/s][INFO|trainer.py:2907] 2023-02-14 22:11:06,282 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:11:06,283 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:11:06,283 >>   Batch size = 8

  0% 0/250 [00:00<?, ?it/s]
  1% 3/250 [00:00<00:11, 21.34it/s]
  2% 6/250 [00:00<00:14, 16.78it/s]
  3% 8/250 [00:00<00:15, 15.85it/s]
  4% 10/250 [00:00<00:15, 15.37it/s]
  5% 12/250 [00:00<00:15, 15.00it/s]
  6% 14/250 [00:00<00:15, 14.91it/s]
  6% 16/250 [00:01<00:15, 14.80it/s]
  7% 18/250 [00:01<00:15, 14.76it/s]
  8% 20/250 [00:01<00:15, 14.78it/s]
  9% 22/250 [00:01<00:15, 14.67it/s]
 10% 24/250 [00:01<00:15, 14.60it/s]
 10% 26/250 [00:01<00:15, 14.65it/s]
 11% 28/250 [00:01<00:15, 14.63it/s]
 12% 30/250 [00:01<00:14, 14.67it/s]
 13% 32/250 [00:02<00:14, 14.64it/s]
 14% 34/250 [00:02<00:14, 14.68it/s]
 14% 36/250 [00:02<00:14, 14.62it/s]
 15% 38/250 [00:02<00:14, 14.53it/s]
 16% 40/250 [00:02<00:14, 14.59it/s]
 17% 42/250 [00:02<00:14, 14.63it/s]
 18% 44/250 [00:02<00:14, 14.57it/s]
 18% 46/250 [00:03<00:13, 14.67it/s]
 19% 48/250 [00:03<00:13, 14.73it/s]
 20% 50/250 [00:03<00:13, 14.82it/s]
 21% 52/250 [00:03<00:13, 14.79it/s]
 22% 54/250 [00:03<00:13, 14.71it/s]
 22% 56/250 [00:03<00:13, 14.70it/s]
 23% 58/250 [00:03<00:13, 14.59it/s]
 24% 60/250 [00:04<00:13, 14.53it/s]
 25% 62/250 [00:04<00:12, 14.46it/s]
 26% 64/250 [00:04<00:12, 14.47it/s]
 26% 66/250 [00:04<00:12, 14.48it/s]
 27% 68/250 [00:04<00:12, 14.65it/s]
 28% 70/250 [00:04<00:12, 14.77it/s]
 29% 72/250 [00:04<00:12, 14.74it/s]
 30% 74/250 [00:04<00:12, 14.66it/s]
 30% 76/250 [00:05<00:11, 14.67it/s]
 31% 78/250 [00:05<00:11, 14.68it/s]
 32% 80/250 [00:05<00:11, 14.70it/s]
 33% 82/250 [00:05<00:11, 14.66it/s]
 34% 84/250 [00:05<00:11, 14.61it/s]
 34% 86/250 [00:05<00:11, 14.62it/s]
 35% 88/250 [00:05<00:11, 14.56it/s]
 36% 90/250 [00:06<00:10, 14.59it/s]
 37% 92/250 [00:06<00:10, 14.51it/s]
 38% 94/250 [00:06<00:10, 14.38it/s]
 38% 96/250 [00:06<00:10, 14.33it/s]
 39% 98/250 [00:06<00:10, 14.30it/s]
 40% 100/250 [00:06<00:10, 14.35it/s]
 41% 102/250 [00:06<00:10, 14.40it/s]
 42% 104/250 [00:07<00:10, 14.40it/s]
 42% 106/250 [00:07<00:10, 14.36it/s]
 43% 108/250 [00:07<00:09, 14.27it/s]
 44% 110/250 [00:07<00:09, 14.36it/s]
 45% 112/250 [00:07<00:09, 14.34it/s]
 46% 114/250 [00:07<00:09, 14.33it/s]
 46% 116/250 [00:07<00:09, 14.31it/s]
 47% 118/250 [00:08<00:09, 14.35it/s]
 48% 120/250 [00:08<00:09, 14.41it/s]
 49% 122/250 [00:08<00:08, 14.47it/s]
 50% 124/250 [00:08<00:08, 14.50it/s]
 50% 126/250 [00:08<00:08, 14.59it/s]
 51% 128/250 [00:08<00:08, 14.56it/s]
 52% 130/250 [00:08<00:08, 14.59it/s]
 53% 132/250 [00:09<00:08, 14.59it/s]
 54% 134/250 [00:09<00:07, 14.67it/s]
 54% 136/250 [00:09<00:07, 14.62it/s]
 55% 138/250 [00:09<00:07, 14.57it/s]
 56% 140/250 [00:09<00:07, 14.65it/s]
 57% 142/250 [00:09<00:07, 14.69it/s]
 58% 144/250 [00:09<00:07, 14.76it/s]
 58% 146/250 [00:09<00:07, 14.65it/s]
 59% 148/250 [00:10<00:06, 14.67it/s]
 60% 150/250 [00:10<00:06, 14.75it/s]
 61% 152/250 [00:10<00:06, 14.59it/s]
 62% 154/250 [00:10<00:06, 14.68it/s]
 62% 156/250 [00:10<00:06, 14.72it/s]
 63% 158/250 [00:10<00:06, 14.66it/s]
 64% 160/250 [00:10<00:06, 14.72it/s]
 65% 162/250 [00:11<00:05, 14.67it/s]
 66% 164/250 [00:11<00:05, 14.69it/s]
 66% 166/250 [00:11<00:05, 14.70it/s]
 67% 168/250 [00:11<00:05, 14.67it/s]
 68% 170/250 [00:11<00:05, 14.65it/s]
 69% 172/250 [00:11<00:05, 14.71it/s]
 70% 174/250 [00:11<00:05, 14.72it/s]
 70% 176/250 [00:12<00:05, 14.71it/s]
 71% 178/250 [00:12<00:04, 14.68it/s]
 72% 180/250 [00:12<00:04, 14.56it/s]
 73% 182/250 [00:12<00:04, 14.55it/s]
 74% 184/250 [00:12<00:04, 14.62it/s]
 74% 186/250 [00:12<00:04, 14.63it/s]
 75% 188/250 [00:12<00:04, 14.64it/s]
 76% 190/250 [00:12<00:04, 14.71it/s]
 77% 192/250 [00:13<00:03, 14.64it/s]
 78% 194/250 [00:13<00:03, 14.71it/s]
 78% 196/250 [00:13<00:03, 14.66it/s]
 79% 198/250 [00:13<00:03, 14.67it/s]
 80% 200/250 [00:13<00:03, 14.73it/s]
 81% 202/250 [00:13<00:03, 14.69it/s]
 82% 204/250 [00:13<00:03, 14.60it/s]
 82% 206/250 [00:14<00:03, 14.59it/s]
 83% 208/250 [00:14<00:02, 14.49it/s]
100% 2500/2500 [06:49<00:00, 10.84it/s]
 85% 212/250 [00:14<00:02, 14.53it/s]
 86% 214/250 [00:14<00:02, 14.51it/s]
 86% 216/250 [00:14<00:02, 14.54it/s]
 87% 218/250 [00:14<00:02, 14.56it/s]
 88% 220/250 [00:15<00:02, 14.67it/s]
 89% 222/250 [00:15<00:01, 14.66it/s]
 90% 224/250 [00:15<00:01, 14.68it/s]
 90% 226/250 [00:15<00:01, 14.68it/s]
 91% 228/250 [00:15<00:01, 14.78it/s]
 92% 230/250 [00:15<00:01, 14.83it/s]
 93% 232/250 [00:15<00:01, 14.82it/s]
 94% 234/250 [00:15<00:01, 14.74it/s]
 94% 236/250 [00:16<00:00, 14.72it/s]
 95% 238/250 [00:16<00:00, 14.71it/s]
 96% 240/250 [00:16<00:00, 14.70it/s]
 97% 242/250 [00:16<00:00, 14.73it/s]
 98% 244/250 [00:16<00:00, 14.74it/s]
 98% 246/250 [00:16<00:00, 14.65it/s]
 99% 248/250 [00:16<00:00, 14.69it/s]
100% 250/250 [00:17<00:00, 14.64it/s]
{'eval_loss': 0.7921838760375977, 'eval_bleu': 0.0, 'eval_accuracy': 1.0, 'eval_gen_len': 2.0, 'eval_runtime': 17.2721, 'eval_samples_per_second': 115.794, 'eval_steps_per_second': 14.474, 'epoch': 1.25}

100% 2500/2500 [06:52<00:00, 10.84it/s]
                                     [INFO|trainer.py:2656] 2023-02-14 22:11:23,556 >> Saving model checkpoint to out/emotion/t5_v1_1/checkpoint-2500
[INFO|configuration_utils.py:447] 2023-02-14 22:11:23,557 >> Configuration saved in out/emotion/t5_v1_1/checkpoint-2500/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:11:24,033 >> Model weights saved in out/emotion/t5_v1_1/checkpoint-2500/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:11:24,034 >> tokenizer config file saved in out/emotion/t5_v1_1/checkpoint-2500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:11:24,034 >> Special tokens file saved in out/emotion/t5_v1_1/checkpoint-2500/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:11:24,070 >> Copy vocab file to out/emotion/t5_v1_1/checkpoint-2500/spiece.model
[INFO|trainer.py:1852] 2023-02-14 22:11:24,853 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1946] 2023-02-14 22:11:24,854 >> Loading best model from out/emotion/t5_v1_1/checkpoint-500 (score: 1.0).
{'train_runtime': 414.2608, 'train_samples_per_second': 48.279, 'train_steps_per_second': 6.035, 'train_loss': 3.8232721221923827, 'epoch': 1.25}
100% 2500/2500 [06:54<00:00,  6.03it/s]
[INFO|trainer.py:2656] 2023-02-14 22:11:25,173 >> Saving model checkpoint to out/emotion/t5_v1_1
[INFO|configuration_utils.py:447] 2023-02-14 22:11:25,174 >> Configuration saved in out/emotion/t5_v1_1/config.json
[INFO|modeling_utils.py:1624] 2023-02-14 22:11:25,662 >> Model weights saved in out/emotion/t5_v1_1/pytorch_model.bin
[INFO|tokenization_utils_base.py:2123] 2023-02-14 22:11:25,663 >> tokenizer config file saved in out/emotion/t5_v1_1/tokenizer_config.json
[INFO|tokenization_utils_base.py:2130] 2023-02-14 22:11:25,663 >> Special tokens file saved in out/emotion/t5_v1_1/special_tokens_map.json
[INFO|tokenization_t5_fast.py:187] 2023-02-14 22:11:25,703 >> Copy vocab file to out/emotion/t5_v1_1/spiece.model
***** train metrics *****
  epoch                    =       1.25
  train_loss               =     3.8233
  train_runtime            = 0:06:54.26
  train_samples            =      16000
  train_samples_per_second =     48.279
  train_steps_per_second   =      6.035
INFO:__main__:*** Evaluate ***
[INFO|trainer.py:2907] 2023-02-14 22:11:25,713 >> ***** Running Evaluation *****
[INFO|trainer.py:2909] 2023-02-14 22:11:25,713 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:11:25,713 >>   Batch size = 8
100% 250/250 [00:17<00:00, 14.50it/s]
***** eval metrics *****
  epoch                   =       1.25
  eval_accuracy           =        1.0
  eval_bleu               =        0.0
  eval_gen_len            =        2.0
  eval_loss               =     2.1697
  eval_runtime            = 0:00:17.31
  eval_samples            =       2000
  eval_samples_per_second =    115.494
  eval_steps_per_second   =     14.437
INFO:__main__:*** Predict ***
[INFO|trainer.py:2907] 2023-02-14 22:11:43,033 >> ***** Running Prediction *****
[INFO|trainer.py:2909] 2023-02-14 22:11:43,033 >>   Num examples = 2000
[INFO|trainer.py:2912] 2023-02-14 22:11:43,034 >>   Batch size = 8
100% 250/250 [00:17<00:00, 14.58it/s]
***** predict metrics *****
  predict_accuracy           =        1.0
  predict_bleu               =        0.0
  predict_gen_len            =        2.0
  predict_loss               =     2.1029
  predict_runtime            = 0:00:17.21
  predict_samples            =       2000
  predict_samples_per_second =    116.158
  predict_steps_per_second   =      14.52
[INFO|modelcard.py:444] 2023-02-14 22:12:00,417 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Translation', 'type': 'translation'}, 'metrics': [{'name': 'Bleu', 'type': 'bleu', 'value': 0.0}, {'name': 'Accuracy', 'type': 'accuracy', 'value': 1.0}]}

FLAN T5

from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import json
if torch.cuda.is_available():
    device = 0
else:
    device = -1
def perform_shot_learning(pipeline_type, model_name, test_file):
    class_type = AutoModelForSeq2SeqLM
    model = class_type.from_pretrained(model_name, torch_dtype=torch.float32)
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    our_pipeline = pipeline(pipeline_type, model=model, tokenizer=tokenizer, device=device)

    correct = 0

    labels = "possible labels: sadness, joy, love, anger, fear, surprise"

    with open(test_file) as f:
      f_lines = f.readlines()
      for line in f_lines:
          ex = json.loads(line)
          prompt = ex['text']

          tmp = labels + '\n' + f'text: {prompt}' + '\n' + 'label: '
          
          predict = our_pipeline(tmp, do_sample=False)[0]['generated_text']

          if predict == ex['label']:
            correct += 1

    print(f'Accuracy: {correct/len(f_lines)}')
test_ds = 'data/s2s-test.json'
perform_shot_learning('text2text-generation', 'google/flan-t5-large', test_ds)
Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]
Downloading (…)"spiece.model";:   0%|          | 0.00/792k [00:00<?, ?B/s]
Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]
/usr/local/lib/python3.8/dist-packages/transformers/pipelines/base.py:1043: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
  warnings.warn(
Accuracy: 0.647
!zip -r /content/projekt.zip /content/
  adding: content/ (stored 0%)
  adding: content/.config/ (stored 0%)
  adding: content/.config/config_sentinel (stored 0%)
  adding: content/.config/logs/ (stored 0%)
  adding: content/.config/logs/2023.02.10/ (stored 0%)
  adding: content/.config/logs/2023.02.10/14.32.38.026074.log (deflated 58%)
  adding: content/.config/logs/2023.02.10/14.33.38.691407.log (deflated 56%)
  adding: content/.config/logs/2023.02.10/14.33.11.427170.log (deflated 58%)
  adding: content/.config/logs/2023.02.10/14.33.37.863925.log (deflated 57%)
  adding: content/.config/logs/2023.02.10/14.32.12.281772.log (deflated 91%)
  adding: content/.config/logs/2023.02.10/14.33.03.230973.log (deflated 86%)
  adding: content/.config/gce (stored 0%)
  adding: content/.config/.last_survey_prompt.yaml (stored 0%)
  adding: content/.config/configurations/ (stored 0%)
  adding: content/.config/configurations/config_default (deflated 15%)
  adding: content/.config/active_config (stored 0%)
  adding: content/.config/.last_update_check.json (deflated 22%)
  adding: content/.config/.last_opt_in_prompt.yaml (stored 0%)
  adding: content/__pycache__/ (stored 0%)
  adding: content/__pycache__/roberta.cpython-38.pyc (deflated 62%)
  adding: content/__pycache__/gpt2.cpython-38.pyc (deflated 53%)
  adding: content/data/ (stored 0%)
  adding: content/data/.ipynb_checkpoints/ (stored 0%)
  adding: content/data/test.json (deflated 69%)
  adding: content/data/s2s-test.json (deflated 70%)
  adding: content/data/s2s-valid.json (deflated 70%)
  adding: content/data/valid.json (deflated 69%)
  adding: content/data/s2s-train.json (deflated 70%)
  adding: content/data/train.json (deflated 69%)
  adding: content/req.txt (deflated 30%)
  adding: content/.cache_training_roberta/ (stored 0%)
  adding: content/.cache_training_roberta/.cache_training_roberta_json_default-1808ac39383e9432_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%)
  adding: content/.cache_training_roberta/json/ (stored 0%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/ (stored 0%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/ (stored 0%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-ff4234a2fb1a9582.arrow (deflated 88%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-6bbf8957e5f0cf7b.arrow (deflated 88%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-5efe26f1bca5cac0.arrow (deflated 88%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%)
  adding: content/.cache_training_roberta/json/default-1808ac39383e9432/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/5606f48548d99a9829d10a96cd364b816b02cd21 (deflated 63%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/ad0bcbeb288f0d1373d88e0762e66357f55b8311 (deflated 59%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/8db5e7ac5bfc9ec8b613b776009300fe3685d957 (deflated 47%)
  adding: content/.cache_training_roberta/models--roberta-base/blobs/278b7a95739c4392fae9b818bb5343dde20be1b89318f37a6d939e1e1b9e461b (deflated 41%)
  adding: content/.cache_training_roberta/models--roberta-base/refs/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/refs/main (deflated 3%)
  adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/tokenizer_config.json (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/added_tokens.json (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/special_tokens_map.json (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json (deflated 47%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json (deflated 59%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt (deflated 53%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json (deflated 63%)
  adding: content/.cache_training_roberta/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin (deflated 41%)
  adding: content/cache_training_t5/ (stored 0%)
  adding: content/cache_training_t5/cache_training_t5_json_default-25a5883a4a222bad_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/4e28ff6ebdf584f5372d9de68867399142435d9a (deflated 48%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/b114c318caf72f6e89ea92e0755c41327a453198 (deflated 82%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/07b81619b82546ab7f30e06c9615c7fca8fe3abd (deflated 44%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/881bdbffc06e471924ecea57f962bc5f8e2a9f21 (deflated 83%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/blobs/7c9a3e998a8c74b52484f3a1ccfdcc9767972ee6b34ae7a527cdf6f972a34163 (deflated 53%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/refs/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/refs/main (deflated 5%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer.json (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/added_tokens.json (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json (deflated 44%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json (deflated 82%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model (deflated 48%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json (deflated 83%)
  adding: content/cache_training_t5/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin (deflated 53%)
  adding: content/cache_training_t5/json/ (stored 0%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/ (stored 0%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/ (stored 0%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bef49b953c77fdf0.arrow (deflated 74%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-105206b5fd478147.arrow (deflated 74%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-040b968aed3576f7.arrow (deflated 74%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 62%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-f37cf2f406b18541.arrow (deflated 74%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 62%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 58%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-b0aef076d30fe2f7.arrow (deflated 74%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 62%)
  adding: content/cache_training_t5/json/default-25a5883a4a222bad/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%)
  adding: content/run_glue.py (deflated 73%)
  adding: content/run_translation.py (deflated 74%)
  adding: content/roberta_custom_training_cache/ (stored 0%)
  adding: content/roberta_custom_training_cache/roberta_custom_training_cache_json_default-01aa9d8252a24a0d_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%)
  adding: content/roberta_custom_training_cache/json/ (stored 0%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/ (stored 0%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/ (stored 0%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-e62b2012f3f40cb2.arrow (deflated 88%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-cd497527f5c67ba7.arrow (deflated 88%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-9c2deb15eb4326c1.arrow (deflated 88%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%)
  adding: content/roberta_custom_training_cache/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/5606f48548d99a9829d10a96cd364b816b02cd21 (deflated 63%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/ad0bcbeb288f0d1373d88e0762e66357f55b8311 (deflated 59%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/8db5e7ac5bfc9ec8b613b776009300fe3685d957 (deflated 47%)
  adding: content/roberta_custom_training_cache/models--roberta-base/blobs/278b7a95739c4392fae9b818bb5343dde20be1b89318f37a6d939e1e1b9e461b (deflated 41%)
  adding: content/roberta_custom_training_cache/models--roberta-base/refs/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/refs/main (deflated 3%)
  adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/tokenizer_config.json (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/added_tokens.json (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/.no_exist/ff46155979338ff8063cdad90908b498ab91b181/special_tokens_map.json (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/ (stored 0%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/config.json (deflated 47%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/tokenizer.json (deflated 59%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/merges.txt (deflated 53%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/vocab.json (deflated 63%)
  adding: content/roberta_custom_training_cache/models--roberta-base/snapshots/ff46155979338ff8063cdad90908b498ab91b181/pytorch_model.bin (deflated 41%)
  adding: content/gtp_cache_training/ (stored 0%)
  adding: content/gtp_cache_training/json/ (stored 0%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/ (stored 0%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/ (stored 0%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 64%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 64%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-7b339bb99d7c17a1.arrow (deflated 88%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 57%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-82acdaa33d6aa0eb.arrow (deflated 88%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 64%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bb8faaac56c0b87e.arrow (deflated 88%)
  adding: content/gtp_cache_training/json/default-01aa9d8252a24a0d/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/blobs/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/blobs/226b0752cac7789c48f0cb3ec53eda48b7be36cc (deflated 53%)
  adding: content/gtp_cache_training/models--gpt2/blobs/7c5d3f4b8b76583b422fcb9189ad6c89d5d97a094541ce8932dce3ecabde1421 (deflated 16%)
  adding: content/gtp_cache_training/models--gpt2/blobs/1f1d9aaca301414e7f6c9396df506798ff4eb9a6 (deflated 67%)
  adding: content/gtp_cache_training/models--gpt2/blobs/10c66461e4c109db5a2196bff4bb59be30396ed8 (deflated 50%)
  adding: content/gtp_cache_training/models--gpt2/blobs/4b988bccc9dc5adacd403c00b4704976196548f8 (deflated 59%)
  adding: content/gtp_cache_training/models--gpt2/refs/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/refs/main (deflated 3%)
  adding: content/gtp_cache_training/models--gpt2/.no_exist/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer_config.json (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/added_tokens.json (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/.no_exist/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/special_tokens_map.json (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/ (stored 0%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/config.json (deflated 50%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/tokenizer.json (deflated 59%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/merges.txt (deflated 53%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/vocab.json (deflated 67%)
  adding: content/gtp_cache_training/models--gpt2/snapshots/e7da7f221d5bf496a48136c0cd264e630fe9fcc8/pytorch_model.bin (deflated 16%)
  adding: content/gtp_cache_training/gtp_cache_training_json_default-01aa9d8252a24a0d_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%)
  adding: content/t5_cache_training/ (stored 0%)
  adding: content/t5_cache_training/t5_cache_training_json_default-a82ca4164dba097e_0.0.0_0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.lock (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/4e28ff6ebdf584f5372d9de68867399142435d9a (deflated 48%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/b114c318caf72f6e89ea92e0755c41327a453198 (deflated 82%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/07b81619b82546ab7f30e06c9615c7fca8fe3abd (deflated 44%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/881bdbffc06e471924ecea57f962bc5f8e2a9f21 (deflated 83%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/blobs/7c9a3e998a8c74b52484f3a1ccfdcc9767972ee6b34ae7a527cdf6f972a34163 (deflated 53%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/refs/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/refs/main (deflated 5%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer.json (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/.no_exist/8a88af75516269158a3aa488d1abdfd3d5e4ee49/added_tokens.json (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/ (stored 0%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/config.json (deflated 44%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/tokenizer_config.json (deflated 82%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/spiece.model (deflated 48%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/special_tokens_map.json (deflated 83%)
  adding: content/t5_cache_training/models--google--t5-v1_1-small/snapshots/8a88af75516269158a3aa488d1abdfd3d5e4ee49/pytorch_model.bin (deflated 53%)
  adding: content/t5_cache_training/json/ (stored 0%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/ (stored 0%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/ (stored 0%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51.incomplete_info.lock (stored 0%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/ (stored 0%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-test.arrow (deflated 62%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-988bff0993eee389.arrow (deflated 74%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-train.arrow (deflated 62%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/dataset_info.json (deflated 58%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-fa17416eabe18767.arrow (deflated 74%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-c6cebbf9290f7df0.arrow (deflated 74%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/json-validation.arrow (deflated 62%)
  adding: content/t5_cache_training/json/default-a82ca4164dba097e/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51_builder.lock (stored 0%)
  adding: content/out/ (stored 0%)
  adding: content/out/emotion/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/trainer_state.json (deflated 79%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/optimizer.pt (deflated 30%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2000/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2_custom/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/all_results.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/predict_results_None.txt (deflated 62%)
  adding: content/out/emotion/gpt2_custom/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/trainer_state.json (deflated 80%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/trainer_state.json (deflated 77%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/optimizer.pt (deflated 30%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2_custom/train_results.json (deflated 40%)
  adding: content/out/emotion/gpt2_custom/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/eval_results.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/scheduler.pt (deflated 50%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/trainer_state.json (deflated 80%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/optimizer.pt (deflated 30%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-2500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2_custom/runs/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/1676409101.551365/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/1676409101.551365/events.out.tfevents.1676409101.fc0011e45a00.60473.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-11-35_fc0011e45a00/events.out.tfevents.1676409101.fc0011e45a00.60473.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/events.out.tfevents.1676407620.fc0011e45a00.53924.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/1676407620.269752/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-53_fc0011e45a00/1676407620.269752/events.out.tfevents.1676407620.fc0011e45a00.53924.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/events.out.tfevents.1676411802.fc0011e45a00.72811.0 (deflated 63%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/events.out.tfevents.1676412248.fc0011e45a00.72811.2 (deflated 28%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/1676411802.9557116/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-56-28_fc0011e45a00/1676411802.9557116/events.out.tfevents.1676411802.fc0011e45a00.72811.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/events.out.tfevents.1676409199.fc0011e45a00.60936.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/1676409199.1303008/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-13-12_fc0011e45a00/1676409199.1303008/events.out.tfevents.1676409199.fc0011e45a00.60936.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/1676408364.7675455/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/1676408364.7675455/events.out.tfevents.1676408364.fc0011e45a00.57251.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-59-18_fc0011e45a00/events.out.tfevents.1676408364.fc0011e45a00.57251.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/events.out.tfevents.1676409294.fc0011e45a00.61381.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/1676409294.483754/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-14-48_fc0011e45a00/1676409294.483754/events.out.tfevents.1676409294.fc0011e45a00.61381.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/events.out.tfevents.1676407574.fc0011e45a00.53675.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/1676407574.5370467/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-46-07_fc0011e45a00/1676407574.5370467/events.out.tfevents.1676407574.fc0011e45a00.53675.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/1676409363.3658211/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/1676409363.3658211/events.out.tfevents.1676409363.fc0011e45a00.61724.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-15-57_fc0011e45a00/events.out.tfevents.1676409363.fc0011e45a00.61724.0 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/events.out.tfevents.1676407449.fc0011e45a00.53094.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/1676407449.3215246/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-44-02_fc0011e45a00/1676407449.3215246/events.out.tfevents.1676407449.fc0011e45a00.53094.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/events.out.tfevents.1676408949.fc0011e45a00.59782.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/1676408949.6798263/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-09-03_fc0011e45a00/1676408949.6798263/events.out.tfevents.1676408949.fc0011e45a00.59782.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/events.out.tfevents.1676410915.fc0011e45a00.68705.0 (deflated 57%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/1676410915.0364006/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_21-41-48_fc0011e45a00/1676410915.0364006/events.out.tfevents.1676410915.fc0011e45a00.68705.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/events.out.tfevents.1676407741.fc0011e45a00.54546.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/1676407741.3566854/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-48-55_fc0011e45a00/1676407741.3566854/events.out.tfevents.1676407741.fc0011e45a00.54546.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/events.out.tfevents.1676407672.fc0011e45a00.54203.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/1676407672.9366086/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-47-46_fc0011e45a00/1676407672.9366086/events.out.tfevents.1676407672.fc0011e45a00.54203.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/events.out.tfevents.1676408205.fc0011e45a00.56536.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/1676408205.8404686/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-56-39_fc0011e45a00/1676408205.8404686/events.out.tfevents.1676408205.fc0011e45a00.56536.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/1676408153.0722597/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/1676408153.0722597/events.out.tfevents.1676408153.fc0011e45a00.56263.1 (deflated 62%)
  adding: content/out/emotion/gpt2_custom/runs/Feb14_20-55-46_fc0011e45a00/events.out.tfevents.1676408153.fc0011e45a00.56263.0 (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/trainer_state.json (deflated 75%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/optimizer.pt (deflated 30%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-1000/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2_custom/README.md (deflated 54%)
  adding: content/out/emotion/gpt2_custom/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/ (stored 0%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/trainer_state.json (deflated 67%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/optimizer.pt (deflated 31%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2_custom/checkpoint-500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2_custom/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-2000/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-2000/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2/checkpoint-2000/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2/checkpoint-2000/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/checkpoint-2000/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-2000/trainer_state.json (deflated 80%)
  adding: content/out/emotion/gpt2/checkpoint-2000/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/checkpoint-2000/optimizer.pt (deflated 29%)
  adding: content/out/emotion/gpt2/checkpoint-2000/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/checkpoint-2000/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/checkpoint-2000/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/checkpoint-2000/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-2000/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/all_results.json (deflated 55%)
  adding: content/out/emotion/gpt2/predict_results_None.txt (deflated 62%)
  adding: content/out/emotion/gpt2/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/trainer_state.json (deflated 81%)
  adding: content/out/emotion/gpt2/checkpoint-1500/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-1500/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2/checkpoint-1500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2/checkpoint-1500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/checkpoint-1500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-1500/trainer_state.json (deflated 78%)
  adding: content/out/emotion/gpt2/checkpoint-1500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/checkpoint-1500/optimizer.pt (deflated 29%)
  adding: content/out/emotion/gpt2/checkpoint-1500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/checkpoint-1500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/checkpoint-1500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/checkpoint-1500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-1500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/train_results.json (deflated 41%)
  adding: content/out/emotion/gpt2/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/eval_results.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-2500/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-2500/scheduler.pt (deflated 50%)
  adding: content/out/emotion/gpt2/checkpoint-2500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2/checkpoint-2500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/checkpoint-2500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-2500/trainer_state.json (deflated 81%)
  adding: content/out/emotion/gpt2/checkpoint-2500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/checkpoint-2500/optimizer.pt (deflated 29%)
  adding: content/out/emotion/gpt2/checkpoint-2500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/checkpoint-2500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/checkpoint-2500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/checkpoint-2500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-2500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/runs/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/events.out.tfevents.1676411778.fc0011e45a00.70872.2 (deflated 28%)
  adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/1676411348.7268953/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/1676411348.7268953/events.out.tfevents.1676411348.fc0011e45a00.70872.1 (deflated 62%)
  adding: content/out/emotion/gpt2/runs/Feb14_21-48-55_fc0011e45a00/events.out.tfevents.1676411348.fc0011e45a00.70872.0 (deflated 63%)
  adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/events.out.tfevents.1676407272.fc0011e45a00.50524.2 (deflated 28%)
  adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/events.out.tfevents.1676406850.fc0011e45a00.50524.0 (deflated 63%)
  adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/1676406850.2390406/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_20-34-05_fc0011e45a00/1676406850.2390406/events.out.tfevents.1676406850.fc0011e45a00.50524.1 (deflated 62%)
  adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/events.out.tfevents.1676403875.fc0011e45a00.37469.0 (deflated 60%)
  adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/1676403875.9091897/ (stored 0%)
  adding: content/out/emotion/gpt2/runs/Feb14_19-44-33_fc0011e45a00/1676403875.9091897/events.out.tfevents.1676403875.fc0011e45a00.37469.1 (deflated 62%)
  adding: content/out/emotion/gpt2/checkpoint-1000/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-1000/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2/checkpoint-1000/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2/checkpoint-1000/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/checkpoint-1000/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-1000/trainer_state.json (deflated 75%)
  adding: content/out/emotion/gpt2/checkpoint-1000/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/checkpoint-1000/optimizer.pt (deflated 29%)
  adding: content/out/emotion/gpt2/checkpoint-1000/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/checkpoint-1000/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/checkpoint-1000/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/checkpoint-1000/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-1000/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/README.md (deflated 54%)
  adding: content/out/emotion/gpt2/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-500/ (stored 0%)
  adding: content/out/emotion/gpt2/checkpoint-500/scheduler.pt (deflated 49%)
  adding: content/out/emotion/gpt2/checkpoint-500/rng_state.pth (deflated 28%)
  adding: content/out/emotion/gpt2/checkpoint-500/config.json (deflated 56%)
  adding: content/out/emotion/gpt2/checkpoint-500/tokenizer_config.json (deflated 41%)
  adding: content/out/emotion/gpt2/checkpoint-500/trainer_state.json (deflated 67%)
  adding: content/out/emotion/gpt2/checkpoint-500/tokenizer.json (deflated 72%)
  adding: content/out/emotion/gpt2/checkpoint-500/optimizer.pt (deflated 30%)
  adding: content/out/emotion/gpt2/checkpoint-500/training_args.bin (deflated 48%)
  adding: content/out/emotion/gpt2/checkpoint-500/special_tokens_map.json (deflated 60%)
  adding: content/out/emotion/gpt2/checkpoint-500/merges.txt (deflated 53%)
  adding: content/out/emotion/gpt2/checkpoint-500/vocab.json (deflated 59%)
  adding: content/out/emotion/gpt2/checkpoint-500/pytorch_model.bin (deflated 9%)
  adding: content/out/emotion/gpt2/pytorch_model.bin


zip error: Interrupted (aborting)