Update README.md

This commit is contained in:
Piotr Nawrot 2023-03-17 11:19:53 +01:00 committed by GitHub
parent 7dbfea19d2
commit 717796e8df
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -108,7 +108,7 @@ python -m nanoT5.main \
optim.lr_scheduler={legacy,cosine} optim.lr_scheduler={legacy,cosine}
``` ```
We recommend adding `model.compile=true` flag for pre-training, if you are able to install PyTorch 2.0. In our case it effects in 1.33x speedup. We recommend adding `model.compile=true` flag for pre-training, if you are able to install PyTorch 2.0. In our case it results in ~1.33x speedup.
Suppose you don't have access to a 80GB GPU. In that case, you can increase the number of gradient accumulation steps by `optim.grad_acc=steps`, In where `batch_size` has to be divisible by `steps`. Suppose you don't have access to a 80GB GPU. In that case, you can increase the number of gradient accumulation steps by `optim.grad_acc=steps`, In where `batch_size` has to be divisible by `steps`.