From 717796e8df4e21927788c6fcb042ec4f8a9141ba Mon Sep 17 00:00:00 2001 From: Piotr Nawrot Date: Fri, 17 Mar 2023 11:19:53 +0100 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e25578c..6756537 100644 --- a/README.md +++ b/README.md @@ -108,7 +108,7 @@ python -m nanoT5.main \ optim.lr_scheduler={legacy,cosine} ``` -We recommend adding `model.compile=true` flag for pre-training, if you are able to install PyTorch 2.0. In our case it effects in 1.33x speedup. +We recommend adding `model.compile=true` flag for pre-training, if you are able to install PyTorch 2.0. In our case it results in ~1.33x speedup. Suppose you don't have access to a 80GB GPU. In that case, you can increase the number of gradient accumulation steps by `optim.grad_acc=steps`, In where `batch_size` has to be divisible by `steps`.