nohup: zignorowane dane wejściowe Using SCRIPTS_ROOTDIR: /home/kasia/mosesdecoder/scripts Using single-thread GIZA using gzip (1) preparing corpus @ Wed Feb 17 14:41:26 CET 2021 Executing: mkdir -p /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus (1.0) selecting factors @ Wed Feb 17 14:41:26 CET 2021 (1.1) running mkcls @ Wed Feb 17 14:41:26 CET 2021 /home/kasia/mosesdecoder/tools/mkcls -c50 -n2 -p/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.pl -V/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb.classes opt /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb.classes already in place, reusing (1.1) running mkcls @ Wed Feb 17 14:41:26 CET 2021 /home/kasia/mosesdecoder/tools/mkcls -c50 -n2 -p/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.en -V/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb.classes opt /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb.classes already in place, reusing (1.2) creating vcb file /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb @ Wed Feb 17 14:41:26 CET 2021 line 17000 line 18000 line 19000 line 20000 line 21000 line 22000 line 23000 line 24000 END. (2.1b) running giza pl-en @ Wed Feb 17 14:42:19 CET 2021 /home/kasia/mosesdecoder/tools/GIZA++ -CoocurrenceFile /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.pl-en/pl-en.cooc -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.pl-en/pl-en -onlyaldumps 1 -p0 0.999 -s /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb -t /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.pl-en/pl-en.A3.final.gz seems finished, reusing. (2.1a) running snt2cooc en-pl @ Wed Feb 17 14:42:19 CET 2021 Executing: mkdir -p /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl Executing: /home/kasia/mosesdecoder/tools/snt2cooc.out /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc /home/kasia/mosesdecoder/tools/snt2cooc.out /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc line 1000 line 2000 line 3000 line 4000 line 5000 line 6000 line 7000 line 8000 line 9000 line 10000 line 11000 line 12000 line 13000 line 14000 line 15000 line 16000 line 17000 line 18000 line 19000 line 20000 line 21000 line 22000 line 23000 line 24000 END. (2.1b) running giza en-pl @ Wed Feb 17 14:43:29 CET 2021 /home/kasia/mosesdecoder/tools/GIZA++ -CoocurrenceFile /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl -onlyaldumps 1 -p0 0.999 -s /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb -t /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb Executing: /home/kasia/mosesdecoder/tools/GIZA++ -CoocurrenceFile /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl -onlyaldumps 1 -p0 0.999 -s /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb -t /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb /home/kasia/mosesdecoder/tools/GIZA++ -CoocurrenceFile /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl -onlyaldumps 1 -p0 0.999 -s /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb -t /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb Parameter 'coocurrencefile' changed from '' to '/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.cooc' Parameter 'c' changed from '' to '/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt' Parameter 'm3' changed from '5' to '3' Parameter 'm4' changed from '5' to '3' Parameter 'model1dumpfrequency' changed from '0' to '1' Parameter 'model4smoothfactor' changed from '0.2' to '0.4' Parameter 'nodumps' changed from '0' to '1' Parameter 'nsmooth' changed from '64' to '4' Parameter 'o' changed from '2021-02-17.144329.kasia' to '/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl' Parameter 'onlyaldumps' changed from '0' to '1' Parameter 'p0' changed from '-1' to '0.999' Parameter 's' changed from '' to '/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb' Parameter 't' changed from '' to '/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb' general parameters: ------------------- ml = 101 (maximum sentence length) No. of iterations: ------------------- hmmiterations = 5 (mh) model1iterations = 5 (number of iterations for Model 1) model2iterations = 0 (number of iterations for Model 2) model3iterations = 3 (number of iterations for Model 3) model4iterations = 3 (number of iterations for Model 4) model5iterations = 0 (number of iterations for Model 5) model6iterations = 0 (number of iterations for Model 6) parameter for various heuristics in GIZA++ for efficient training: ------------------------------------------------------------------ countincreasecutoff = 1e-06 (Counts increment cutoff threshold) countincreasecutoffal = 1e-05 (Counts increment cutoff threshold for alignments in training of fertility models) mincountincrease = 1e-07 (minimal count increase) peggedcutoff = 0.03 (relative cutoff probability for alignment-centers in pegging) probcutoff = 1e-07 (Probability cutoff threshold for lexicon probabilities) probsmooth = 1e-07 (probability smoothing (floor) value ) parameters for describing the type and amount of output: ----------------------------------------------------------- compactalignmentformat = 0 (0: detailled alignment format, 1: compact alignment format ) hmmdumpfrequency = 0 (dump frequency of HMM) l = 2021-02-17.144329.kasia.log (log file name) log = 0 (0: no logfile; 1: logfile) model1dumpfrequency = 1 (dump frequency of Model 1) model2dumpfrequency = 0 (dump frequency of Model 2) model345dumpfrequency = 0 (dump frequency of Model 3/4/5) nbestalignments = 0 (for printing the n best alignments) nodumps = 1 (1: do not write any files) o = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl (output file prefix) onlyaldumps = 1 (1: do not write any files) outputpath = (output path) transferdumpfrequency = 0 (output: dump of transfer from Model 2 to 3) verbose = 0 (0: not verbose; 1: verbose) verbosesentence = -10 (number of sentence for which a lot of information should be printed (negative: no output)) parameters describing input files: ---------------------------------- c = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt (training corpus file name) d = (dictionary file name) s = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb (source vocabulary file name) t = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb (target vocabulary file name) tc = (test corpus file name) smoothing parameters: --------------------- emalsmooth = 0.2 (f-b-trn: smoothing factor for HMM alignment model (can be ignored by -emSmoothHMM)) model23smoothfactor = 0 (smoothing parameter for IBM-2/3 (interpolation with constant)) model4smoothfactor = 0.4 (smooting parameter for alignment probabilities in Model 4) model5smoothfactor = 0.1 (smooting parameter for distortion probabilities in Model 5 (linear interpolation with constant)) nsmooth = 4 (smoothing for fertility parameters (good value: 64): weight for wordlength-dependent fertility parameters) nsmoothgeneral = 0 (smoothing for fertility parameters (default: 0): weight for word-independent fertility parameters) parameters modifying the models: -------------------------------- compactadtable = 1 (1: only 3-dimensional alignment table for IBM-2 and IBM-3) deficientdistortionforemptyword = 0 (0: IBM-3/IBM-4 as described in (Brown et al. 1993); 1: distortion model of empty word is deficient; 2: distoriton model of empty word is deficient (differently); setting this parameter also helps to avoid that during IBM-3 and IBM-4 training too many words are aligned with the empty word) depm4 = 76 (d_{=1}: &1:l, &2:m, &4:F, &8:E, d_{>1}&16:l, &32:m, &64:F, &128:E) depm5 = 68 (d_{=1}: &1:l, &2:m, &4:F, &8:E, d_{>1}&16:l, &32:m, &64:F, &128:E) emalignmentdependencies = 2 (lextrain: dependencies in the HMM alignment model. &1: sentence length; &2: previous class; &4: previous position; &8: French position; &16: French class) emprobforempty = 0.4 (f-b-trn: probability for empty word) parameters modifying the EM-algorithm: -------------------------------------- m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it is determined in training)) manlexfactor1 = 0 () manlexfactor2 = 0 () manlexmaxmultiplicity = 20 () maxfertility = 10 (maximal fertility for fertility models) p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then it is determined in training)) pegging = 0 (0: no pegging; 1: do pegging) general parameters: ------------------- ml = 101 (maximum sentence length) No. of iterations: ------------------- hmmiterations = 5 (mh) model1iterations = 5 (number of iterations for Model 1) model2iterations = 0 (number of iterations for Model 2) model3iterations = 3 (number of iterations for Model 3) model4iterations = 3 (number of iterations for Model 4) model5iterations = 0 (number of iterations for Model 5) model6iterations = 0 (number of iterations for Model 6) parameter for various heuristics in GIZA++ for efficient training: ------------------------------------------------------------------ countincreasecutoff = 1e-06 (Counts increment cutoff threshold) countincreasecutoffal = 1e-05 (Counts increment cutoff threshold for alignments in training of fertility models) mincountincrease = 1e-07 (minimal count increase) peggedcutoff = 0.03 (relative cutoff probability for alignment-centers in pegging) probcutoff = 1e-07 (Probability cutoff threshold for lexicon probabilities) probsmooth = 1e-07 (probability smoothing (floor) value ) parameters for describing the type and amount of output: ----------------------------------------------------------- compactalignmentformat = 0 (0: detailled alignment format, 1: compact alignment format ) hmmdumpfrequency = 0 (dump frequency of HMM) l = 2021-02-17.144329.kasia.log (log file name) log = 0 (0: no logfile; 1: logfile) model1dumpfrequency = 1 (dump frequency of Model 1) model2dumpfrequency = 0 (dump frequency of Model 2) model345dumpfrequency = 0 (dump frequency of Model 3/4/5) nbestalignments = 0 (for printing the n best alignments) nodumps = 1 (1: do not write any files) o = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl (output file prefix) onlyaldumps = 1 (1: do not write any files) outputpath = (output path) transferdumpfrequency = 0 (output: dump of transfer from Model 2 to 3) verbose = 0 (0: not verbose; 1: verbose) verbosesentence = -10 (number of sentence for which a lot of information should be printed (negative: no output)) parameters describing input files: ---------------------------------- c = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt (training corpus file name) d = (dictionary file name) s = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb (source vocabulary file name) t = /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb (target vocabulary file name) tc = (test corpus file name) smoothing parameters: --------------------- emalsmooth = 0.2 (f-b-trn: smoothing factor for HMM alignment model (can be ignored by -emSmoothHMM)) model23smoothfactor = 0 (smoothing parameter for IBM-2/3 (interpolation with constant)) model4smoothfactor = 0.4 (smooting parameter for alignment probabilities in Model 4) model5smoothfactor = 0.1 (smooting parameter for distortion probabilities in Model 5 (linear interpolation with constant)) nsmooth = 4 (smoothing for fertility parameters (good value: 64): weight for wordlength-dependent fertility parameters) nsmoothgeneral = 0 (smoothing for fertility parameters (default: 0): weight for word-independent fertility parameters) parameters modifying the models: -------------------------------- compactadtable = 1 (1: only 3-dimensional alignment table for IBM-2 and IBM-3) deficientdistortionforemptyword = 0 (0: IBM-3/IBM-4 as described in (Brown et al. 1993); 1: distortion model of empty word is deficient; 2: distoriton model of empty word is deficient (differently); setting this parameter also helps to avoid that during IBM-3 and IBM-4 training too many words are aligned with the empty word) depm4 = 76 (d_{=1}: &1:l, &2:m, &4:F, &8:E, d_{>1}&16:l, &32:m, &64:F, &128:E) depm5 = 68 (d_{=1}: &1:l, &2:m, &4:F, &8:E, d_{>1}&16:l, &32:m, &64:F, &128:E) emalignmentdependencies = 2 (lextrain: dependencies in the HMM alignment model. &1: sentence length; &2: previous class; &4: previous position; &8: French position; &16: French class) emprobforempty = 0.4 (f-b-trn: probability for empty word) parameters modifying the EM-algorithm: -------------------------------------- m5p0 = -1 (fixed value for parameter p_0 in IBM-5 (if negative then it is determined in training)) manlexfactor1 = 0 () manlexfactor2 = 0 () manlexmaxmultiplicity = 20 () maxfertility = 10 (maximal fertility for fertility models) p0 = 0.999 (fixed value for parameter p_0 in IBM-3/4 (if negative then it is determined in training)) pegging = 0 (0: no pegging; 1: do pegging) reading vocabulary files Reading vocabulary file from:/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/pl.vcb Reading vocabulary file from:/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en.vcb Source vocabulary list has 44236 unique tokens Target vocabulary list has 19029 unique tokens Calculating vocabulary frequencies from corpus /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/corpus/en-pl-int-train.snt Reading more sentence pairs into memory ... Corpus fits in memory, corpus has: 24744 sentence pairs. Train total # sentence pairs (weighted): 24744 Size of source portion of the training corpus: 564554 tokens Size of the target portion of the training corpus: 654662 tokens In source portion of the training corpus, only 44235 unique tokens appeared In target portion of the training corpus, only 19027 unique tokens appeared lambda for PP calculation in IBM-1,IBM-2,HMM:= 654662/(589298-24744)== 1.15961 There are 5299035 5299035 entries in table ========================================================== Model1 Training Started at: Wed Feb 17 14:43:51 2021 ----------- Model1: Iteration 1 Model1: (1) TRAIN CROSS-ENTROPY 14.3728 PERPLEXITY 21214.8 Model1: (1) VITERBI TRAIN CROSS-ENTROPY inf PERPLEXITY inf Model 1 Iteration: 1 took: 42 seconds ----------- Model1: Iteration 2 Model1: (2) TRAIN CROSS-ENTROPY 6.99257 PERPLEXITY 127.343 Model1: (2) VITERBI TRAIN CROSS-ENTROPY 9.63962 PERPLEXITY 797.652 Model 1 Iteration: 2 took: 30 seconds ----------- Model1: Iteration 3 Model1: (3) TRAIN CROSS-ENTROPY 6.21065 PERPLEXITY 74.0614 Model1: (3) VITERBI TRAIN CROSS-ENTROPY 8.25816 PERPLEXITY 306.163 Model 1 Iteration: 3 took: 34 seconds ----------- Model1: Iteration 4 Model1: (4) TRAIN CROSS-ENTROPY 5.86592 PERPLEXITY 58.3199 Model1: (4) VITERBI TRAIN CROSS-ENTROPY 7.5277 PERPLEXITY 184.528 Model 1 Iteration: 4 took: 36 seconds ----------- Model1: Iteration 5 Model1: (5) TRAIN CROSS-ENTROPY 5.72968 PERPLEXITY 53.0646 Model1: (5) VITERBI TRAIN CROSS-ENTROPY 7.16976 PERPLEXITY 143.983 Model 1 Iteration: 5 took: 36 seconds Entire Model1 Training took: 178 seconds NOTE: I am doing iterations with the HMM model! Read classes: #words: 44235 #classes: 51 Read classes: #words: 19028 #classes: 51 ========================================================== Hmm Training Started at: Wed Feb 17 14:46:51 2021 ----------- Hmm: Iteration 1 A/D table contains 224673 parameters. Hmm: (1) TRAIN CROSS-ENTROPY 5.55273 PERPLEXITY 46.9394 Hmm: (1) VITERBI TRAIN CROSS-ENTROPY 6.96819 PERPLEXITY 125.209 Hmm Iteration: 1 took: 349 seconds ----------- Hmm: Iteration 2 A/D table contains 224673 parameters. Hmm: (2) TRAIN CROSS-ENTROPY 5.2041 PERPLEXITY 36.863 Hmm: (2) VITERBI TRAIN CROSS-ENTROPY 5.8443 PERPLEXITY 57.4528 Hmm Iteration: 2 took: 326 seconds ----------- Hmm: Iteration 3 A/D table contains 224673 parameters. Hmm: (3) TRAIN CROSS-ENTROPY 4.65738 PERPLEXITY 25.2355 Hmm: (3) VITERBI TRAIN CROSS-ENTROPY 5.01702 PERPLEXITY 32.3796 Hmm Iteration: 3 took: 310 seconds ----------- Hmm: Iteration 4 A/D table contains 224673 parameters. Hmm: (4) TRAIN CROSS-ENTROPY 4.34099 PERPLEXITY 20.266 Hmm: (4) VITERBI TRAIN CROSS-ENTROPY 4.58726 PERPLEXITY 24.0382 Hmm Iteration: 4 took: 357 seconds ----------- Hmm: Iteration 5 A/D table contains 224673 parameters. Hmm: (5) TRAIN CROSS-ENTROPY 4.19589 PERPLEXITY 18.3268 Hmm: (5) VITERBI TRAIN CROSS-ENTROPY 4.39346 PERPLEXITY 21.0166 Hmm Iteration: 5 took: 384 seconds Entire Hmm Training took: 1726 seconds ========================================================== Read classes: #words: 44235 #classes: 51 Read classes: #words: 19028 #classes: 51 Read classes: #words: 44235 #classes: 51 Read classes: #words: 19028 #classes: 51 ========================================================== Starting H333444: Viterbi Training H333444 Training Started at: Wed Feb 17 15:15:40 2021 --------------------- THTo3: Iteration 1 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1196.14 #alsophisticatedcountcollection: 0 #hcsteps: 0 #peggingImprovements: 0 A/D table contains 224673 parameters. A/D table contains 200260 parameters. NTable contains 442360 parameter. p0_count is 481744 and p1 is 86458.7; p0 is 0.999 p1: 0.001 THTo3: TRAIN CROSS-ENTROPY 4.07617 PERPLEXITY 16.8675 THTo3: (1) TRAIN VITERBI CROSS-ENTROPY 4.1553 PERPLEXITY 17.8185 THTo3 Viterbi Iteration : 1 took: 316 seconds --------------------- Model3: Iteration 2 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1202.69 #alsophisticatedcountcollection: 0 #hcsteps: 4.44475 #peggingImprovements: 0 A/D table contains 224673 parameters. A/D table contains 200260 parameters. NTable contains 442360 parameter. p0_count is 578971 and p1 is 37845.7; p0 is 0.999 p1: 0.001 Model3: TRAIN CROSS-ENTROPY 5.90798 PERPLEXITY 60.0455 Model3: (2) TRAIN VITERBI CROSS-ENTROPY 5.99155 PERPLEXITY 63.626 Model3 Viterbi Iteration : 2 took: 227 seconds --------------------- Model3: Iteration 3 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1203.15 #alsophisticatedcountcollection: 0 #hcsteps: 4.5232 #peggingImprovements: 0 A/D table contains 224673 parameters. A/D table contains 200260 parameters. NTable contains 442360 parameter. p0_count is 602707 and p1 is 25977.5; p0 is 0.999 p1: 0.001 Model3: TRAIN CROSS-ENTROPY 5.70902 PERPLEXITY 52.3104 Model3: (3) TRAIN VITERBI CROSS-ENTROPY 5.78006 PERPLEXITY 54.9506 Model3 Viterbi Iteration : 3 took: 121 seconds --------------------- T3To4: Iteration 4 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1203.25 #alsophisticatedcountcollection: 68.6223 #hcsteps: 4.50291 #peggingImprovements: 0 D4 table contains 527191 parameters. A/D table contains 224673 parameters. A/D table contains 200260 parameters. NTable contains 442360 parameter. p0_count is 610028 and p1 is 22316.8; p0 is 0.999 p1: 0.001 T3To4: TRAIN CROSS-ENTROPY 5.65874 PERPLEXITY 50.5185 T3To4: (4) TRAIN VITERBI CROSS-ENTROPY 5.72488 PERPLEXITY 52.8884 T3To4 Viterbi Iteration : 4 took: 118 seconds --------------------- Model4: Iteration 5 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1203.02 #alsophisticatedcountcollection: 54.8872 #hcsteps: 3.70651 #peggingImprovements: 0 D4 table contains 527191 parameters. A/D table contains 224673 parameters. A/D table contains 200565 parameters. NTable contains 442360 parameter. p0_count is 599062 and p1 is 27799.8; p0 is 0.999 p1: 0.001 Model4: TRAIN CROSS-ENTROPY 5.26987 PERPLEXITY 38.5824 Model4: (5) TRAIN VITERBI CROSS-ENTROPY 5.31993 PERPLEXITY 39.9445 Model4 Viterbi Iteration : 5 took: 370 seconds --------------------- Model4: Iteration 6 10000 20000 #centers(pre/hillclimbed/real): 1 1 1 #al: 1202.95 #alsophisticatedcountcollection: 47.6773 #hcsteps: 3.65204 #peggingImprovements: 0 D4 table contains 527191 parameters. A/D table contains 224673 parameters. A/D table contains 200565 parameters. NTable contains 442360 parameter. p0_count is 599681 and p1 is 27490.4; p0 is 0.999 p1: 0.001 Model4: TRAIN CROSS-ENTROPY 5.08712 PERPLEXITY 33.9919 Model4: (6) TRAIN VITERBI CROSS-ENTROPY 5.13153 PERPLEXITY 35.0545 Model4 Viterbi Iteration : 6 took: 362 seconds H333444 Training Finished at: Wed Feb 17 15:40:54 2021 Entire Viterbi H333444 Training took: 1514 seconds ========================================================== Entire Training took: 3445 seconds Program Finished at: Wed Feb 17 15:40:54 2021 ========================================================== Executing: rm -f /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.A3.final.gz Executing: gzip /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.A3.final (3) generate word alignment @ Wed Feb 17 15:41:01 CET 2021 Combining forward and inverted alignment from files: /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.pl-en/pl-en.A3.final.{bz2,gz} /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.A3.final.{bz2,gz} Executing: mkdir -p /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model Executing: /home/kasia/mosesdecoder/scripts/training/giza2bal.pl -d "gzip -cd /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.en-pl/en-pl.A3.final.gz" -i "gzip -cd /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/giza.pl-en/pl-en.A3.final.gz" |/home/kasia/mosesdecoder/scripts/../bin/symal -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/aligned.grow-diag-final-and symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) skip=<0> counts=<24744> (4) generate lexical translation table 0-0 @ Wed Feb 17 15:41:23 CET 2021 (/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.pl,/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.en,/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex) reusing: /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.f2e and /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.e2f (5) extract phrases @ Wed Feb 17 15:41:23 CET 2021 /home/kasia/mosesdecoder/scripts/generic/extract-parallel.perl 8 split "sort " /home/kasia/mosesdecoder/scripts/../bin/extract /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.en /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.pl /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/aligned.grow-diag-final-and /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract 7 orientation --model wbe-msd --GZOutput Executing: /home/kasia/mosesdecoder/scripts/generic/extract-parallel.perl 8 split "sort " /home/kasia/mosesdecoder/scripts/../bin/extract /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.en /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.pl /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/aligned.grow-diag-final-and /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract 7 orientation --model wbe-msd --GZOutput MAX 7 1 0 Started Wed Feb 17 15:41:23 2021 using gzip isBSDSplit=0 Executing: mkdir -p /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994; ls -l /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994 total=24744 line-per-split=3094 split -d -l 3094 -a 7 /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.en /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/target.split -d -l 3094 -a 7 /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/corpus/train25k.clean.pl /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/source.split -d -l 3094 -a 7 /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/aligned.grow-diag-final-and /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/align.merging extract / extract.inv gunzip -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000000.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000001.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000002.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000003.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000004.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000005.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000006.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000007.gz | LC_ALL=C sort -T /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994 2>> /dev/stderr | gzip -c > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.sorted.gz 2>> /dev/stderr gunzip -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000000.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000001.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000002.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000003.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000004.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000005.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000006.inv.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000007.inv.gz | LC_ALL=C sort -T /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994 2>> /dev/stderr | gzip -c > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.inv.sorted.gz 2>> /dev/stderr gunzip -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000000.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000001.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000002.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000003.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000004.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000005.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000006.o.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994/extract.0000007.o.gz | LC_ALL=C sort -T /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.12994 2>> /dev/stderr | gzip -c > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.o.sorted.gz 2>> /dev/stderr Finished Wed Feb 17 15:42:43 2021 (6) score phrases @ Wed Feb 17 15:42:43 CET 2021 (6.1) creating table half /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.f2e @ Wed Feb 17 15:42:43 CET 2021 /home/kasia/mosesdecoder/scripts/generic/score-parallel.perl 8 "sort " /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.sorted.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.f2e /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.f2e.gz 0 Executing: /home/kasia/mosesdecoder/scripts/generic/score-parallel.perl 8 "sort " /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.sorted.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.f2e /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.f2e.gz 0 using gzip Started Wed Feb 17 15:42:43 2021 /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/extract.0.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.f2e /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/phrase-table.half.0000000.gz 2>> /dev/stderr /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.0.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.1.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.2.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.3.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.4.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.5.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.6.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/run.7.shmv /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112/phrase-table.half.0000000.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.f2e.gzrm -rf /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13112 Finished Wed Feb 17 15:46:08 2021 (6.3) creating table half /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.e2f @ Wed Feb 17 15:46:08 CET 2021 /home/kasia/mosesdecoder/scripts/generic/score-parallel.perl 8 "sort " /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.inv.sorted.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.e2f /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.e2f.gz --Inverse 1 Executing: /home/kasia/mosesdecoder/scripts/generic/score-parallel.perl 8 "sort " /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.inv.sorted.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.e2f /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.e2f.gz --Inverse 1 using gzip Started Wed Feb 17 15:46:08 2021 /home/kasia/mosesdecoder/scripts/../bin/score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/extract.0.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.e2f /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/phrase-table.half.0000000.gz --Inverse 2>> /dev/stderr /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.0.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.1.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.2.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.3.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.4.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.5.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.7.sh/home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/run.6.shgunzip -c /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381/phrase-table.half.*.gz 2>> /dev/stderr| LC_ALL=C sort -T /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381 | gzip -c > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.e2f.gz 2>> /dev/stderr rm -rf /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/tmp.13381 Finished Wed Feb 17 15:50:05 2021 (6.6) consolidating the two halves @ Wed Feb 17 15:50:05 CET 2021 Executing: /home/kasia/mosesdecoder/scripts/../bin/consolidate /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.f2e.gz /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.e2f.gz /dev/stdout | gzip -c > /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.gz Consolidate v2.0 written by Philipp Koehn consolidating direct and indirect rule tables ................. Executing: rm -f /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/phrase-table.half.* (7) learn reordering model @ Wed Feb 17 15:51:14 CET 2021 (7.1) [no factors] learn reordering model @ Wed Feb 17 15:51:14 CET 2021 (7.2) building tables @ Wed Feb 17 15:51:14 CET 2021 Executing: /home/kasia/mosesdecoder/scripts/../bin/lexical-reordering-score /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/extract.o.sorted.gz 0.5 /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/reordering-table. --model "wbe msd wbe-msd-bidirectional-fe" Lexical Reordering Scorer scores lexical reordering models of several types (hierarchical, phrase-based and word-based-extraction (8) learn generation model @ Wed Feb 17 15:52:27 CET 2021 no generation model requested, skipping step (9) create moses.ini @ Wed Feb 17 15:52:27 CET 2021 ng Scorer scores lexical reordering models of several types (hierarchical, phrase-based and word-based-extraction (8) learn generation model @ Wed Feb 17 15:35:45 CET 2021 no generation model requested, skipping step (9) create moses.ini @ Wed Feb 17 15:35:45 CET 2021 PhraseExtract v1.5, written by Philipp Koehn et al.PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus PhraseExtract v1.5, written by Philipp Koehn et al. phrase extraction from an aligned parallel corpus .. Score v2.1 -- scoring methods for extracted rules Loading lexical translation table from /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.f2e. ...................... Score v2.1 -- scoring methods for extracted rules using inverse mode Loading lexical translation table from /home/kasia/Pulpit/TAU/wmt-2020-pl-en/train/working/train/model/lex.e2f. ......................