uzupelnione configi

oprawiona obsluga opcji weight w gue
tre znajduje swoja biblioteke (nie znajdywal wczesniej)



git-svn-id: svn://atos.wmid.amu.edu.pl/utt@51 e293616e-ec6a-49c2-aa92-f4a8b91c5d16
This commit is contained in:
obrebski 2008-05-15 19:55:02 +00:00
parent 6ac84d8bba
commit 19760efd7b
7 changed files with 41 additions and 36 deletions

View File

@ -1,6 +1,7 @@
BARDZO WAZNE: BARDZO WAZNE:
* przemyslec sposob wybierania jezyka / slownika po zainstalowaniu roznych dystrybucji [PK, TO] * przemyslec sposob wybierania jezyka / slownika po zainstalowaniu roznych dystrybucji [PK, TO]
* gue nie sortuje wynikow, opcja weights dziala na odwrot * gue nie sortuje wynikow, opcja weights dziala na odwrot
* kor nie wykonuje zamian <jednalitera> -> <dwielitery>, np. ż rz
WAZNE: WAZNE:
* zamienic kota na lepszego (Kubis) [TO] * zamienic kota na lepszego (Kubis) [TO]

View File

@ -13,3 +13,4 @@
dictionary-home = PATH_PREFIX/share/utt dictionary-home = PATH_PREFIX/share/utt
weights = PATH_PREFIX/share/utt/weights.kor weights = PATH_PREFIX/share/utt/weights.kor
threshold = 1.0 threshold = 1.0
process=W

View File

@ -12,3 +12,4 @@
# #
macros = PATH_PREFIX/lib/utt/terms.m4 macros = PATH_PREFIX/lib/utt/terms.m4
flex-template = PATH_PREFIX/lib/utt/ser.l.template flex-template = PATH_PREFIX/lib/utt/ser.l.template
tags=uam

View File

@ -10,7 +10,7 @@
@copying @copying
This manual is for UAM Text Tools (version 0.90, November, 2007) This manual is for UAM Text Tools (version 0.90, November, 2007)
Copyright @copyright{} 2005, 2007 Tomasz Obrębski, Michał Stolarski, Justyna Walkowska, Paweł Konieczka. Copyright @copyright{} 2005, 2007 Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
Permission is granted to copy, distribute and/or modify this document Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 under the terms of the GNU Free Documentation License, Version 1.2
@ -127,6 +127,7 @@ List of contributors:
@item Michal Stolarski @item Michal Stolarski
@item Marcin Walas @item Marcin Walas
@item Justyna Walkowska @item Justyna Walkowska
@item Paweł Wereński
@end itemize @end itemize
@c ---------------------------------------------------------------------- @c ----------------------------------------------------------------------
@ -248,7 +249,7 @@ sentence: @samp{Piszemy dobre progrumy.}
@example @example
0000 00 BOS * 0000 00 BOS *
0000 07 W Piszemy lem:pisać,V 0000 07 W Piszemy lem:pisaæ,V
0007 01 S _ 0007 01 S _
0008 05 W dobre lem:dobry,ADJ 0008 05 W dobre lem:dobry,ADJ
0013 01 S _ 0013 01 S _
@ -259,7 +260,7 @@ sentence: @samp{Piszemy dobre progrumy.}
0024 00 BOS * 0024 00 BOS *
0024 11 W Warszawiacy lem:Warszawiak,N 0024 11 W Warszawiacy lem:Warszawiak,N
0035 01 S _ 0035 01 S _
0036 03 W też 0036 03 W te¿
0039 01 P . 0039 01 P .
0040 00 EOS * 0040 00 EOS *
@ -267,7 +268,7 @@ sentence: @samp{Piszemy dobre progrumy.}
@example @example
0000 BOS * 0000 BOS *
0000 W Piszemy lem:pisać,V 0000 W Piszemy lem:pisaæ,V
0007 S _ 0007 S _
0008 W dobre lem:dobry,ADJ 0008 W dobre lem:dobry,ADJ
0013 S _ 0013 S _
@ -280,7 +281,7 @@ Posion information may be provided only for some types of segments:
@example @example
0000 BOS * 0000 BOS *
W Piszemy lem:pisać,V W Piszemy lem:pisaæ,V
S _ S _
W dobre lem:dobry,ADJ W dobre lem:dobry,ADJ
S _ S _
@ -291,7 +292,7 @@ S _
0024 BOS * 0024 BOS *
W Warszawiacy lem:Warszawiak,N W Warszawiacy lem:Warszawiak,N
S _ S _
W też W te¿
P . P .
EOS * EOS *
@end example @end example
@ -405,7 +406,7 @@ as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
@c @end table @c @end table
@c [JAK UZYSKAĆ POLSKIE CZCIONKI W DVI???] @c [JAK UZYSKAÆ POLSKIE CZCIONKI W DVI???]
@macro parhelp @macro parhelp
@item @b{@minus{}@minus{}help}, @b{@minus{}h} @item @b{@minus{}@minus{}help}, @b{@minus{}h}
@ -718,7 +719,7 @@ Sinks: programs which read UTT data and produce output in another format
@c ---------------------------------------- @c ----------------------------------------
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab source @item @strong{Component category:} @tab source
@end multitable @end multitable
@ -820,7 +821,7 @@ Output:
@c @node sen - sentencizer @c @node sen - sentencizer
@c @chapter sen - sentencizer @c @chapter sen - sentencizer
@c Authors: Tomasz Obrębski @c Authors: Tomasz Obrêbski
@c --------------------------------------------------------------------- @c ---------------------------------------------------------------------
@c LEM @c LEM
@ -831,7 +832,7 @@ Output:
@section lem - morphological analyzer @section lem - morphological analyzer
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski, Michał Stolarski @item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -932,7 +933,7 @@ Input:
Output (default): Output (default):
@example @example
0000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
0007 01 B _ 0007 01 B _
0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn
0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn 0008 05 W dobre lem:dobry,ADJ/DpNsCnavGn
@ -947,7 +948,7 @@ Output (default):
Output (@option{--one-line} option): Output (@option{--one-line} option):
@example @example
0000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
0007 01 S _ 0007 01 S _
0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn lem:dobry,ADJ/DpNsCnavGn
0013 01 S _ 0013 01 S _
@ -959,7 +960,7 @@ Output (@option{--one-line} option):
Output (@option{--one-field} option): Output (@option{--one-field} option):
@example @example
0000 07 W Piszemy lem:pisać,V/AiVpMdTrfNpP1 0000 07 W Piszemy lem:pisaæ,V/AiVpMdTrfNpP1
0007 01 S _ 0007 01 S _
0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn 0008 05 W dobre lem:dobry,ADJ/DpNpCnavGaifn,ADJ/DpNsCnavGn
0013 01 S _ 0013 01 S _
@ -993,7 +994,7 @@ Dictionary entries have the following structure:
meaning: replace prefix of length @code{<cut1>} with meaning: replace prefix of length @code{<cut1>} with
string @code{<add1>}, replace suffix of length @code{<cut2>} with string string @code{<add1>}, replace suffix of length @code{<cut2>} with string
@code{<add2>}. For example @code{3t} transforms @samp{kocie} into @code{<add2>}. For example @code{3t} transforms @samp{kocie} into
@samp{kot}, @code{3-4ały} transforms @samp{najbielsi} into @samp{biały} @samp{kot}, @code{3-4a³y} transforms @samp{najbielsi} into @samp{bia³y}
Each dictionary entry must be written in one line and must not contain blank characters. Each dictionary entry must be written in one line and must not contain blank characters.
@ -1004,8 +1005,8 @@ kota;1,N/GaNsCg;1,N/GaNsCa
kotu;1,N/GaNsCd kotu;1,N/GaNsCd
kotem;2,N/GaNsCi kotem;2,N/GaNsCi
kocie;3t,N/GaNsCl;3t,N/GaNsCv kocie;3t,N/GaNsCl;3t,N/GaNsCv
najbielsi;3-4ały,ADJ/DsNpCnGp najbielsi;3-4a³y,ADJ/DsNpCnGp
najbielsze;3-5ały,ADJ/DsNpCnGaifn najbielsze;3-5a³y,ADJ/DsNpCnGaifn
najlepsi;dobry,ADJ/DsNpCnGp najlepsi;dobry,ADJ/DsNpCnGp
najlepsze;dobry,ADJ/DsNpCnGaifn najlepsze;dobry,ADJ/DsNpCnGaifn
@end example @end example
@ -1064,7 +1065,7 @@ located by default in:
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Michał Stolarski, Tomasz Obrębski @item @strong{Authors:} @tab Micha³ Stolarski, Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -1155,7 +1156,7 @@ string @var{add1}, replace suffix of length @var{cat2} with string
@var{add2}. @var{add2}.
Example: @code{3-4ały} transforms @i{najbielsi} into @i{biały} Example: @code{3-4a³y} transforms @i{najbielsi} into @i{bia³y}
@var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.). @var{description} contains the part of speech and morphosyntactic information (@xref{PMDBF dictionary}.).
@ -1164,8 +1165,8 @@ Example: @code{3-4a
likelihood of the guess. likelihood of the guess.
@example @example
*łkę;1a,N/GfNsCa *³kê;1a,N/GfNsCa
naj*elszy;3-4ały,ADJ/...:... naj*elszy;3-4a³y,ADJ/...:...
@end example @end example
@ -1178,7 +1179,7 @@ naj*elszy;3-4a
@section cor - spelling corrector @section cor - spelling corrector
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski, Michał Stolarski @item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -1247,7 +1248,7 @@ odludek
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -1267,7 +1268,7 @@ odludek
command: sen command: sen
input: input:
0000 05 W Cześć 0000 05 W Cze¶æ
0005 01 P ! 0005 01 P !
0006 01 S _ 0006 01 S _
0007 02 W To 0007 02 W To
@ -1278,7 +1279,7 @@ input:
output: output:
0000 00 BOS * 0000 00 BOS *
0000 05 W Cześć 0000 05 W Cze¶æ
0005 01 P ! 0005 01 P !
0006 00 EOS * 0006 00 EOS *
0006 00 BOS * 0006 00 BOS *
@ -1299,7 +1300,7 @@ output:
@c @node gph - graphizer @c @node gph - graphizer
@c @chapter gph - graphizer @c @chapter gph - graphizer
@c Authors: Tomasz Obrębski @c Authors: Tomasz Obrêbski
@ -1312,7 +1313,7 @@ output:
@section ser - pattern search tool @section ser - pattern search tool
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -1540,7 +1541,7 @@ installed in the system:
@section grp - pattern search tool @section grp - pattern search tool
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@ -1634,7 +1635,7 @@ lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR}
@node kot @node kot
@section kot - untokenizer @section kot - untokenizer
Authors: Tomasz Obrębski Authors: Tomasz Obrêbski
@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text. @command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
@ -1849,7 +1850,7 @@ termination of the program.
@section fla - the UTT file flattener @section fla - the UTT file flattener
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable
@c @c
@ -1888,7 +1889,7 @@ default, segments containing a field @code{BOS} are seeked.
@section unfla - the UTT file unflattener @section unfla - the UTT file unflattener
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa} @multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
@item @strong{Authors:} @tab Tomasz Obrębski @item @strong{Authors:} @tab Tomasz Obrêbski
@item @strong{Component category:} @tab filter @item @strong{Component category:} @tab filter
@end multitable @end multitable

View File

@ -1,6 +1,10 @@
#!/usr/bin/ruby -I /usr/local/lib/utt -I $HOME/.local/lib/utt #!/usr/bin/ruby -I /usr/local/lib/utt -I $HOME/.local/lib/utt
$: << "#{ENV['HOME']}/.local/lib/utt"
$: << "/usr/local/lib/utt"
require 'getoptlong' require 'getoptlong'
require 'seg.rb'
opts = GetoptLong.new( opts = GetoptLong.new(
[ '--help', '-h', GetoptLong::NO_ARGUMENT ], [ '--help', '-h', GetoptLong::NO_ARGUMENT ],
@ -60,9 +64,6 @@ if $INFO=='DEFAULT'
end end
end end
#require File.expand_path(File.dirname(__FILE__) + "../lib/utt/seg.rb")
require 'seg.rb'
$dgpsep=';' $dgpsep=';'
def tre(input) def tre(input)

View File

@ -7,6 +7,6 @@ option "cut-off" - "Do not display answers with less weight than cut-off" int d
option "dictionary-home" - "dh" string typestr="FILENAME" no hidden option "dictionary-home" - "dh" string typestr="FILENAME" no hidden
option "dictionary" d "File with dictionary information" string typestr="filename" default="gue.bin" no option "dictionary" d "File with dictionary information" string typestr="filename" default="gue.bin" no
option "per-info" v "Display performance information" flag off option "per-info" v "Display performance information" flag off
option "weights" w "Print weights" flag off hidden option "weights" w "Print weights" flag off
option "no-uppercase" - "Do not process form containing uppercase letters" flag off option "no-uppercase" - "Do not process form containing uppercase letters" flag off

View File

@ -7,7 +7,7 @@ double delta=0.1;
int cut_off=100; int cut_off=100;
char dictionary[255]; char dictionary[255];
bool per_info=false; bool per_info=false;
bool weights=true; bool weights=false;
void process_guess_options(gengetopt_args_info* args) void process_guess_options(gengetopt_args_info* args)
{ {
@ -55,6 +55,6 @@ void process_guess_options(gengetopt_args_info* args)
per_info=args->per_info_flag; per_info=args->per_info_flag;
if(args->weights_given) if(args->weights_given)
weights=false; weights=true;
} }