w utt.texinfo
git-svn-id: svn://atos.wmid.amu.edu.pl/utt@60 e293616e-ec6a-49c2-aa92-f4a8b91c5d16
This commit is contained in:
parent
839a0d50e2
commit
261bf629fb
@ -8,15 +8,16 @@
|
||||
@c %**end of header
|
||||
|
||||
@copying
|
||||
This manual is for UAM Text Tools (version 0.90, November, 2007)
|
||||
This manual is for UAM Text Tools (version 0.90, October, 2008)
|
||||
|
||||
Copyright @copyright{} 2005, 2007 Tomasz Obrêbski, Micha³ Stolarski, Justyna Walkowska, Pawe³ Konieczka.
|
||||
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.2
|
||||
or any later version published by the Free Software Foundation;
|
||||
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
|
||||
Texts. A copy of the license is included in the section entitled GNU Free Documentation License,,GNU Free Documentation License.
|
||||
under the terms of the GNU Free Documentation License, Version 1.2 or
|
||||
any later version published by the Free Software Foundation; with no
|
||||
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
|
||||
copy of the license is included in the section entitled GNU Free
|
||||
Documentation License,,GNU Free Documentation License.
|
||||
|
||||
@c @quotation
|
||||
@c Permission is granted to ...
|
||||
@ -357,12 +358,33 @@ but not
|
||||
0005 02 W km
|
||||
@end example
|
||||
|
||||
because in the latter example the first segment (starting at position 0000, 2 characters long) ends at position @var{n}=0001 which is covered by the second segment and no segment starts at position @var{n+2}=0002.
|
||||
because in the latter example the first segment (starting at position
|
||||
0000, 2 characters long) ends at position @var{n}=0001 which is
|
||||
covered by the second segment and no segment starts at position
|
||||
@var{n+2}=0002.
|
||||
|
||||
|
||||
@section Flattened UTT file
|
||||
|
||||
A UTT file format has two variants: regular and flattend. The regular
|
||||
format was described above. In the flattened format some of the
|
||||
end-of-line characters are replaced with line-feed characters.
|
||||
|
||||
The flatten format is basically used to represent whole sentences as
|
||||
single lines of the input file (all intrasentential end-of-line
|
||||
characters are replaced with line-feed characters).
|
||||
|
||||
This technical trick permits to perform certain text
|
||||
processing operations on entire sentences with the use of such tools as
|
||||
@command{grep} (see @command{grp} component) or @command{sed} (see @command{mar} component).
|
||||
|
||||
The conversion between the two formats is performed by the tools:
|
||||
@command{fla} and @command{unfla}.
|
||||
|
||||
@section Character encoding
|
||||
|
||||
The UTT component programs accept only 1-byte character encoding, such
|
||||
as ISO, ANSI, DOS, UTF-8 (probably: not tested yet).
|
||||
as ISO, ANSI, DOS.
|
||||
|
||||
|
||||
@c @section Formats
|
||||
@ -525,99 +547,6 @@ This option is useful when working with @command{kot} or @command{con}.
|
||||
@end macro
|
||||
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@c @node Common command line options
|
||||
@c @chapter Common command line options
|
||||
|
||||
@c @table @code
|
||||
|
||||
@c @parhelp
|
||||
|
||||
@c @item @b{@minus{}@minus{}help}, @b{@minus{}h}
|
||||
@c Print help.
|
||||
|
||||
@c @item @b{@minus{}@minus{}version}, @b{@minus{}v}
|
||||
@c Print version information.
|
||||
|
||||
@c @item @b{@minus{}@minus{}file=@var{filename}, @minus{}f @var{filename}}
|
||||
@c Input file name.
|
||||
@c If this option is absent or equal to '@minus{}', the program
|
||||
@c reads from the standard input.
|
||||
|
||||
@c @item @b{@minus{}@minus{}output=@var{filename}, @minus{}o @var{filename}}
|
||||
@c Regular output file name. To regular output the program sends segments
|
||||
@c which it successfully processed and copies those which were not
|
||||
@c subject to processing. If this option is absent or equal to
|
||||
@c '@minus{}', standard output is used.
|
||||
|
||||
@c @item @b{@minus{}@minus{}fail=@var{filename}, @minus{}e @var{filename}}
|
||||
@c Fail output file name. To fail output the program copies the segments
|
||||
@c it failed to process. If this option is absent or equal to
|
||||
@c '@minus{}', standard output is used.
|
||||
|
||||
@c @item @b{@minus{}@minus{}only-fail}
|
||||
@c Discard segments which would normally be sent to regular
|
||||
@c output. Print only segments the program failed to process.
|
||||
|
||||
@c @item @b{@minus{}@minus{}no-fail}
|
||||
@c Discard segments the program failed to process.
|
||||
@c (This and the previous option are functionally equivalent to,
|
||||
@c respectively, @option{-o /dev/null} and @option{-e /dev/null}, but
|
||||
@c make the programs run faster.)
|
||||
|
||||
@c @item @b{@minus{}@minus{}input-field=@var{fieldname}, @minus{}I @var{fieldname}}
|
||||
@c The field containing the input to the program. The default is usually
|
||||
@c the @var{form} field (unless otherwise stated in the program
|
||||
@c description). The fields @var{position}, @var{length}, @var{tag}, and
|
||||
@c @var{form} are referred to as @code{1}, @code{2}, @code{3}, @code{4},
|
||||
@c respectively.
|
||||
|
||||
@c @item @b{@minus{}@minus{}output-field=@var{fieldname}, @minus{}O @var{fieldname}}
|
||||
@c The name of the field added by the program. The default is the name of
|
||||
@c the program.
|
||||
|
||||
@c @c @item @b{@minus{}@minus{}copy, @minus{}c}
|
||||
@c @c Copy processed segments to regular output.
|
||||
|
||||
@c @item @b{@minus{}@minus{}dictionary=@var{filename}, @minus{}d @var{filename}}
|
||||
@c Dictionary file name.
|
||||
@c (This option is used by programs which use dictionary data.)
|
||||
|
||||
@c @item @b{@minus{}@minus{}process=@var{tag}, @minus{}p @var{tag}}
|
||||
@c Process segments with the specified value in the @var{tag} field.
|
||||
@c Multiple occurences of this option are allowed and are interpreted as
|
||||
@c disjunction. If this option is absent, all segments are processed.
|
||||
|
||||
@c @item @b{@minus{}@minus{}select=@var{fieldname}, @minus{}s @var{fieldname}}
|
||||
@c Select for processing only segments in which the field named
|
||||
@c @var{fieldname} is present. Multiple occurences of this option are
|
||||
@c allowed and are interpreted as conjunction of conditions. If this
|
||||
@c option is absent, all segments are processed.
|
||||
|
||||
@c @item @b{@minus{}@minus{}unselect=@var{fieldname}, @minus{}S @var{fieldname}}
|
||||
@c Select for processing only segments in which the field @var{fieldname}
|
||||
@c is absent. Multiple occurences of this option are allowed and are
|
||||
@c interpreted as conjunction of conditions. If this option is absent,
|
||||
@c all segments are processed.
|
||||
|
||||
@c @item @b{@minus{}@minus{}interactive @minus{}i}
|
||||
@c This option toggles interactive mode, which is by default off. In the
|
||||
@c interactive mode the program does not buffer the output.
|
||||
|
||||
@c @item @b{@minus{}@minus{}config=@var{filename}}
|
||||
@c Read configuration from file @file{@var{filename}}.
|
||||
|
||||
@c @item @b{@minus{}@minus{}one @minus{}1}
|
||||
@c This option makes the program print ambiguous annotation in one output
|
||||
@c segment. By default when
|
||||
@c ambiguous new annotation is being produced for a segment, the segment
|
||||
@c is multiplicated and each of the annotations is added to separate copy
|
||||
@c of the segment.
|
||||
|
||||
@c @end table
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c CONFIGURATION FILES
|
||||
@c ---------------------------------------------------------------------
|
||||
@ -694,14 +623,16 @@ in UTT format
|
||||
* tok:: a tokenizer
|
||||
|
||||
Filters: programs which read and produce UTT-formatted data
|
||||
@c * sen - the sentencizer::
|
||||
* lem:: a morphological analyzer
|
||||
* gue:: a morphological guesser
|
||||
* cor:: a spelling corrector
|
||||
* cor:: a simple spelling corrector
|
||||
* kor:: a more elaborated spelling corrector
|
||||
* sen:: a sentensizer
|
||||
@c * gph - the graphizer::
|
||||
* ser:: a pattern search tool (marks matches)
|
||||
* mar:: a pattern search tool (introduces arbitrary markers into the text)
|
||||
* grp:: a pattern search tool (selects sentences containing a match)
|
||||
@c * gph:: a word-graph annotation tool::
|
||||
@c * dgp:: a dependency parser
|
||||
|
||||
Sinks: programs which read UTT data and produce output in another format
|
||||
* kot:: an untokenizer
|
||||
@ -721,6 +652,9 @@ Sinks: programs which read UTT data and produce output in another format
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab source
|
||||
@item @strong{Input format:} @tab raw text file
|
||||
@item @strong{Output format:} @tab UTT regular
|
||||
@item @strong{Required annotation:} @tab -
|
||||
@end multitable
|
||||
|
||||
|
||||
@ -834,6 +768,9 @@ Output:
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab UTT regular
|
||||
@item @strong{Required annotation:} @tab tok
|
||||
@end multitable
|
||||
|
||||
@menu
|
||||
@ -1031,28 +968,34 @@ A large-coverage morphological dictionary for Polish language, Polex/PMDBF, is i
|
||||
the distribution as the default @emph{lem}'s dictionary. It's
|
||||
located by default in:
|
||||
|
||||
@file{$HOME/.utt/pl/lem.bin}
|
||||
@file{$HOME/.local/share/utt/pl_PL.ISO-8859-2/lem.bin}
|
||||
|
||||
in local installation or in
|
||||
|
||||
@file{/usr/local/share/utt/pl_PL.ISO-8859-2/lem.bin}
|
||||
|
||||
in system installation.
|
||||
|
||||
@node lem hints
|
||||
@subsection Hints
|
||||
|
||||
@c @subsubheading Combining data from multiple dictionaries
|
||||
@subsubheading Combining data from multiple dictionaries
|
||||
|
||||
@c @itemize
|
||||
@itemize
|
||||
|
||||
@c @item Apply <dict1>, then apply <dict2> to words which were not annotatated.
|
||||
@item Apply <dict1>, then apply <dict2> to words which were not annotatated.
|
||||
|
||||
@c @example
|
||||
@c lem -d <dict1> | lem -S lem -d <dict2>
|
||||
@c @end example
|
||||
@example
|
||||
lem -d <dict1> | lem -S lem -d <dict2>
|
||||
@end example
|
||||
|
||||
@c @item Add annotations from two dictionaries <dict1> and <dict2>.
|
||||
@item Add annotations from two dictionaries <dict1> and <dict2>.
|
||||
|
||||
@c @example
|
||||
@c lem -c -d <dict1> | lem -S lem -d <dict2>
|
||||
@c @end example
|
||||
@example
|
||||
lem -c -d <dict1> | lem -S lem -d <dict2>
|
||||
@end example
|
||||
|
||||
@c @end itemize
|
||||
@end itemize
|
||||
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@ -1070,15 +1013,21 @@ located by default in:
|
||||
|
||||
@end multitable
|
||||
|
||||
@command{gue} guesess morphological descriptions of the form contained
|
||||
in the @var{form} field.
|
||||
|
||||
@menu
|
||||
* gue description::
|
||||
* gue command line options::
|
||||
* gue example::
|
||||
* gue dictionaries::
|
||||
@end menu
|
||||
|
||||
|
||||
@node gue description
|
||||
@subsection Description
|
||||
|
||||
@command{gue} guesess morphological descriptions of the form contained
|
||||
in the @var{form} field.
|
||||
|
||||
|
||||
@node gue command line options
|
||||
@subsection Command line options
|
||||
|
||||
@ -1181,24 +1130,27 @@ naj*elszy;3-4a³y,ADJ/...:...
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski, Micha³ Stolarski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab UTT regular
|
||||
@item @strong{Required annotation:} @tab tok
|
||||
@end multitable
|
||||
|
||||
@menu
|
||||
* cor description::
|
||||
* cor command line options::
|
||||
* cor dictionaries::
|
||||
@end menu
|
||||
|
||||
|
||||
@node cor description
|
||||
@subsection Description
|
||||
|
||||
The spelling corrector applies Kemal Oflazer's dynamic programming
|
||||
algorithm @cite{oflazer96} to the FSA representation of the set of
|
||||
word forms of the Polex/PMDBF dictionary. Given an incorrect
|
||||
word form it returns all word forms present in the dictionary whose
|
||||
edit distance is smaller than the threshold given as the parameter.
|
||||
|
||||
By default @code{cor} replaces the contents of the @var{form} field
|
||||
with new corrected value, placing the old contents in the @code{cor}
|
||||
field.
|
||||
|
||||
|
||||
@menu
|
||||
* cor command line options::
|
||||
* cor dictionaries::
|
||||
@end menu
|
||||
|
||||
|
||||
@node cor command line options
|
||||
@subsection Command line options
|
||||
@ -1224,6 +1176,10 @@ field.
|
||||
@item @b{@minus{}@minus{}distance=@var{int}, @minus{}n @var{int}}
|
||||
Maximum edit distance (default='1').
|
||||
|
||||
@c @item @b{@minus{}@minus{}replace, @minus{}r}
|
||||
@c Replace original form with corrected form, place original form in the
|
||||
@c cor field. This option has no effect in @option{--one-*} modes (default=off)
|
||||
|
||||
|
||||
@end table
|
||||
|
||||
@ -1242,6 +1198,29 @@ odlotowy
|
||||
odludek
|
||||
@end example
|
||||
|
||||
@subsubheading Binary format
|
||||
|
||||
The mandatory file name extension for a binary dictionary is @code{bin}. To
|
||||
compile a text dictionary into binary format, write:
|
||||
|
||||
@example
|
||||
compiledic <dictionaryname>.dic
|
||||
@end example
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c KOR
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@page
|
||||
@node kor
|
||||
@section kor - configurable spelling corrector
|
||||
|
||||
[TODO]
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c SEN
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@page
|
||||
@node sen
|
||||
@section sen - a sentensizer
|
||||
@ -1250,17 +1229,25 @@ odludek
|
||||
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab UTT regular
|
||||
@item @strong{Required annotation:} @tab tok
|
||||
|
||||
@end multitable
|
||||
|
||||
@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
|
||||
|
||||
@menu
|
||||
* sen description::
|
||||
@c * sen input::
|
||||
@c * sen output::
|
||||
* sen example::
|
||||
@end menu
|
||||
|
||||
@node sen description
|
||||
@subsection Description
|
||||
|
||||
@command{sen} detects sentence boundaries in UTT-formatted texts and marks them with special zero-length segments, in which the @var{type} field may contain the BOS (beginning of sentence) or EOS (end of sentence) annotation.
|
||||
|
||||
@node sen example
|
||||
@subsection Example
|
||||
|
||||
@ -1304,8 +1291,8 @@ output:
|
||||
|
||||
|
||||
|
||||
@c SER
|
||||
@c ---------------------------------------------------------------------
|
||||
@c SER
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@page
|
||||
@ -1315,11 +1302,13 @@ output:
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab UTT regular
|
||||
@item @strong{Required annotation:} @tab tok, lem --one-field
|
||||
@end multitable
|
||||
|
||||
@command{ser} looks for patterns in UTT-formatted texts.
|
||||
|
||||
@menu
|
||||
* ser description::
|
||||
* ser command line options::
|
||||
* ser pattern::
|
||||
* ser how ser works::
|
||||
@ -1329,6 +1318,12 @@ output:
|
||||
@end menu
|
||||
|
||||
|
||||
@node ser description
|
||||
@subsection Description
|
||||
|
||||
@command{ser} looks for patterns in UTT-formatted texts.
|
||||
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@node ser command line options
|
||||
@subsection Command line options
|
||||
@ -1503,7 +1498,7 @@ ocurrence of a relative pronoun
|
||||
@c All predefined terms correspond to single segments,
|
||||
|
||||
@example
|
||||
define(`verbseq', `(cat(V) (space cat(V)))')
|
||||
define(`verbseq', `(cat(<V>) (space cat(<V>)))')
|
||||
@end example
|
||||
|
||||
|
||||
@ -1514,7 +1509,7 @@ the term @code{cat()} may not be used as a ... of
|
||||
@node ser limitations
|
||||
@subsection Limitations
|
||||
|
||||
more than 3 attributes in <>.
|
||||
Do not use more than 3 attributes in <>.
|
||||
|
||||
@node ser requirements
|
||||
@subsection Requirements
|
||||
@ -1532,8 +1527,8 @@ installed in the system:
|
||||
@end itemize
|
||||
|
||||
|
||||
@c GRP
|
||||
@c ---------------------------------------------------------------------
|
||||
@c GRP
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@page
|
||||
@ -1543,9 +1538,23 @@ installed in the system:
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT flattened
|
||||
@item @strong{Output format:} @tab UTT flattened
|
||||
@item @strong{Required annotation:} @tab tok, sen, lem --one-field
|
||||
@end multitable
|
||||
|
||||
|
||||
@menu
|
||||
* grp description::
|
||||
* grp command line options::
|
||||
* grp pattern::
|
||||
* grp hints::
|
||||
@end menu
|
||||
|
||||
|
||||
@node grp description
|
||||
@subsection Description
|
||||
|
||||
@code{gre} selects sentences containing an expression matching a
|
||||
pattern. The pattern format is exactly the same as that accepted by
|
||||
@code{ser}.
|
||||
@ -1554,22 +1563,6 @@ pattern. The pattern format is exactly the same as that accepted by
|
||||
It is extremely fast (processing speed is usually higher then the speed
|
||||
of reading the corpus file from disk).
|
||||
|
||||
|
||||
|
||||
@c @menu
|
||||
@c * ser command line options::
|
||||
@c * ser pattern::
|
||||
@c * ser how ser works::
|
||||
@c * ser customization::
|
||||
@c * ser limitations::
|
||||
@c * ser requirements::
|
||||
@c @end menu
|
||||
@menu
|
||||
* grp command line options::
|
||||
* grp pattern::
|
||||
* grp hints::
|
||||
@end menu
|
||||
|
||||
@node grp command line options
|
||||
@subsection Command line options
|
||||
|
||||
@ -1577,10 +1570,6 @@ of reading the corpus file from disk).
|
||||
|
||||
@parhelp
|
||||
@parversion
|
||||
@c @parfile
|
||||
@c @paroutput
|
||||
@c @parinputfield
|
||||
@c @paroutputfield
|
||||
@parprocess
|
||||
@parinteractive
|
||||
|
||||
@ -1626,24 +1615,51 @@ lzop -cd corpus.grp.lzo | grp -a gP -e @var{EXPR} | ser -e @var{EXPR}
|
||||
@end example
|
||||
|
||||
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c kot
|
||||
@c MAR
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
@page
|
||||
@node mar
|
||||
@section mar
|
||||
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Marcin Walas, Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@end multitable
|
||||
|
||||
[TODO]
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c KOT
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
|
||||
@page
|
||||
@node kot
|
||||
@section kot - untokenizer
|
||||
|
||||
Authors: Tomasz Obrêbski
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Tomasz Obrêbski
|
||||
@item @strong{Component category:} @tab filter
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab text
|
||||
@item @strong{Required annotation:} @tab tok
|
||||
@end multitable
|
||||
|
||||
@command{kot} is the opposite of @command{tok}. It changes UTT-formatted text into plain text.
|
||||
|
||||
@menu
|
||||
* kot description::
|
||||
* kot command line options::
|
||||
* kot usage examples::
|
||||
@end menu
|
||||
|
||||
@node kot description
|
||||
@subsection Description
|
||||
|
||||
@command{kot} transforms a UTT formatted file back into raw text format.
|
||||
|
||||
@node kot command line options
|
||||
@subsection Command line options
|
||||
|
||||
@ -1683,28 +1699,38 @@ cat legia.txt | tok | kot
|
||||
cat legia.txt | tok | lem -1 | kot
|
||||
@end example
|
||||
|
||||
@c CON............................................................
|
||||
@c ...............................................................
|
||||
@c ...............................................................
|
||||
@c ---------------------------------------------------------------
|
||||
@c CON
|
||||
@c ---------------------------------------------------------------
|
||||
|
||||
|
||||
@page
|
||||
@node con
|
||||
@section con - concordance table generator
|
||||
|
||||
@command{con} generates a concordance table based on a pattern given to @command{ser}.
|
||||
|
||||
@multitable {aaaaaaaaaaaaaaaaaaaaaaaaa} {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa}
|
||||
@item @strong{Authors:} @tab Justyna Walkowska
|
||||
@item @strong{Component category:} @tab sink
|
||||
@item @strong{Input format:} @tab UTT regular
|
||||
@item @strong{Output format:} @tab text
|
||||
@item @strong{Required annotation:} @tab ser or mar
|
||||
@end multitable
|
||||
@c
|
||||
|
||||
@menu
|
||||
* con description::
|
||||
* con command line options::
|
||||
* con usage example::
|
||||
* con hints::
|
||||
@end menu
|
||||
|
||||
|
||||
@node con description
|
||||
@subsection Description
|
||||
|
||||
@command{con} generates a concordance table based on a pattern given to @command{ser}.
|
||||
|
||||
|
||||
@node con command line options
|
||||
@subsection Command line options
|
||||
|
||||
@ -1757,9 +1783,9 @@ cat legia.txt | tok | lem -1 | kot
|
||||
Left column minimal width in characters (default = 0).
|
||||
@item @b{@minus{}@minus{}ignore @minus{}i}
|
||||
Ignore segment inconsistency in the input.
|
||||
@item @b{@minus{}@minus{}bon}
|
||||
@item @b{@minus{}@minus{}bom}
|
||||
Beginning of selected segment (regex, default='[0-9]+ [0-9]+ BOM .*').
|
||||
@item @b{@minus{}@minus{}eob}
|
||||
@item @b{@minus{}@minus{}eom}
|
||||
End of selected segment (regex, default='[0-9]+ [0-9]+ EOM .*').
|
||||
@item @b{@minus{}@minus{}bod}
|
||||
Selected segment beginning display string (default='[').
|
||||
@ -1773,7 +1799,7 @@ cat legia.txt | tok | lem -1 | kot
|
||||
@node con usage example
|
||||
@subsection Usage example
|
||||
@example
|
||||
cat file.txt | tok | lem -1 | ser -e 'lexeme(dom) | con'
|
||||
cat file.txt | tok | lem -1 | ser -e 'lexeme(dom)' | con
|
||||
@end example
|
||||
|
||||
|
||||
@ -1789,7 +1815,6 @@ sequence:
|
||||
@end example
|
||||
|
||||
|
||||
|
||||
@c ---------------------------------------------------------------------
|
||||
@c ---------------------------------------------------------------------
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user