e28a625259
M app/dist/files/README uaktualnione M app/doc/utt.texinfo dopiski M app/src/gue/Makefile statyczne biblioteki M app/src/cor/cmdline_cor.ggo usuniecie nie dzialajacych parametrow M app/src/cor/Makefile statyczne biblioteki M app/src/common/cmdline_common.ggo ? M app/src/kor/Makefile statyczne biblioteki M app/src/lem/Makefile statyczne biblioteki M lang/dist/tarball/Makefile pakowanie modulow jezykowych po jednym M lang/Makefile -"- git-svn-id: svn://atos.wmid.amu.edu.pl/utt@61 e293616e-ec6a-49c2-aa92-f4a8b91c5d16
52 lines
1.4 KiB
Plaintext
52 lines
1.4 KiB
Plaintext
General information
|
|
*********************
|
|
|
|
UAM Text Tools (UTT) is a package of language processing tools
|
|
developed at Adam Mickiewicz University. Its functionality includes:
|
|
* tokenization
|
|
* dictionary-based morphological analysis
|
|
* heuristic morphological analysis of unknown words
|
|
* spelling correction
|
|
* pattern search
|
|
* sentence splitting
|
|
* generation of concordance tables
|
|
|
|
The toolkit is destined for processing of raw (not annotated)
|
|
unrestricted text for any conceivable purpose.
|
|
|
|
|
|
Installation
|
|
**************
|
|
|
|
1) unpack the UTT tar archive
|
|
2) in the same directory, unpack the tar archives of all UTT dictionary modules you have
|
|
3) run
|
|
make install
|
|
in the root directory of the installation
|
|
4) add the bin directory to the PATH variable
|
|
|
|
|
|
Requirements
|
|
*************
|
|
|
|
* File::HomeDir
|
|
|
|
the Perl package File::HomeDir must be installed
|
|
(to install the package, run 'perl -MCPAN -e shell' and write
|
|
'install File::HomeDir' after the 'cpan>' prompt appears)
|
|
|
|
* flex
|
|
|
|
to run the ser component, flex must be installed in your system
|
|
|
|
* ruby
|
|
|
|
to run the tre component, ruby must be installed in your system
|
|
|
|
* locale pl_PL.iso-8852-2
|
|
|
|
the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed
|
|
and set while using UTT with the Polish module. The text you
|
|
process with UTT must be encoded in iso-8859-2.
|
|
|