utt/dist/files/README

52 lines
1.4 KiB
Plaintext

General information
*********************
UAM Text Tools (UTT) is a package of language processing tools
developed at Adam Mickiewicz University. Its functionality includes:
* tokenization
* dictionary-based morphological analysis
* heuristic morphological analysis of unknown words
* spelling correction
* pattern search
* sentence splitting
* generation of concordance tables
The toolkit is destined for processing of raw (not annotated)
unrestricted text for any conceivable purpose.
Installation
**************
1) unpack the UTT tar archive
2) in the same directory, unpack the tar archives of all UTT dictionary modules you have
3) run
make install
in the root directory of the installation
4) add the bin directory to the PATH variable
Requirements
*************
* File::HomeDir
the Perl package File::HomeDir must be installed
(to install the package, run 'perl -MCPAN -e shell' and write
'install File::HomeDir' after the 'cpan>' prompt appears)
* flex
to run the ser component, flex must be installed in your system
* ruby
to run the tre component, ruby must be installed in your system
* locale pl_PL.iso-8852-2
the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed
and set while using UTT with the Polish module. The text you
process with UTT must be encoded in iso-8859-2.