52 lines
1.4 KiB
Plaintext
52 lines
1.4 KiB
Plaintext
General information
|
|
*********************
|
|
|
|
UAM Text Tools (UTT) is a package of language processing tools
|
|
developed at Adam Mickiewicz University. Its functionality includes:
|
|
* tokenization
|
|
* dictionary-based morphological analysis
|
|
* heuristic morphological analysis of unknown words
|
|
* spelling correction
|
|
* pattern search
|
|
* sentence splitting
|
|
* generation of concordance tables
|
|
|
|
The toolkit is destined for processing of raw (not annotated)
|
|
unrestricted text for any conceivable purpose.
|
|
|
|
|
|
Installation
|
|
**************
|
|
|
|
1) unpack the UTT tar archive
|
|
2) in the same directory, unpack the tar archives of all UTT dictionary modules you have
|
|
3) run
|
|
make install
|
|
in the root directory of the installation
|
|
4) add the bin directory to the PATH variable
|
|
|
|
|
|
Requirements
|
|
*************
|
|
|
|
* File::HomeDir
|
|
|
|
the Perl package File::HomeDir must be installed
|
|
(to install the package, run 'perl -MCPAN -e shell' and write
|
|
'install File::HomeDir' after the 'cpan>' prompt appears)
|
|
|
|
* flex
|
|
|
|
to run the ser component, flex must be installed in your system
|
|
|
|
* ruby
|
|
|
|
to run the tre component, ruby must be installed in your system
|
|
|
|
* locale pl_PL.iso-8852-2
|
|
|
|
the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed
|
|
and set while using UTT with the Polish module. The text you
|
|
process with UTT must be encoded in iso-8859-2.
|
|
|