utt/dist/files
2012-02-07 15:37:11 +01:00
..
COPYRIGHT Rewritten the build system, added lem UTF-8 version. 2012-02-07 15:37:11 +01:00
LICENCE Rewritten the build system, added lem UTF-8 version. 2012-02-07 15:37:11 +01:00
README Rewritten the build system, added lem UTF-8 version. 2012-02-07 15:37:11 +01:00

General information
*********************

UAM Text Tools (UTT) is a package of language processing tools
developed at Adam Mickiewicz University. Its functionality includes:
* tokenization
* dictionary-based morphological analysis
* heuristic morphological analysis of unknown words
* spelling correction
* pattern search
* sentence splitting
* generation of concordance tables
                     
The toolkit is destined for processing of raw (not annotated)
unrestricted text for any conceivable purpose.
                        

Installation
**************

1) unpack the UTT tar archive
2) in the same directory, unpack the tar archives of all UTT dictionary modules you have
3) run
	make install
   in the root directory of the installation
4) add the bin directory to the PATH variable


Requirements
*************

* File::HomeDir

  the Perl package File::HomeDir must be installed
  (to install the package, run 'perl -MCPAN -e shell' and write
   'install File::HomeDir' after the 'cpan>' prompt appears)
   
* flex

  to run the ser component, flex must be installed in your system

* ruby

  to run the tre component, ruby must be installed in your system

* locale pl_PL.iso-8852-2

  the locales pl_PL.iso-8859-2 (pl_PL in short) must be installed
  and set while using UTT with the Polish module. The text you 
  process with UTT must be encoded in iso-8859-2.