djfz-2019/TaskE05/description.txt

22 lines
724 B
Plaintext
Raw Normal View History

2020-01-09 11:29:38 +01:00
Text normalization for a TTS
============================
The task is to write a Thrax grammar for normalizing text for a
text-to-speech system, i.e. the text should be converted to a form
closer to speech (but without phonetic transcription).
For instance, "I bought 21 books from prof. Smith" should
be transformed into "i bought twenty-one books from professor smith"
You should:
- convert numbers 0-99 into words
- convert Roman numbers I-X into adjectives (the first, the second, etc.)
- expand abbreviations: "e.g." ("for example"), , "prof." ("professor"), "dr." ("doctor"),
"p." ("page"), "pp." ("pages")
- remove punctuations (except for hyphen)
- lower-case everything
POINTS: 10
DEADLINE: 2020-01-28 23:59