22 lines
724 B
Plaintext
22 lines
724 B
Plaintext
Text normalization for a TTS
|
|
============================
|
|
|
|
The task is to write a Thrax grammar for normalizing text for a
|
|
text-to-speech system, i.e. the text should be converted to a form
|
|
closer to speech (but without phonetic transcription).
|
|
|
|
For instance, "I bought 21 books from prof. Smith" should
|
|
be transformed into "i bought twenty-one books from professor smith"
|
|
|
|
You should:
|
|
|
|
- convert numbers 0-99 into words
|
|
- convert Roman numbers I-X into adjectives (the first, the second, etc.)
|
|
- expand abbreviations: "e.g." ("for example"), , "prof." ("professor"), "dr." ("doctor"),
|
|
"p." ("page"), "pp." ("pages")
|
|
- remove punctuations (except for hyphen)
|
|
- lower-case everything
|
|
|
|
POINTS: 10
|
|
DEADLINE: 2020-01-28 23:59
|