* Use standard text normalization - fixes#2898Fixes#2898. Fixes#409. Refs #650
Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.
* Fix Mac build
* Improve compatibility with previous code
One intentional change is folding O with stroke to
oe instead of o.
- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table
* Add oe character/ligature & more long S forms
* More tests for ligatures and Latin Extended
* Add Latin-1 Supplement tests
Fixes#1161
This change parallels what was done in #12571da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.