0562638ffa
* Use standard text normalization - fixes #2898 Fixes #2898. Fixes #409. Refs #650 Replaces homegrown ISO Latin-1 only character subsitition with standard Java Normalize to NFD, followed by diacritic removal and a few custom character expansions/replacements. * Fix Mac build * Improve compatibility with previous code One intentional change is folding O with stroke to oe instead of o. - Use more powerful NFKD instead of NFD - strip punctuation after decomposition since it can generate new punctuation - Add compatibility test for old asciify() method - Add some graphically similar characters to substitution table * Add oe character/ligature & more long S forms * More tests for ligatures and Latin Extended * Add Latin-1 Supplement tests |
||
---|---|---|
.. | ||
binning | ||
knn |