RandomSec/main
Tom Morris 0562638ffa
Use standard text normalization - fixes #2898 (#2900)
* Use standard text normalization - fixes #2898

Fixes #2898. Fixes #409. Refs #650

Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.

* Fix Mac build

* Improve compatibility with previous code

One intentional change is folding O with stroke to
oe instead of o.

- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
  new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table

* Add oe character/ligature & more long S forms

* More tests for ligatures and Latin Extended

* Add Latin-1 Supplement tests
2020-07-07 21:35:41 +02:00
..
IDEs/eclipse Thad no longer supports Netbeans config (#2466) 2020-03-27 09:30:30 +01:00
src Use standard text normalization - fixes #2898 (#2900) 2020-07-07 21:35:41 +02:00
tests Use standard text normalization - fixes #2898 (#2900) 2020-07-07 21:35:41 +02:00
webapp Remove incorrect "dates" from guess data type label - fixes #2883 2020-07-06 19:55:23 -04:00
pom.xml Report HTTP error codes to the user when creating a project from a URL (#2870) 2020-07-07 11:58:47 +02:00