RandomSec/main/tests/server/src/com/google/refine
Tom Morris 0562638ffa
Use standard text normalization - fixes #2898 (#2900)
* Use standard text normalization - fixes #2898

Fixes #2898. Fixes #409. Refs #650

Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.

* Fix Mac build

* Improve compatibility with previous code

One intentional change is folding O with stroke to
oe instead of o.

- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
  new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table

* Add oe character/ligature & more long S forms

* More tests for ligatures and Latin Extended

* Add Latin-1 Supplement tests
2020-07-07 21:35:41 +02:00
..
browsing Fix imprecise facet statistics in records mode (#2607) 2020-06-15 19:38:50 +02:00
clustering Use standard text normalization - fixes #2898 (#2900) 2020-07-07 21:35:41 +02:00
commands remove unused imports (#2574) 2020-04-21 15:51:01 +02:00
exporters allow xlsx files to have more columns (#2602) 2020-04-26 17:07:26 +02:00
expr Unit test improvements (#2856) 2020-07-02 20:29:21 +02:00
grel Unit test improvements (#2856) 2020-07-02 20:29:21 +02:00
history Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
importers Fix Open Office Spreadsheet (ODS) dates (#2843) 2020-07-04 08:42:33 +02:00
importing Report HTTP error codes to the user when creating a project from a URL (#2870) 2020-07-07 11:58:47 +02:00
io Save preferences JSON using UTF-8 encoding. Bulletproof prefs load. (#2657) 2020-06-06 10:00:01 +01:00
model Better error handling for reconciliation process - fixes #2590 (#2671) 2020-06-23 21:54:54 +02:00
operations Increase maximum wait for testInvalidUrl, follow-up for #2876 #2875 2020-07-03 21:48:43 +02:00
preference Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
process Fix race in Process Manager (#2748) 2020-06-17 21:24:25 +02:00
sorting Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
util Fix race in Process Manager (#2748) 2020-06-17 21:24:25 +02:00
HistoryEntryManagerStub.java Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
ProjectManagerStub.java Replace Apache Ant with Commons Compress (#2691) 2020-06-11 16:39:51 +02:00
ProjectManagerTests.java Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
RefineServletStub.java Rename test packages to match tested ones, for #2133 2019-08-23 11:55:31 +01:00
RefineServletTests.java consistent usage of Apache http status constants (#2432) 2020-03-18 06:40:52 +00:00
RefineTest.java Unit test improvements (#2856) 2020-07-02 20:29:21 +02:00