* Use standard text normalization - fixes#2898Fixes#2898. Fixes#409. Refs #650
Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.
* Fix Mac build
* Improve compatibility with previous code
One intentional change is folding O with stroke to
oe instead of o.
- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table
* Add oe character/ligature & more long S forms
* More tests for ligatures and Latin Extended
* Add Latin-1 Supplement tests
Fixes#1161
This change parallels what was done in #12571da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.
* Adjust Travis build environments - fixes#2861Fixes#2861
- Only builds one each of JDK 11-14
- Fixes all validator warnings
- Switches default build environment to bionic
- Uses trusty for an Oracle JDK 8 build
- Adds OS X build
- Adds JDK 13 & 14 builds
- Adds placeholder for JDK 16 builds
(but Jacoco doesn't currently support it,
so commented out)
- Reorder build jobs so that most informative ones run first
- Split before_install into before_install and
before_script sections
* Drop redundant JDK 13 build
* Swap OS X to JDK 14 instead of JDK 13
This doesn't have anything to do with JDK or OS X versions,
but instead the Travis CI build images. A bug in the homebrew
support was only fixed in recent images, so we need to use
an xcode11 build which implies macOS 10.14 or 10.5 and
JDK 14 or 14.0.1.
* Implemented RestrictedPosition Scrutinizer tests using mocks
Added RestrictedPositionConstraint class and updated test cases using mocks
* Tests updated & working fine
* Truncate any completely empty columns on the right
Fixes#565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.
Also adds a basic ODS import test.
* Fix dates in ODS spreadsheets
Fixes#2224
* Performance optimized version of ToNumber
Approximately 5x faster for floats (data dependent)
and about the same speed for integers.
- Instead of blindly trying to parse as Long, do a quick check
for obvious problems (e.g. decimal point).
- Don't trim. It's already done by called methods.
- Use valueOf() instead of parse() to avoid object creation
* Add Java Microbenchmark Harness
The shaded JAR is missing the OpenRefine classes, for a reason
that I haven't figured out, so requires openrefine-main.jar at runtime.
* Remove old implementations of ToNumber
* Remove unneeded dependencies from main project
* Clean up and reformat
Refs #2863
The tree importer sorts columns/column groups by how populated
they are, which is of arguable utility, but the tie-breaker
of ordering by shortest column name is completely silly.
This change removes that and, in conjunction with a stable sort
algorithm, will preserve the original order of the columns.
* Fix two deprecated methods usages
* Test ToNumber conversions
* Test behavior of all functions when passed 0 or 8 arguments
There are 16 which fail currently on 0 args (return null or
False instead of EvalError), but have been whitelisted until
we can verify whether it's safe to change them without introducing
compatibility issues.
There are 19 which fail to return an error on too many (ie 8) args.
No issue.
- we don't support Excel95, but make sure that it generates an exception
- move the test data file into the appropriate directory
- for any normal test, consider exceptions a failure