Commit Graph

5 Commits

Author SHA1 Message Date
Tom Morris
e61d50a1aa
Fix NGramFingerprintKeyer to ignore accents - fixes #1161 (#2899)
Fixes #1161
This change parallels what was done in #1257 1da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.
2020-07-07 09:02:49 +02:00
Tom Morris
0e832e2d7c
Merge pull request #2889 from OpenRefine/dependabot/maven/org.apache.maven.plugins-maven-site-plugin-3.9.1
Bump maven-site-plugin from 3.3 to 3.9.1
2020-07-06 16:34:32 -04:00
dependabot-preview[bot]
4ddca58f97
Bump maven-site-plugin from 3.3 to 3.9.1
Bumps [maven-site-plugin](https://github.com/apache/maven-site-plugin) from 3.3 to 3.9.1.
- [Release notes](https://github.com/apache/maven-site-plugin/releases)
- [Commits](https://github.com/apache/maven-site-plugin/compare/maven-site-plugin-3.3...maven-site-plugin-3.9.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-06 08:35:49 +00:00
dependabot-preview[bot]
0db676ed3f
Bump maven-shade-plugin from 3.2.1 to 3.2.4
Bumps [maven-shade-plugin](https://github.com/apache/maven-shade-plugin) from 3.2.1 to 3.2.4.
- [Release notes](https://github.com/apache/maven-shade-plugin/releases)
- [Commits](https://github.com/apache/maven-shade-plugin/compare/maven-shade-plugin-3.2.1...maven-shade-plugin-3.2.4)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-06 08:35:16 +00:00
Tom Morris
df8d092132
Micro benchmark harness & ToNumber optimizations (#2859)
* Performance optimized version of ToNumber

Approximately 5x faster for floats (data dependent)
and about the same speed for integers.

- Instead of blindly trying to parse as Long, do a quick check
  for obvious problems (e.g. decimal point).
- Don't trim. It's already done by called methods.
- Use valueOf() instead of parse() to avoid object creation

* Add Java Microbenchmark Harness

The shaded JAR is missing the OpenRefine classes, for a reason
that I haven't figured out, so requires openrefine-main.jar at runtime.

* Remove old implementations of ToNumber

* Remove unneeded dependencies from main project

* Clean up and reformat
2020-07-03 21:42:44 +02:00