* Refactor HTTP code into a common utility class
Centralizes the six (slightly) different implementations to use
a common Apache HTTP Client 5 implementation which implements our
strategies for retries, timeouts, error handling, etc.
Apache HTTP Client 5 adds support for Retry-After headers, HTTP/2,
and a bunch of other stuff under the covers.
Moves request delay to a request interceptor and fixes calculation
of the delay (again). Increase retries from 1x to 3x and use delay*2
as the default retry interval, if no Retry-After header. Uses an
exponential backoff strategy for multiple retries.
* Reuses HTTP client across requests
* Use IOException instead of Exception for HTTP errors
* Fix: Preventing addition of any empty cells with whitespaces while importing Xml data with Tests : Issue #1095
* Chore: Using 'CharMatcher' to match whitespace pattern instead of using custom regex : Issue #1095
* add clarity for reinterpret docs
Helps fix#3292
* update reinterpret docs phrasing
We agreed to use "encoding" to be friendly to user exposed messaging instead of "encoder" and "decoder" that is used internally.
* fix serializeReinterpret() test json
* Bump git-commit-id-plugin from 4.0.0 to 4.0.2
Bumps git-commit-id-plugin from 4.0.0 to 4.0.2.
Signed-off-by: dependabot[bot] <support@github.com>
* Bump to maven-git-commit-id-plugin 4.0.3 for JSON bugfix
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tom Morris <tfmorris@gmail.com>
* changed to reflect the function's acceptance of either simple string or regex
* cast p into a Pattern
* cast p into a Pattern
* Changed test to reflect the new output from function.
* Add more keyer tests
- All forms of Unicode whitespace for both fingerprint & N-gram fingerprint
- additional N-gram fingerprint cases
* Improve fingerprint keyers
- Update N-gram fingerprint keyer to match (missed last time)
- refactor string normalization to reduce redundancy between two keyers
- add C1 controls to control characters that are stripped
- include all Unicode whitespace characters in splitting delimiter
and don't strip controls which are whitespace (HT, LF, VT, FF, CR,
NEL)
- minor cleanups, simplifications, and performance optimizations
* Added translation using Weblate (Polish)
* Added translation using Weblate (Polish)
* Added translation using Weblate (Polish)
* Translated using Weblate (Polish)
Currently translated at 6.3% (49 of 769 strings)
Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/pl/
* Translated using Weblate (Polish)
Currently translated at 4.0% (9 of 225 strings)
Translation: OpenRefine/wikidata
Translate-URL: https://hosted.weblate.org/projects/openrefine/wikidata/pl/
* Translated using Weblate (Polish)
Currently translated at 19.3% (12 of 62 strings)
Translation: OpenRefine/database
Translate-URL: https://hosted.weblate.org/projects/openrefine/database/pl/
* Translated using Weblate (French)
Currently translated at 99.2% (763 of 769 strings)
Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/fr/
Co-authored-by: Włodzimierz Bartczak <wzbartczak@gmail.com>
Co-authored-by: Tom Morris <tfmorris@gmail.com>
* Visually center links box
* Add the Java runtime info to the About page - fixes#3240
- Add the Java runtime name & version GetVersionCommand
- Add the returned information to the About page
* internationalised the label using $.i18n
* fix some problem
* make it more clear and understandable
* change prefix 'core-buttons/ to 'core-import-formats'
* formateNames to formateLabelKey
* fix spelling mistake
* add translation-en.json in pc-axis
* remove from previous file
* Add internationalized activation
* improvement in pc-axis langs
Co-authored-by: chetan <you@example.com>
* Fix for issue #3163, Calling the okbutton.click() on press of enter
* Fix for issue #3163, using form and bind on submit.
* Fix for issue #3163, submit using jQuery's submit method
* Clean up importer refactoring
Remove an extra copy of filename setting.
Revert some additional API changes (retaining both versions)
* Revert archive file name changes & mark as deprecated
* Add utility helpers to create array of comparable items
* Extend sort() to handle arrays with nulls
- Instead of NullPointerException on nulls, sort them last
- add JSON helpers to return Comparable[] in addition to Object[]
- Non-homogenous arrays or arrays with non-primitive
objects (array or object) are not sortable
- Add tests for both new and old sort functionality
* Refactor GREL Get tests
- move helper up to RefineTest
- move tests to the correct module
* Extend forEach() to support JSON objects - fixes#3149
Also add tests for existing forEach forms in addition to the new one
* Add a couple more tests
* added class to List Facet
* added class to Timerange Facet
* added class to Range Facet
* added class to Text Filter Facet
* added class to Scatterplot Facet
* added base class
* added end line in facet.js
* fixed indentations facet.js
* fixed indentation again
* removed fields
* added suggested changes
* Migrate reconciliation calls to OkHTTP, for #2903
* Migrate to Apache HTTP Commons
* Migrate data extension to Apache HTTP client
* Deprecate HttpURLConnection in RefineServlet
* Use LaxRedirectStrategy, clean up imports
* Remove read and pool timeouts, only keep the connection timeout
* Adapt mocking of HTTP calls after migration
* Refactor test helper
Create a version of the assert that uses the standard parameter
order and deprecate the version that uses inverted order.
* Use consistent Assert class and parameter ordering
* Update jQuery UI from 1.10.3 to 1.12.1 and associated theme CSS
* Fix sidebar tab layout issue with new jQuery UI
* Update initialization jQuery UI Tabs widgets
selected is now active, but the first tab is selected by default
so we don't need to do it manually.
* Patch GData initialization error
Don't attempt to initialize if we get no docs back (ie unauthorized)
* Fix About page - fixes#3088
Update jQuery version
Include correct Javascript file to get version information
Fix version display
* Remove obsolete Freebase logo license reference
* Fix ToDate test failure - fixes#3026
Instead of computing offset from UTC at current
point in time, use the offset from the parsed
date so that we're not affected by crossing
a daylight savings time boundary.
* Fix date parsing with locale as first format string
Also refactors for simpicity, restore some dropped tests,
and restores previous behavior of considering a bad
format string an error instead of silently ignoring it.
It does NOT address another issue which was introduced
in May 2018 of treating date/times without timzone
information as UTC instead of local.
* Restore error checking and messages
* Save & restore default timezone for tests
Also add some ToDos for places where LocalDate is being misused.
* Refactor module wiring to reduce redundancy
* Update to jQuery 1.12.4 & jQuery Migrate 1.4.1 - fixes#2932
This updates to the latest jQuery 1.x and jQuery Migrate 1.x,
the first step in upgrading to a modern jQuery.
* Add a couple of bug fixes from Google Code SVN
This is an unrelease version from the Google Code freebase-site
repo which only has a few changes from the v4.3 release, but
one of them is removing the `browser.msie` reference that
jQuery Migrate is complaining about.
* Use prop() for 'checked' and 'disabled'
* Update jQuery 'value' property setting code to use val()
* Use prop() instead of attr() to set 'selected'
* Patch for jQuery >1.9
* Replace js string concatenation with i18n parameters
refs #1858
Remove Javascript string concatentation and use jquery i18n()
instead so that translators have the needed context and
flexibility to be able to do a good job. Also remove code-based
plurals conditionalization and replace with i18n.
* Update French translation so I can test non-English support
* Add missing localization
* Clean up formatting of service API link
Fixed self mistakes
modified line forVantage
Revert "modified line forVantage"
This reverts commit f252bde77cedf2f85fbfaf2059e551078ad62c2c.
modification in one anathor line
Co-authored-by: chetan <you@example.com>
* Clustering dialog choices limit & performance improvements - fixes#695Fixes#695
- Caps the total number of choices displayed at 10,000 and warns when
over the limit. Users can use facets to tune which clusters are displayed.
- Doubles the performance of the Javascript processing
- Only displays count of rows for a choice if it's > 1 to DOM elements
- Adds internationalization for row count
For 41K clusters containing 118K choices, processing dropped from
3m20s to 1m20s, but with the 10K choice cap total time is ~10sec.
* Restore even/odd row class
* Updates from review feedback
* changes to rendering of rows
* some cell rendering improvements
* more render row improvements
* fixed jQuery methods on js elements
* added comment for nbsp
Bump to vicino 1.2 with bug fix and real POM.
Drop dependencies on secondstring and arithcode which are just
transitive dependencies from simile-vicino, now that it has a
proper POM. Fixes#2959.
* Make sure data directory is directory, not a file
* Add a test for zip archive import
Also tests the saving of the archive file name and source filename
* Add TODOs - no functional changes
* Cosmetic cleanups
* Revert importer API changes for archive file name parameter
Fixes#2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.
* Revert weird flow of control for import options metadata
This reverts the very convoluted control flow that was introduced
when adding the input options to the project metadata. Instead
the metadata is all handled in the importer framework rather than
having to change APIs are have individual importers worry about
it.
The feature never had test coverage, so that is still to be added.
* Add test for import options in project metadata & fix bug
Fixes bug where same options object was being reused and overwritten,
so all copies in the list ended up the same.
Fixes#2917
Update to Butterfly 1.0.4 which catches NoClassDefFound errors
for Butterfly modules (ie OpenRefine extensions) which are missing
Java dependencies (e.g. those built against earlier versions
of OpenRefine)
* Fix text guesser so it doesn't guess wikitext
Fixes#2850
- Add simple magic detector for zip & gzip files to keep
it from attempting to guess binary files
- Add a counter for C0 controls for the same reason
- Tighten wikitable counters to require marker at
beginning of the line, per the specification
- Refactor to use Apache Commons instead of private
counting methods
- Add tests for most TextGuesser formats
* Remove misplaced duplicate test data file
* Fix LGTM warning + minor cleanups
* Use BoundedInputStream to prevent runaway lines
* Add utility functions to check/convert dates
* Add date tests and refactor to DRY up
* Fix date import - fixes#1908
Change from java.util.Date to OpenRefine 3.0+'s OffsetDateTime
Fixes#1908
* Centralize date conversion
* Moving utility methods to ParsingUtilities
* Fix tests
* Use standard text normalization - fixes#2898Fixes#2898. Fixes#409. Refs #650
Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.
* Fix Mac build
* Improve compatibility with previous code
One intentional change is folding O with stroke to
oe instead of o.
- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table
* Add oe character/ligature & more long S forms
* More tests for ligatures and Latin Extended
* Add Latin-1 Supplement tests
Fixes#1161
This change parallels what was done in #12571da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.
* Truncate any completely empty columns on the right
Fixes#565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.
Also adds a basic ODS import test.
* Fix dates in ODS spreadsheets
Fixes#2224
* Performance optimized version of ToNumber
Approximately 5x faster for floats (data dependent)
and about the same speed for integers.
- Instead of blindly trying to parse as Long, do a quick check
for obvious problems (e.g. decimal point).
- Don't trim. It's already done by called methods.
- Use valueOf() instead of parse() to avoid object creation
* Add Java Microbenchmark Harness
The shaded JAR is missing the OpenRefine classes, for a reason
that I haven't figured out, so requires openrefine-main.jar at runtime.
* Remove old implementations of ToNumber
* Remove unneeded dependencies from main project
* Clean up and reformat
Refs #2863
The tree importer sorts columns/column groups by how populated
they are, which is of arguable utility, but the tie-breaker
of ordering by shortest column name is completely silly.
This change removes that and, in conjunction with a stable sort
algorithm, will preserve the original order of the columns.
* Fix two deprecated methods usages
* Test ToNumber conversions
* Test behavior of all functions when passed 0 or 8 arguments
There are 16 which fail currently on 0 args (return null or
False instead of EvalError), but have been whitelisted until
we can verify whether it's safe to change them without introducing
compatibility issues.
There are 19 which fail to return an error on too many (ie 8) args.
No issue.
- we don't support Excel95, but make sure that it generates an exception
- move the test data file into the appropriate directory
- for any normal test, consider exceptions a failure
Fixes#565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.
Also adds a basic ODS import test.
Fixes#2824
Versions up through 3.14.0 appear to work, but since odfdom bundles
Jena 3.9.0, we're going to be conservative and match that.
As an added bonus, includes a blank node test which will trigger
the failure.
* Fix charset encoding & MIME type handling
Character set (ie what we call "encoding") is part of the Content-Type,
*not* the Content-Encoding, which specifies compression (e.g. gzip).
This correctly sets the character set encoding as well as cleaning
the MIME type so that additional parsing doesn't need to be done
downstream (and removes that code).
* Use "text" instead of "text/line-based" as default fallback format
The TextLineBasedGuesser only tries a limited number of
formats (CSV, TSV, fixed), so we can't get out of that hole to
find JSON, XML, etc.
Start with a more general format instead to improve our
guessing odds.
* Support content type Structured Name Syntax Suffixes (+json +xml)
If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes#2800Fixes#2805
* Harden reconciliation - Fixes#2590
- check for non-JSON / unparseable JSON returns
- handle malformed results response with no name for candidates
- catch any Exception, not just IOExceptions
- call processManager.onFailedProcess() for cleanup on error
* Add default constructor for Jackson
Jackson complains about needing a default constructor for the
NON_DEFAULT annotation, but I'm not sure why this worked before.
* Clean up indentation and unused variable - no functional changes
Make indentation consistent throughout the module, changing recently
added lines to use the standard all spaces convention.
Remove unused count variable
* Simplify control flow
* Update limit parameter comment. No functional change.
* Replace ternary expression which is causing NPE
* Add reconciliation tests using mock HTTP server
* Fixes#486. Builds on code from Steffen Stundzig
- Switch from ICU4J to juniversalchardet
(Java port of Mozilla charset detector)
- Replace org.json code with Jackson
- Add tests
- Add TODO for multi-file character encoding mismatches
* Restore dependency lost in rebase
Co-authored-by: Steffen Stundzig <git@stundzig.de>