Commit Graph

1139 Commits

Author SHA1 Message Date
Jathin Sreenivas
4d106b118f
Templating exporter should have clear error reporting (#3945)
* Templating exporter should have clear error reporting

* Using i18n for meesage and removed the unnecessary null check

* Removed usage of bindings

* Adding tests for grel, renamed ParsetTests to TemplatingParserTests

* Regex to test the keys of the template exporter

* Cancel changes to the templating parser

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2021-11-25 21:19:10 +01:00
Antonin Delpeuch
3f5a4bd900
Fix date parsing to avoid offsets due to DST. (#4295)
This pull request comes without tetsts because there is no clean way
to set the system timezone in a Java unit test:
https://stackoverflow.com/questions/23466218/setting-timezone-for-maven-unit-tests-on-java-8

The problem came from the fact that we were interpreting all dates with the current
zone *offset*, which varies during the year depending on DST, meaning that during the winter
we would interpret summer dates in a winter timezone, which does not make sense.
This changes the date conversion mechanism to only rely on the current zone, making sure
the correct offset is used depending on the value of the converted date.
2021-11-14 10:29:46 +01:00
nikhilp3
c92d745af3
Display error for unsupported compression file type (#4286)
Closes #4260.
2021-11-13 13:03:49 +01:00
nmamoon
debc4e65c8
More explicit error messages for GREL functions with unexpected arguments (#4255)
Fixes #4193.
2021-11-09 19:13:17 +01:00
Antonin Delpeuch
30a0f6643d
Fix vulnerability in imported filename allocation. (#4237)
Follow up for https://github.com/OpenRefine/OpenRefine/pull/3048.
Code change suggested by https://github.com/Marcono1234.

Closes #3043.
2021-10-23 09:01:14 +02:00
Antonin Delpeuch
235b5957e2
Change the behaviour of failed reconciliations to not reconcile the cell at all (#4232)
* Change the behaviour of failed reconciliations to not reconcile the cell at all.

This makes it easier to tell apart cells which could not be reconciled due to
a network error and those which just did not have any reconciliation candidate.
This makes it possible to retry reconciling cells which have been left unreconciled
after a recon operation.

Closes #3369.

* Update StandardReconConfigTests with new behaviour for failed recons
2021-10-22 08:43:45 +02:00
Pablo Castellano
3bca46329c
Update wikidata reconciliation endpoint (#4206) 2021-10-20 14:45:57 +02:00
Antonin Delpeuch
c71dee891c
Fix Recon id generation. (#4217)
The previous method was prone to creating collisions when a lot of recon ids
were created around the same time.

Closes #3785.
2021-10-20 11:28:42 +02:00
Antonin Delpeuch
6960b7db66
Increase socket timeout for reconciliation. Closes #3987 (#4068) 2021-10-16 09:08:56 +02:00
Warpeas
fed23ec7f6
fixes #3462 (#3921)
Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2021-05-30 22:24:06 +02:00
Antonin Delpeuch
6b30a7e13c Set version to 3.6-SNAPSHOT 2021-05-29 13:06:04 +02:00
Antonin Delpeuch
1b9907e20c Set version to 3.5-beta1 2021-05-29 09:23:23 +02:00
Warpeas
3adc03c0db
fixe rotation parsing in scatterplot facet (#3926)
* fixes #3344

* Remove try/catch block no longer useful
2021-05-26 14:08:42 +02:00
Antonin Delpeuch
36589e5b1c
Make sure POST requests are retried even when connections are closed. (#3875)
By default, the Apache HTTP client does not retry POST requests in those cases
because they are not deemed idempotent. But for reconciliation they are, so they
should be retried.

Closes #3664.
2021-05-15 08:52:23 +02:00
Antonin Delpeuch
e664712a70
Make schema evaluation null-proof; avoid adding null values in data extension. (#3723)
Null values in non-null cells should normally not happen, but sometimes they do and
we shouldn't fail miserably in those cases.

Closes #2880.

Co-authored-by: Tom Morris <tfmorris@gmail.com>
2021-05-15 08:51:26 +02:00
Antonin Delpeuch
787c272fe0
Fix date parsing for XLS and ODS files to avoid timezone-dependency. (#3825)
Closes #3740.
2021-05-15 08:51:04 +02:00
Antonin Delpeuch
bbec28e67d
Restore log levels after Jetty update (#3827)
* Restore log levels after Jetty update

* Merge the two log4j.properties files
2021-05-15 08:50:38 +02:00
Antonin Delpeuch
577caabec1
Add exponential backoff retries for reconciliation calls. (#3770)
* Update test to demonstrate retry mechanism for recon queries.

This was fixed earlier by #3237. Closes #3369.

* Ensure non-zero retry interval in HttpClient

* Restore original bound
2021-04-01 17:10:17 +02:00
gitonthescene
5bbc4ae304
Support export filenames with non-ASCII chars. (fixes #3724) (#3736)
Co-authored-by: Douglas Mennella <douglas.mennella@gmail.com>
2021-03-17 08:28:21 +01:00
sarahelshabrawy
cf58dc4cdc
Enable hyperlinks in in-tool Help field (#3648) 2021-02-20 12:56:39 +01:00
allanaaa
713a8f5b72
Edits to GREL Help text in-tool (#3649)
Improvements to GREL Help text in-tool and minor updates to GREL reference. Plus removal of pointless tests

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
Co-authored-by: Owen Stephens <owen@ostephens.com>
2021-02-18 17:52:32 +00:00
Tom Morris
14f43dc2cc
Refactor HTTP code into common module & Improve Fetch URL - fixes #3129 (#3237)
* Refactor HTTP code into a common utility class 

Centralizes the six (slightly) different implementations to use
a common Apache HTTP Client 5 implementation which implements our
strategies for retries, timeouts, error handling, etc.

Apache HTTP Client 5 adds support for Retry-After headers, HTTP/2,
and a bunch of other stuff under the covers.

Moves request delay to a request interceptor and fixes calculation
of the delay (again). Increase retries from 1x to 3x and use delay*2
as the default retry interval, if no Retry-After header. Uses an 
exponential backoff strategy for multiple retries.

* Reuses HTTP client across requests
* Use IOException instead of Exception for HTTP errors
2020-12-07 00:38:36 -05:00
Mahesh Jindal
4f97fd55a5
Fix: Preventing addition of any empty cells with whitespaces while importing Xml Data with Tests #1095 (#3357)
* Fix: Preventing addition of any empty cells with whitespaces while importing Xml data with Tests : Issue #1095

* Chore: Using 'CharMatcher' to match whitespace pattern instead of using custom regex : Issue #1095
2020-12-02 18:11:45 +01:00
Thad Guidry
3d30897b3b Add Wynn to fingerprint to support Old English texts 2020-11-28 21:40:49 -05:00
rachittiwari8562
990540ce10
Fix #3330: argument checking in phonetic GREL function (#3345)
* Fix for issue #3330 phonetic-function

* Update main/src/com/google/refine/expr/functions/strings/Phonetic.java

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>

* Corrected Intendation

Corrected intendation as suggested.

* Added tests to check invalid parameters

* Added tests to check invalid parameters

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2020-11-22 09:23:06 +01:00
Thad Guidry
6c9ad3f31d
Improve documentation of reinterpret GREL function (#3315)
* add clarity for reinterpret docs

Helps fix #3292

* update reinterpret docs phrasing

We agreed to use "encoding" to be friendly to user exposed messaging instead of "encoder" and "decoder" that is used internally.

* fix serializeReinterpret() test json
2020-11-05 07:57:12 +01:00
Tom Morris
e809c707ff Fix up Javadoc 2020-11-03 15:44:07 -05:00
Tom Morris
fb96d22dec Revert "Fix Reinterpret missing documentation"
This reverts commit f7cece6103.
2020-11-03 14:49:00 -05:00
Thad Guidry
f7cece6103 Fix Reinterpret missing documentation 2020-11-03 09:42:33 -06:00
rubyAnne
352127558a
changed to reflect the function's acceptance of either simple string … (#3294)
* changed to reflect the function's acceptance of either simple string or regex

* cast p into a Pattern

* cast p into a Pattern

* Changed test to reflect the new output from function.
2020-11-01 08:52:36 +01:00
Tom Morris
c8220d687e
Improve fingerprint keyers - fixes #3282 (#3283)
* Add more keyer tests

- All forms of Unicode whitespace for both fingerprint & N-gram fingerprint
- additional N-gram fingerprint cases

* Improve fingerprint keyers

- Update N-gram fingerprint keyer to match (missed last time)
- refactor string normalization to reduce redundancy between two keyers
- add C1 controls to control characters that are stripped
- include all Unicode whitespace characters in splitting delimiter
  and don't strip controls which are whitespace (HT, LF, VT, FF, CR,
NEL)
- minor cleanups, simplifications, and performance optimizations
2020-10-25 20:32:30 +01:00
Tom Morris
af1cf375f5
Add Java runtime name & version to About page - fixes #3240 (#3244)
* Visually center links box

* Add the Java runtime info to the About page - fixes #3240

- Add the Java runtime name & version GetVersionCommand
- Add the returned information to the About page
2020-10-02 09:41:37 +02:00
Tom Morris
959200d141 Maintain order for uniques() - fixes #3235
Also add tests
2020-09-30 17:45:24 -04:00
Thad Guidry
3f6d1eabba
Adds new Jsoup wholeText() function and tests (#3181)
* Adds new Jsoup wholeText() function and tests
- Ref: https://github.com/jhy/jsoup/blob/master/CHANGES#L275
- Ref: https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#wholeText()

* update the description of function

* Update main/src/com/google/refine/expr/functions/xml/WholeText.java
2020-09-12 16:14:26 +02:00
Tom Morris
eaf881ced7
Importer refactoring cleanup (#2984)
* Clean up importer refactoring

Remove an extra copy of filename setting.
Revert some additional API changes (retaining both versions)

* Revert archive file name changes & mark as deprecated
2020-09-06 17:46:08 -04:00
Tom Morris
15a069d3d5
Improve exception reporting - refs #3145 (#3146)
Include the exception name with the message returned to the user.
2020-09-06 08:26:49 +02:00
Tom Morris
a86a6d4e3b
sort() handles nulls instead of throwing NPE - fixes #3152 (#3162)
* Add utility helpers to create array of comparable items

* Extend sort() to handle arrays with nulls

- Instead of NullPointerException on nulls, sort them last
- add JSON helpers to return Comparable[] in addition to Object[]
- Non-homogenous arrays or arrays with non-primitive
  objects (array or object) are not sortable
- Add tests for both new and old sort functionality
2020-09-05 23:01:47 +02:00
Tom Morris
37ae9a3d51
Merge pull request #3156 from OpenRefine/3132-recon-read-timeout
Set read timeout to 60 sec for reconciliation.
2020-09-02 13:16:17 -04:00
Antonin Delpeuch
609e4bac4c Set read timeout to 60 sec for reconciliation. Closes #3132 2020-09-02 16:03:29 +02:00
Tom Morris
aa43445c99
Extend forEach() to support JSON objects (#3150)
* Refactor GREL Get tests

- move helper up to RefineTest
- move tests to the correct module

* Extend forEach() to support JSON objects - fixes #3149

Also add tests for existing forEach forms in addition to the new one

* Add a couple more tests
2020-08-30 08:40:17 +02:00
Tom Morris
95756bf11f Replace deprecated constant 2020-08-23 14:17:40 -04:00
Antonin Delpeuch
9ac54edbba
Migrate reconciliation calls to Apache HTTP client (#2906)
* Migrate reconciliation calls to OkHTTP, for #2903

* Migrate to Apache HTTP Commons

* Migrate data extension to Apache HTTP client

* Deprecate HttpURLConnection in RefineServlet

* Use LaxRedirectStrategy, clean up imports

* Remove read and pool timeouts, only keep the connection timeout

* Adapt mocking of HTTP calls after migration
2020-08-23 14:04:59 +02:00
Tom Morris
fc21d58ed1
Don't count TABs as control characters - fixes #3061 (#3068)
* Don't count TABs as control characters - fixes #3061

* Add TSV test. Replace info logging w/assert message
2020-08-16 10:35:25 +02:00
Tom Morris
9c403d59d2
Add separator to zip slip check - fixes #3043 (#3048) 2020-08-09 14:48:55 +02:00
Tom Morris
55edae2b7b
Fix ToDate test failure & inefficiency - fixes #3026 (#3027)
* Fix ToDate test failure - fixes #3026

Instead of computing offset from UTC at current
point in time, use the offset from the parsed
date so that we're not affected by crossing
a daylight savings time boundary.

* Fix date parsing with locale as first format string

Also refactors for simpicity, restore some dropped tests,
and restores previous behavior of considering a bad
format string an error instead of silently ignoring it.

It does NOT address another issue which was introduced
in May 2018 of treating date/times without timzone
information as UTC instead of local.

* Restore error checking and messages

* Save & restore default timezone for tests

Also add some ToDos for places where LocalDate is being misused.
2020-08-09 13:53:43 +02:00
Tom Morris
83ed9ffdaf
Refactor importer APIs - Fixes #2963 (#2978)
* Make sure data directory is directory, not a file

* Add a test for zip archive import

Also tests the saving of the archive file name and source filename

* Add TODOs - no functional changes

* Cosmetic cleanups

* Revert importer API changes for archive file name parameter

Fixes #2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.

* Revert weird flow of control for import options metadata

This reverts the very convoluted control flow that was introduced
when adding the input options to the project metadata. Instead
the metadata is all handled in the importer framework rather than
having to change APIs are have individual importers worry about
it.

The feature never had test coverage, so that is still to be added.

* Add test for import options in project metadata & fix bug

Fixes bug where same options object was being reused and overwritten,
so all copies in the list ended up the same.
2020-07-23 18:36:14 +02:00
Tom Morris
a3fab26cca
Fix the text format guesser so it doesn't inappropriately guess WikiText (#2924)
* Fix text guesser so it doesn't guess wikitext

Fixes #2850
- Add simple magic detector for zip & gzip files to keep
  it from attempting to guess binary files
- Add a counter for C0 controls for the same reason
- Tighten wikitable counters to require marker at
  beginning of the line, per the specification
- Refactor to use Apache Commons instead of private
  counting methods
- Add tests for most TextGuesser formats

* Remove misplaced duplicate test data file

* Fix LGTM warning + minor cleanups

* Use BoundedInputStream to prevent runaway lines
2020-07-15 08:56:00 +02:00
Tom Morris
ed68541988
Remove informational logging from tests that are passing (#2923)
* Change logging from info to debug

* Make tests less chatty when they're passing
2020-07-14 17:47:36 +02:00
Tom Morris
306b541c69
Fix Excel date import - Fixes #1908 (#2909)
* Add utility functions to check/convert dates

* Add date tests and refactor to DRY up

* Fix date import - fixes #1908

Change from java.util.Date to OpenRefine 3.0+'s OffsetDateTime
Fixes #1908

* Centralize date conversion

* Moving utility methods to ParsingUtilities

* Fix tests
2020-07-09 23:13:44 +02:00
Tom Morris
0562638ffa
Use standard text normalization - fixes #2898 (#2900)
* Use standard text normalization - fixes #2898

Fixes #2898. Fixes #409. Refs #650

Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.

* Fix Mac build

* Improve compatibility with previous code

One intentional change is folding O with stroke to
oe instead of o.

- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
  new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table

* Add oe character/ligature & more long S forms

* More tests for ligatures and Latin Extended

* Add Latin-1 Supplement tests
2020-07-07 21:35:41 +02:00