Commit Graph

684 Commits

Author SHA1 Message Date
dependabot[bot]
968378a39c
build(deps-dev): bump eslint in /main/tests/cypress (#3640)
Bumps [eslint](https://github.com/eslint/eslint) from 7.19.0 to 7.20.0.
- [Release notes](https://github.com/eslint/eslint/releases)
- [Changelog](https://github.com/eslint/eslint/blob/master/CHANGELOG.md)
- [Commits](https://github.com/eslint/eslint/compare/v7.19.0...v7.20.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
Co-authored-by: Kush Trivedi <44091822+kushthedude@users.noreply.github.com>
2021-02-19 08:44:37 +01:00
allanaaa
713a8f5b72
Edits to GREL Help text in-tool (#3649)
Improvements to GREL Help text in-tool and minor updates to GREL reference. Plus removal of pointless tests

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
Co-authored-by: Owen Stephens <owen@ostephens.com>
2021-02-18 17:52:32 +00:00
Florian Giroud
c5f3d700a3
test: Added tests for transpose functionalities, #3426 (#3650)
* test: Added tests for transpose functionalities, #3426
2021-02-18 15:15:29 +01:00
Kush Trivedi
0c5742771c
feat: add tests for rows-records mode (#3606)
Added tests for the view as row / view as record modes in the grid header
2021-02-18 11:16:34 +01:00
dependabot[bot]
64e75f79ad
build(deps): bump cypress from 6.4.0 to 6.5.0 in /main/tests/cypress (#3644)
Bumps [cypress](https://github.com/cypress-io/cypress) from 6.4.0 to 6.5.0.
- [Release notes](https://github.com/cypress-io/cypress/releases)
- [Changelog](https://github.com/cypress-io/cypress/blob/develop/.releaserc.base.js)
- [Commits](https://github.com/cypress-io/cypress/compare/v6.4.0...v6.5.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-02-16 19:02:34 +01:00
Kush Trivedi
1b02446b5c
fix: accidental non-linted files in repo (#3635)
Signed-off-by: kushthedude <kushthedude@gmail.com>
2021-02-14 19:30:23 +01:00
S-Harshit
389fa6257d
Add UX Test for controls at the row level (#3601)
* Add UX Test for controls at the row level
2021-02-12 14:16:33 +01:00
Akshita Singh
7bccdd1bcf
UI support for multiple hyperlinks in the same cell (#3597)
* fix for style ,hyperlink,tests added

* fixes #2519 hyperlink issue and added tests

* fixes #2519 ,span test,typo indent fixed
2021-02-11 19:54:36 +01:00
Kush Trivedi
f2d2be1356
CI: add eslint workflow in the CI (#3602)
Signed-off-by: kushthedude <kushthedude@gmail.com>
2021-02-10 14:18:49 +01:00
Florian Giroud
a1209f9702
Fix Flaky Cypress tests, Issue 3594 (#3595)
* Attempt to fix flaky tests
* Increased Cypress retries, removed retries for openMode
2021-02-08 14:47:52 +01:00
Florian Giroud
382c030a7c
fix: UX tests linting (#3593) 2021-02-07 21:27:12 +01:00
S-Harshit
fb248495f1
Add UX Test for column / Text filter (#3566)
* AddUX Test for column / Text filter

fixes #3421
2021-02-07 21:05:36 +01:00
dependabot[bot]
c1cbc2bdd7
Bump cypress from 6.2.1 to 6.4.0 in /main/tests/cypress (#3551)
Bumps [cypress](https://github.com/cypress-io/cypress) from 6.2.1 to 6.4.0.
- [Release notes](https://github.com/cypress-io/cypress/releases)
- [Changelog](https://github.com/cypress-io/cypress/blob/develop/.releaserc.base.js)
- [Commits](https://github.com/cypress-io/cypress/compare/v6.2.1...v6.4.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-02-07 20:56:09 +01:00
Florian Giroud
7003dd2d2d
doc: Enriched UX testing documentation, #3573 (#3583) 2021-02-07 19:07:54 +01:00
Kush Trivedi
859828a0f0
feat: add ESLint configuration for cypress-test-suite (#3564)
* feat: Initialise ES-Lint for cypress
2021-02-07 18:53:13 +01:00
Antonin Delpeuch
a0bea8f8c7 Revert "Merge pull request #3588 from singhakshita/hyperlink-issue-tests"
This reverts commit 79aa260442, reversing
changes made to d6edf5ddb0.

See discussion at:
https://github.com/OpenRefine/OpenRefine/pull/3588
2021-02-07 09:21:26 +01:00
singhakshita
44d019a014
Update main/tests/cypress/cypress/integration/project/grid/column/edit-cells/common-transforms/proper-display.spec.js
Co-authored-by: Thad Guidry <thadguidry@gmail.com>
2021-02-06 21:35:20 +05:30
akshitasingh
002d795c43 tests fixed 2021-02-06 11:41:02 +05:30
akshitasingh
86d64bfb15 lint fix 2021-02-06 08:16:30 +05:30
akshitasingh
d0802fa0ce tests for hyperlink 2021-02-05 22:41:29 +05:30
Kush Trivedi
3692257aa1
feat: add tests for pagination and pagesize (#3550)
* feat: add tests for pagination and pagesize
2021-02-04 15:42:57 +01:00
Kush Trivedi
6750a45d53
feat: add prettier lint scripts and workflow (#3546)
* feat: add lint scripts and workflow

Signed-off-by: kushthedude <kushthedude@gmail.com>

* fix lint

Signed-off-by: kushthedude <kushthedude@gmail.com>
2021-02-04 15:37:03 +01:00
Florian Giroud
a5db3774f9
test: Added tests for expression panels, #3498 (#3535)
* test: Added tests for expression panels, #3498

* Cosmetic changes, renaming tests

* Refactored unique expressions used in tests
2021-02-02 09:09:25 +01:00
Florian Giroud
23be710f2c
test: Refactored the facet test, added more cases (#3534)
* test: Refactored the facet test, added more cases, added utility methods, #3419

* Update main/tests/cypress/cypress/integration/project/grid/column/facet/facets.spec.js

Co-authored-by: Thad Guidry <thadguidry@gmail.com>

* Update main/tests/cypress/cypress/integration/project/grid/column/facet/facets.spec.js

Co-authored-by: Thad Guidry <thadguidry@gmail.com>

* Update main/tests/cypress/cypress/integration/project/grid/column/facet/facets.spec.js

Co-authored-by: Thad Guidry <thadguidry@gmail.com>

* Removed assertion from utility method editCell

Co-authored-by: Thad Guidry <thadguidry@gmail.com>
2021-02-02 09:07:40 +01:00
Florian Giroud
3ce9292b66
test: Added test cases for edit Cells, #3423 (#3541) 2021-02-01 14:43:27 +01:00
Kush Trivedi
b8982d1d6e
tests: add test for filtering project through tags (#3521)
* tests: add test for filtering project through tags

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-02-01 13:49:04 +01:00
Kush Trivedi
0b748bcc37
feat: add tests for sorting column (#3519)
* feat: add tests for sorting column
2021-02-01 13:40:40 +01:00
Kush Trivedi
7e81f9500b
fix: properly migrate to cypress 6.2.1 (#3533)
Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-01-28 12:36:02 +01:00
Florian Giroud
8c50a18a85
fix: Reverted Cypress to 6.2.1
Reverted Cypress, because of too many stability issues with the latest 6.3
2021-01-28 12:22:23 +01:00
Kush Trivedi
900ff2db9a
tests: UX Test for project list deletion, sort, filter (#3480)
* tests: UX Test for project list deletion, sort, filter

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-01-28 11:45:05 +01:00
dependabot[bot]
951175a1f5
Bump cypress-file-upload from 5.0.1 to 5.0.2 in /main/tests/cypress (#3516)
Bumps [cypress-file-upload](https://github.com/abramenal/cypress-file-upload) from 5.0.1 to 5.0.2.
- [Release notes](https://github.com/abramenal/cypress-file-upload/releases)
- [Commits](https://github.com/abramenal/cypress-file-upload/compare/v5.0.1...v5.0.2)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-27 09:15:30 +01:00
Florian Giroud
a421447a8f
test: Improved the way we handle fixtures, #3505 (#3506)
* test: Improved the way we handle fixtures, #3505

* Added a fix for an invalid call to loadAndVisitProject, added cypress download path to gitignore
2021-01-25 21:05:03 +01:00
dependabot[bot]
61fe0ed458
Bump cypress-file-upload from 5.0.0 to 5.0.1 in /main/tests/cypress (#3503)
Bumps [cypress-file-upload](https://github.com/abramenal/cypress-file-upload) from 5.0.0 to 5.0.1.
- [Release notes](https://github.com/abramenal/cypress-file-upload/releases)
- [Commits](https://github.com/abramenal/cypress-file-upload/compare/v5.0.0...v5.0.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-22 12:23:39 +01:00
Kush Trivedi
2f375664f2
feat: introduct prettier style formatting for cypress test suite (#3494)
Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-01-21 14:08:01 +01:00
dependabot[bot]
1a2a5082d7
Bump fs-extra from 9.0.1 to 9.1.0 in /main/tests/cypress (#3500)
Bumps [fs-extra](https://github.com/jprichardson/node-fs-extra) from 9.0.1 to 9.1.0.
- [Release notes](https://github.com/jprichardson/node-fs-extra/releases)
- [Changelog](https://github.com/jprichardson/node-fs-extra/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jprichardson/node-fs-extra/compare/9.0.1...9.1.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-20 17:38:00 +01:00
dependabot[bot]
00f31eb758
Bump cypress from 6.2.1 to 6.3.0 in /main/tests/cypress (#3501)
Bumps [cypress](https://github.com/cypress-io/cypress) from 6.2.1 to 6.3.0.
- [Release notes](https://github.com/cypress-io/cypress/releases)
- [Changelog](https://github.com/cypress-io/cypress/blob/develop/.releaserc.base.js)
- [Commits](https://github.com/cypress-io/cypress/compare/v6.2.1...v6.3.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-20 14:45:39 +01:00
dependabot[bot]
c6f15ed653
Bump cypress-file-upload from 4.1.1 to 5.0.0 in /main/tests/cypress (#3493)
Bumps [cypress-file-upload](https://github.com/abramenal/cypress-file-upload) from 4.1.1 to 5.0.0.
- [Release notes](https://github.com/abramenal/cypress-file-upload/releases)
- [Commits](https://github.com/abramenal/cypress-file-upload/compare/v4.1.1...v5.0.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-18 22:27:12 +01:00
Florian Giroud
e2361fee56
Issue 3453, organized the test suite (#3456)
* Moved and renamed Cypress tests, created folders skeleton

* Updated syntax for assertions in sorting test

* Added test folders

* fix: cache all UI paths for cypress (#3454)

* fix: cache all UI paths for cypress

* Update pull_request.yml

* chore: restore caching in pr workflow (#3457)

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>

Co-authored-by: Kush Trivedi <44091822+kushthedude@users.noreply.github.com>
2021-01-12 15:24:53 +01:00
dependabot[bot]
a6b7855ede
Bump cypress from 6.0.1 to 6.2.1 in /main/tests/cypress (#3448)
Bumps [cypress](https://github.com/cypress-io/cypress) from 6.0.1 to 6.2.1.
- [Release notes](https://github.com/cypress-io/cypress/releases)
- [Changelog](https://github.com/cypress-io/cypress/blob/develop/.releaserc.base.js)
- [Commits](https://github.com/cypress-io/cypress/compare/v6.0.1...v6.2.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-07 22:41:42 +01:00
Kush Trivedi
0e08c50e07
feat: introduce dependabot for cypress suite (#3447)
Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-01-07 08:37:24 +01:00
Kush Trivedi
fe123129d2
chore: align test-suite name with npm standards (#3439)
* chore: align test-suite name with npm standards

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>

* chore: rename ci to openrefine

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>

* chore: make requested changes

Signed-off-by: Kush Trivedi <kushthedude@gmail.com>
2021-01-04 12:28:48 +01:00
Florian Giroud
4b6106a386
Run UI tests in continuous integration (#3393)
* Fixed flaky tests

* Refactored ui_test commans-line, added documentation

* Attempt to build a workflow with cypress

* Fixed CI UX tests build

* Changed cyprss actions for pull-request

* Merged Cypress workflow into the regular PR target workflow

* Refactored Github workflows to include Cypress Tests

* Revert Ci build to pull_request_target
2020-12-15 20:34:15 +01:00
Tom Morris
14f43dc2cc
Refactor HTTP code into common module & Improve Fetch URL - fixes #3129 (#3237)
* Refactor HTTP code into a common utility class 

Centralizes the six (slightly) different implementations to use
a common Apache HTTP Client 5 implementation which implements our
strategies for retries, timeouts, error handling, etc.

Apache HTTP Client 5 adds support for Retry-After headers, HTTP/2,
and a bunch of other stuff under the covers.

Moves request delay to a request interceptor and fixes calculation
of the delay (again). Increase retries from 1x to 3x and use delay*2
as the default retry interval, if no Retry-After header. Uses an 
exponential backoff strategy for multiple retries.

* Reuses HTTP client across requests
* Use IOException instead of Exception for HTTP errors
2020-12-07 00:38:36 -05:00
Mahesh Jindal
4f97fd55a5
Fix: Preventing addition of any empty cells with whitespaces while importing Xml Data with Tests #1095 (#3357)
* Fix: Preventing addition of any empty cells with whitespaces while importing Xml data with Tests : Issue #1095

* Chore: Using 'CharMatcher' to match whitespace pattern instead of using custom regex : Issue #1095
2020-12-02 18:11:45 +01:00
Tom Morris
2872ceeb7a
Merge pull request #3349 from thadguidry/fingerprint-wynn-thadguidry
Add Wynn to fingerprint to support Old English texts
2020-11-28 22:08:00 -05:00
Thad Guidry
3d30897b3b Add Wynn to fingerprint to support Old English texts 2020-11-28 21:40:49 -05:00
Florian Giroud
7950d764ff
Architecture for front end browser-based UI tests, issue #733 (#3340)
* Added Cypress tests to OpenRefine
* Installed Cypress
* Added a few tests to cover basic OR features
Y

* Enriched langage Tests

* Enriched project_create tests

* Refactored and enriched undo/redo tests, added extract & apply

* Upgraded Cypress to 5.6.0

* Removed the cypress-dot-env plugin, as Cypress now supports nice configuration capabilities

* Added UX tests documentation

* Improved functional tests documentation, added license and description to tests package.json
2020-11-23 18:18:12 +01:00
rachittiwari8562
990540ce10
Fix #3330: argument checking in phonetic GREL function (#3345)
* Fix for issue #3330 phonetic-function

* Update main/src/com/google/refine/expr/functions/strings/Phonetic.java

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>

* Corrected Intendation

Corrected intendation as suggested.

* Added tests to check invalid parameters

* Added tests to check invalid parameters

Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2020-11-22 09:23:06 +01:00
Thad Guidry
6c9ad3f31d
Improve documentation of reinterpret GREL function (#3315)
* add clarity for reinterpret docs

Helps fix #3292

* update reinterpret docs phrasing

We agreed to use "encoding" to be friendly to user exposed messaging instead of "encoder" and "decoder" that is used internally.

* fix serializeReinterpret() test json
2020-11-05 07:57:12 +01:00
rubyAnne
352127558a
changed to reflect the function's acceptance of either simple string … (#3294)
* changed to reflect the function's acceptance of either simple string or regex

* cast p into a Pattern

* cast p into a Pattern

* Changed test to reflect the new output from function.
2020-11-01 08:52:36 +01:00
Tom Morris
c8220d687e
Improve fingerprint keyers - fixes #3282 (#3283)
* Add more keyer tests

- All forms of Unicode whitespace for both fingerprint & N-gram fingerprint
- additional N-gram fingerprint cases

* Improve fingerprint keyers

- Update N-gram fingerprint keyer to match (missed last time)
- refactor string normalization to reduce redundancy between two keyers
- add C1 controls to control characters that are stripped
- include all Unicode whitespace characters in splitting delimiter
  and don't strip controls which are whitespace (HT, LF, VT, FF, CR,
NEL)
- minor cleanups, simplifications, and performance optimizations
2020-10-25 20:32:30 +01:00
Tom Morris
30d16c2077 Add Thad/Owen's test 2020-09-30 20:57:57 -04:00
Tom Morris
dbb8e530c8 Add tests for array reverse & join 2020-09-30 20:24:05 -04:00
Tom Morris
d6e42bf5d9 Annotate another missed test 2020-09-30 20:00:08 -04:00
Tom Morris
959200d141 Maintain order for uniques() - fixes #3235
Also add tests
2020-09-30 17:45:24 -04:00
Tom Morris
bb4fc50f17 Enable missed test 2020-09-30 17:44:05 -04:00
Tom Morris
c76e2b9a46
disable flaky readTableWithLinks test - refs #3128 (#3207) 2020-09-23 07:41:22 +02:00
Thad Guidry
3f6d1eabba
Adds new Jsoup wholeText() function and tests (#3181)
* Adds new Jsoup wholeText() function and tests
- Ref: https://github.com/jhy/jsoup/blob/master/CHANGES#L275
- Ref: https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#wholeText()

* update the description of function

* Update main/src/com/google/refine/expr/functions/xml/WholeText.java
2020-09-12 16:14:26 +02:00
Tom Morris
eaf881ced7
Importer refactoring cleanup (#2984)
* Clean up importer refactoring

Remove an extra copy of filename setting.
Revert some additional API changes (retaining both versions)

* Revert archive file name changes & mark as deprecated
2020-09-06 17:46:08 -04:00
Tom Morris
a86a6d4e3b
sort() handles nulls instead of throwing NPE - fixes #3152 (#3162)
* Add utility helpers to create array of comparable items

* Extend sort() to handle arrays with nulls

- Instead of NullPointerException on nulls, sort them last
- add JSON helpers to return Comparable[] in addition to Object[]
- Non-homogenous arrays or arrays with non-primitive
  objects (array or object) are not sortable
- Add tests for both new and old sort functionality
2020-09-05 23:01:47 +02:00
Tom Morris
aa43445c99
Extend forEach() to support JSON objects (#3150)
* Refactor GREL Get tests

- move helper up to RefineTest
- move tests to the correct module

* Extend forEach() to support JSON objects - fixes #3149

Also add tests for existing forEach forms in addition to the new one

* Add a couple more tests
2020-08-30 08:40:17 +02:00
Tom Morris
441c069bc5 Add some string function tests
Including a test for Apache TEXT-149 behavior change
https://github.com/apache/commons-text/pull/119

Add some more string function tests
2020-08-23 14:17:40 -04:00
Tom Morris
b5aea3b780 Remove unused imports 2020-08-23 14:17:40 -04:00
Tom Morris
a50669800f Split up multifunction test modules
Distributes the tests to individual modules per function and
deletes the former multifunction test modules.
2020-08-23 14:11:49 -04:00
Tom Morris
392a64b25e Refactor tests to hoist common methods into RefineTest
Moves the invoke() method and the associated fields into
the super class and deletes the redundant implementations.
2020-08-23 14:11:49 -04:00
Antonin Delpeuch
9ac54edbba
Migrate reconciliation calls to Apache HTTP client (#2906)
* Migrate reconciliation calls to OkHTTP, for #2903

* Migrate to Apache HTTP Commons

* Migrate data extension to Apache HTTP client

* Deprecate HttpURLConnection in RefineServlet

* Use LaxRedirectStrategy, clean up imports

* Remove read and pool timeouts, only keep the connection timeout

* Adapt mocking of HTTP calls after migration
2020-08-23 14:04:59 +02:00
Tom Morris
259705ad5f
assertEqualAsJson test helper refactor (#3113)
* Refactor test helper

Create a version of the assert that uses the standard parameter
order and deprecate the version that uses inverted order.

* Use consistent Assert class and parameter ordering
2020-08-23 11:04:44 +02:00
Tom Morris
fc21d58ed1
Don't count TABs as control characters - fixes #3061 (#3068)
* Don't count TABs as control characters - fixes #3061

* Add TSV test. Replace info logging w/assert message
2020-08-16 10:35:25 +02:00
Tom Morris
b73b480d7d
Remove tests of third party code (#3050)
Neither of these tests are testing OpenRefine code
(and a negative NotEquals test is useless anyway)
2020-08-10 12:39:30 +02:00
Tom Morris
55edae2b7b
Fix ToDate test failure & inefficiency - fixes #3026 (#3027)
* Fix ToDate test failure - fixes #3026

Instead of computing offset from UTC at current
point in time, use the offset from the parsed
date so that we're not affected by crossing
a daylight savings time boundary.

* Fix date parsing with locale as first format string

Also refactors for simpicity, restore some dropped tests,
and restores previous behavior of considering a bad
format string an error instead of silently ignoring it.

It does NOT address another issue which was introduced
in May 2018 of treating date/times without timzone
information as UTC instead of local.

* Restore error checking and messages

* Save & restore default timezone for tests

Also add some ToDos for places where LocalDate is being misused.
2020-08-09 13:53:43 +02:00
Tom Morris
a0819acbd6 Move ProjectManager initialization to beforeMethod 2020-08-03 20:42:31 -04:00
Tom Morris
52194e1685 Add https for all TestNG DTDs 2020-08-03 12:27:58 -04:00
Tom Morris
83ed9ffdaf
Refactor importer APIs - Fixes #2963 (#2978)
* Make sure data directory is directory, not a file

* Add a test for zip archive import

Also tests the saving of the archive file name and source filename

* Add TODOs - no functional changes

* Cosmetic cleanups

* Revert importer API changes for archive file name parameter

Fixes #2963
- restore binary compatibility to the API
- hoist the handling of both fileSource and archiveFileName from
TabularImportingParserBase and TreeImportingParserBase to
ImportingParserBase so that there's only one copy. These 3 classes are
all part of the internal implementation, so there should be no
compatibility issue.

* Revert weird flow of control for import options metadata

This reverts the very convoluted control flow that was introduced
when adding the input options to the project metadata. Instead
the metadata is all handled in the importer framework rather than
having to change APIs are have individual importers worry about
it.

The feature never had test coverage, so that is still to be added.

* Add test for import options in project metadata & fix bug

Fixes bug where same options object was being reused and overwritten,
so all copies in the list ended up the same.
2020-07-23 18:36:14 +02:00
Tom Morris
d5abaac6df
Update marc4j to 2.9.1 - Fixes #2962 (#2977)
* Add a MARC import test

* Make sure data directory is directory, not a file

* Update to marc4j 2.9.1 - fixes #2962
2020-07-22 22:12:30 +02:00
Tom Morris
f2e61b6628
Add tests for wide XLS/XLSX export (#2945)
Refs #2122. Also reenable a couple of disabled tests
2020-07-16 10:01:17 +02:00
Tom Morris
a3fab26cca
Fix the text format guesser so it doesn't inappropriately guess WikiText (#2924)
* Fix text guesser so it doesn't guess wikitext

Fixes #2850
- Add simple magic detector for zip & gzip files to keep
  it from attempting to guess binary files
- Add a counter for C0 controls for the same reason
- Tighten wikitable counters to require marker at
  beginning of the line, per the specification
- Refactor to use Apache Commons instead of private
  counting methods
- Add tests for most TextGuesser formats

* Remove misplaced duplicate test data file

* Fix LGTM warning + minor cleanups

* Use BoundedInputStream to prevent runaway lines
2020-07-15 08:56:00 +02:00
Tom Morris
561619399c
Fix order dependent NPE in LoadLanguage test (#2922)
* Ensure ProjectManager is initialized before test - fixes #2895

* Fix indentation (detabify)
2020-07-14 18:06:04 +02:00
Tom Morris
ed68541988
Remove informational logging from tests that are passing (#2923)
* Change logging from info to debug

* Make tests less chatty when they're passing
2020-07-14 17:47:36 +02:00
Tom Morris
306b541c69
Fix Excel date import - Fixes #1908 (#2909)
* Add utility functions to check/convert dates

* Add date tests and refactor to DRY up

* Fix date import - fixes #1908

Change from java.util.Date to OpenRefine 3.0+'s OffsetDateTime
Fixes #1908

* Centralize date conversion

* Moving utility methods to ParsingUtilities

* Fix tests
2020-07-09 23:13:44 +02:00
Tom Morris
0562638ffa
Use standard text normalization - fixes #2898 (#2900)
* Use standard text normalization - fixes #2898

Fixes #2898. Fixes #409. Refs #650

Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.

* Fix Mac build

* Improve compatibility with previous code

One intentional change is folding O with stroke to
oe instead of o.

- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
  new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table

* Add oe character/ligature & more long S forms

* More tests for ligatures and Latin Extended

* Add Latin-1 Supplement tests
2020-07-07 21:35:41 +02:00
Urvashi Gupta
f62f63706c
Report HTTP error codes to the user when creating a project from a URL (#2870)
* HTTP Error

* urlImportingTestCompleted
2020-07-07 11:58:47 +02:00
Tom Morris
e61d50a1aa
Fix NGramFingerprintKeyer to ignore accents - fixes #1161 (#2899)
Fixes #1161
This change parallels what was done in #1257 1da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.
2020-07-07 09:02:49 +02:00
Tom Morris
3717111db8
Fix Open Office Spreadsheet (ODS) dates (#2843)
* Truncate any completely empty columns on the right

Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.

* Fix dates in ODS spreadsheets

Fixes #2224
2020-07-04 08:42:33 +02:00
Antonin Delpeuch
f4692de9e1 Increase maximum wait for testInvalidUrl, follow-up for #2876 #2875 2020-07-03 21:48:43 +02:00
Tom Morris
5d6af9cb6c
Merge pull request #2865 from tfmorris/2863-tree-column-ordering
Remove shortest-column-name ordering - fixes #2863
2020-07-03 15:23:36 -04:00
Tom Morris
f5786afa35
Increase test timeout - fixes #2875 (#2876) 2020-07-03 21:20:01 +02:00
Tom Morris
d3db73aa67 Remove shortest-column-name ordering
Refs #2863
The tree importer sorts columns/column groups by how populated
they are, which is of arguable utility, but the tie-breaker
of ordering by shortest column name is completely silly.

This change removes that and, in conjunction with a stable sort
algorithm, will preserve the original order of the columns.
2020-07-02 16:12:55 -04:00
Tom Morris
28a9f68236
Unit test improvements (#2856)
* Fix two deprecated methods usages

* Test ToNumber conversions

* Test behavior of all functions when passed 0 or 8 arguments

There are 16 which fail currently on 0 args (return null or
False instead of EvalError), but have been whitelisted until
we can verify whether it's safe to change them without introducing
compatibility issues.

There are 19 which fail to return an error on too many (ie 8) args.
2020-07-02 20:29:21 +02:00
Tom Morris
0f3a6006f3
Add Excel95 import test and improve other importer tests (#2844)
No issue.
- we don't support Excel95, but make sure that it generates an exception
- move the test data file into the appropriate directory
- for any normal test, consider exceptions a failure
2020-06-30 08:20:56 +02:00
Tom Morris
421974cc3d
Truncate any completely empty columns on the right (#2842)
Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.
2020-06-30 08:19:00 +02:00
Tom Morris
83f52d4ba5
Fall back to Apache Jena 3.9.0 (from 3.15.0) (#2826)
Fixes #2824
Versions up through 3.14.0 appear to work, but since odfdom bundles
Jena 3.9.0, we're going to be conservative and match that.

As an added bonus, includes a blank node test which will trigger
the failure.
2020-06-27 23:40:21 +02:00
Tom Morris
4b146acc6e
Create Project import improvements (#2806)
* Fix charset encoding & MIME type handling

Character set (ie what we call "encoding") is part of the Content-Type,
*not* the Content-Encoding, which specifies compression (e.g. gzip).

This correctly sets the character set encoding as well as cleaning
the MIME type so that additional parsing doesn't need to be done
downstream (and removes that code).

* Use "text" instead of "text/line-based" as default fallback format

The TextLineBasedGuesser only tries a limited number of
formats (CSV, TSV, fixed), so we can't get out of that hole to
find JSON, XML, etc.

Start with a more general format instead to improve our
guessing odds.

* Support content type Structured Name Syntax Suffixes (+json +xml)

If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes #2800 Fixes #2805
2020-06-25 08:36:57 +02:00
Tom Morris
1849e62234
Better error handling for reconciliation process - fixes #2590 (#2671)
* Harden reconciliation - Fixes #2590

- check for non-JSON / unparseable JSON returns
- handle malformed results response with no name for candidates
- catch any Exception, not just IOExceptions
- call processManager.onFailedProcess() for cleanup on error

* Add default constructor for Jackson

Jackson complains about needing a default constructor for the
NON_DEFAULT annotation, but I'm not sure why this worked before.

* Clean up indentation and unused variable - no functional changes

Make indentation consistent throughout the module, changing recently
added lines to use the standard all spaces convention.

Remove unused count variable

* Simplify control flow

* Update limit parameter comment. No functional change.

* Replace ternary expression which is causing NPE

* Add reconciliation tests using mock HTTP server
2020-06-23 21:54:54 +02:00
Tom Morris
e293602897
Restore character encoding guesser (#2755)
* Fixes #486. Builds on code from Steffen Stundzig

- Switch from ICU4J to juniversalchardet
  (Java port of Mozilla charset detector)
- Replace org.json code with Jackson
- Add tests
- Add TODO for multi-file character encoding mismatches

* Restore dependency lost in rebase

Co-authored-by: Steffen Stundzig <git@stundzig.de>
2020-06-22 06:04:51 +02:00
Tom Morris
77b858db18
Fix race in Process Manager (#2748)
* Remove redundant JSON diff logging

* Fix race in process manager test causing intermittent failure
2020-06-17 21:24:25 +02:00
Tom Morris
749704518c
Use Apache HTTP Commons for Fetch URL (#2692)
* Use mockwebserver instead of live network for tests

Fixes #2680. Fixes #1904.

* Remove use of deprecated methods

* Convert to use Apache HTTP Components client library

Fixes #1410 by virtue of redirect following being a built-in
capability of the library, along with retries with binary backoff,
built-in decompression, etc.

* Address review comments
2020-06-16 09:38:06 +02:00
james-cui
04055153a1
add archive column (#2573)
Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2020-06-15 19:56:00 +02:00
Joanne Ong
d57d76f7df
Fix imprecise facet statistics in records mode (#2607)
* Fix bug in choice counts for records mode

* Add test for value grouper on records

* Refactor and comment code

* Count distinct instances of null/blank data

* Update test to check for blank data count in records

* Remove unnecessary import statement
2020-06-15 19:38:50 +02:00
Lisa Chandra
947356ddad
[FEAT]Adds new options for split (#2471)
* added options ui

* added definition for both separators

* added tests

* removed definitions from backend and added them to frontend

* added reverse order and handling for accented characters

* added tests for accented characters and reverse split

* fixed build errors

* unicode character ranges instead

* added examples
2020-06-15 19:30:18 +02:00
chuhao zeng
9b03ecae41
Convert illegal characters into legal ones. (#2431)
* Convert illegal characters into leagal ones.

* Test tab in key & value string

Also fix up test that depended on previous TAB
related error message and clean up logging

Co-authored-by: Tom Morris <tfmorris@gmail.com>
2020-06-14 09:47:58 +02:00