Commit Graph

6152 Commits

Author SHA1 Message Date
Tom Morris
3717111db8
Fix Open Office Spreadsheet (ODS) dates (#2843)
* Truncate any completely empty columns on the right

Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.

* Fix dates in ODS spreadsheets

Fixes #2224
2020-07-04 08:42:33 +02:00
Antonin Delpeuch
952447461f
Fix wikidata logout when credentials have expired. Fixes #2873 (#2878) 2020-07-04 08:38:17 +02:00
Antonin Delpeuch
f4692de9e1 Increase maximum wait for testInvalidUrl, follow-up for #2876 #2875 2020-07-03 21:48:43 +02:00
Tom Morris
df8d092132
Micro benchmark harness & ToNumber optimizations (#2859)
* Performance optimized version of ToNumber

Approximately 5x faster for floats (data dependent)
and about the same speed for integers.

- Instead of blindly trying to parse as Long, do a quick check
  for obvious problems (e.g. decimal point).
- Don't trim. It's already done by called methods.
- Use valueOf() instead of parse() to avoid object creation

* Add Java Microbenchmark Harness

The shaded JAR is missing the OpenRefine classes, for a reason
that I haven't figured out, so requires openrefine-main.jar at runtime.

* Remove old implementations of ToNumber

* Remove unneeded dependencies from main project

* Clean up and reformat
2020-07-03 21:42:44 +02:00
Tom Morris
a88aeca304
Merge pull request #2854 from OpenRefine/dependabot/maven/com.google.apis-google-api-services-sheets-v4-rev20200616-1.30.9
Bump google-api-services-sheets from v4-rev20200508-1.30.9 to v4-rev20200616-1.30.9
2020-07-03 15:27:21 -04:00
Tom Morris
5d6af9cb6c
Merge pull request #2865 from tfmorris/2863-tree-column-ordering
Remove shortest-column-name ordering - fixes #2863
2020-07-03 15:23:36 -04:00
Tom Morris
f5786afa35
Increase test timeout - fixes #2875 (#2876) 2020-07-03 21:20:01 +02:00
Thad Guidry
49fd21759c
remove English sentence from French translation (#2871) 2020-07-03 16:12:43 +02:00
Tom Morris
de2c2aa778
Correct mimetype for Google Drive project exports (#2829)
Fixes #2797. Changes mimetype from zip to gzip
and adds .tar.gz extension to the name.
2020-07-03 14:24:25 +02:00
Tom Morris
139019f6e3
Internationalize clipboard default project name (#2814)
Fixes #2776
2020-07-03 14:22:44 +02:00
Ekta Mishra
c68047a614
Implemented QuantityScrutinizer tests using Mocks (#2862)
* Implemented QunatityScrutinizer tests using Mockito

Updated test cases and added AllowedUnitsConstraint class

* Test cases updated
2020-07-03 14:14:32 +02:00
Ekta Mishra
9edb1e514d
Implemented Difference-within-range Scrutinizer tests using mocks (#2864)
Updated all test cases and added DifferenceWithinRangeConstraint class.
2020-07-03 14:13:31 +02:00
Tom Morris
a4b7a00c70
Merge pull request #2867 from chetan-v/JsonFix
Fixed the guessing of JSON for .txt(2820)
2020-07-03 02:55:50 -04:00
chetan
3932b23eb6 Fixed the guessing of JSON for .txt(2820) 2020-07-03 10:46:07 +05:30
Tom Morris
d3db73aa67 Remove shortest-column-name ordering
Refs #2863
The tree importer sorts columns/column groups by how populated
they are, which is of arguable utility, but the tie-breaker
of ordering by shortest column name is completely silly.

This change removes that and, in conjunction with a stable sort
algorithm, will preserve the original order of the columns.
2020-07-02 16:12:55 -04:00
Tom Morris
28a9f68236
Unit test improvements (#2856)
* Fix two deprecated methods usages

* Test ToNumber conversions

* Test behavior of all functions when passed 0 or 8 arguments

There are 16 which fail currently on 0 args (return null or
False instead of EvalError), but have been whitelisted until
we can verify whether it's safe to change them without introducing
compatibility issues.

There are 19 which fail to return an error on too many (ie 8) args.
2020-07-02 20:29:21 +02:00
Ekta Mishra
cd0ed11dad
Implemented Format Scrutinizer tests using Mockito (#2849)
* Implemented Format Scrutinizer tests using Mockito

Updated implementation of the scrutinzer & tests

* Testcases updated in FormatScrutinizerTest
2020-07-02 16:28:56 +02:00
Ekta Mishra
9dfb9114c4
Implemented QualifierComaptibilty Scrutinizer tests using Mockito (#2860)
Updated test cases & added AlLowedQualifierConstraint and MandatoryQualifierConstraint classes.
2020-07-02 14:22:50 +02:00
Ekta Mishra
67bc8581ce
Implemented InverseScrutinizer tests using Mocks (#2855)
* Implemented InverseScrutinizer tests using Mocks

updated testcases and added InverseConstraint Class

* Test cases updated & working fine
2020-07-01 20:49:15 +02:00
Tom Morris
2d1d740b44
Merge pull request #2853 from OpenRefine/dependabot/maven/com.google.http-client-google-http-client-jackson2-1.36.0
Bump google-http-client-jackson2 from 1.35.0 to 1.36.0
2020-07-01 08:58:19 -04:00
dependabot-preview[bot]
cd0d4bdda9
Bump google-api-services-sheets
Bumps google-api-services-sheets from v4-rev20200508-1.30.9 to v4-rev20200616-1.30.9.

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-01 08:26:36 +00:00
dependabot-preview[bot]
b9dedc4438
Bump google-http-client-jackson2 from 1.35.0 to 1.36.0
Bumps [google-http-client-jackson2](https://github.com/googleapis/google-http-java-client) from 1.35.0 to 1.36.0.
- [Release notes](https://github.com/googleapis/google-http-java-client/releases)
- [Changelog](https://github.com/googleapis/google-http-java-client/blob/master/CHANGELOG.md)
- [Commits](https://github.com/googleapis/google-http-java-client/compare/v1.35.0...v1.36.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-01 08:26:18 +00:00
Thad Guidry
b31d2457b0
Add Chan Zuckerberg Initiaive to our backers file (#2852) 2020-07-01 08:08:11 +02:00
dependabot[bot]
3ec20eecb6
Bump xstream from 1.4.9 to 1.4.10-java7 in /packaging
Bumps [xstream](https://github.com/x-stream/xstream) from 1.4.9 to 1.4.10-java7.
- [Release notes](https://github.com/x-stream/xstream/releases)
- [Commits](https://github.com/x-stream/xstream/commits)

Signed-off-by: dependabot[bot] <support@github.com>
2020-06-30 23:27:53 +00:00
Ekta Mishra
cef2e84e7f
Implemented EntityTypeScrutinizer tests usings mocks (#2839)
Updates all the testcases in EntityTypeScrutinizerTest
2020-06-30 22:59:43 +02:00
Tom Morris
54291ef441
Use Apache IO Commons IOUtils instead of homerolled (#2845)
Probably should remove the funky Gzip support with the
overloaded use of the encoding parameter, but this is
a start.
2020-06-30 13:49:47 +02:00
Chetan Verma
e2a2dd2a4e
Fix misstatement about supported formats in import project screen (#2841)
Closes #2753.
2020-06-30 08:25:15 +02:00
Tom Morris
b64cbfea4f
Fix i18n. Fixes #2805 (#2847)
Fix database extensions exporter which is corrupting the dictionary
name with the value of the language.
2020-06-30 08:22:12 +02:00
Tom Morris
0f3a6006f3
Add Excel95 import test and improve other importer tests (#2844)
No issue.
- we don't support Excel95, but make sure that it generates an exception
- move the test data file into the appropriate directory
- for any normal test, consider exceptions a failure
2020-06-30 08:20:56 +02:00
Tom Morris
421974cc3d
Truncate any completely empty columns on the right (#2842)
Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.
2020-06-30 08:19:00 +02:00
Ekta Mishra
bc672047f6
Implemented DistinctValueScrutinizer tests using mockito (#2833)
* Implemented DistinctValueScrutinizer tests using mcokito

Added inner class to the scrutinizer and updated the tests using mocks.

* Tests updated-testNoIssue added

* all tests updated & working fine
2020-06-29 16:00:37 +02:00
Ekta Mishra
46c510b5e2
Implemented SingleValue Scrutinizer tests using mocks (#2818)
* Implemented SingleValue Scrutinizer tests using mocks

Updated test class & added inner class to the scrutinizer

* tests updated

* Updated SingleValueConstraint class
2020-06-29 15:59:53 +02:00
Thad Guidry
2a34c8b5e6
begin Docusaurus 2 migration (#2799)
* begin Docusaurus 2 migration

* Need help fixing the broken 'index'
* needs further customizing footer if we want

* fix README.md

* fixed Pages and Sidebar not loading

Yeah!

* Revert "fixed Pages and Sidebar not loading"

This reverts commit b1588387fc89d650b391c5a8883b6100c4714fbd.

* Revert "fix README.md"

This reverts commit a81509c3c62f11370df40096e55dfd544dad2f87.

* Revert "begin Docusaurus 2 migration"

This reverts commit 59d59c355b8d2a1a270a5655922d53a0577d6414.

* clean move the files for Antonin

* fix broken Navbar links

* fix wrong GitHub link pointing to Docusaurus href

* Fix the edit link for GitHub in top right corner

* Copy content from wiki into Technical Reference

* Copy pages from wiki for top level Architecture

* fix sidebar ordering for Tech

* Add colors from our logo into Infima color matrix

* add comment about colors

* shift primary color by 1 shade in matrix
2020-06-29 08:45:24 +02:00
Ekta Mishra
f32f6a6ea2
Change return type of getConstraintsByType method (#2838)
changed the return type of getConstraintsByTpye method from Stream<Statement> to List<Statement>
2020-06-29 08:43:38 +02:00
Tom Morris
bc540a880e
Fix update to deprecated Google Drive credential code (#2828)
No issue. Restore missing piece of commit 42354c0 so that Builder
has the method parameter that it needs.
2020-06-28 23:07:06 +02:00
Ekta Mishra
1b04927d12
Add constraint class (#2822)
* Add constraint class

* Add constraint class

* updated names
2020-06-28 10:20:18 +02:00
Tom Morris
83f52d4ba5
Fall back to Apache Jena 3.9.0 (from 3.15.0) (#2826)
Fixes #2824
Versions up through 3.14.0 appear to work, but since odfdom bundles
Jena 3.9.0, we're going to be conservative and match that.

As an added bonus, includes a blank node test which will trigger
the failure.
2020-06-27 23:40:21 +02:00
Antoine Beaubien
043e595ea0
Change pref name for ui.browsing.pageSize (#2817)
Change the preference key name ui.gridPaginationSize for ui.browsing.pageSize.
2020-06-27 21:58:48 +02:00
Ekta Mishra
7ac41b4609
Implemented ConflictsWithScrutinizer tests using Mockito (#2804)
updated test class by creating mocks for ConstraintFetcher

Implemented tests for conflicts-with scrutinizer using mocks

Added testcase for no statementList & multiple constraint.

Implemented tests using mock for conflicts-with scrutinizer

Implemented tests using mock for conflicts-with scrutinizer

Added test case for multiple constraints

Added test case for multiple constraints
2020-06-27 17:17:20 +02:00
Ekta Mishra
8c1d8cdcb7
New implementation for Multivalue Scrutinizer (#2807)
Created inner class for Multivalue & mocks for unit tests

New implementation for multivalue scrutinizer

tests updated
2020-06-26 10:14:34 +02:00
Lisa Chandra
7b8f8486f6
Adds a default separator preference for split/join multi valued cells (#2520)
* default value for split/join

* using the new preference interface

* changed preference name to ui.cell.rowSplitDefaultSeparator
2020-06-25 14:35:53 +02:00
Tom Morris
cfa1038066
Remove commons-digester dependency (#2798) 2020-06-25 14:16:25 +02:00
dependabot-preview[bot]
c09e1d5baa
Bump jackson.version from 2.11.0 to 2.11.1 (#2811)
Bumps `jackson.version` from 2.11.0 to 2.11.1.

Updates `jackson-databind` from 2.11.0 to 2.11.1
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

Updates `jackson-annotations` from 2.11.0 to 2.11.1
- [Release notes](https://github.com/FasterXML/jackson/releases)
- [Commits](https://github.com/FasterXML/jackson/commits)

Updates `jackson-core` from 2.11.0 to 2.11.1
- [Release notes](https://github.com/FasterXML/jackson-core/releases)
- [Commits](https://github.com/FasterXML/jackson-core/compare/jackson-core-2.11.0...jackson-core-2.11.1)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
2020-06-25 10:39:29 +02:00
Tom Morris
4b146acc6e
Create Project import improvements (#2806)
* Fix charset encoding & MIME type handling

Character set (ie what we call "encoding") is part of the Content-Type,
*not* the Content-Encoding, which specifies compression (e.g. gzip).

This correctly sets the character set encoding as well as cleaning
the MIME type so that additional parsing doesn't need to be done
downstream (and removes that code).

* Use "text" instead of "text/line-based" as default fallback format

The TextLineBasedGuesser only tries a limited number of
formats (CSV, TSV, fixed), so we can't get out of that hole to
find JSON, XML, etc.

Start with a more general format instead to improve our
guessing odds.

* Support content type Structured Name Syntax Suffixes (+json +xml)

If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes #2800 Fixes #2805
2020-06-25 08:36:57 +02:00
Tom Morris
3aa610d6aa
Improve Google Sheets upload (#2784)
* Support more than 26 columns

Google Sheets default to just 26 columns (A-Z) and we need to
explicitly add more columns if we need them.

Fixes #2760

* Improve Google Sheets upload

- upload in chunks instead of serializing the entire document at once
- Free up resources as we go
- stop if an error occurs
- reduce batch size to try and stay in 10MB request size limit
  (but need a more dynamic way to do this probably for very wide
   sheets or sheets with large values)

* Add basic test and do some cleanup

- add test for columns > 26
- refactor to allow testing and not depend on unnecessary fields
- add i18n TODO for translating spreadsheet description

* Preserve cell data types

Fixes #2785
- integers and floats are sent as Doubles
- bools as Boolean
- DateTimes as Strings
- nulls as the empty string
- anything else as Strings using .toString()

* Fix LGTM-flagged potentially null pointer dereference
2020-06-25 08:18:28 +02:00
dependabot-preview[bot]
de309158c9
Bump plexus-archiver from 4.0.0 to 4.2.2 (#2736)
* Bump plexus-archiver from 4.0.0 to 4.2.2

Bumps [plexus-archiver](https://github.com/codehaus-plexus/plexus-archiver) from 4.0.0 to 4.2.2.
- [Release notes](https://github.com/codehaus-plexus/plexus-archiver/releases)
- [Changelog](https://github.com/codehaus-plexus/plexus-archiver/blob/master/ReleaseNotes.md)
- [Commits](https://github.com/codehaus-plexus/plexus-archiver/compare/plexus-archiver-4.0.0...plexus-archiver-4.2.2)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

* Add comment to explain dependency override

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
2020-06-25 08:03:48 +02:00
Tom Morris
7f435bd3df
Remove obsolete Google API key reference (#2809)
This key was used for the Freebase APIs and is no longer
referenced anywhere.
2020-06-25 07:57:04 +02:00
Tom Morris
a24f2f3feb
Merge pull request #2802 from OpenRefine/dependabot/maven/com.google.apis-google-api-services-drive-v3-rev20200609-1.30.9
Bump google-api-services-drive from v3-rev20200413-1.30.9 to v3-rev20200609-1.30.9
2020-06-24 23:37:46 -04:00
dependabot-preview[bot]
3c4712fb43
Bump google-api-services-drive
Bumps google-api-services-drive from v3-rev20200413-1.30.9 to v3-rev20200609-1.30.9.

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-06-25 03:08:30 +00:00
Tom Morris
f9eb819b01
Merge pull request #2737 from OpenRefine/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.30
Bump slf4j-log4j12 from 1.7.18 to 1.7.30
2020-06-24 16:00:22 -04:00