Commit Graph

2555 Commits

Author SHA1 Message Date
Tom Morris
79399691a4
Translated using Weblate (Spanish)
Currently translated at 99.4% (749 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/es/
2020-08-08 02:20:32 +02:00
Antonin Delpeuch
858bd463a4 Revert "Bump git-commit-id-plugin from 4.0.0 to 4.0.1 (#2948)"
The dependency update broke the snapshot release process on GitHub Actions:
https://github.com/OpenRefine/OpenRefine/runs/876878262

This reverts commit 52bb2c4d38.
2020-07-16 11:07:54 +02:00
dependabot-preview[bot]
52bb2c4d38
Bump git-commit-id-plugin from 4.0.0 to 4.0.1 (#2948)
Bumps git-commit-id-plugin from 4.0.0 to 4.0.1.

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
2020-07-16 11:01:43 +02:00
Tom Morris
f2e61b6628
Add tests for wide XLS/XLSX export (#2945)
Refs #2122. Also reenable a couple of disabled tests
2020-07-16 10:01:17 +02:00
Tom Morris
a3fab26cca
Fix the text format guesser so it doesn't inappropriately guess WikiText (#2924)
* Fix text guesser so it doesn't guess wikitext

Fixes #2850
- Add simple magic detector for zip & gzip files to keep
  it from attempting to guess binary files
- Add a counter for C0 controls for the same reason
- Tighten wikitable counters to require marker at
  beginning of the line, per the specification
- Refactor to use Apache Commons instead of private
  counting methods
- Add tests for most TextGuesser formats

* Remove misplaced duplicate test data file

* Fix LGTM warning + minor cleanups

* Use BoundedInputStream to prevent runaway lines
2020-07-15 08:56:00 +02:00
Hosted Weblate
f9d6c2b93b
Merge branch 'origin/master' into Weblate. 2020-07-14 22:41:55 +02:00
Isao Matsunami
a2100f64f7
Translated using Weblate (Japanese)
Currently translated at 100.0% (753 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/ja/
2020-07-14 22:41:51 +02:00
Tom Morris
561619399c
Fix order dependent NPE in LoadLanguage test (#2922)
* Ensure ProjectManager is initialized before test - fixes #2895

* Fix indentation (detabify)
2020-07-14 18:06:04 +02:00
Tom Morris
ed68541988
Remove informational logging from tests that are passing (#2923)
* Change logging from info to debug

* Make tests less chatty when they're passing
2020-07-14 17:47:36 +02:00
Urvashi Gupta
f00129b852
fixes service panel toggling (#2915) 2020-07-14 16:46:10 +02:00
Tom Morris
233cb95289
Ignore events which don't change text input - fixes #1134 (#2846)
* Ignore events which don't change text input - fixes #1134

* Fix bind
2020-07-14 08:35:46 +02:00
dependabot-preview[bot]
396efc0d1b
Bump mockwebserver from 4.7.2 to 4.8.0
Bumps [mockwebserver](https://github.com/square/okhttp) from 4.7.2 to 4.8.0.
- [Release notes](https://github.com/square/okhttp/releases)
- [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md)
- [Commits](https://github.com/square/okhttp/compare/parent-4.7.2...parent-4.8.0)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
2020-07-13 08:39:23 +00:00
Allan Nordhøy
98b64b7a01
Translated using Weblate (Norwegian Bokmål)
Currently translated at 51.6% (389 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/nb_NO/
2020-07-12 13:41:50 +02:00
Rafael Fontenelle
f329f5085b
Translated using Weblate (Portuguese (Brazil))
Currently translated at 100.0% (753 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/pt_BR/
2020-07-12 13:41:49 +02:00
Hosted Weblate
6c69525545
Merge branch 'origin/master' into Weblate. 2020-07-11 12:55:18 +02:00
Allan Nordhøy
128a3089ed
Translated using Weblate (Norwegian Bokmål)
Currently translated at 51.1% (385 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/nb_NO/
2020-07-11 12:55:18 +02:00
Rafael Fontenelle
b0177e6e33
Translated using Weblate (Portuguese (Brazil))
Currently translated at 100.0% (753 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/pt_BR/
2020-07-11 12:55:17 +02:00
Isao Matsunami
1a7ae77431
Translated using Weblate (Japanese)
Currently translated at 100.0% (753 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/ja/
2020-07-11 12:55:17 +02:00
Tom Morris
306b541c69
Fix Excel date import - Fixes #1908 (#2909)
* Add utility functions to check/convert dates

* Add date tests and refactor to DRY up

* Fix date import - fixes #1908

Change from java.util.Date to OpenRefine 3.0+'s OffsetDateTime
Fixes #1908

* Centralize date conversion

* Moving utility methods to ParsingUtilities

* Fix tests
2020-07-09 23:13:44 +02:00
Urvashi Gupta
a0f2d11255
addStandardServiceOnEnter (#2914) 2020-07-09 22:57:24 +02:00
dependabot-preview[bot]
380942d53f
Bump httpmime from 4.5.2 to 4.5.12 (#2904)
Bumps httpmime from 4.5.2 to 4.5.12.

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
2020-07-08 10:29:09 +02:00
Tom Morris
0562638ffa
Use standard text normalization - fixes #2898 (#2900)
* Use standard text normalization - fixes #2898

Fixes #2898. Fixes #409. Refs #650

Replaces homegrown ISO Latin-1 only character subsitition
with standard Java Normalize to NFD, followed by diacritic
removal and a few custom character expansions/replacements.

* Fix Mac build

* Improve compatibility with previous code

One intentional change is folding O with stroke to
oe instead of o.

- Use more powerful NFKD instead of NFD
- strip punctuation after decomposition since it can generate
  new punctuation
- Add compatibility test for old asciify() method
- Add some graphically similar characters to substitution table

* Add oe character/ligature & more long S forms

* More tests for ligatures and Latin Extended

* Add Latin-1 Supplement tests
2020-07-07 21:35:41 +02:00
Urvashi Gupta
f62f63706c
Report HTTP error codes to the user when creating a project from a URL (#2870)
* HTTP Error

* urlImportingTestCompleted
2020-07-07 11:58:47 +02:00
Tom Morris
e61d50a1aa
Fix NGramFingerprintKeyer to ignore accents - fixes #1161 (#2899)
Fixes #1161
This change parallels what was done in #1257 1da3c00 to fix
the FingerprintKeyer and moves the diacritic removal before
the deduping. Includes a test.
2020-07-07 09:02:49 +02:00
morrme
66aeaa4409
Remove incorrect "dates" from guess data type label - fixes #2883
Fixes #2883.
2020-07-06 19:55:23 -04:00
Hosted Weblate
f421cfd76f
Merge branch 'origin/master' into Weblate. 2020-07-06 03:42:12 +02:00
Tijs De Schacht
dfcd5a0f25
Translated using Weblate (Dutch)
Currently translated at 54.4% (410 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/nl/
2020-07-06 03:42:08 +02:00
Mohamed El ouard Baouche
a6a5935585
Translated using Weblate (Arabic)
Currently translated at 5.8% (44 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/ar/
2020-07-06 03:41:49 +02:00
Isao Matsunami
948d1acae1
Translated using Weblate (Japanese)
Currently translated at 100.0% (753 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/ja/
2020-07-06 03:41:49 +02:00
Tom Morris
8a6171432d
Translated using Weblate (French)
Currently translated at 99.0% (746 of 753 strings)

Translation: OpenRefine/Translations
Translate-URL: https://hosted.weblate.org/projects/openrefine/translations/fr/
2020-07-06 03:41:48 +02:00
Tom Morris
3717111db8
Fix Open Office Spreadsheet (ODS) dates (#2843)
* Truncate any completely empty columns on the right

Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.

* Fix dates in ODS spreadsheets

Fixes #2224
2020-07-04 08:42:33 +02:00
Antonin Delpeuch
f4692de9e1 Increase maximum wait for testInvalidUrl, follow-up for #2876 #2875 2020-07-03 21:48:43 +02:00
Tom Morris
df8d092132
Micro benchmark harness & ToNumber optimizations (#2859)
* Performance optimized version of ToNumber

Approximately 5x faster for floats (data dependent)
and about the same speed for integers.

- Instead of blindly trying to parse as Long, do a quick check
  for obvious problems (e.g. decimal point).
- Don't trim. It's already done by called methods.
- Use valueOf() instead of parse() to avoid object creation

* Add Java Microbenchmark Harness

The shaded JAR is missing the OpenRefine classes, for a reason
that I haven't figured out, so requires openrefine-main.jar at runtime.

* Remove old implementations of ToNumber

* Remove unneeded dependencies from main project

* Clean up and reformat
2020-07-03 21:42:44 +02:00
Tom Morris
5d6af9cb6c
Merge pull request #2865 from tfmorris/2863-tree-column-ordering
Remove shortest-column-name ordering - fixes #2863
2020-07-03 15:23:36 -04:00
Tom Morris
f5786afa35
Increase test timeout - fixes #2875 (#2876) 2020-07-03 21:20:01 +02:00
Thad Guidry
49fd21759c
remove English sentence from French translation (#2871) 2020-07-03 16:12:43 +02:00
Tom Morris
139019f6e3
Internationalize clipboard default project name (#2814)
Fixes #2776
2020-07-03 14:22:44 +02:00
chetan
3932b23eb6 Fixed the guessing of JSON for .txt(2820) 2020-07-03 10:46:07 +05:30
Tom Morris
d3db73aa67 Remove shortest-column-name ordering
Refs #2863
The tree importer sorts columns/column groups by how populated
they are, which is of arguable utility, but the tie-breaker
of ordering by shortest column name is completely silly.

This change removes that and, in conjunction with a stable sort
algorithm, will preserve the original order of the columns.
2020-07-02 16:12:55 -04:00
Tom Morris
28a9f68236
Unit test improvements (#2856)
* Fix two deprecated methods usages

* Test ToNumber conversions

* Test behavior of all functions when passed 0 or 8 arguments

There are 16 which fail currently on 0 args (return null or
False instead of EvalError), but have been whitelisted until
we can verify whether it's safe to change them without introducing
compatibility issues.

There are 19 which fail to return an error on too many (ie 8) args.
2020-07-02 20:29:21 +02:00
Tom Morris
54291ef441
Use Apache IO Commons IOUtils instead of homerolled (#2845)
Probably should remove the funky Gzip support with the
overloaded use of the encoding parameter, but this is
a start.
2020-06-30 13:49:47 +02:00
Chetan Verma
e2a2dd2a4e
Fix misstatement about supported formats in import project screen (#2841)
Closes #2753.
2020-06-30 08:25:15 +02:00
Tom Morris
0f3a6006f3
Add Excel95 import test and improve other importer tests (#2844)
No issue.
- we don't support Excel95, but make sure that it generates an exception
- move the test data file into the appropriate directory
- for any normal test, consider exceptions a failure
2020-06-30 08:20:56 +02:00
Tom Morris
421974cc3d
Truncate any completely empty columns on the right (#2842)
Fixes #565
The current versions of Open Office create default spreadsheets
with over 1000 empty columns. Keep track of the rightmost
non-empty column when importing and truncate everything else.

Also adds a basic ODS import test.
2020-06-30 08:19:00 +02:00
Tom Morris
83f52d4ba5
Fall back to Apache Jena 3.9.0 (from 3.15.0) (#2826)
Fixes #2824
Versions up through 3.14.0 appear to work, but since odfdom bundles
Jena 3.9.0, we're going to be conservative and match that.

As an added bonus, includes a blank node test which will trigger
the failure.
2020-06-27 23:40:21 +02:00
Antoine Beaubien
043e595ea0
Change pref name for ui.browsing.pageSize (#2817)
Change the preference key name ui.gridPaginationSize for ui.browsing.pageSize.
2020-06-27 21:58:48 +02:00
Lisa Chandra
7b8f8486f6
Adds a default separator preference for split/join multi valued cells (#2520)
* default value for split/join

* using the new preference interface

* changed preference name to ui.cell.rowSplitDefaultSeparator
2020-06-25 14:35:53 +02:00
Tom Morris
cfa1038066
Remove commons-digester dependency (#2798) 2020-06-25 14:16:25 +02:00
Tom Morris
4b146acc6e
Create Project import improvements (#2806)
* Fix charset encoding & MIME type handling

Character set (ie what we call "encoding") is part of the Content-Type,
*not* the Content-Encoding, which specifies compression (e.g. gzip).

This correctly sets the character set encoding as well as cleaning
the MIME type so that additional parsing doesn't need to be done
downstream (and removes that code).

* Use "text" instead of "text/line-based" as default fallback format

The TextLineBasedGuesser only tries a limited number of
formats (CSV, TSV, fixed), so we can't get out of that hole to
find JSON, XML, etc.

Start with a more general format instead to improve our
guessing odds.

* Support content type Structured Name Syntax Suffixes (+json +xml)

If we can't find a fully specified content type in our lookup,
fall back to just the suffix (which is registered with a leading +)
Fixes #2800 Fixes #2805
2020-06-25 08:36:57 +02:00
Tom Morris
f9eb819b01
Merge pull request #2737 from OpenRefine/dependabot/maven/org.slf4j-slf4j-log4j12-1.7.30
Bump slf4j-log4j12 from 1.7.18 to 1.7.30
2020-06-24 16:00:22 -04:00