Commit Graph

685 Commits

Author SHA1 Message Date
Jacky
7c83746ee7 deal with empty string properly 2017-10-27 16:59:55 -04:00
Thad Guidry
d72a2de348 Revert "Extend cross() function to support multiple-value-cell-input" 2017-10-26 17:37:10 -05:00
Jacky
6a47482ea4 support metadata edit 2017-10-26 15:47:24 -04:00
Jacky
249fa4d8d5 support metadata edit 2017-10-26 15:45:58 -04:00
Thad Guidry
3d0e96a0ce Merge pull request #1290 from claussni/cross-func-split
Extend cross() function to support multiple-value-cell-input
2017-10-26 14:23:23 -05:00
Ralf Claussnitzer
0b107ec5e9 Add optional 4th parameter to cross() function
The cross function now accepts a 4th parameter defining a regular
expression separator for splitting multi-value field values when joining
projects.

See https://github.com/OpenRefine/OpenRefine/issues/1204#issuecomment-326320954
2017-10-26 19:50:02 +02:00
Ralf Claussnitzer
9aa168633f Allow comma separated multi-value source in cross() function
Implements support for comma separated multiple-value keys for joining
another project using the cross() function.

See https://github.com/OpenRefine/OpenRefine/issues/1204#issuecomment-326320954
2017-10-26 19:50:02 +02:00
Antonin Delpeuch
88b10a2917 Merge pull request #1278 from ostephens/cell-split-regex
Cell split regex
2017-10-25 11:04:33 +01:00
Owen Stephens
224210625d Remove automatic trim of split values 2017-10-24 08:28:37 +01:00
Owen Stephens
46c3ec100e Remove unused local variables and imports 2017-10-23 08:36:08 +01:00
Antonin Delpeuch
23b643426a Fix Codacy warnings in MultiValuedCellSplitOperation 2017-10-23 08:41:14 +02:00
Owen Stephens
cccf1e55c9 Update split multi-valued cells to support split by regex and split by lengths 2017-10-22 23:54:18 +01:00
Jacky
63c1714d0a add fields for metadata 2017-10-22 00:37:59 -04:00
Jacky
f1ab6b8cd6 Merge branch 'master' of https://github.com/OpenRefine/OpenRefine 2017-10-21 23:49:58 -04:00
Jacky
818e139b43 add the import options to metadata 2017-10-21 23:41:11 -04:00
Antonin Delpeuch
21f4d62474 Merge pull request #1275 from OpenRefine/wikitext-url-fix
Forbid pipe characters in URL references to ease parsing.
2017-10-20 16:41:00 +02:00
Antonin Delpeuch
e2a22a6994 Forbid pipe characters in URL references to ease parsing.
This is a temporary fix before we do full Wikitext parsing inside references
(this needs a change upstream). See https://github.com/sweble/sweble-wikitext/issues/67 .
2017-10-20 15:32:58 +01:00
Antonin Delpeuch
54acf10edf Change "topic" to "item" in the UI 2017-10-18 12:39:40 +01:00
Antonin Delpeuch
473b1b135d Merge pull request #1264 from OpenRefine/issue1262
Update Jackson to 2.9.1
2017-10-09 20:09:49 +02:00
Antonin Delpeuch
c9cc4fb262 Update Jackson to 2.9.1
Closes #1262
2017-10-09 17:38:09 +01:00
Owen Stephens
bb6b8378d3 Ensure _max is never less than _min 2017-10-09 17:13:43 +01:00
Antonin Delpeuch
1da3c00cb1 Perform ASCII normalization earlier in FingerprintKeyer.
This closes #1256.
2017-09-27 09:23:40 +01:00
Antonin Delpeuch
cfc0b95cd1 Fix string comparison in Wikitext exporter 2017-09-23 23:13:18 +01:00
Antonin Delpeuch
a1b2c9b683 Add support for references in Wikitable importer.
Closes #1243.
2017-09-23 22:54:43 +01:00
Antonin Delpeuch
49564e8905 Fix bug when an extra column starts in the middle of the table 2017-09-19 13:54:27 +01:00
Antonin Delpeuch
00f8e4fc6b Merge pull request #1237.
Conflicts:
	.classpath
	main/webapp/modules/core/langs/translation-en.json
	main/webapp/modules/core/scripts/dialogs/extend-data-preview-dialog.js

Closes #363 and #56.
2017-08-28 16:25:50 +01:00
Antonin Delpeuch
c66e609b1d Cleanup wikitext PR for Codacy 2017-08-26 21:50:02 +01:00
Antonin Delpeuch
0a00fd9318 Add option to include raw templates as cells 2017-08-25 14:28:30 +01:00
Antonin Delpeuch
554b75fa7b Fix parsing of newlines in cells 2017-08-17 19:18:50 +01:00
Antonin Delpeuch
7989aacc58 Cleanup for Codacy 2017-08-17 12:40:56 +01:00
Antonin Delpeuch
637e69db9d Better error reporting and testing for Wikitext import 2017-08-16 10:30:51 +01:00
Antonin Delpeuch
3dcda5a42c Add reconciliation config in wikitext import. 2017-08-16 00:05:40 +01:00
Antonin Delpeuch
86dc240335 Support reconciliation via sitelinks.
Wikilinks are automatically reconciled at import time.

Related to #56.
2017-08-15 20:17:34 +01:00
Antonin Delpeuch
aa4517ba58 Add support for colspan and rowspan in Wikitext 2017-08-15 11:28:43 +01:00
Antonin Delpeuch
73f7fdc036 Update TextFormatGuesser to support wikitext 2017-08-14 15:58:27 +01:00
Antonin Delpeuch
e168c900e8 Add support for table headers 2017-08-13 20:14:48 +01:00
Jacky
c3e04010b1 Merge branch 'master' into master 2017-08-13 14:09:56 -04:00
Antonin Delpeuch
b8a781d366 Add support for links (unreconciled for now) 2017-08-13 12:57:46 +01:00
Antonin Delpeuch
e6406f56ef Initial version of the wikitext importer 2017-08-13 11:26:59 +01:00
Antonin Delpeuch
dbb071da30 Merge branch 'default-to-english' of https://github.com/RBGKew/OpenRefine into RBGKew-default-to-english 2017-08-09 14:07:22 +01:00
Jacky
275dac976e fix #137 2017-08-07 21:53:35 -04:00
Antonin Delpeuch
66eac0fae9 Ensure null values are not cached in URL fetching operation. Closes #1219. 2017-08-01 13:05:29 +01:00
jackyq2015
53baa5a833 put the correct params description 2017-07-28 20:37:20 -04:00
jackyq2015
4950d29074 add backward compatility for cross function 2017-07-23 19:19:58 -04:00
Thad Guidry
7f92251ed1 Merge pull request #1210 from wetneb/extend
Add data extension capabilities to the reconciliation API
2017-07-17 18:01:37 -05:00
Antonin Delpeuch
84c06821ee Data extension tests 2017-07-16 11:47:12 +01:00
Antonin Delpeuch
05873f283d Integration of constraints with service-defined forms 2017-07-14 22:17:40 +01:00
Antonin Delpeuch
3eadefe613 Do not add reconciliation statistics on columns without types 2017-07-14 12:53:54 +01:00
Antonin Delpeuch
6501c235e8 Pass the identifier and schema spaces along to create better ReconCandidates 2017-07-14 12:30:39 +01:00
Antonin Delpeuch
cc991cab21 Add nicer spinning gif while preview is loading.
Fix bug of multiple ColumnInfo being generated.
2017-07-14 11:30:17 +01:00
Antonin Delpeuch
d99128c330 Retrieve types from the extend service 2017-07-06 21:15:37 +02:00
Antonin Delpeuch
ad3a174abd Starting to migrate data extension to standard reconciliation services 2017-07-04 23:14:19 +02:00
jackyq2015
1ee339cbbd cross function test suite. #1204 2017-06-28 08:12:36 -04:00
jackyq2015
f03be76475 Extend cross() function to take either a cell or a value #1204 2017-06-25 21:04:00 -04:00
Felix Lohmeier
2557cc5419 bugfix for new option autosave period 2017-06-24 22:42:49 +02:00
Felix Lohmeier
e54199a6f1 added options for initial java heap space and autosave period 2017-06-22 12:27:55 +02:00
Adi Eyal
09c00c6a19 Fixes #1181 2017-05-05 23:38:37 +02:00
Bob Harper
909df1b6a7 xor can also accept 2+ params, rewrite tests to be consistent 2017-04-27 11:20:48 +01:00
Bob Harper
ef4e039998 allow more than 2 AND and OR conditions 2017-04-26 20:51:58 +01:00
wangwenxiang
660df900d4 Fix bug: load wrong new value for RowStarChange 2017-03-15 12:54:01 +08:00
wangwenxiang
0314f49f36 Fix bug: load wrong new value for RowFlagChange 2017-03-15 10:39:33 +08:00
Jacky
912600f0bd Merge pull request #1178 from wetneb/url_caching
Add caching in URL fetching
2017-03-09 17:28:38 -05:00
Antonin Delpeuch
22124ac57e Add checkbox to disable caching 2017-03-09 00:21:34 +00:00
Antonin Delpeuch
32c232c2d6 Move to Guava's cache for ColumnAdditionByFetchingURLsOperation 2017-03-08 09:32:34 +00:00
Antonin Delpeuch
a9c4b0af16 Cache String, not URL, in ColumnAdditionByFetchingURLsOperation 2017-03-08 07:45:11 +00:00
Antonin Delpeuch
782a2f5b48 Add caching in URL fetching 2017-03-07 20:24:50 +00:00
Jacky
5aede573dc bump version to 2.7 2017-02-10 15:55:58 -05:00
Qi Cui
773151380e fix #1138. column transpose 2016-08-24 13:56:35 -04:00
Tom Morris
aa65bc5c18 Throw exception on error instead of logging to console 2016-05-17 15:10:09 -04:00
Tom Morris
6df822e5a6 Set ContentType to application/json 2016-05-17 15:10:09 -04:00
Tom Morris
5d45566455 Protect against NPE when content type is missing 2016-05-17 15:10:09 -04:00
Scott Wiedemann
16b0453b74 Update ToDate.java
Updating SimpleDateFormat api doc url for ToDate function.
2015-11-13 12:27:16 -07:00
Steffen Stundzig
7f5e58ef51 #1086 add support for quote character 2015-10-30 14:32:46 +01:00
Tom Morris
be7f880cbe Revert addition of synchronized methods 2015-10-16 19:33:15 -04:00
Tom Morris
e3858da843 Escape cell data for HTML - fixes #1049 2015-10-16 15:41:03 -04:00
Martin Magdinier
8b4a1d577a Merge pull request #1079 from RefinePro/issue-796
fixed issue #796 Columnize by key/value columns creates empty lines
2015-10-08 14:01:07 -04:00
jackyq2015
7a2a0eb52f fixed issue #796 Columnize by key/value columns creates empty lines 2015-09-29 20:12:05 -04:00
Tom Morris
48681e8877 Move assert where it belongs 2015-09-25 20:01:27 -04:00
Tom Morris
be936a86eb Clean up PR #1055 2015-09-25 19:01:16 -04:00
Tom Morris
de66afa512 Revert " Use new algorithm for levenshtein clustering" 2015-09-25 16:44:25 -04:00
Thad Guidry
175f4a5319 Merge pull request #1047 from lemmingapex/master
Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter
2015-09-21 20:33:05 -05:00
Thad Guidry
94e219042e Merge pull request #1007 from lispc/master
Use new algorithm for levenshtein clustering
2015-09-21 20:23:45 -05:00
Thad Guidry
85ffce60d2 Merge pull request #1070 from RefinePro/issue-995
fix issue #995
2015-09-21 20:12:51 -05:00
jackyq2015
d671d7784b fix issue #995 2015-09-21 21:03:25 -04:00
magdmartin
ab56b73db9 Merge pull request #993 from RefinePro/OpenRefine-trunk
prevent the multiple sorting
2015-09-20 09:32:17 -04:00
magdmartin
b635f4e067 Merge pull request #1055 from RefinePro/issue-512
fix issue #512 to save the file location as a table column
2015-09-20 09:31:16 -04:00
magdmartin
ab6e2951e9 Merge pull request #1051 from RefinePro/issue-1015
Issue 1015. add the meta utf-8
2015-09-20 09:28:10 -04:00
jackyq2015
4e6f584cde fix issue #512 to save the file location as a table column 2015-08-27 15:13:20 -04:00
jackyq2015
dc7535c63e 1. take out of issue #1021 fix which was mistakenly put in
2. fix the expected value for JUNIT
2015-08-06 21:31:37 -04:00
Scott Wiedemann
5eab8893cc Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter. 2015-07-30 16:19:26 -06:00
jackyq2015
819e1ba5c6 patch for issue #708. fix few hanging UIs when importing file 2015-07-18 10:27:35 -04:00
lispc
43e441a4d0 Use new algorithm for levenshtein clustering 2015-06-01 20:35:21 +08:00
Jacky
ca862970a4 prevent the multiple sorting 2015-05-01 15:04:51 -04:00
magdmartin
383f8c5e50 Changed GREL to *General Refine Expression Language* as agreed in 2013 when drafting *Using OpenRefine* 2015-04-21 10:35:52 -04:00
Matthew Blissett
5cdc6d7b5a Fallback to English language to avoid need to maintain 'default' translation files. 2015-02-10 12:33:08 +00:00
QI CUI
495dcd7bd5 use the LinkedHashMap instead of HashMap to make sure the retrive order 2015-01-30 15:03:20 -05:00
Tom Morris
83da996a36 Change to Java 5 loop syntax 2014-12-23 00:04:24 -05:00
Tom Morris
ddfaecb3e6 Merge pull request #914 from opendatatrentino/rev-masschange
Fix wrong revert order in MassChange
2014-12-22 23:50:31 -05:00
David Leoni
4d2b90ad60 added MassChangeTests 2014-12-22 12:23:49 +01:00
Tom Morris
ea723413cb Use StringUtils.toString() convenience method 2014-12-21 11:39:34 -05:00
Tom Morris
4eb6eb6eda Merge pull request #915 from opendatatrentino/fixNullCellToString
Fixes Cell.toString failing on null value
2014-12-21 11:13:34 -05:00
Matthew Blissett
f3e2b9622a Add charset=UTF-8 to HTTP Content-Type for reconciliation queries.
Fixes problem where non-ASCII characters would be URL encoded as UTF-8, but interpreted according to the whims of the server.
2014-11-28 10:45:22 +00:00
David Leoni
c3884c57f5 Fixes Cell.toString failing on null value 2014-11-27 18:45:01 +01:00
David Leoni
d29bf230b5 Fixes wrong revert order in MassChange 2014-11-27 18:12:54 +01:00
Thad Guidry
cdda1edcf0 Fixed issue with null cells after Fetch URL
Some websites do not set the charset= properly and use enclosing quotes.  Tested and Verified.
2014-08-13 21:39:30 -05:00
Tom Morris
536493c5d3 Fix AbstractMethodError 500 - fixes #589 2014-08-05 14:55:45 -04:00
Tom Morris
2fa9cf11c8 Merge pull request #859 from Arcadelia/Job-lastTouched-fix
Initialized ImportingJob.lastTouched
2014-07-03 10:36:48 -04:00
Tom Morris
655e0b0dc1 Wrap conditional statement in block 2014-07-03 10:35:24 -04:00
Tom Morris
b21cb56149 Merge pull request #852 from Arcadelia/Duplicate-job-id-fix
Import job duplicate id fix
2014-07-03 10:34:29 -04:00
Tom Morris
4333b1b2e7 Merge pull request #881 from zsxwing/simple-date-format-bug
Put ISO8601_FORMAT into ThreadLocal to fix the concurrency issue
2014-07-03 10:15:03 -04:00
Tom Morris
d106d61b25 Improve error messages - fixes #878 2014-05-30 01:47:22 -04:00
Tom Morris
5799c3d92b Synchronize access to processes list - fixes #862 2014-05-30 01:47:21 -04:00
zsxwing
4ee8e079c9 Put ISO8601_FORMAT into ThreadLocal to fix the concurrency issue 2014-05-30 11:45:28 +08:00
Tom Morris
a4d03968a5 Merge pull request #867 from abhillman/exceloutput255bugfix
Report error to user when attempting to export >255 columns, rather than generic 500 ISE
2014-04-20 23:43:19 -04:00
Aryeh Hillman
2bf35e5f0d Fix when exporting to excel files
When exporting to excel, there cannot be more than 255 columns.
If there are more columns than that, we write "ERROR: TOO MANY
COLUMNS" to the 255th column. Formerly, OpenRefine reported
a 500 Server error.
2014-04-12 16:41:54 -07:00
Frank Wennerdahl
8c02a13429 Initialized ImportingJob.lastTouched
Prevents the CleaningTimerTask from disposing newly created
ImportingJobs which have not yet been touched.
2014-02-19 16:02:45 +01:00
Frank Wennerdahl
a0d4eb0058 Job id duplicate fix
Changed how job id's are created to avoid the same id to be assigned to
two concurrent jobs.
2014-02-05 12:21:50 +01:00
Frank Wennerdahl
6dedae37a1 Fixed too frequent job cleanups
The ImportingManager cleans up jobs that has not been touched in 60ms.
According to comment this should be 60 minutes but was changed in
4529310237.
2014-02-05 11:07:41 +01:00
Tom Morris
bc801546cc Remove references to obsolete splitIntoColumns option 2013-09-18 18:44:58 -04:00
Tom Morris
4f2ebed676 Make localization language list dynamic - fixes #807
- refactor LoadLanguageCommand so language loading can be reused
- add GetLanguagesCommand for the server
- change GUI to fetch language list and update selection list with it
2013-09-18 13:16:24 -04:00
Tom Morris
1261734f15 Partial solution for #816 plus improved conversion test coverage 2013-09-18 11:14:48 -04:00
Tom Morris
d84f897ae0 Improve help message to specify an integer is returned 2013-09-18 11:12:34 -04:00
Tom Morris
f344e3da1c Return "null" for toString(null) - fixes #783
- also fixed grammar in error message
2013-09-18 10:20:17 -04:00
Tom Morris
daed3bd90c Move MARC->XML conversion to earlier in process - issue #794
- functional now, but probably not good enough to release yet
2013-09-17 19:19:50 -04:00
Tom Morris
6bd6a5934b Start wiring up MARC importer - issue #794 2013-09-17 17:17:23 -04:00
Tom Morris
cce480ff38 Fix implementation for #466 to handle default empty string 2013-09-04 18:59:13 -04:00
Tom Morris
889245fdf4 Make the number of reconciliation results configurable - closes #466 2013-09-04 18:07:12 -04:00
Thad Guidry
f2c4e3ab48 Added ability to extract MILLISECOND to datePart (milliseconds,ms,S) 2013-08-30 09:09:54 -05:00
Tom Morris
c68c1bb2b1 Upgrade to Clojure 1.5.1 & switch to clojure-slim JAR - #792 2013-08-26 19:40:37 -04:00
Tom Morris
62b8c476f1 Use Java's built-in Number formatter instead of ICU4J which is
massive - #792
2013-08-26 15:47:12 -04:00
Tom Morris
4529310237 Switch from TimerTask to ScheduledExecutorService for more robustness 2013-08-18 11:31:03 -04:00
Tom Morris
e93bfa798e Use iterator when removing to avoid ConcurrentModificationException -
fixes #652
2013-08-17 13:45:22 -04:00
Tom Morris
3315136681 Allow reinitializatoin of ProjectManager singleton - fixes #787 2013-08-17 12:47:57 -04:00
Tom Morris
25f02dd9b9 Fix Java 6 incompatibility 2013-08-15 15:57:24 -04:00
Tom Morris
fa072df85c Add locale support to toDate() - fixes #729 2013-08-15 15:19:01 -04:00
Tom Morris
ab42df6ea3 Merge pull request #658 from Arcadelia/CSV_Multi-char-separator_support
Support for multi-char-separators in CSV
2013-08-14 07:29:45 -07:00
Tom Morris
37d8abc114 Minor improvement to recon error handling 2013-08-10 18:03:06 -04:00
Tom Morris
1d8784e059 Make workspace saving and loading more robust - fixes #528
- don't overwrite old files if we get an error writing new ones
- don't write unchanged data
- keep backup files around until next write rather than deleting
immediately
- attempt to recreate missing metadata as best as possible
2013-08-09 19:53:53 -04:00
Tom Morris
579d71b7eb Switch back to NUL character for quote now that OpenCSV handles it -
fixes #653
2013-08-07 17:07:17 -04:00
Tom Morris
7b5b549113 More project saving changes for #528
- reduce project retention in memory from 1 hr to 15 min.
- free all unmodified projects if we get an error on save (we could be
running low on memory)
- make sure exceptions propagate up to where they can be usefully
handled
2013-08-05 14:13:56 -04:00
Tom Morris
190a031a8a Comments only. No code changes. 2013-08-05 14:11:06 -04:00
Tom Morris
3500f20e47 Save all modified projects before importing new one - hopefully helps
#528
2013-08-05 14:10:26 -04:00
Tom Morris
57f5e9873d Add Javadoc. No code changes. 2013-08-05 13:08:30 -04:00
Tom Morris
c3cab0524a Narrow exceptions thrown and let them propagate up so we know
workspace file isn't valid - first step for #528
2013-08-05 13:08:02 -04:00
Tom Morris
a7273625d7 Add support for Basic Authentication over HTTPS - addresses #217 2013-08-02 19:15:24 -04:00
Tom Morris
4f7da9d18e Switch to Apache HTTP client for downloads - fixes #748 2013-08-02 18:13:41 -04:00
Tom Morris
d7531bbbd8 Handle quoted fields with embedded new lines. Sort separators by score
rather than just standard deviation
2013-08-02 17:59:09 -04:00
Tom Morris
f4ff227340 Clean up localization - fixes #760, modifies pull request #755
- make all file loading relative to module base
- move core language files into appropriate place
- eliminate all SetLanguage commands and use SetPreference instead
- eliminate all LoadLanguage commands except for core's
- fix duplicate keys in JSON language files
- remove BOM from JSON language files

OPEN - task 760: Translations not being loaded from built kit 
http://github.com/OpenRefine/OpenRefine/issues/issue/760
2013-07-31 00:31:31 -04:00
Tom Morris
9450d483ce Fix up line endings 2013-07-29 15:49:20 -04:00
Tom Morris
3003c1a709 Make importers more robust to preview errors when someone selects the
wrong importer/parser
2013-07-27 13:35:12 -04:00