RandomSec

Author	SHA1	Message	Date
Urvashi Gupta	f62f63706c	Report HTTP error codes to the user when creating a project from a URL (#2870 ) * HTTP Error * urlImportingTestCompleted	2020-07-07 11:58:47 +02:00
Tom Morris	e61d50a1aa	Fix NGramFingerprintKeyer to ignore accents - fixes #1161 (#2899 ) Fixes #1161 This change parallels what was done in #1257 `1da3c00` to fix the FingerprintKeyer and moves the diacritic removal before the deduping. Includes a test.	2020-07-07 09:02:49 +02:00
Tom Morris	3717111db8	Fix Open Office Spreadsheet (ODS) dates (#2843 ) * Truncate any completely empty columns on the right Fixes #565 The current versions of Open Office create default spreadsheets with over 1000 empty columns. Keep track of the rightmost non-empty column when importing and truncate everything else. Also adds a basic ODS import test. * Fix dates in ODS spreadsheets Fixes #2224	2020-07-04 08:42:33 +02:00
Tom Morris	df8d092132	Micro benchmark harness & ToNumber optimizations (#2859 ) * Performance optimized version of ToNumber Approximately 5x faster for floats (data dependent) and about the same speed for integers. - Instead of blindly trying to parse as Long, do a quick check for obvious problems (e.g. decimal point). - Don't trim. It's already done by called methods. - Use valueOf() instead of parse() to avoid object creation * Add Java Microbenchmark Harness The shaded JAR is missing the OpenRefine classes, for a reason that I haven't figured out, so requires openrefine-main.jar at runtime. * Remove old implementations of ToNumber * Remove unneeded dependencies from main project * Clean up and reformat	2020-07-03 21:42:44 +02:00
Tom Morris	d3db73aa67	Remove shortest-column-name ordering Refs #2863 The tree importer sorts columns/column groups by how populated they are, which is of arguable utility, but the tie-breaker of ordering by shortest column name is completely silly. This change removes that and, in conjunction with a stable sort algorithm, will preserve the original order of the columns.	2020-07-02 16:12:55 -04:00
Tom Morris	54291ef441	Use Apache IO Commons IOUtils instead of homerolled (#2845 ) Probably should remove the funky Gzip support with the overloaded use of the encoding parameter, but this is a start.	2020-06-30 13:49:47 +02:00
Tom Morris	421974cc3d	Truncate any completely empty columns on the right (#2842 ) Fixes #565 The current versions of Open Office create default spreadsheets with over 1000 empty columns. Keep track of the rightmost non-empty column when importing and truncate everything else. Also adds a basic ODS import test.	2020-06-30 08:19:00 +02:00
Tom Morris	4b146acc6e	Create Project import improvements (#2806 ) * Fix charset encoding & MIME type handling Character set (ie what we call "encoding") is part of the Content-Type, not the Content-Encoding, which specifies compression (e.g. gzip). This correctly sets the character set encoding as well as cleaning the MIME type so that additional parsing doesn't need to be done downstream (and removes that code). * Use "text" instead of "text/line-based" as default fallback format The TextLineBasedGuesser only tries a limited number of formats (CSV, TSV, fixed), so we can't get out of that hole to find JSON, XML, etc. Start with a more general format instead to improve our guessing odds. * Support content type Structured Name Syntax Suffixes (+json +xml) If we can't find a fully specified content type in our lookup, fall back to just the suffix (which is registered with a leading +) Fixes #2800 Fixes #2805	2020-06-25 08:36:57 +02:00
Tom Morris	1849e62234	Better error handling for reconciliation process - fixes #2590 (#2671 ) * Harden reconciliation - Fixes #2590 - check for non-JSON / unparseable JSON returns - handle malformed results response with no name for candidates - catch any Exception, not just IOExceptions - call processManager.onFailedProcess() for cleanup on error * Add default constructor for Jackson Jackson complains about needing a default constructor for the NON_DEFAULT annotation, but I'm not sure why this worked before. * Clean up indentation and unused variable - no functional changes Make indentation consistent throughout the module, changing recently added lines to use the standard all spaces convention. Remove unused count variable * Simplify control flow * Update limit parameter comment. No functional change. * Replace ternary expression which is causing NPE * Add reconciliation tests using mock HTTP server	2020-06-23 21:54:54 +02:00
Tom Morris	e293602897	Restore character encoding guesser (#2755 ) * Fixes #486. Builds on code from Steffen Stundzig - Switch from ICU4J to juniversalchardet (Java port of Mozilla charset detector) - Replace org.json code with Jackson - Add tests - Add TODO for multi-file character encoding mismatches * Restore dependency lost in rebase Co-authored-by: Steffen Stundzig <git@stundzig.de>	2020-06-22 06:04:51 +02:00
Tom Morris	5d2c10b9d8	Merge pull request #2731 from tfmorris/jena-3.7.0 Bump Jena from 3.6.0 to 3.15.0	2020-06-16 17:08:01 -04:00
Tom Morris	5f368bc56d	Use ContentDisposition instead of ContentType to control download (#2722 ) * Use ContentDisposition instead of ContentType to control download Fixes #1197. Previously we were using a funky ContentType to attempt to force a file download rather than display in browser, but this conflicted with attempts to save UTF-8 which was outside the Basic Multilingual Plane (BMP). By switching to ContentDisposition: attachment, which has been the preferred method for a number of years, we can avoid this conflict. As part of this, switch to using the "preview" param consistently to control preview vs download rather than the content type. * Switch content type to text/plain Now that we don't need to use ContentType to control download behavior, we can use something more reasonable.	2020-06-16 15:46:07 +02:00
Tom Morris	749704518c	Use Apache HTTP Commons for Fetch URL (#2692 ) * Use mockwebserver instead of live network for tests Fixes #2680. Fixes #1904. * Remove use of deprecated methods * Convert to use Apache HTTP Components client library Fixes #1410 by virtue of redirect following being a built-in capability of the library, along with retries with binary backoff, built-in decompression, etc. * Address review comments	2020-06-16 09:38:06 +02:00
Tom Morris	559494b75d	Add TODOs for Jena RDF language names	2020-06-15 20:04:05 -04:00
Tom Morris	348d82d131	Merge pull request #2725 from OpenRefine/issue-2724-wikidata-endpoint Update URL of Wikidata reconciliation service	2020-06-15 17:20:05 -04:00
james-cui	04055153a1	add archive column (#2573 ) Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>	2020-06-15 19:56:00 +02:00
Joanne Ong	d57d76f7df	Fix imprecise facet statistics in records mode (#2607 ) * Fix bug in choice counts for records mode * Add test for value grouper on records * Refactor and comment code * Count distinct instances of null/blank data * Update test to check for blank data count in records * Remove unnecessary import statement	2020-06-15 19:38:50 +02:00
Lisa Chandra	947356ddad	[FEAT]Adds new options for split (#2471 ) * added options ui * added definition for both separators * added tests * removed definitions from backend and added them to frontend * added reverse order and handling for accented characters * added tests for accented characters and reverse split * fixed build errors * unicode character ranges instead * added examples	2020-06-15 19:30:18 +02:00
Antonin Delpeuch	1bb9e8a67e	Update URL of Wikidata reconciliation service. Closes #2724	2020-06-15 00:35:10 +02:00
Tom Morris	bf1c890cc3	Unused imports and other minor cleanups (#2723 ) * Two minor fixes - prevent invalid index error on empty strings (shouldn't normally happen) - update deprecated Apache Commons Lang method * Remove unused imports	2020-06-14 21:18:02 +02:00
chuhao zeng	9b03ecae41	Convert illegal characters into legal ones. (#2431 ) * Convert illegal characters into leagal ones. * Test tab in key & value string Also fix up test that depended on previous TAB related error message and clean up logging Co-authored-by: Tom Morris <tfmorris@gmail.com>	2020-06-14 09:47:58 +02:00
Tom Morris	18c18e587e	Replace Apache Ant with Commons Compress (#2691 ) NOTE: Changes the public API where some of the old types were embedded which means that any extensions that extend these interfaces will have to be updated. Fixes #2690.	2020-06-11 16:39:51 +02:00
Tom Morris	e6ed8e5d62	Save preferences JSON using UTF-8 encoding. Bulletproof prefs load. (#2657 ) * Save preferences JSON using UTF-8 encoding. Bulletproof prefs load. Fixes #2543. Fixes #2627. Always use UTF-8 to write JSON because platform default encoding might not be legal JSON (e.g. ISO 8859-1). Also be more conservative about keeping backups if we fail to write. * Handle case where backup prefs is better than more recent * Recover from corrupted prefs with null starred list. Fixes #2544. Replaces null with an empty list. * Run tests with non-UTF-8 encoding Make sure that we don't depend on UTF-8 being the default encoding because it isn't true everywhere (e.g. Windows) * Add test for non-ASCII chars in workspace.json This depends on the default Java encoding being something other than UTF-8 to test properly.	2020-06-06 10:00:01 +01:00
Antoine Beaubien	3ca08f6ff1	Changed cell.error to cell.errorMessage & added help data. (#2628 ) * Changed cell.error to cell.errorMessage & added help data. Changed cell.error to cell.errorMessage and added the informations into the internal help system. * FR Text correction * HU Fix text 3 instead of 2.	2020-05-23 14:05:25 +02:00
Lu Liu	e89eaf0ee2	support default project name and column name for cross() (#2518 )	2020-05-22 09:39:57 +02:00
Tom Morris	557ffad920	Merge pull request #2586 from OpenRefine/issue-2510-type-boolean Support "boolean" return for type() function. Closes #2510	2020-05-18 17:24:47 -04:00
Antoine2711	0e86619d86	Fix the true.type() == "boolean" Fix the true.type() == "boolean" instead of java.lang.Boolean. Remove all the references to "error" result in Type(). This will be addressed in: @ToDo fix this with issue #2562	2020-05-18 17:23:43 -04:00
Antonin Delpeuch	d7d567439e	Set version to 3.5-SNAPSHOT	2020-05-13 22:56:33 +02:00
Antonin Delpeuch	5597e1c942	Set version to 3.4-beta	2020-05-13 22:52:25 +02:00
Antonin Delpeuch	825e687b0b	Fix bug when both trim and autodetect are enabled in tabular parser. Closes #2584 (#2610 )	2020-05-05 14:00:17 +02:00
Thad Guidry	15710ace17	reduce object creation during JSON serialization (#2576 ) If a new {@code Double} instance is not required, this method * should generally be used in preference to the constructor * {@link #Double(double)}, as this method is likely to yield * significantly better space and time performance by caching * frequently requested values.	2020-05-05 10:07:54 +02:00
PJ Fanning	f047a88518	poi works better reading files directly (#2597 )	2020-04-26 21:27:09 +02:00
PJ Fanning	ab64303cbb	allow xlsx files to have more columns (#2602 )	2020-04-26 17:07:26 +02:00
PJ Fanning	1a0e187561	correct excel mime types (#2596 ) * correct excel mime types * address PR issue * remove use of wildcard	2020-04-26 14:36:37 +02:00
PJ Fanning	88f7fb2852	Use SXSSFWorkbook in XlsExporter to improve memory usage when exporting xlsx files (#2594 )	2020-04-26 12:26:05 +02:00
Thad Guidry	e5e2c8f665	remove Freebase AGENT_ID (#2575 ) * remove unused imports * remove unneeded Freebase AGENT_ID In the past, Freebase editors used Google Refine for making edits to its database and the internal identifier was "/en/google_refine" which equated to a Software Application type with attached metadata and also had ownership privileges for certain Freebase Apps. Since Freebase is no longer around, this identifier, only used by Freebase, can now be removed. (This is not a User-Agent header string but was an internal identifier for the Freebase database which no longer exists) * Revert "remove unused imports" This reverts commit 9f6a276f36a54245016bd445680067d2c8862fcb.	2020-04-21 18:32:39 +02:00
Thad Guidry	009c587437	remove unused imports (#2574 )	2020-04-21 15:51:01 +02:00
Lu Liu	bf84fc9cf1	use string representation for matching (#2571 )	2020-04-20 09:07:09 +02:00
Ekta Mishra	05b6a7b2ae	Provides more intuitive representation for arrays in GREL (#2488 ) Added test for same closes #2040	2020-04-01 10:59:25 +02:00
chuhao zeng	1f0111eaed	Fix silent error in JSON/XML importers (#2414 ) * Add error handler for parse error * Add test for parsing json with incorrect strecture * Enable localization from front-end * Add methods to get localized error messages * Update returned exception message * Remove unused log and fix file diff issue * Test auto build * Refactor getOptions in newly created test * Use new exception to unwrap original message * Undo unexpected fix * Remove unused lines * Fix exception logic * Fix typo	2020-03-27 09:41:49 +01:00
Albin Larsson	72966af5b6	remove Freebase reconciliation from Excel Importer (#2470 )	2020-03-27 09:30:00 +01:00
Lu Liu	f2b06418da	Support lookup by numbers for GREL cross function (#2468 ) * support int & long argument for cross function * support any types of a cell value	2020-03-26 08:57:10 +01:00
chuhao zeng	70b4c6a6d0	Enable gzip compression (#2475 ) * Enable gzip compression * Add test for gzip parser	2020-03-26 08:42:55 +01:00
chuhao zeng	e484625adf	Fix: Data losses when importing multiple sheets from same Excell file (#2404 ) * Fix loosing data when importing multiple sheets from same source Excell file * Add test for importing multi sheets with different column size * Fix space issues * Restore old tests and implement new test cases for the new feature * Restore unexpected delete * Refactor fix * Restore unexpected line delete * Add new unit test for new feature	2020-03-23 22:41:23 +01:00
Thad Guidry	63bef81980	Remove unused variable in JSONUtilities (#2464 )	2020-03-23 20:38:03 +01:00
Lu Liu	9ad3b1080f	Make cross() function work for all columns (#2456 ) * fix #1950 * migrate from join to lookup * reformat	2020-03-23 14:48:32 +01:00
Lisa Chandra	ef8ad85c3c	Adds trim whitespace option to separator based files (#2408 ) * added trim ui to csv importer * added trim functionality * trimStrings handler only for strings * added test for trimStrings option in csv/tsv files * made trim option enabled by default	2020-03-21 10:38:43 +00:00
Albin Larsson	9745bfe374	consistent usage of Apache http status constants (#2432 )	2020-03-18 06:40:52 +00:00
Lisa Chandra	a91691cb6b	[FIX] json/xml trim whitespace configuration option (#2415 ) * trimStrings condition * added test for trimString xml * added trimStrings check for json	2020-03-15 16:04:01 +00:00
zengchu2	c90fd31daf	Add cell.error field for error messages (#2363 ) * Add case for querying cell.error for error messages * Add testing file * Refactor test case for cell with error * Reformat spaces	2020-03-10 10:14:15 +00:00
Chris Parker	93d34d781a	Replaced some deprecated methods	2020-02-24 23:51:41 -06:00
Antonin Delpeuch	429f26c2ae	Set version to 3.4-SNAPSHOT	2020-01-31 19:06:56 +01:00
Antonin Delpeuch	58b839b9c5	Set version to 3.3	2020-01-31 18:22:18 +01:00
Antonin Delpeuch	faece760f6	Set version to 3.3-SNAPSHOT	2020-01-08 20:56:51 +01:00
jamessspanggg	5afd93e2d1	Standardise 'edit' cell dialogue with 'toNumber()' behavior	2020-01-07 10:09:28 +08:00
Antonin Delpeuch	e62bb7ac0e	Set version to 3.3-rc1	2020-01-06 13:30:39 +01:00
Antonin Delpeuch	904129d0f7	Fix other NPE in expression logging, for #2264	2020-01-06 06:30:56 +01:00
Antonin Delpeuch	14dd4c0112	Merge pull request #2264 from OpenRefine/issue-2086-expression-logging-npe Fix NPE in expression logging.	2019-12-30 21:52:58 +01:00
Antonin Delpeuch	60089ab716	Merge pull request #2263 from OpenRefine/issue-2213-xlsx-export-url More robust URI detection in tabular exporter.	2019-12-30 21:52:45 +01:00
Antonin Delpeuch	7593d5484d	Add Hyperlink to cell in Excel importer, with fallback to String, for #2213	2019-12-25 22:24:58 +01:00
Antonin Delpeuch	08e175dc66	Fix NPE in expresion logging. Closes #2086 .	2019-12-25 12:33:42 +01:00
Antonin Delpeuch	0bd6a0fbd7	Merge pull request #2198 from viniciusbds/master Dealing with a possible null pointer dereference	2019-12-25 11:42:34 +01:00
Antonin Delpeuch	78853f8fb2	More robust URI detection in tabular exporter. Closes #2213 .	2019-12-25 11:33:03 +01:00
Antonin Delpeuch	726395620b	Merge pull request #2202 from viniciusbds/patch-1 Update SqlCreateBuilder.java	2019-12-16 08:18:20 +01:00
Antonin Delpeuch	cc5498a42a	Return best loaded language code in LoadLanguageCommand. (#2232 ) Closes #2227.	2019-11-27 15:35:18 +00:00
Antonin Delpeuch	efbfce29bb	Add server-side language fallback. This allows to keep the same Javascript calls to load languages, so it does not require any change for extensions to benefit from this. Closes #1350. Fixes #2209.	2019-11-07 17:23:02 +01:00
Vinicius Barbosa	d452e3040c	Update SqlCreateBuilder.java	2019-10-25 12:22:16 -03:00
Vinicius Barbosa	522641e84f	Update SetProjectTagsCommand.java	2019-10-25 11:03:41 -03:00
Antonin Delpeuch	c8eaaee39c	Set version to 3.3-beta	2019-10-21 10:31:24 +01:00
viniciusbds	790fc2ffaa	Dealing with a possible null pointer dereference	2019-10-18 00:23:26 -03:00
viniciusbds	5d89978000	Dealing with a possible null pointer dereference	2019-10-17 23:59:16 -03:00
Antonin Delpeuch	9ae6a7a581	Tie up CSRF tokens in the frontend	2019-10-15 12:07:14 +01:00
Antonin Delpeuch	5dc005749a	Add CSRF protection to remaining commands	2019-10-15 12:07:13 +01:00
Antonin Delpeuch	3559eeb11f	CSRF protection for project and recon commands	2019-10-15 12:07:12 +01:00
Antonin Delpeuch	a340c137d0	CSRF protection for OpenWorkspaceDirCommand and language loading	2019-10-15 12:07:04 +01:00
Antonin Delpeuch	91cead27f8	CSRF protection for ImportingController	2019-10-14 16:24:26 +01:00
Antonin Delpeuch	70e37b9085	Add CSRF protection to cell, history, column and expr commands	2019-10-14 16:24:26 +01:00
Antonin Delpeuch	51ddd27909	Require CSRF token in EditOneCellCommand	2019-10-14 16:24:26 +01:00
Antonin Delpeuch	21b841a089	Add CSRF token generation capabilities, for #2164	2019-10-14 16:24:26 +01:00
viniciusbds	496f1fd2d0	Fix bug when accessing empty list	2019-10-02 08:56:22 -03:00
viniciusbds	6743d5c878	Change strings comparison to use equals comparator	2019-10-01 23:05:24 -03:00
Antonin Delpeuch	bbb5766a33	Merge pull request #2155 from OpenRefine/issue-2152-lonely-clusters Fix clusters with single candidates.	2019-09-18 19:08:18 +01:00
Antonin Delpeuch	36150a874d	Fix scatterplot facet filtering	2019-09-12 11:52:28 +01:00
Antonin Delpeuch	573ba18e6d	Fix scatterplot drawing command, closes #2117	2019-09-12 10:43:12 +01:00
Antonin Delpeuch	95b063162d	Fix clusters with single candidates. Closes #2152 .	2019-09-11 12:12:32 +01:00
Antonin Delpeuch	8ab7653e0b	Set version to 3.3-SNAPSHOT	2019-07-26 15:52:00 +01:00
Antonin Delpeuch	e3417bff49	Set version to 3.2	2019-07-26 15:29:57 +01:00
Owen Stephens	ac7b5a0a19	Update Find and Tests	2019-07-21 13:34:18 +01:00
Owen Stephens	d6999de0da	Match only accepts regular expressions	2019-07-21 13:19:34 +01:00
Antonin Delpeuch	33ff7be18a	Fix NPE in StandardReconConfig. Closes #2076 .	2019-07-03 10:21:45 +02:00
Antonin Delpeuch	cde59a0dca	Merge pull request #2070 from OpenRefine/issue-2068-duplicate-json-key Remove duplicate JSON keys.	2019-07-02 10:19:16 +02:00
Antonin Delpeuch	8390d234b1	Merge pull request #2058 from OpenRefine/issue-1994-customMetadata Fix parsing and display of custom metadata	2019-06-14 14:53:19 +01:00
Antonin Delpeuch	9d76b04a1c	Remove duplicate JSON keys. Closes #2068 .	2019-06-14 11:38:24 +01:00
Antonin Delpeuch	ad9566502f	Merge pull request #2059 from OpenRefine/issue-1989-filenotfound Disable error message when workspace.json does not exist.	2019-06-06 20:57:31 +01:00
Antonin Delpeuch	afb787c845	Disable error message when workspace.json does not exist. Fixes #1989	2019-06-06 17:33:04 +01:00
Krzysztof 'impune-pl' Prorok	ae2f44f9d5	Fixed: issue 1998	2019-06-04 17:01:25 +02:00
Antonin Delpeuch	b9573d83e0	Add customMetadata to project metadata parsing test	2019-06-04 12:02:49 +01:00
s_tanaka	b8b9feac0c	Fix column removal in reorder leaves undeleted hidden cells.	2019-05-15 19:37:40 +09:00
Antonin Delpeuch	edfa7d8445	Skip unknown operations in ApplyOperationsCommand	2019-04-19 11:25:01 +01:00
Antonin Delpeuch	0332be312f	Fix JSON history corruption. Also adds new logic to preserve the JSON representation of unknown operations, to protect from version downgrading or removal of extensions. Closes #1990.	2019-04-18 20:31:41 +01:00

1 2 3 4 5 ...

1139 Commits