RandomSec

Author	SHA1	Message	Date
Tom Morris	e61d50a1aa	Fix NGramFingerprintKeyer to ignore accents - fixes #1161 (#2899 ) Fixes #1161 This change parallels what was done in #1257 `1da3c00` to fix the FingerprintKeyer and moves the diacritic removal before the deduping. Includes a test.	2020-07-07 09:02:49 +02:00
Tom Morris	3717111db8	Fix Open Office Spreadsheet (ODS) dates (#2843 ) * Truncate any completely empty columns on the right Fixes #565 The current versions of Open Office create default spreadsheets with over 1000 empty columns. Keep track of the rightmost non-empty column when importing and truncate everything else. Also adds a basic ODS import test. * Fix dates in ODS spreadsheets Fixes #2224	2020-07-04 08:42:33 +02:00
Antonin Delpeuch	f4692de9e1	Increase maximum wait for testInvalidUrl, follow-up for #2876 #2875	2020-07-03 21:48:43 +02:00
Tom Morris	5d6af9cb6c	Merge pull request #2865 from tfmorris/2863-tree-column-ordering Remove shortest-column-name ordering - fixes #2863	2020-07-03 15:23:36 -04:00
Tom Morris	f5786afa35	Increase test timeout - fixes #2875 (#2876 )	2020-07-03 21:20:01 +02:00
Tom Morris	d3db73aa67	Remove shortest-column-name ordering Refs #2863 The tree importer sorts columns/column groups by how populated they are, which is of arguable utility, but the tie-breaker of ordering by shortest column name is completely silly. This change removes that and, in conjunction with a stable sort algorithm, will preserve the original order of the columns.	2020-07-02 16:12:55 -04:00
Tom Morris	28a9f68236	Unit test improvements (#2856 ) * Fix two deprecated methods usages * Test ToNumber conversions * Test behavior of all functions when passed 0 or 8 arguments There are 16 which fail currently on 0 args (return null or False instead of EvalError), but have been whitelisted until we can verify whether it's safe to change them without introducing compatibility issues. There are 19 which fail to return an error on too many (ie 8) args.	2020-07-02 20:29:21 +02:00
Tom Morris	0f3a6006f3	Add Excel95 import test and improve other importer tests (#2844 ) No issue. - we don't support Excel95, but make sure that it generates an exception - move the test data file into the appropriate directory - for any normal test, consider exceptions a failure	2020-06-30 08:20:56 +02:00
Tom Morris	421974cc3d	Truncate any completely empty columns on the right (#2842 ) Fixes #565 The current versions of Open Office create default spreadsheets with over 1000 empty columns. Keep track of the rightmost non-empty column when importing and truncate everything else. Also adds a basic ODS import test.	2020-06-30 08:19:00 +02:00
Tom Morris	83f52d4ba5	Fall back to Apache Jena 3.9.0 (from 3.15.0) (#2826 ) Fixes #2824 Versions up through 3.14.0 appear to work, but since odfdom bundles Jena 3.9.0, we're going to be conservative and match that. As an added bonus, includes a blank node test which will trigger the failure.	2020-06-27 23:40:21 +02:00
Tom Morris	4b146acc6e	Create Project import improvements (#2806 ) * Fix charset encoding & MIME type handling Character set (ie what we call "encoding") is part of the Content-Type, not the Content-Encoding, which specifies compression (e.g. gzip). This correctly sets the character set encoding as well as cleaning the MIME type so that additional parsing doesn't need to be done downstream (and removes that code). * Use "text" instead of "text/line-based" as default fallback format The TextLineBasedGuesser only tries a limited number of formats (CSV, TSV, fixed), so we can't get out of that hole to find JSON, XML, etc. Start with a more general format instead to improve our guessing odds. * Support content type Structured Name Syntax Suffixes (+json +xml) If we can't find a fully specified content type in our lookup, fall back to just the suffix (which is registered with a leading +) Fixes #2800 Fixes #2805	2020-06-25 08:36:57 +02:00
Tom Morris	1849e62234	Better error handling for reconciliation process - fixes #2590 (#2671 ) * Harden reconciliation - Fixes #2590 - check for non-JSON / unparseable JSON returns - handle malformed results response with no name for candidates - catch any Exception, not just IOExceptions - call processManager.onFailedProcess() for cleanup on error * Add default constructor for Jackson Jackson complains about needing a default constructor for the NON_DEFAULT annotation, but I'm not sure why this worked before. * Clean up indentation and unused variable - no functional changes Make indentation consistent throughout the module, changing recently added lines to use the standard all spaces convention. Remove unused count variable * Simplify control flow * Update limit parameter comment. No functional change. * Replace ternary expression which is causing NPE * Add reconciliation tests using mock HTTP server	2020-06-23 21:54:54 +02:00
Tom Morris	e293602897	Restore character encoding guesser (#2755 ) * Fixes #486. Builds on code from Steffen Stundzig - Switch from ICU4J to juniversalchardet (Java port of Mozilla charset detector) - Replace org.json code with Jackson - Add tests - Add TODO for multi-file character encoding mismatches * Restore dependency lost in rebase Co-authored-by: Steffen Stundzig <git@stundzig.de>	2020-06-22 06:04:51 +02:00
Tom Morris	77b858db18	Fix race in Process Manager (#2748 ) * Remove redundant JSON diff logging * Fix race in process manager test causing intermittent failure	2020-06-17 21:24:25 +02:00
Tom Morris	749704518c	Use Apache HTTP Commons for Fetch URL (#2692 ) * Use mockwebserver instead of live network for tests Fixes #2680. Fixes #1904. * Remove use of deprecated methods * Convert to use Apache HTTP Components client library Fixes #1410 by virtue of redirect following being a built-in capability of the library, along with retries with binary backoff, built-in decompression, etc. * Address review comments	2020-06-16 09:38:06 +02:00
james-cui	04055153a1	add archive column (#2573 ) Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>	2020-06-15 19:56:00 +02:00
Joanne Ong	d57d76f7df	Fix imprecise facet statistics in records mode (#2607 ) * Fix bug in choice counts for records mode * Add test for value grouper on records * Refactor and comment code * Count distinct instances of null/blank data * Update test to check for blank data count in records * Remove unnecessary import statement	2020-06-15 19:38:50 +02:00
Lisa Chandra	947356ddad	[FEAT]Adds new options for split (#2471 ) * added options ui * added definition for both separators * added tests * removed definitions from backend and added them to frontend * added reverse order and handling for accented characters * added tests for accented characters and reverse split * fixed build errors * unicode character ranges instead * added examples	2020-06-15 19:30:18 +02:00
chuhao zeng	9b03ecae41	Convert illegal characters into legal ones. (#2431 ) * Convert illegal characters into leagal ones. * Test tab in key & value string Also fix up test that depended on previous TAB related error message and clean up logging Co-authored-by: Tom Morris <tfmorris@gmail.com>	2020-06-14 09:47:58 +02:00
Tom Morris	18c18e587e	Replace Apache Ant with Commons Compress (#2691 ) NOTE: Changes the public API where some of the old types were embedded which means that any extensions that extend these interfaces will have to be updated. Fixes #2690.	2020-06-11 16:39:51 +02:00
Tom Morris	e6ed8e5d62	Save preferences JSON using UTF-8 encoding. Bulletproof prefs load. (#2657 ) * Save preferences JSON using UTF-8 encoding. Bulletproof prefs load. Fixes #2543. Fixes #2627. Always use UTF-8 to write JSON because platform default encoding might not be legal JSON (e.g. ISO 8859-1). Also be more conservative about keeping backups if we fail to write. * Handle case where backup prefs is better than more recent * Recover from corrupted prefs with null starred list. Fixes #2544. Replaces null with an empty list. * Run tests with non-UTF-8 encoding Make sure that we don't depend on UTF-8 being the default encoding because it isn't true everywhere (e.g. Windows) * Add test for non-ASCII chars in workspace.json This depends on the default Java encoding being something other than UTF-8 to test properly.	2020-06-06 10:00:01 +01:00
Antoine Beaubien	3ca08f6ff1	Changed cell.error to cell.errorMessage & added help data. (#2628 ) * Changed cell.error to cell.errorMessage & added help data. Changed cell.error to cell.errorMessage and added the informations into the internal help system. * FR Text correction * HU Fix text 3 instead of 2.	2020-05-23 14:05:25 +02:00
Lu Liu	e89eaf0ee2	support default project name and column name for cross() (#2518 )	2020-05-22 09:39:57 +02:00
Tom Morris	557ffad920	Merge pull request #2586 from OpenRefine/issue-2510-type-boolean Support "boolean" return for type() function. Closes #2510	2020-05-18 17:24:47 -04:00
Antoine2711	0e86619d86	Fix the true.type() == "boolean" Fix the true.type() == "boolean" instead of java.lang.Boolean. Remove all the references to "error" result in Type(). This will be addressed in: @ToDo fix this with issue #2562	2020-05-18 17:23:43 -04:00
Antonin Delpeuch	825e687b0b	Fix bug when both trim and autodetect are enabled in tabular parser. Closes #2584 (#2610 )	2020-05-05 14:00:17 +02:00
PJ Fanning	ab64303cbb	allow xlsx files to have more columns (#2602 )	2020-04-26 17:07:26 +02:00
PJ Fanning	fe7fcce94b	small improvement to xls tests (#2599 )	2020-04-26 16:02:20 +02:00
PJ Fanning	1a0e187561	correct excel mime types (#2596 ) * correct excel mime types * address PR issue * remove use of wildcard	2020-04-26 14:36:37 +02:00
Thad Guidry	009c587437	remove unused imports (#2574 )	2020-04-21 15:51:01 +02:00
Lu Liu	bf84fc9cf1	use string representation for matching (#2571 )	2020-04-20 09:07:09 +02:00
Ekta Mishra	05b6a7b2ae	Provides more intuitive representation for arrays in GREL (#2488 ) Added test for same closes #2040	2020-04-01 10:59:25 +02:00
chuhao zeng	1f0111eaed	Fix silent error in JSON/XML importers (#2414 ) * Add error handler for parse error * Add test for parsing json with incorrect strecture * Enable localization from front-end * Add methods to get localized error messages * Update returned exception message * Remove unused log and fix file diff issue * Test auto build * Refactor getOptions in newly created test * Use new exception to unwrap original message * Undo unexpected fix * Remove unused lines * Fix exception logic * Fix typo	2020-03-27 09:41:49 +01:00
Lu Liu	f2b06418da	Support lookup by numbers for GREL cross function (#2468 ) * support int & long argument for cross function * support any types of a cell value	2020-03-26 08:57:10 +01:00
chuhao zeng	70b4c6a6d0	Enable gzip compression (#2475 ) * Enable gzip compression * Add test for gzip parser	2020-03-26 08:42:55 +01:00
chuhao zeng	e484625adf	Fix: Data losses when importing multiple sheets from same Excell file (#2404 ) * Fix loosing data when importing multiple sheets from same source Excell file * Add test for importing multi sheets with different column size * Fix space issues * Restore old tests and implement new test cases for the new feature * Restore unexpected delete * Refactor fix * Restore unexpected line delete * Add new unit test for new feature	2020-03-23 22:41:23 +01:00
Lu Liu	9ad3b1080f	Make cross() function work for all columns (#2456 ) * fix #1950 * migrate from join to lookup * reformat	2020-03-23 14:48:32 +01:00
Lisa Chandra	ef8ad85c3c	Adds trim whitespace option to separator based files (#2408 ) * added trim ui to csv importer * added trim functionality * trimStrings handler only for strings * added test for trimStrings option in csv/tsv files * made trim option enabled by default	2020-03-21 10:38:43 +00:00
Albin Larsson	9745bfe374	consistent usage of Apache http status constants (#2432 )	2020-03-18 06:40:52 +00:00
Albin Larsson	0233e7186b	CSVExporter: add test case for quoteAll option (#2430 )	2020-03-18 04:39:32 +00:00
Lisa Chandra	a91691cb6b	[FIX] json/xml trim whitespace configuration option (#2415 ) * trimStrings condition * added test for trimString xml * added trimStrings check for json	2020-03-15 16:04:01 +00:00
Lu Liu	14ef45efb2	mock reconciliation service (#2410 )	2020-03-14 09:40:15 +00:00
zengchu2	c90fd31daf	Add cell.error field for error messages (#2363 ) * Add case for querying cell.error for error messages * Add testing file * Refactor test case for cell with error * Reformat spaces	2020-03-10 10:14:15 +00:00
jamessspanggg	67b62c5c16	EditOneCellCommandTests: Add number parsing tests	2020-01-08 09:55:51 +08:00
Antonin Delpeuch	904129d0f7	Fix other NPE in expression logging, for #2264	2020-01-06 06:30:56 +01:00
Antonin Delpeuch	14dd4c0112	Merge pull request #2264 from OpenRefine/issue-2086-expression-logging-npe Fix NPE in expression logging.	2019-12-30 21:52:58 +01:00
Antonin Delpeuch	08e175dc66	Fix NPE in expresion logging. Closes #2086 .	2019-12-25 12:33:42 +01:00
Antonin Delpeuch	78853f8fb2	More robust URI detection in tabular exporter. Closes #2213 .	2019-12-25 11:33:03 +01:00
Antonin Delpeuch	cc5498a42a	Return best loaded language code in LoadLanguageCommand. (#2232 ) Closes #2227.	2019-11-27 15:35:18 +00:00
Antonin Delpeuch	85e40b8c45	Fix typo	2019-11-07 17:52:38 +01:00

1 2 3 4 5 ...

553 Commits