Commit Graph

105 Commits

Author SHA1 Message Date
jackyq2015
819e1ba5c6 patch for issue #708. fix few hanging UIs when importing file 2015-07-18 10:27:35 -04:00
QI CUI
495dcd7bd5 use the LinkedHashMap instead of HashMap to make sure the retrive order 2015-01-30 15:03:20 -05:00
Tom Morris
bc801546cc Remove references to obsolete splitIntoColumns option 2013-09-18 18:44:58 -04:00
Tom Morris
daed3bd90c Move MARC->XML conversion to earlier in process - issue #794
- functional now, but probably not good enough to release yet
2013-09-17 19:19:50 -04:00
Tom Morris
6bd6a5934b Start wiring up MARC importer - issue #794 2013-09-17 17:17:23 -04:00
Tom Morris
ab42df6ea3 Merge pull request #658 from Arcadelia/CSV_Multi-char-separator_support
Support for multi-char-separators in CSV
2013-08-14 07:29:45 -07:00
Tom Morris
579d71b7eb Switch back to NUL character for quote now that OpenCSV handles it -
fixes #653
2013-08-07 17:07:17 -04:00
Tom Morris
d7531bbbd8 Handle quoted fields with embedded new lines. Sort separators by score
rather than just standard deviation
2013-08-02 17:59:09 -04:00
Tom Morris
3003c1a709 Make importers more robust to preview errors when someone selects the
wrong importer/parser
2013-07-27 13:35:12 -04:00
Tom Morris
57ca70132c Turn all import conversions off by default - fixes #478 2013-07-27 13:32:26 -04:00
Tom Morris
7edc550618 Give a reasonable error message on Excel 95 import failure - fixes #564 2013-07-26 16:24:56 -04:00
Tom Morris
1e5f89e84c Centralize handling of import job config object & synchronize to allow
multiple accessors
2013-07-25 15:41:08 -04:00
Tom Morris
567da6aa9f Normalize line endings
Add .gitattributes & do one-time normalization of line endings
2013-03-23 18:46:20 -04:00
Tom Morris
6a91b5d75b Use InputStream instead of Reader for JSON import - fixes #698 2013-03-23 18:36:05 -04:00
Tom Morris
6b3592982e Remove O(n^2) issue in tree importers - fixes #699
- Add sparse/based list implementation for ImportRecord
2013-03-23 12:02:51 -04:00
Tom Morris
f78dfadcf3 Clean up tree import utilities for #699
- lazy allocate objects
- conditionalize logging to prevent calls to StringBuilder & toString()

These are secondary issues, but still worth cleaning up.
2013-03-23 11:56:58 -04:00
Tom Morris
0a2ba1b1ae Switch from LinkedList to ArrayList
Just a simple list.  No need for extra overhead..
2013-03-23 08:16:23 -04:00
Tom Morris
bfa7c34d17 Merge pull request #659 - closes #659 2013-03-18 21:24:01 -04:00
Tom Morris
8a61cf731b Merge pull request #664 from Arcadelia/Preserve_Quotes
Quotes should not be removed from values
2013-03-18 18:12:51 -07:00
Tom Morris
a2a8f4af2e Patch applied - closed #315 2013-03-06 21:45:54 -05:00
Tom Morris
d8d82bf8b7 Clean up a couple more format guessing issues left over from #685 2013-03-06 20:39:39 -05:00
Tom Morris
369bfffb2f Don't guess field widths unless we have at least 3 lines
- Investigation of #685 showed that single line files were being guessed
as fixed field width
2013-03-04 17:47:06 -05:00
Tom Morris
c0347225b8 Switch escape character from NUL to DEL in hopes that it's rarer. 2013-02-01 17:12:07 -05:00
Frank Wennerdahl
1f7ab046c7 Quotes should not be removed from values
Leading quotation marks should not be removed from values. If they have
been left by the importing parser they should be considered part of the
value.
2013-01-24 09:04:17 +01:00
Frank Wennerdahl
ebdc40ad71 Added CSV quote options
Added two additional CSV options, one for parsing and one for export.

Specifying strict quotes when parsing will ignore all data not quoted.
Specifying quote all when exporting will enclose all values in quotes.

No front-end changes made, just added the support for the options in the
requests.
2013-01-21 08:21:16 +01:00
Frank Wennerdahl
f837643f1e Support for multi-char-separators in CSV
This change requires that the following patch is applied to OpenCSV:

http://sourceforge.net/tracker/index.php?func=detail&aid=3599477&group_id=148905&atid=773543
2013-01-18 16:28:27 +01:00
Tom Morris
b3f5fada95 FIXED - task 578 & 596: Clean up JSON importer
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596

Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import;  Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6 Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
http://code.google.com/p/google-refine/issues/detail?id=478

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d FIXED - task 154: Can't import RDF/XML Data
http://code.google.com/p/google-refine/issues/detail?id=154

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91 make the latest eclipse happy (it triggers a warning)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751 Be slightly more tolerant of weird spreadsheet data
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8 Protect against NullPointerException
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba Issue 513 - get rid of exception at end of import in JSON parser
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf Issue 524 - shorten __anonymous__ names for JSON importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670 Issue 491 - fix off-by-one error in column counts
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00
David Huynh
5aec75696d Fixed Issue 477 in google-refine: Implement or remove the line separator option.
Also, fixed displaying bug in the fixed-width parser UI: previously, tab characters forced columns to be wider.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2364 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-06 20:13:05 +00:00
Tom Morris
85a37d23f9 Issue 474 - implement record limit for XML and JSON importers
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2359 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-05 16:38:19 +00:00
Tom Morris
a7c81880a8 Issue 475 - Support escaped custom separators
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2355 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 19:04:16 +00:00
Tom Morris
cacbedd352 Fix index out of bounds exception when separator is the empty string
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2354 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 17:31:51 +00:00
Stefano Mazzocchi
856ef6a65a commented out unused variables
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2352 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:47:24 +00:00
Tom Morris
71492c706c Just some TODOs
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2349 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:51:20 +00:00
Tom Morris
ad8705e299 Javadoc only
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2348 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:29:35 +00:00
Tom Morris
a870e782f5 Make sure out counts our current before attempting to use them for sorting
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2347 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:28:27 +00:00
Tom Morris
ab950689dd Add debugging info - mostly toString() methods for types missing them
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2343 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-21 16:46:55 +00:00
David Huynh
223074bb25 Xml importer should stop trying to skip over initial non-xml content after some number of characters.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2336 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-18 15:25:31 +00:00
Tom Morris
2d5125af1e Issue 462 - don't trim whitespace from string-valued cell contents on import
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2330 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-12 23:45:52 +00:00
Tom Morris
3bd84088da Rename OO/ODS importer with more generic name
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2325 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 21:01:45 +00:00
Tom Morris
ca17e1ef0a New importer for Open Document Format (ODF) spreadsheet files (.ods)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
5c856179cb Add TODO for suspicious code
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2320 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:14:57 +00:00