Commit Graph

162 Commits

Author SHA1 Message Date
Steffen Stundzig
7f5e58ef51 #1086 add support for quote character 2015-10-30 14:32:46 +01:00
Tom Morris
48681e8877 Move assert where it belongs 2015-09-25 20:01:27 -04:00
Tom Morris
be936a86eb Clean up PR #1055 2015-09-25 19:01:16 -04:00
Thad Guidry
175f4a5319 Merge pull request #1047 from lemmingapex/master
Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter
2015-09-21 20:33:05 -05:00
magdmartin
b635f4e067 Merge pull request #1055 from RefinePro/issue-512
fix issue #512 to save the file location as a table column
2015-09-20 09:31:16 -04:00
jackyq2015
4e6f584cde fix issue #512 to save the file location as a table column 2015-08-27 15:13:20 -04:00
Scott Wiedemann
5eab8893cc Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter. 2015-07-30 16:19:26 -06:00
jackyq2015
819e1ba5c6 patch for issue #708. fix few hanging UIs when importing file 2015-07-18 10:27:35 -04:00
QI CUI
495dcd7bd5 use the LinkedHashMap instead of HashMap to make sure the retrive order 2015-01-30 15:03:20 -05:00
Tom Morris
bc801546cc Remove references to obsolete splitIntoColumns option 2013-09-18 18:44:58 -04:00
Tom Morris
daed3bd90c Move MARC->XML conversion to earlier in process - issue #794
- functional now, but probably not good enough to release yet
2013-09-17 19:19:50 -04:00
Tom Morris
6bd6a5934b Start wiring up MARC importer - issue #794 2013-09-17 17:17:23 -04:00
Tom Morris
ab42df6ea3 Merge pull request #658 from Arcadelia/CSV_Multi-char-separator_support
Support for multi-char-separators in CSV
2013-08-14 07:29:45 -07:00
Tom Morris
579d71b7eb Switch back to NUL character for quote now that OpenCSV handles it -
fixes #653
2013-08-07 17:07:17 -04:00
Tom Morris
d7531bbbd8 Handle quoted fields with embedded new lines. Sort separators by score
rather than just standard deviation
2013-08-02 17:59:09 -04:00
Tom Morris
3003c1a709 Make importers more robust to preview errors when someone selects the
wrong importer/parser
2013-07-27 13:35:12 -04:00
Tom Morris
57ca70132c Turn all import conversions off by default - fixes #478 2013-07-27 13:32:26 -04:00
Tom Morris
7edc550618 Give a reasonable error message on Excel 95 import failure - fixes #564 2013-07-26 16:24:56 -04:00
Tom Morris
1e5f89e84c Centralize handling of import job config object & synchronize to allow
multiple accessors
2013-07-25 15:41:08 -04:00
Tom Morris
567da6aa9f Normalize line endings
Add .gitattributes & do one-time normalization of line endings
2013-03-23 18:46:20 -04:00
Tom Morris
6a91b5d75b Use InputStream instead of Reader for JSON import - fixes #698 2013-03-23 18:36:05 -04:00
Tom Morris
6b3592982e Remove O(n^2) issue in tree importers - fixes #699
- Add sparse/based list implementation for ImportRecord
2013-03-23 12:02:51 -04:00
Tom Morris
f78dfadcf3 Clean up tree import utilities for #699
- lazy allocate objects
- conditionalize logging to prevent calls to StringBuilder & toString()

These are secondary issues, but still worth cleaning up.
2013-03-23 11:56:58 -04:00
Tom Morris
0a2ba1b1ae Switch from LinkedList to ArrayList
Just a simple list.  No need for extra overhead..
2013-03-23 08:16:23 -04:00
Tom Morris
bfa7c34d17 Merge pull request #659 - closes #659 2013-03-18 21:24:01 -04:00
Tom Morris
8a61cf731b Merge pull request #664 from Arcadelia/Preserve_Quotes
Quotes should not be removed from values
2013-03-18 18:12:51 -07:00
Tom Morris
a2a8f4af2e Patch applied - closed #315 2013-03-06 21:45:54 -05:00
Tom Morris
d8d82bf8b7 Clean up a couple more format guessing issues left over from #685 2013-03-06 20:39:39 -05:00
Tom Morris
369bfffb2f Don't guess field widths unless we have at least 3 lines
- Investigation of #685 showed that single line files were being guessed
as fixed field width
2013-03-04 17:47:06 -05:00
Tom Morris
c0347225b8 Switch escape character from NUL to DEL in hopes that it's rarer. 2013-02-01 17:12:07 -05:00
Frank Wennerdahl
1f7ab046c7 Quotes should not be removed from values
Leading quotation marks should not be removed from values. If they have
been left by the importing parser they should be considered part of the
value.
2013-01-24 09:04:17 +01:00
Frank Wennerdahl
ebdc40ad71 Added CSV quote options
Added two additional CSV options, one for parsing and one for export.

Specifying strict quotes when parsing will ignore all data not quoted.
Specifying quote all when exporting will enclose all values in quotes.

No front-end changes made, just added the support for the options in the
requests.
2013-01-21 08:21:16 +01:00
Frank Wennerdahl
f837643f1e Support for multi-char-separators in CSV
This change requires that the following patch is applied to OpenCSV:

http://sourceforge.net/tracker/index.php?func=detail&aid=3599477&group_id=148905&atid=773543
2013-01-18 16:28:27 +01:00
Tom Morris
b3f5fada95 FIXED - task 578 & 596: Clean up JSON importer
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596

Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import;  Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6 Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
http://code.google.com/p/google-refine/issues/detail?id=478

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d FIXED - task 154: Can't import RDF/XML Data
http://code.google.com/p/google-refine/issues/detail?id=154

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91 make the latest eclipse happy (it triggers a warning)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751 Be slightly more tolerant of weird spreadsheet data
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8 Protect against NullPointerException
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba Issue 513 - get rid of exception at end of import in JSON parser
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf Issue 524 - shorten __anonymous__ names for JSON importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670 Issue 491 - fix off-by-one error in column counts
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00
David Huynh
5aec75696d Fixed Issue 477 in google-refine: Implement or remove the line separator option.
Also, fixed displaying bug in the fixed-width parser UI: previously, tab characters forced columns to be wider.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2364 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-06 20:13:05 +00:00
Tom Morris
85a37d23f9 Issue 474 - implement record limit for XML and JSON importers
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2359 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-05 16:38:19 +00:00
Tom Morris
a7c81880a8 Issue 475 - Support escaped custom separators
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2355 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 19:04:16 +00:00
Tom Morris
cacbedd352 Fix index out of bounds exception when separator is the empty string
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2354 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 17:31:51 +00:00
Stefano Mazzocchi
856ef6a65a commented out unused variables
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2352 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:47:24 +00:00
Tom Morris
71492c706c Just some TODOs
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2349 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:51:20 +00:00
Tom Morris
ad8705e299 Javadoc only
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2348 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:29:35 +00:00
Tom Morris
a870e782f5 Make sure out counts our current before attempting to use them for sorting
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2347 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:28:27 +00:00
Tom Morris
ab950689dd Add debugging info - mostly toString() methods for types missing them
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2343 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-21 16:46:55 +00:00
David Huynh
223074bb25 Xml importer should stop trying to skip over initial non-xml content after some number of characters.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2336 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-18 15:25:31 +00:00
Tom Morris
2d5125af1e Issue 462 - don't trim whitespace from string-valued cell contents on import
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2330 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-12 23:45:52 +00:00
Tom Morris
3bd84088da Rename OO/ODS importer with more generic name
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2325 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 21:01:45 +00:00
Tom Morris
ca17e1ef0a New importer for Open Document Format (ODF) spreadsheet files (.ods)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
5c856179cb Add TODO for suspicious code
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2320 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:14:57 +00:00
Tom Morris
16421303cb Add Javadoc
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2318 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:12:24 +00:00
David Huynh
1a14d82393 For XML files, ignore not just leading whitespace but anything except <.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2313 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-10 20:51:00 +00:00
Stefano Mazzocchi
1f67866258 fixing a bunch of inconsistencies and potential bugs as indicated by findbugs, pmd and eclipse
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2301 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 21:23:23 +00:00
Tom Morris
31073d7712 Refactor importer interfaces to narrow exceptions thrown and handled
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2296 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 19:06:53 +00:00
Tom Morris
50927b33dc Javadoc
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2295 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:56:23 +00:00
Tom Morris
4a230abb44 Narrow exception handling
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2294 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:55:46 +00:00
Tom Morris
29cbc5af20 Remove some obsolete TODOs. No functional changes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2290 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 17:29:30 +00:00
David Huynh
18f32ed7e8 Fixed up Rdf Triples importer, added a parser UI for it, and got its tests to pass.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2283 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 21:28:20 +00:00
David Huynh
1c5dc32b88 Fixed tsv/csv tests.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2276 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 06:22:30 +00:00
Tom Morris
ac4a0ca747 Store blank cells as nulls if that's what the user request
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2272 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-05 23:41:52 +00:00
David Huynh
7935dfd60e Stricter detection of json and xml formats on import, by checking for initial nonspace character.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2266 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-30 01:47:42 +00:00
David Huynh
d047acf1d1 Fixed Issue 452: Importing using Clipboard function does not guess structure correctly for XML or JSON
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2263 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-29 14:02:12 +00:00
David Huynh
5762efebf6 Fixed Issue 397: New UI Importer Branch - individual JSON record nodes do not preview well.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2258 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-28 03:38:23 +00:00
David Huynh
db3bbb5c86 Fixed xml parsing error due to whitespaces in front of <?xml>.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2246 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 09:06:36 +00:00
David Huynh
66cf0b6596 Fixed Issue 449: Uncaught exception from Excel importer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2245 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 08:49:35 +00:00
David Huynh
4113a10b5b Catch/log exceptions in the importers a bit more carefully.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2215 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 21:47:15 +00:00
David Huynh
f023b922e1 Implemented encoding selectors in a few importing parser UIs.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2214 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 17:55:06 +00:00
Tom Morris
9d7b8a5279 Don't die if we get passed no candidates
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2210 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-18 17:39:18 +00:00
David Huynh
afb7953eac Fixed problem for importing from an archive file containing fixed width column files: we used to create totally new columns for each contained file, yielding too many columns.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2203 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:53:19 +00:00
David Huynh
33d99186ea Made fixed width column guessing slightly better.
Made sure fixed width parser UI take into account the File column.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2202 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:05:18 +00:00
David Huynh
823729776d Google spreadsheets can now be imported directly from within Refine.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2192 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-11 00:35:01 +00:00
David Huynh
c5078d1887 Fixed issue 428: Excel import sometimes drops last row of data.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2189 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-06 19:37:23 +00:00
Tom Morris
da7347e7b1 Make sure all conditionals and loops are in blocks (too bug-prone otherwise)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2183 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 22:21:47 +00:00
Tom Morris
97a0f2a33e Organize imports. com.google.refine last in a section of its own. Everything alphabetical in its section.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2180 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 21:10:22 +00:00
Tom Morris
5497fa4685 Remove unnecessary casts
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2173 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 20:33:57 +00:00
Tom Morris
123614539d Add missing @Override annotations
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2171 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 19:30:23 +00:00
David Huynh
78edff6f7f Merged new importer UI work from branch over.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2170 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 03:34:47 +00:00
David Huynh
53442c5ef2 Handle the case where an excel cell has a formula but the cached result of that formula is an error.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1962 7d457c2a-affb-35e4-300a-418c747d4874
2010-12-25 21:41:21 +00:00
Tom Morris
3a8f9306bd Add some toString() methods to help with debugging
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1941 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-29 06:24:50 +00:00
Tom Morris
af20157532 Fix indentation so indent levels match logical block levels. No code changes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1940 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-28 17:46:57 +00:00
Tom Morris
748b5699b9 Issue 61 - Turn on text coalescing and XML entity reference replacement
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1939 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-27 22:07:15 +00:00
Tom Morris
e19148c375 Make sure we at least log an error if the import fails
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1938 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-27 22:05:45 +00:00
Iain Sproat
74e9288229 Additional error dialog for Issue 188
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1858 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-11 14:25:46 +00:00
Iain Sproat
2f564589f5 Adding a Fixed Width data importer (Issue 85) and associated tests.
Although this importer is 'wired up', it requires a property "fixed-column-widths" which is not (yet) implemented in the UI.  But the ImporterRegister.guessImporter method will probably select the CsvTsvImporter before the FixedWidthImporter anyway.  I suggest an improvement to the project creation UI and/or the guessImporter method will be required.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@1857 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-11 13:15:41 +00:00
David Huynh
5a17acfd70 Prepended license text to java source
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1613 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-20 20:45:52 +00:00
David Huynh
73042712ed Made csv/tsv importer not trim whitespace even if "guess cells' types" is checked (for cells that are strings).
Updated csv tests to expect un-trimmed cells.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@1557 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-15 05:30:15 +00:00
David Huynh
7cd5a47fbf We haven't been using non-split row parser, so we need to fix the trimming problem in the tsv/csv importer instead.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1467 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-12 23:24:16 +00:00
David Huynh
2d276fa1e6 Non split row parser shouldn't trim lines because whitespaces are significant
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1465 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-12 22:45:30 +00:00
Iain Sproat
142591a090 Added a mention of the new JsonImporter to CHANGES.txt
Corrected the logger name in JsonImporter.java

git-svn-id: http://google-refine.googlecode.com/svn/trunk@1455 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-08 07:58:59 +00:00
David Huynh
3ba8e63249 Register Json importer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1426 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 18:53:41 +00:00
Iain Sproat
d977f42f51 Changed behaviour of the XmlImporter to make it more permissive, and allow arrays within mixed elements to be used as candidates for importing to Refine.
This change has also allowed the JsonImporter to pass all its unit tests without any further modification.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@1425 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 18:33:59 +00:00
Iain Sproat
ec9898ba92 Some tidying up of the XmlImporter which reduces the number of generic TreeParser tokens to a minimum - and should allow elements such as comments and CDATA to be ignored/skipped.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1422 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 15:02:09 +00:00
Iain Sproat
d3f223c196 The JsonImporter now passes all current unit tests.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1421 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 10:02:50 +00:00