Commit Graph

82 Commits

Author SHA1 Message Date
Tom Morris
d8d82bf8b7 Clean up a couple more format guessing issues left over from #685 2013-03-06 20:39:39 -05:00
Tom Morris
369bfffb2f Don't guess field widths unless we have at least 3 lines
- Investigation of #685 showed that single line files were being guessed
as fixed field width
2013-03-04 17:47:06 -05:00
Tom Morris
c0347225b8 Switch escape character from NUL to DEL in hopes that it's rarer. 2013-02-01 17:12:07 -05:00
Tom Morris
b3f5fada95 FIXED - task 578 & 596: Clean up JSON importer
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596

Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import;  Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6 Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
http://code.google.com/p/google-refine/issues/detail?id=478

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d FIXED - task 154: Can't import RDF/XML Data
http://code.google.com/p/google-refine/issues/detail?id=154

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91 make the latest eclipse happy (it triggers a warning)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751 Be slightly more tolerant of weird spreadsheet data
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8 Protect against NullPointerException
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba Issue 513 - get rid of exception at end of import in JSON parser
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf Issue 524 - shorten __anonymous__ names for JSON importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670 Issue 491 - fix off-by-one error in column counts
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00
David Huynh
5aec75696d Fixed Issue 477 in google-refine: Implement or remove the line separator option.
Also, fixed displaying bug in the fixed-width parser UI: previously, tab characters forced columns to be wider.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2364 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-06 20:13:05 +00:00
Tom Morris
85a37d23f9 Issue 474 - implement record limit for XML and JSON importers
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2359 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-05 16:38:19 +00:00
Tom Morris
a7c81880a8 Issue 475 - Support escaped custom separators
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2355 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 19:04:16 +00:00
Tom Morris
cacbedd352 Fix index out of bounds exception when separator is the empty string
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2354 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 17:31:51 +00:00
Stefano Mazzocchi
856ef6a65a commented out unused variables
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2352 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:47:24 +00:00
Tom Morris
71492c706c Just some TODOs
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2349 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:51:20 +00:00
Tom Morris
ad8705e299 Javadoc only
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2348 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:29:35 +00:00
Tom Morris
a870e782f5 Make sure out counts our current before attempting to use them for sorting
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2347 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:28:27 +00:00
Tom Morris
ab950689dd Add debugging info - mostly toString() methods for types missing them
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2343 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-21 16:46:55 +00:00
David Huynh
223074bb25 Xml importer should stop trying to skip over initial non-xml content after some number of characters.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2336 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-18 15:25:31 +00:00
Tom Morris
2d5125af1e Issue 462 - don't trim whitespace from string-valued cell contents on import
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2330 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-12 23:45:52 +00:00
Tom Morris
3bd84088da Rename OO/ODS importer with more generic name
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2325 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 21:01:45 +00:00
Tom Morris
ca17e1ef0a New importer for Open Document Format (ODF) spreadsheet files (.ods)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
5c856179cb Add TODO for suspicious code
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2320 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:14:57 +00:00
Tom Morris
16421303cb Add Javadoc
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2318 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:12:24 +00:00
David Huynh
1a14d82393 For XML files, ignore not just leading whitespace but anything except <.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2313 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-10 20:51:00 +00:00
Stefano Mazzocchi
1f67866258 fixing a bunch of inconsistencies and potential bugs as indicated by findbugs, pmd and eclipse
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2301 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 21:23:23 +00:00
Tom Morris
31073d7712 Refactor importer interfaces to narrow exceptions thrown and handled
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2296 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 19:06:53 +00:00
Tom Morris
50927b33dc Javadoc
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2295 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:56:23 +00:00
Tom Morris
4a230abb44 Narrow exception handling
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2294 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:55:46 +00:00
Tom Morris
29cbc5af20 Remove some obsolete TODOs. No functional changes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2290 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 17:29:30 +00:00
David Huynh
18f32ed7e8 Fixed up Rdf Triples importer, added a parser UI for it, and got its tests to pass.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2283 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 21:28:20 +00:00
David Huynh
1c5dc32b88 Fixed tsv/csv tests.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2276 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 06:22:30 +00:00
Tom Morris
ac4a0ca747 Store blank cells as nulls if that's what the user request
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2272 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-05 23:41:52 +00:00
David Huynh
7935dfd60e Stricter detection of json and xml formats on import, by checking for initial nonspace character.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2266 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-30 01:47:42 +00:00
David Huynh
d047acf1d1 Fixed Issue 452: Importing using Clipboard function does not guess structure correctly for XML or JSON
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2263 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-29 14:02:12 +00:00
David Huynh
5762efebf6 Fixed Issue 397: New UI Importer Branch - individual JSON record nodes do not preview well.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2258 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-28 03:38:23 +00:00
David Huynh
db3bbb5c86 Fixed xml parsing error due to whitespaces in front of <?xml>.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2246 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 09:06:36 +00:00
David Huynh
66cf0b6596 Fixed Issue 449: Uncaught exception from Excel importer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2245 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 08:49:35 +00:00
David Huynh
4113a10b5b Catch/log exceptions in the importers a bit more carefully.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2215 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 21:47:15 +00:00
David Huynh
f023b922e1 Implemented encoding selectors in a few importing parser UIs.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2214 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 17:55:06 +00:00
Tom Morris
9d7b8a5279 Don't die if we get passed no candidates
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2210 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-18 17:39:18 +00:00
David Huynh
afb7953eac Fixed problem for importing from an archive file containing fixed width column files: we used to create totally new columns for each contained file, yielding too many columns.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2203 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:53:19 +00:00
David Huynh
33d99186ea Made fixed width column guessing slightly better.
Made sure fixed width parser UI take into account the File column.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2202 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:05:18 +00:00
David Huynh
823729776d Google spreadsheets can now be imported directly from within Refine.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2192 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-11 00:35:01 +00:00
David Huynh
c5078d1887 Fixed issue 428: Excel import sometimes drops last row of data.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2189 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-06 19:37:23 +00:00
Tom Morris
da7347e7b1 Make sure all conditionals and loops are in blocks (too bug-prone otherwise)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2183 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 22:21:47 +00:00