Tom Morris
567da6aa9f
Normalize line endings
...
Add .gitattributes & do one-time normalization of line endings
2013-03-23 18:46:20 -04:00
Tom Morris
6a91b5d75b
Use InputStream instead of Reader for JSON import - fixes #698
2013-03-23 18:36:05 -04:00
Tom Morris
6b3592982e
Remove O(n^2) issue in tree importers - fixes #699
...
- Add sparse/based list implementation for ImportRecord
2013-03-23 12:02:51 -04:00
Tom Morris
f78dfadcf3
Clean up tree import utilities for #699
...
- lazy allocate objects
- conditionalize logging to prevent calls to StringBuilder & toString()
These are secondary issues, but still worth cleaning up.
2013-03-23 11:56:58 -04:00
Tom Morris
0a2ba1b1ae
Switch from LinkedList to ArrayList
...
Just a simple list. No need for extra overhead..
2013-03-23 08:16:23 -04:00
Tom Morris
bfa7c34d17
Merge pull request #659 - closes #659
2013-03-18 21:24:01 -04:00
Tom Morris
8a61cf731b
Merge pull request #664 from Arcadelia/Preserve_Quotes
...
Quotes should not be removed from values
2013-03-18 18:12:51 -07:00
Tom Morris
a2a8f4af2e
Patch applied - closed #315
2013-03-06 21:45:54 -05:00
Tom Morris
d8d82bf8b7
Clean up a couple more format guessing issues left over from #685
2013-03-06 20:39:39 -05:00
Tom Morris
369bfffb2f
Don't guess field widths unless we have at least 3 lines
...
- Investigation of #685 showed that single line files were being guessed
as fixed field width
2013-03-04 17:47:06 -05:00
Tom Morris
c0347225b8
Switch escape character from NUL to DEL in hopes that it's rarer.
2013-02-01 17:12:07 -05:00
Frank Wennerdahl
1f7ab046c7
Quotes should not be removed from values
...
Leading quotation marks should not be removed from values. If they have
been left by the importing parser they should be considered part of the
value.
2013-01-24 09:04:17 +01:00
Frank Wennerdahl
ebdc40ad71
Added CSV quote options
...
Added two additional CSV options, one for parsing and one for export.
Specifying strict quotes when parsing will ignore all data not quoted.
Specifying quote all when exporting will enclose all values in quotes.
No front-end changes made, just added the support for the options in the
requests.
2013-01-21 08:21:16 +01:00
Tom Morris
b3f5fada95
FIXED - task 578 & 596: Clean up JSON importer
...
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596
Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import; Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6
Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
...
http://code.google.com/p/google-refine/issues/detail?id=478
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d
FIXED - task 154: Can't import RDF/XML Data
...
http://code.google.com/p/google-refine/issues/detail?id=154
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91
make the latest eclipse happy (it triggers a warning)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751
Be slightly more tolerant of weird spreadsheet data
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd
Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8
Protect against NullPointerException
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba
Issue 513 - get rid of exception at end of import in JSON parser
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf
Issue 524 - shorten __anonymous__ names for JSON importer
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670
Issue 491 - fix off-by-one error in column counts
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00
David Huynh
5aec75696d
Fixed Issue 477 in google-refine: Implement or remove the line separator option.
...
Also, fixed displaying bug in the fixed-width parser UI: previously, tab characters forced columns to be wider.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2364 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-06 20:13:05 +00:00
Tom Morris
85a37d23f9
Issue 474 - implement record limit for XML and JSON importers
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2359 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-05 16:38:19 +00:00
Tom Morris
a7c81880a8
Issue 475 - Support escaped custom separators
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2355 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 19:04:16 +00:00
Tom Morris
cacbedd352
Fix index out of bounds exception when separator is the empty string
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2354 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 17:31:51 +00:00
Stefano Mazzocchi
856ef6a65a
commented out unused variables
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2352 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:47:24 +00:00
Tom Morris
71492c706c
Just some TODOs
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2349 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:51:20 +00:00
Tom Morris
ad8705e299
Javadoc only
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2348 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:29:35 +00:00
Tom Morris
a870e782f5
Make sure out counts our current before attempting to use them for sorting
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2347 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:28:27 +00:00
Tom Morris
ab950689dd
Add debugging info - mostly toString() methods for types missing them
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2343 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-21 16:46:55 +00:00
David Huynh
223074bb25
Xml importer should stop trying to skip over initial non-xml content after some number of characters.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2336 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-18 15:25:31 +00:00
Tom Morris
2d5125af1e
Issue 462 - don't trim whitespace from string-valued cell contents on import
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2330 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-12 23:45:52 +00:00
Tom Morris
3bd84088da
Rename OO/ODS importer with more generic name
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2325 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 21:01:45 +00:00
Tom Morris
ca17e1ef0a
New importer for Open Document Format (ODF) spreadsheet files (.ods)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
5c856179cb
Add TODO for suspicious code
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2320 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:14:57 +00:00
Tom Morris
16421303cb
Add Javadoc
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2318 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:12:24 +00:00
David Huynh
1a14d82393
For XML files, ignore not just leading whitespace but anything except <.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2313 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-10 20:51:00 +00:00
Stefano Mazzocchi
1f67866258
fixing a bunch of inconsistencies and potential bugs as indicated by findbugs, pmd and eclipse
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2301 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 21:23:23 +00:00
Tom Morris
31073d7712
Refactor importer interfaces to narrow exceptions thrown and handled
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2296 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 19:06:53 +00:00
Tom Morris
50927b33dc
Javadoc
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2295 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:56:23 +00:00
Tom Morris
4a230abb44
Narrow exception handling
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2294 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:55:46 +00:00
Tom Morris
29cbc5af20
Remove some obsolete TODOs. No functional changes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2290 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 17:29:30 +00:00
David Huynh
18f32ed7e8
Fixed up Rdf Triples importer, added a parser UI for it, and got its tests to pass.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2283 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 21:28:20 +00:00
David Huynh
1c5dc32b88
Fixed tsv/csv tests.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2276 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 06:22:30 +00:00
Tom Morris
ac4a0ca747
Store blank cells as nulls if that's what the user request
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2272 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-05 23:41:52 +00:00
David Huynh
7935dfd60e
Stricter detection of json and xml formats on import, by checking for initial nonspace character.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2266 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-30 01:47:42 +00:00
David Huynh
d047acf1d1
Fixed Issue 452: Importing using Clipboard function does not guess structure correctly for XML or JSON
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2263 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-29 14:02:12 +00:00
David Huynh
5762efebf6
Fixed Issue 397: New UI Importer Branch - individual JSON record nodes do not preview well.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2258 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-28 03:38:23 +00:00