Tom Morris
b3f5fada95
FIXED - task 578 & 596: Clean up JSON importer
...
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596
Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import; Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6
Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
...
http://code.google.com/p/google-refine/issues/detail?id=478
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d
FIXED - task 154: Can't import RDF/XML Data
...
http://code.google.com/p/google-refine/issues/detail?id=154
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91
make the latest eclipse happy (it triggers a warning)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751
Be slightly more tolerant of weird spreadsheet data
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd
Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8
Protect against NullPointerException
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba
Issue 513 - get rid of exception at end of import in JSON parser
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf
Issue 524 - shorten __anonymous__ names for JSON importer
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670
Issue 491 - fix off-by-one error in column counts
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00
David Huynh
5aec75696d
Fixed Issue 477 in google-refine: Implement or remove the line separator option.
...
Also, fixed displaying bug in the fixed-width parser UI: previously, tab characters forced columns to be wider.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2364 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-06 20:13:05 +00:00
Tom Morris
85a37d23f9
Issue 474 - implement record limit for XML and JSON importers
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2359 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-05 16:38:19 +00:00
Tom Morris
a7c81880a8
Issue 475 - Support escaped custom separators
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2355 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 19:04:16 +00:00
Tom Morris
cacbedd352
Fix index out of bounds exception when separator is the empty string
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2354 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-04 17:31:51 +00:00
Stefano Mazzocchi
856ef6a65a
commented out unused variables
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2352 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:47:24 +00:00
Tom Morris
71492c706c
Just some TODOs
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2349 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:51:20 +00:00
Tom Morris
ad8705e299
Javadoc only
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2348 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:29:35 +00:00
Tom Morris
a870e782f5
Make sure out counts our current before attempting to use them for sorting
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2347 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-28 17:28:27 +00:00
Tom Morris
ab950689dd
Add debugging info - mostly toString() methods for types missing them
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2343 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-21 16:46:55 +00:00
David Huynh
223074bb25
Xml importer should stop trying to skip over initial non-xml content after some number of characters.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2336 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-18 15:25:31 +00:00
Tom Morris
2d5125af1e
Issue 462 - don't trim whitespace from string-valued cell contents on import
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2330 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-12 23:45:52 +00:00
Tom Morris
3bd84088da
Rename OO/ODS importer with more generic name
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2325 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 21:01:45 +00:00
Tom Morris
ca17e1ef0a
New importer for Open Document Format (ODF) spreadsheet files (.ods)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
5c856179cb
Add TODO for suspicious code
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2320 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:14:57 +00:00
Tom Morris
16421303cb
Add Javadoc
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2318 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:12:24 +00:00
David Huynh
1a14d82393
For XML files, ignore not just leading whitespace but anything except <.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2313 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-10 20:51:00 +00:00
Stefano Mazzocchi
1f67866258
fixing a bunch of inconsistencies and potential bugs as indicated by findbugs, pmd and eclipse
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2301 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 21:23:23 +00:00
Tom Morris
31073d7712
Refactor importer interfaces to narrow exceptions thrown and handled
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2296 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 19:06:53 +00:00
Tom Morris
50927b33dc
Javadoc
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2295 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:56:23 +00:00
Tom Morris
4a230abb44
Narrow exception handling
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2294 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 18:55:46 +00:00
Tom Morris
29cbc5af20
Remove some obsolete TODOs. No functional changes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2290 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 17:29:30 +00:00
David Huynh
18f32ed7e8
Fixed up Rdf Triples importer, added a parser UI for it, and got its tests to pass.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2283 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 21:28:20 +00:00
David Huynh
1c5dc32b88
Fixed tsv/csv tests.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2276 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-06 06:22:30 +00:00
Tom Morris
ac4a0ca747
Store blank cells as nulls if that's what the user request
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2272 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-05 23:41:52 +00:00
David Huynh
7935dfd60e
Stricter detection of json and xml formats on import, by checking for initial nonspace character.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2266 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-30 01:47:42 +00:00
David Huynh
d047acf1d1
Fixed Issue 452: Importing using Clipboard function does not guess structure correctly for XML or JSON
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2263 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-29 14:02:12 +00:00
David Huynh
5762efebf6
Fixed Issue 397: New UI Importer Branch - individual JSON record nodes do not preview well.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2258 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-28 03:38:23 +00:00
David Huynh
db3bbb5c86
Fixed xml parsing error due to whitespaces in front of <?xml>.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2246 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 09:06:36 +00:00
David Huynh
66cf0b6596
Fixed Issue 449: Uncaught exception from Excel importer.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2245 7d457c2a-affb-35e4-300a-418c747d4874
2011-09-19 08:49:35 +00:00
David Huynh
4113a10b5b
Catch/log exceptions in the importers a bit more carefully.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2215 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 21:47:15 +00:00
David Huynh
f023b922e1
Implemented encoding selectors in a few importing parser UIs.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2214 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-22 17:55:06 +00:00
Tom Morris
9d7b8a5279
Don't die if we get passed no candidates
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2210 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-18 17:39:18 +00:00
David Huynh
afb7953eac
Fixed problem for importing from an archive file containing fixed width column files: we used to create totally new columns for each contained file, yielding too many columns.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2203 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:53:19 +00:00
David Huynh
33d99186ea
Made fixed width column guessing slightly better.
...
Made sure fixed width parser UI take into account the File column.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2202 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-14 02:05:18 +00:00
David Huynh
823729776d
Google spreadsheets can now be imported directly from within Refine.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2192 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-11 00:35:01 +00:00
David Huynh
c5078d1887
Fixed issue 428: Excel import sometimes drops last row of data.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2189 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-06 19:37:23 +00:00
Tom Morris
da7347e7b1
Make sure all conditionals and loops are in blocks (too bug-prone otherwise)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2183 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 22:21:47 +00:00
Tom Morris
97a0f2a33e
Organize imports. com.google.refine last in a section of its own. Everything alphabetical in its section.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2180 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 21:10:22 +00:00
Tom Morris
5497fa4685
Remove unnecessary casts
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2173 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 20:33:57 +00:00
Tom Morris
123614539d
Add missing @Override annotations
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2171 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 19:30:23 +00:00
David Huynh
78edff6f7f
Merged new importer UI work from branch over.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2170 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-02 03:34:47 +00:00
David Huynh
53442c5ef2
Handle the case where an excel cell has a formula but the cached result of that formula is an error.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1962 7d457c2a-affb-35e4-300a-418c747d4874
2010-12-25 21:41:21 +00:00
Tom Morris
3a8f9306bd
Add some toString() methods to help with debugging
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1941 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-29 06:24:50 +00:00
Tom Morris
af20157532
Fix indentation so indent levels match logical block levels. No code changes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1940 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-28 17:46:57 +00:00
Tom Morris
748b5699b9
Issue 61 - Turn on text coalescing and XML entity reference replacement
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1939 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-27 22:07:15 +00:00
Tom Morris
e19148c375
Make sure we at least log an error if the import fails
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1938 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-27 22:05:45 +00:00
Iain Sproat
74e9288229
Additional error dialog for Issue 188
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1858 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-11 14:25:46 +00:00
Iain Sproat
2f564589f5
Adding a Fixed Width data importer (Issue 85) and associated tests.
...
Although this importer is 'wired up', it requires a property "fixed-column-widths" which is not (yet) implemented in the UI. But the ImporterRegister.guessImporter method will probably select the CsvTsvImporter before the FixedWidthImporter anyway. I suggest an improvement to the project creation UI and/or the guessImporter method will be required.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1857 7d457c2a-affb-35e4-300a-418c747d4874
2010-11-11 13:15:41 +00:00
David Huynh
5a17acfd70
Prepended license text to java source
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1613 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-20 20:45:52 +00:00
David Huynh
73042712ed
Made csv/tsv importer not trim whitespace even if "guess cells' types" is checked (for cells that are strings).
...
Updated csv tests to expect un-trimmed cells.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1557 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-15 05:30:15 +00:00
David Huynh
7cd5a47fbf
We haven't been using non-split row parser, so we need to fix the trimming problem in the tsv/csv importer instead.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1467 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-12 23:24:16 +00:00
David Huynh
2d276fa1e6
Non split row parser shouldn't trim lines because whitespaces are significant
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1465 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-12 22:45:30 +00:00
Iain Sproat
142591a090
Added a mention of the new JsonImporter to CHANGES.txt
...
Corrected the logger name in JsonImporter.java
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1455 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-08 07:58:59 +00:00
David Huynh
3ba8e63249
Register Json importer.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1426 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 18:53:41 +00:00
Iain Sproat
d977f42f51
Changed behaviour of the XmlImporter to make it more permissive, and allow arrays within mixed elements to be used as candidates for importing to Refine.
...
This change has also allowed the JsonImporter to pass all its unit tests without any further modification.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1425 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 18:33:59 +00:00
Iain Sproat
ec9898ba92
Some tidying up of the XmlImporter which reduces the number of generic TreeParser tokens to a minimum - and should allow elements such as comments and CDATA to be ignored/skipped.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1422 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 15:02:09 +00:00
Iain Sproat
d3f223c196
The JsonImporter now passes all current unit tests.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1421 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-04 10:02:50 +00:00
David Huynh
935355cb50
Comments in XML file caused the record detection code to fail. So we added ignorable element type that we can skip over.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1392 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-28 19:16:43 +00:00
Iain Sproat
bd3ded0828
Correcting JsonImporter to use the correct parser.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1388 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-28 14:19:19 +00:00
Iain Sproat
855df20481
XmlImportUtilities no longer relies on XMLStreamConstants, and is now independent of any specific type of tree data (Xml or otherwise).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1378 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-28 10:46:33 +00:00
Iain Sproat
b21961be89
Another small step towards making XmlImportUtilities generic for all tree structured data, and less XML centric. Some calls to XMLStreamConstant in XmlImportUtilities are now working with a generic TreeParserToken, with methods to converter between TreeParserToken and XMLStreamConstant/JsonToken in the respective parsers.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1377 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-28 10:04:56 +00:00
David Huynh
f2ce1b7161
Fixed Issue 121: Importing attached file strips backslashes
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1369 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-28 03:35:42 +00:00
Iain Sproat
c3c23a87b0
The renaming of TreeImporter to TreeImportUtilities didn't seem to get committed last time. Trying again.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1362 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-27 22:57:26 +00:00
Iain Sproat
d285999da8
New JsonImporter, JsonParser and JsonImporterTests (copy of XmlImporterTests with syntax of the example data altered for Json).
...
Renaming of TreeImporter to TreeImportUtilities (as per the current convention with the XmlImporter and XmlImportUtilities).
NB the new JsonParser class does not work, and 5 of the new unit tests for JsonImporter currently fail. To be fixed in due course.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1361 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-27 22:53:17 +00:00
Iain Sproat
e5ddfa6fdc
All methods in XmlImportUtilities now use the TreeParser interface.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1323 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-27 17:59:53 +00:00
Iain Sproat
d71c563831
XmlImportUtilities.detectPathFromTag and XmlImportUtilities.detectRecordElement methods now use a generic TreeParser interface. A lightweight wrapper XmlParser wraps XMLStreamReader to provide parsing for xml data.
...
This is another small step towards a generic importer for tree structured data. My plan is to refactor more of XmlImportUtilities' methods to use the TreeParser interface so that XmlStreamReader is no longer called directly from XmlImportUtilities.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1322 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-27 17:40:51 +00:00
Iain Sproat
1bda46d40f
Methods which are generic to any tree structured data and don't rely on an XmlParser have been moved to a new TreeImporter class. This is a small step towards supporting importers for other tree structured data.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1321 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-27 16:09:44 +00:00
David Huynh
1367ce301e
More renaming, except for: client-side code, build scripts, anything to do with data loading and QA, workspace path. Refine can still run, and undo/redo on existing projects is working.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1290 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-22 18:36:33 +00:00
David Huynh
edb23eb263
Changed Java packages com.google.gridworks.* to com.google.refine.* and modified other code just enough to start grefine up without error. Much remains to be done. Do not check out the code just yet.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1288 7d457c2a-affb-35e4-300a-418c747d4874
2010-09-22 17:04:10 +00:00