Antonin Delpeuch
3dcda5a42c
Add reconciliation config in wikitext import.
2017-08-16 00:05:40 +01:00
Antonin Delpeuch
86dc240335
Support reconciliation via sitelinks.
...
Wikilinks are automatically reconciled at import time.
Related to #56 .
2017-08-15 20:17:34 +01:00
Antonin Delpeuch
aa4517ba58
Add support for colspan and rowspan in Wikitext
2017-08-15 11:28:43 +01:00
Antonin Delpeuch
73f7fdc036
Update TextFormatGuesser to support wikitext
2017-08-14 15:58:27 +01:00
Antonin Delpeuch
e168c900e8
Add support for table headers
2017-08-13 20:14:48 +01:00
Antonin Delpeuch
b8a781d366
Add support for links (unreconciled for now)
2017-08-13 12:57:46 +01:00
Antonin Delpeuch
e6406f56ef
Initial version of the wikitext importer
2017-08-13 11:26:59 +01:00
Adi Eyal
09c00c6a19
Fixes #1181
2017-05-05 23:38:37 +02:00
Tom Morris
48681e8877
Move assert where it belongs
2015-09-25 20:01:27 -04:00
Tom Morris
be936a86eb
Clean up PR #1055
2015-09-25 19:01:16 -04:00
Thad Guidry
175f4a5319
Merge pull request #1047 from lemmingapex/master
...
Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter
2015-09-21 20:33:05 -05:00
magdmartin
b635f4e067
Merge pull request #1055 from RefinePro/issue-512
...
fix issue #512 to save the file location as a table column
2015-09-20 09:31:16 -04:00
jackyq2015
4e6f584cde
fix issue #512 to save the file location as a table column
2015-08-27 15:13:20 -04:00
Scott Wiedemann
5eab8893cc
Fixed #1046 Combine xls and xlsx formats by inspecting file header information in ExcelImporter.
2015-07-30 16:19:26 -06:00
jackyq2015
819e1ba5c6
patch for issue #708 . fix few hanging UIs when importing file
2015-07-18 10:27:35 -04:00
QI CUI
495dcd7bd5
use the LinkedHashMap instead of HashMap to make sure the retrive order
2015-01-30 15:03:20 -05:00
Tom Morris
bc801546cc
Remove references to obsolete splitIntoColumns option
2013-09-18 18:44:58 -04:00
Tom Morris
daed3bd90c
Move MARC->XML conversion to earlier in process - issue #794
...
- functional now, but probably not good enough to release yet
2013-09-17 19:19:50 -04:00
Tom Morris
6bd6a5934b
Start wiring up MARC importer - issue #794
2013-09-17 17:17:23 -04:00
Tom Morris
ab42df6ea3
Merge pull request #658 from Arcadelia/CSV_Multi-char-separator_support
...
Support for multi-char-separators in CSV
2013-08-14 07:29:45 -07:00
Tom Morris
579d71b7eb
Switch back to NUL character for quote now that OpenCSV handles it -
...
fixes #653
2013-08-07 17:07:17 -04:00
Tom Morris
d7531bbbd8
Handle quoted fields with embedded new lines. Sort separators by score
...
rather than just standard deviation
2013-08-02 17:59:09 -04:00
Tom Morris
3003c1a709
Make importers more robust to preview errors when someone selects the
...
wrong importer/parser
2013-07-27 13:35:12 -04:00
Tom Morris
57ca70132c
Turn all import conversions off by default - fixes #478
2013-07-27 13:32:26 -04:00
Tom Morris
7edc550618
Give a reasonable error message on Excel 95 import failure - fixes #564
2013-07-26 16:24:56 -04:00
Tom Morris
1e5f89e84c
Centralize handling of import job config object & synchronize to allow
...
multiple accessors
2013-07-25 15:41:08 -04:00
Tom Morris
567da6aa9f
Normalize line endings
...
Add .gitattributes & do one-time normalization of line endings
2013-03-23 18:46:20 -04:00
Tom Morris
6a91b5d75b
Use InputStream instead of Reader for JSON import - fixes #698
2013-03-23 18:36:05 -04:00
Tom Morris
6b3592982e
Remove O(n^2) issue in tree importers - fixes #699
...
- Add sparse/based list implementation for ImportRecord
2013-03-23 12:02:51 -04:00
Tom Morris
f78dfadcf3
Clean up tree import utilities for #699
...
- lazy allocate objects
- conditionalize logging to prevent calls to StringBuilder & toString()
These are secondary issues, but still worth cleaning up.
2013-03-23 11:56:58 -04:00
Tom Morris
0a2ba1b1ae
Switch from LinkedList to ArrayList
...
Just a simple list. No need for extra overhead..
2013-03-23 08:16:23 -04:00
Tom Morris
bfa7c34d17
Merge pull request #659 - closes #659
2013-03-18 21:24:01 -04:00
Tom Morris
8a61cf731b
Merge pull request #664 from Arcadelia/Preserve_Quotes
...
Quotes should not be removed from values
2013-03-18 18:12:51 -07:00
Tom Morris
a2a8f4af2e
Patch applied - closed #315
2013-03-06 21:45:54 -05:00
Tom Morris
d8d82bf8b7
Clean up a couple more format guessing issues left over from #685
2013-03-06 20:39:39 -05:00
Tom Morris
369bfffb2f
Don't guess field widths unless we have at least 3 lines
...
- Investigation of #685 showed that single line files were being guessed
as fixed field width
2013-03-04 17:47:06 -05:00
Tom Morris
c0347225b8
Switch escape character from NUL to DEL in hopes that it's rarer.
2013-02-01 17:12:07 -05:00
Frank Wennerdahl
1f7ab046c7
Quotes should not be removed from values
...
Leading quotation marks should not be removed from values. If they have
been left by the importing parser they should be considered part of the
value.
2013-01-24 09:04:17 +01:00
Frank Wennerdahl
ebdc40ad71
Added CSV quote options
...
Added two additional CSV options, one for parsing and one for export.
Specifying strict quotes when parsing will ignore all data not quoted.
Specifying quote all when exporting will enclose all values in quotes.
No front-end changes made, just added the support for the options in the
requests.
2013-01-21 08:21:16 +01:00
Frank Wennerdahl
f837643f1e
Support for multi-char-separators in CSV
...
This change requires that the following patch is applied to OpenCSV:
http://sourceforge.net/tracker/index.php?func=detail&aid=3599477&group_id=148905&atid=773543
2013-01-18 16:28:27 +01:00
Tom Morris
b3f5fada95
FIXED - task 578 & 596: Clean up JSON importer
...
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596
Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import; Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
93d6e176d6
Task 478: Default "guess datatypes" to False so importers which don't specify it (e.g. gData & Excel) aren't effected
...
http://code.google.com/p/google-refine/issues/detail?id=478
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2541 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-07 21:17:34 +00:00
Tom Morris
4bf212c03d
FIXED - task 154: Can't import RDF/XML Data
...
http://code.google.com/p/google-refine/issues/detail?id=154
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2526 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 16:31:41 +00:00
Stefano Mazzocchi
6e41f4ad91
make the latest eclipse happy (it triggers a warning)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2513 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-12 01:55:11 +00:00
Tom Morris
a0812c5751
Be slightly more tolerant of weird spreadsheet data
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2501 7d457c2a-affb-35e4-300a-418c747d4874
2012-06-02 21:00:30 +00:00
Tom Morris
28ff2295fd
Issue 490 - Handle separator guessing for CSVs with quoted fields containing commas
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2458 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-08 15:53:55 +00:00
Tom Morris
190e817fb8
Protect against NullPointerException
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2444 7d457c2a-affb-35e4-300a-418c747d4874
2012-02-22 20:06:03 +00:00
Tom Morris
40183aa0ba
Issue 513 - get rid of exception at end of import in JSON parser
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2435 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-27 17:05:45 +00:00
Tom Morris
fdac0c30cf
Issue 524 - shorten __anonymous__ names for JSON importer
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2432 7d457c2a-affb-35e4-300a-418c747d4874
2012-01-26 22:38:25 +00:00
Tom Morris
b409ef5670
Issue 491 - fix off-by-one error in column counts
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2405 7d457c2a-affb-35e4-300a-418c747d4874
2011-12-09 23:50:40 +00:00