Commit Graph

33 Commits

Author SHA1 Message Date
David Huynh
35da36b0e8 Fixed misspell in clustering dialog.
Added option for not splitting lines into columns on import.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@508 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-20 07:26:07 +00:00
David Huynh
d85a0e1851 Retrieve dates correctly from Excel files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@507 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-20 04:43:39 +00:00
Stefano Mazzocchi
85d7ed6b89 cleanup
git-svn-id: http://google-refine.googlecode.com/svn/trunk@491 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-17 03:02:34 +00:00
David Huynh
9e73a4e68c Started to work on a MARC importer. It doesn't work properly yet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@486 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-16 19:52:01 +00:00
David Huynh
a1a8758c37 Added options for specifying # lines the header columns take, and the # lines to skip processing entirely initially.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@468 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-13 21:23:41 +00:00
David Huynh
a2db5590ac Trim column names on import.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@461 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-13 06:28:13 +00:00
David Huynh
f7e830e709 Fixed bug in which editing a single cell and then starring the same row seemed to revert the cell back to its original content.
Added an option for not guessing cell value type during import.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@446 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-11 21:54:56 +00:00
David Huynh
5928a689e2 Use RowParser for parsing the header row, too.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@444 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-11 03:42:44 +00:00
David Huynh
70449cf7c8 Better error catching in toNumber function.
Watch out for the string "Infinity" while importing data sets: don't parse it into a double.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@438 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-09 21:59:50 +00:00
Stefano Mazzocchi
d3d40d608a bunch of PMD-induced fixes
(now the PMD report is clean)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@430 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-09 00:14:11 +00:00
David Huynh
5320cc6587 Make duplicated column names unique during import by appending indices to them.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@392 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 17:55:36 +00:00
Stefano Mazzocchi
798b2a36ca - archive and compressed file importer (supports zip, tar, gz, bz2, tar.gz and tar.bz2)
(works by loading the files that have the most common extensions in the archive)
- changed default max heap to 3Gb


git-svn-id: http://google-refine.googlecode.com/svn/trunk@381 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-04 07:48:47 +00:00
Stefano Mazzocchi
dced641599 - added the ability to specify the character separator for CSV or TSV files that don't use commas or tabs (this was needed to parse a dataset that we got from the BBC to try things out)
- used commons-lang split function instead of the java String.split one, this is necessary to avoid having to escape separators that might be confused for regexps


git-svn-id: http://google-refine.googlecode.com/svn/trunk@368 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 22:34:21 +00:00
Stefano Mazzocchi
7c132cfa53 clean eclipse warnings
git-svn-id: http://google-refine.googlecode.com/svn/trunk@357 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-29 20:51:18 +00:00
David Huynh
1d0e6abaf8 Got some work done on the plane:
- better detection of record XML elements in XML importer
- XML importer creates column groups and data table view renders them


git-svn-id: http://google-refine.googlecode.com/svn/trunk@356 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-27 05:23:09 +00:00
David Huynh
2a9fbd7d81 Made sure columns are named hierarchically in XML importer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@355 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 23:15:10 +00:00
David Huynh
df7389876f First shot at XML import.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@354 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 23:08:08 +00:00
David Huynh
8dd0dea472 Try to roundtrip reconciled IDs as much as possible when import/export as Excel files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@322 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 00:32:52 +00:00
David Huynh
07945f9cde A more helpful error message when the excel importer fails.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@313 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 21:43:58 +00:00
David Huynh
7e2667ab45 Minor bug in Excel importer: we forgot to update the max cell index.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@281 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 00:23:01 +00:00
David Huynh
2bac6844e2 Fixed csv importer to handle escaped quotation marks ("").
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
David Huynh
9d8b746121 Switched Cell.value from Object to Serializable.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@201 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:59:31 +00:00
David Huynh
b75f1faea8 Changed tabs to spaces. No functionality change.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@174 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 04:19:58 +00:00
David Huynh
c914aa6c16 Introduced EvalError objects as possible values returned by expressions.
Extracted function and control name mappings to ControlFunctionRegistry.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@148 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-27 05:48:33 +00:00
David Huynh
bb83dcda1c Added support for specifying number of initial rows to skip when creating a new project.
Fixed the height of the histogram images in range facets to eliminate jitters.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@135 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-24 18:52:54 +00:00
David Huynh
254853b51d Added reverse and sort functions.
Support a limit on how many rows to load into a new project.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@134 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-23 23:22:02 +00:00
David Huynh
f8a1daba62 Handle formula cells in Excel files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@77 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-09 01:13:11 +00:00
David Huynh
cd376c7532 Added support for Excel 2007 XML file format.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@73 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 23:44:33 +00:00
Stefano Mazzocchi
1f5b27653e POI deprecated the use of short, good thing
git-svn-id: http://google-refine.googlecode.com/svn/trunk@69 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 19:54:09 +00:00
David Huynh
d3f97fea93 While importing data, use null for cells with empty text.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@63 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 07:16:39 +00:00
Stefano Mazzocchi
a61f35079a make eclipse happier by removing @Override annotations when really it's an interface method implementation
(no functional changes)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@62 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 06:47:52 +00:00
David Huynh
a025b272bd String.isEmpty() is no longer there (?!).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@61 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 06:16:46 +00:00
David Huynh
16dda46a61 Refactored importers, adding support for Excel files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@47 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-05 19:19:38 +00:00