David Huynh
2bac6844e2
Fixed csv importer to handle escaped quotation marks ("").
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
Stefano Mazzocchi
8ce21461cb
getting closer to the desired functionality... still way too slow though
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@256 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 17:28:50 +00:00
Stefano Mazzocchi
50e58fb863
ngram-blocking gives more expected results... but slow as hell, maybe bug in the vptree code?
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@255 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 09:05:20 +00:00
Stefano Mazzocchi
546f87a536
let's try with another knn method
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@254 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:09:35 +00:00
David Huynh
311d15f493
Re-organized column header popup menus and added a bunch of common facets and common cell edit transforms.
...
Added native syntax for regex in GEL and modified replace, split, partition, and rpartition functions to support regex. Removed function replaceRegex.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@249 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 06:57:08 +00:00
Stefano Mazzocchi
5b079b04b7
- moved from float to double to avoid excessive casting from secondstring
...
- added a few of the more powerful distances
- fixed a bug in the VPTree builder (although is still not working as I expect it to)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@248 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 05:11:36 +00:00
David Huynh
4a4ae6bf27
Fixed toTitlecase to handle parentheses and other delimiters.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@240 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 19:40:51 +00:00
David Huynh
ac50b3c48b
Re-worked the cell editor popup.
...
Don't keep logging "Saved workspace."
git-svn-id: http://google-refine.googlecode.com/svn/trunk@235 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 06:30:47 +00:00
David Huynh
5d3a57eeeb
Implemented project import and export commands (from/to .tar files).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
David Huynh
a1ec0ea8df
When saving projects, save only modified ones.
...
Save projects and workspace periodically.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@232 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 00:37:06 +00:00
David Huynh
3388c3e09f
Still some old Serializable stuff to remove.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@228 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 23:02:57 +00:00
David Huynh
80e6111a92
Added options for omitting error and blank choices in list facets, and use them in the various recon facets.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@227 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:54:02 +00:00
David Huynh
694f09fb0a
Major refactoring: everything is now saved to disk using our own formats, mostly json-based, some inside zip files.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@226 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:37:26 +00:00
Stefano Mazzocchi
f7b0caa1b8
now kNN clustering is fully operational... not very practical though, needs more work and testing
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@225 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 08:27:13 +00:00
David Huynh
e06d8fe130
Better checking for null value in Cell.load.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@224 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:35:44 +00:00
David Huynh
e0d72c81e9
Renamed "facet-based edit" operation and command to "mass edit", because it's not just facet-based.
...
Added option "apply to other cells with same original content" to single cell edit popup, so it can be used like a find&replace operation.
Renamed "do-text-transform" operation and command to just "text-transform".
git-svn-id: http://google-refine.googlecode.com/svn/trunk@223 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:25:00 +00:00
David Huynh
253874b1a1
Got Clusterer to use Column.name rather than Column.headerLabel now.
...
Tried using Verdana instead of Tahoma as the common font.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@220 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 22:48:42 +00:00
Stefano Mazzocchi
976c1da5c7
much improved facet clustering dialog and functionality
...
NOTE: kNN clustering code operational but is not working as expected
git-svn-id: http://google-refine.googlecode.com/svn/trunk@219 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 10:17:58 +00:00
David Huynh
db824bffeb
Fixed bug in saving recon changes.
...
Fixed bug in discard recon judgment operation.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@218 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 08:03:29 +00:00
David Huynh
78b1eb7e73
Major refactoring:
...
- Made all Change classes save to and load from .zip files.
- Changed Column.headerLabel to Column.name.
- Save project's raw data to "raw-data" file for now. We'll make it save to a zip file next.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@217 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 07:43:45 +00:00
David Huynh
589b9cd936
Re-organized popup menus for row operations. Added filter row.starred.
...
Disabled rendering of key column and column groups for now.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@216 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 22:31:47 +00:00
David Huynh
5c845f06bf
Now we can delete a project even if it hasn't been saved to file yet.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@214 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 19:43:08 +00:00
David Huynh
b3ac945c33
Implemented single-cell editing.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@210 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 08:11:48 +00:00
David Huynh
40cdf5092b
Better display of Calendar objects in data table view and in expression preview dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@208 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 02:25:27 +00:00
David Huynh
87d20f3299
Fixed minor bug in numeric bin index where if a value was infinity, the bin count went negative.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@207 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 02:10:24 +00:00
Stefano Mazzocchi
37e37488ec
ability to delete a project from the front page
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@206 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 01:52:55 +00:00
David Huynh
1d6db8fa6e
Made recon process cause the client page to create facets when the recon process is done.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@203 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 01:13:59 +00:00
Stefano Mazzocchi
32c0bf08c9
adding now() and inc() functions to the gel
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@202 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 20:53:07 +00:00
David Huynh
9d8b746121
Switched Cell.value from Object to Serializable.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@201 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:59:31 +00:00
David Huynh
3e0ac50e17
Fixed date parsing bug in index.js introduced since last commit.
...
Removed debugging console.log() call in browsing-engine.js.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@200 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:38:23 +00:00
David Huynh
1dc3d4abbd
Save project metadata to disk as JSON now rather than through Java serialization API.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@199 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:15:46 +00:00
Stefano Mazzocchi
409b451085
started work on protocol buffers
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@197 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 08:46:33 +00:00
David Huynh
22f226358d
Added pre-canned facets isBlank(value) and isError(value).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@196 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 07:32:30 +00:00
David Huynh
6811f54f31
Fixed quoting bug in tripleloader transposer.
...
Implemented tripleloader exporter.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@194 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 07:02:03 +00:00
Stefano Mazzocchi
8f01da0aa8
fixing the date parser
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@193 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 04:06:38 +00:00
Stefano Mazzocchi
1695e2f8f1
add the ngramFingerprint function
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@191 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 02:37:25 +00:00
Stefano Mazzocchi
5c3ca7723a
use a TreeSet to do both sorting and de-dupe of the split fragments
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@190 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 02:37:06 +00:00
David Huynh
70df6821a0
Made expression preview dialog for text transform operation also support repeat option.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@189 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 02:25:42 +00:00
Stefano Mazzocchi
cde6a02cbb
typo (needed to escape ' which is actually *not* an HTML entity, who knew)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@188 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 02:24:06 +00:00
David Huynh
87956be756
Minor bug: don't try to bind null cell value.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@187 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 02:14:40 +00:00
David Huynh
1f05954924
Fixed regex text search facet to handle errors better. Use .text() rather than .html() to render cell values, or & will not show up.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@184 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 01:47:58 +00:00
David Huynh
72d06fe65c
Added support for canceling running and pending processes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@183 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 01:14:48 +00:00
David Huynh
eaef7b2394
Also let user decide what to do on expression evaluation error when creating a new column.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@182 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:32:54 +00:00
David Huynh
5a0a8bea4f
Added custom dialog box for create column operation (with a field for the new column name).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@180 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:12:39 +00:00
David Huynh
2fe8f98e4e
Added repeat and repeatCount options for text transform operation. This lets us fix those & repeated encoding problems easily.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@179 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:00:46 +00:00
David Huynh
b4d2cef526
Added an option for what to do when a text transform errors out. Made a custom expression preview dialog for the text transform command in order to suppor that option.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@178 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 22:12:48 +00:00
David Huynh
c1498448e4
Implemented global and per-project expression histories and hooked them up to the expression preview dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@176 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 21:21:38 +00:00
David Huynh
b75f1faea8
Changed tabs to spaces. No functionality change.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@174 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 04:19:58 +00:00
Stefano Mazzocchi
2691ee50d7
adding OS-specific data paths
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@173 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 02:53:07 +00:00
David Huynh
ad7671508f
Added "cancel processes" command, not hooked up yet.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@171 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 00:30:39 +00:00