Commit Graph

166 Commits

Author SHA1 Message Date
Stefano Mazzocchi
d72c07b715 latest clustering fixes (the vptree is still too slow though, I'll probably abandon that approach for now)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@285 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 07:37:37 +00:00
David Huynh
58450555e9 Allow GEL identifiers to contain underscores.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@284 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:38:12 +00:00
David Huynh
025eccce4b Implemented "record" field for each row.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@283 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:33:03 +00:00
David Huynh
af3cb76056 Added support for including dependent rows in row visiting. Facets still don't count them, though.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@282 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 01:06:23 +00:00
David Huynh
7e2667ab45 Minor bug in Excel importer: we forgot to update the max cell index.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@281 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 00:23:01 +00:00
David Huynh
e760750b57 Fixed minor bug that prevented column details from getting passed on to recon service.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@280 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:55:32 +00:00
David Huynh
c3ebb5a9f4 Got Vishal's jython integration to work.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Vishal Talwar
4bb9e06772 adding extremely crude jython and clojure expression evaluation
git-svn-id: http://google-refine.googlecode.com/svn/trunk@276 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 23:43:21 +00:00
David Huynh
86a8e13d88 Added "numeric" choice to numeric range facets.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@272 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:48:04 +00:00
David Huynh
b1fca11342 Made recon use cells from context rows.
Fixed bug in menu left-right positioning.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@271 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:32:20 +00:00
Stefano Mazzocchi
a7d4951725 several improvements for clustering
- added a unicode ASCII-fiying addition to the fingerprinting functions
 - removed all distance functions for kNN that didn't seem to do anything useful
 - added the ability to indicate what value to use as cluster centroid by simply clicking on it
 (this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome)
 - added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones
 - made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset
(performance is still not ideal but it's now reasonable)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:45:14 +00:00
David Huynh
6bf5418f9d Cell changes should also flush column precomputes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@267 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:42:57 +00:00
David Huynh
e008332399 - make recon changes flush column precomputes
- fixed bug where recon features are not saved to file properly
- support selecting non-numeric, blank, and error choices in numeric range facets

git-svn-id: http://google-refine.googlecode.com/svn/trunk@265 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 06:02:36 +00:00
Stefano Mazzocchi
72b012971f more polish on the clustering dialog
git-svn-id: http://google-refine.googlecode.com/svn/trunk@264 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 01:18:41 +00:00
Stefano Mazzocchi
c03b223a78 fixed another NPE bug
git-svn-id: http://google-refine.googlecode.com/svn/trunk@262 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:43:49 +00:00
David Huynh
51b38a4eed Fixed minor bug in binning clusterer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@261 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 21:28:40 +00:00
Stefano Mazzocchi
0c4b79c53a more polish on the clustering dialog
git-svn-id: http://google-refine.googlecode.com/svn/trunk@260 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:58:51 +00:00
David Huynh
6b3a20dc46 Check for null confidence string.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@259 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:57:46 +00:00
David Huynh
2bac6844e2 Fixed csv importer to handle escaped quotation marks ("").
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
Stefano Mazzocchi
8ce21461cb getting closer to the desired functionality... still way too slow though
git-svn-id: http://google-refine.googlecode.com/svn/trunk@256 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 17:28:50 +00:00
Stefano Mazzocchi
50e58fb863 ngram-blocking gives more expected results... but slow as hell, maybe bug in the vptree code?
git-svn-id: http://google-refine.googlecode.com/svn/trunk@255 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 09:05:20 +00:00
Stefano Mazzocchi
546f87a536 let's try with another knn method
git-svn-id: http://google-refine.googlecode.com/svn/trunk@254 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:09:35 +00:00
David Huynh
311d15f493 Re-organized column header popup menus and added a bunch of common facets and common cell edit transforms.
Added native syntax for regex in GEL and modified replace, split, partition, and rpartition functions to support regex. Removed function replaceRegex.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@249 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 06:57:08 +00:00
Stefano Mazzocchi
5b079b04b7 - moved from float to double to avoid excessive casting from secondstring
- added a few of the more powerful distances
- fixed a bug in the VPTree builder (although is still not working as I expect it to)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@248 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 05:11:36 +00:00
David Huynh
4a4ae6bf27 Fixed toTitlecase to handle parentheses and other delimiters.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@240 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 19:40:51 +00:00
David Huynh
ac50b3c48b Re-worked the cell editor popup.
Don't keep logging "Saved workspace."

git-svn-id: http://google-refine.googlecode.com/svn/trunk@235 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 06:30:47 +00:00
David Huynh
5d3a57eeeb Implemented project import and export commands (from/to .tar files).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
David Huynh
a1ec0ea8df When saving projects, save only modified ones.
Save projects and workspace periodically.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@232 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 00:37:06 +00:00
David Huynh
3388c3e09f Still some old Serializable stuff to remove.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@228 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 23:02:57 +00:00
David Huynh
80e6111a92 Added options for omitting error and blank choices in list facets, and use them in the various recon facets.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@227 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:54:02 +00:00
David Huynh
694f09fb0a Major refactoring: everything is now saved to disk using our own formats, mostly json-based, some inside zip files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@226 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:37:26 +00:00
Stefano Mazzocchi
f7b0caa1b8 now kNN clustering is fully operational... not very practical though, needs more work and testing
git-svn-id: http://google-refine.googlecode.com/svn/trunk@225 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 08:27:13 +00:00
David Huynh
e06d8fe130 Better checking for null value in Cell.load.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@224 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:35:44 +00:00
David Huynh
e0d72c81e9 Renamed "facet-based edit" operation and command to "mass edit", because it's not just facet-based.
Added option "apply to other cells with same original content" to single cell edit popup, so it can be used like a find&replace operation.
Renamed "do-text-transform" operation and command to just "text-transform".

git-svn-id: http://google-refine.googlecode.com/svn/trunk@223 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:25:00 +00:00
David Huynh
253874b1a1 Got Clusterer to use Column.name rather than Column.headerLabel now.
Tried using Verdana instead of Tahoma as the common font.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@220 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 22:48:42 +00:00
Stefano Mazzocchi
976c1da5c7 much improved facet clustering dialog and functionality
NOTE: kNN clustering code operational but is not working as expected


git-svn-id: http://google-refine.googlecode.com/svn/trunk@219 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 10:17:58 +00:00
David Huynh
db824bffeb Fixed bug in saving recon changes.
Fixed bug in discard recon judgment operation.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@218 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 08:03:29 +00:00
David Huynh
78b1eb7e73 Major refactoring:
- Made all Change classes save to and load from .zip files.
- Changed Column.headerLabel to Column.name.
- Save project's raw data to "raw-data" file for now. We'll make it save to a zip file next.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@217 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 07:43:45 +00:00
David Huynh
589b9cd936 Re-organized popup menus for row operations. Added filter row.starred.
Disabled rendering of key column and column groups for now.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@216 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 22:31:47 +00:00
David Huynh
5c845f06bf Now we can delete a project even if it hasn't been saved to file yet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@214 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 19:43:08 +00:00
David Huynh
b3ac945c33 Implemented single-cell editing.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@210 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 08:11:48 +00:00
David Huynh
40cdf5092b Better display of Calendar objects in data table view and in expression preview dialog.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@208 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 02:25:27 +00:00
David Huynh
87d20f3299 Fixed minor bug in numeric bin index where if a value was infinity, the bin count went negative.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@207 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 02:10:24 +00:00
Stefano Mazzocchi
37e37488ec ability to delete a project from the front page
git-svn-id: http://google-refine.googlecode.com/svn/trunk@206 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 01:52:55 +00:00
David Huynh
1d6db8fa6e Made recon process cause the client page to create facets when the recon process is done.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@203 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 01:13:59 +00:00
Stefano Mazzocchi
32c0bf08c9 adding now() and inc() functions to the gel
git-svn-id: http://google-refine.googlecode.com/svn/trunk@202 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 20:53:07 +00:00
David Huynh
9d8b746121 Switched Cell.value from Object to Serializable.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@201 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:59:31 +00:00
David Huynh
3e0ac50e17 Fixed date parsing bug in index.js introduced since last commit.
Removed debugging console.log() call in browsing-engine.js.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@200 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:38:23 +00:00
David Huynh
1dc3d4abbd Save project metadata to disk as JSON now rather than through Java serialization API.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@199 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 19:15:46 +00:00
Stefano Mazzocchi
409b451085 started work on protocol buffers
git-svn-id: http://google-refine.googlecode.com/svn/trunk@197 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 08:46:33 +00:00