Commit Graph

6146 Commits

Author SHA1 Message Date
David Huynh
d0f77a5ef8 Minor layout tweak in clustering dialog.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@298 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 18:48:39 +00:00
Stefano Mazzocchi
7137b4bdf6 make use of multiple cores when doing clustering (has a consistent performance speedup for 5000 rows or more so I enable it by default)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@297 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 07:45:30 +00:00
Stefano Mazzocchi
227b30c860 more optimizations for clustering
git-svn-id: http://google-refine.googlecode.com/svn/trunk@296 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 04:30:49 +00:00
David Huynh
a32273de70 More work on the extend data preview dialog. Results now looks correct, but we still are not handling CVTs.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@295 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 20:55:57 +00:00
David Huynh
e35c4c3b94 Minor bug.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@294 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 05:15:23 +00:00
Stefano Mazzocchi
3495c417cd more fixes to the VPTree, this time it's working consistently for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@293 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 01:29:27 +00:00
Stefano Mazzocchi
f7ab7c9cf6 - incorporated Paolo Ciccarese's fixes for VPTrees in Vicino
- moved all clustering stuff in the vicino package space to simplify external collaboration on that code
- added "type" function to the GEL


git-svn-id: http://google-refine.googlecode.com/svn/trunk@292 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:34:17 +00:00
Stefano Mazzocchi
2946f2e8c3 - renamed facet-based-edit-dialog -> clustering-dialog
- added help in case the clustering dialog comes up with no clusters
- changed 'remove' -> (x) button for text facet


git-svn-id: http://google-refine.googlecode.com/svn/trunk@291 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:32:06 +00:00
David Huynh
99ae7dea29 More work on the extend data preview dialog. It's starting to render some results.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@290 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 08:08:25 +00:00
David Huynh
f34577ec85 Improved grid layout CSS rules.
Started working on extending data from Freebase feature.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@289 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 07:13:18 +00:00
David Huynh
c637df71c9 Factored out grid layout CSS rules.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@288 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 22:42:05 +00:00
David Huynh
67bac099f0 - Made removing facet a bit more interactive: the removed facet disappears right away.
- Made list facets limit themselves to only 2000 choices, so not to overload the browser.
- Made list and range facets handle errors in expressions better.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@287 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 20:08:50 +00:00
Stefano Mazzocchi
00a81c5fc4 make the kNN clustering report the right counts for the facet values (and order them in the clusters by counts)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@286 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 19:10:22 +00:00
Stefano Mazzocchi
d72c07b715 latest clustering fixes (the vptree is still too slow though, I'll probably abandon that approach for now)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@285 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 07:37:37 +00:00
David Huynh
58450555e9 Allow GEL identifiers to contain underscores.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@284 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:38:12 +00:00
David Huynh
025eccce4b Implemented "record" field for each row.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@283 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:33:03 +00:00
David Huynh
af3cb76056 Added support for including dependent rows in row visiting. Facets still don't count them, though.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@282 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 01:06:23 +00:00
David Huynh
7e2667ab45 Minor bug in Excel importer: we forgot to update the max cell index.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@281 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 00:23:01 +00:00
David Huynh
e760750b57 Fixed minor bug that prevented column details from getting passed on to recon service.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@280 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:55:32 +00:00
David Huynh
7526e07e6d TOOL-153: property suggest now allows full property IDs.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@279 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:19:51 +00:00
David Huynh
4855f70d88 Fixed event wiring bug in index.html that prevented the project name validation from getting run, and prevented limit= and skip= params from getting sent to the server.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@278 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 20:58:42 +00:00
David Huynh
c3ebb5a9f4 Got Vishal's jython integration to work.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Vishal Talwar
4bb9e06772 adding extremely crude jython and clojure expression evaluation
git-svn-id: http://google-refine.googlecode.com/svn/trunk@276 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 23:43:21 +00:00
David Huynh
c81548529b Fixed process widget to scroll properly.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@275 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 21:18:12 +00:00
David Huynh
2094ded82c Updated site index.html to link to 1.0b files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@273 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:56:38 +00:00
David Huynh
86a8e13d88 Added "numeric" choice to numeric range facets.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@272 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:48:04 +00:00
David Huynh
b1fca11342 Made recon use cells from context rows.
Fixed bug in menu left-right positioning.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@271 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:32:20 +00:00
David Huynh
432e88a23b Fixed range facet to restore non-numeric, blank, and error selections from its ui state properly.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@270 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:15:01 +00:00
David Huynh
f02fd3f5c3 Try to keep the same scrolling position in the data table view when it gets re-rendered.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@269 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:51:34 +00:00
Stefano Mazzocchi
a7d4951725 several improvements for clustering
- added a unicode ASCII-fiying addition to the fingerprinting functions
 - removed all distance functions for kNN that didn't seem to do anything useful
 - added the ability to indicate what value to use as cluster centroid by simply clicking on it
 (this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome)
 - added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones
 - made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset
(performance is still not ideal but it's now reasonable)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:45:14 +00:00
David Huynh
6bf5418f9d Cell changes should also flush column precomputes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@267 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:42:57 +00:00
David Huynh
0160b6841d Fixed data table view bugs: collapsed columns should now stay collapsed even if the column model changes.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@266 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:12:09 +00:00
David Huynh
e008332399 - make recon changes flush column precomputes
- fixed bug where recon features are not saved to file properly
- support selecting non-numeric, blank, and error choices in numeric range facets

git-svn-id: http://google-refine.googlecode.com/svn/trunk@265 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 06:02:36 +00:00
Stefano Mazzocchi
72b012971f more polish on the clustering dialog
git-svn-id: http://google-refine.googlecode.com/svn/trunk@264 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 01:18:41 +00:00
David Huynh
1000d63539 In the range facet, although we don't want to update the facets until the user stops dragging, we still want to update the selection indicators from-to during the dragging.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@263 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:53:53 +00:00
Stefano Mazzocchi
c03b223a78 fixed another NPE bug
git-svn-id: http://google-refine.googlecode.com/svn/trunk@262 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:43:49 +00:00
David Huynh
51b38a4eed Fixed minor bug in binning clusterer.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@261 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 21:28:40 +00:00
Stefano Mazzocchi
0c4b79c53a more polish on the clustering dialog
git-svn-id: http://google-refine.googlecode.com/svn/trunk@260 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:58:51 +00:00
David Huynh
6b3a20dc46 Check for null confidence string.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@259 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:57:46 +00:00
Stefano Mazzocchi
3f7397e467 fixing typo
git-svn-id: http://google-refine.googlecode.com/svn/trunk@258 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:16:04 +00:00
David Huynh
2bac6844e2 Fixed csv importer to handle escaped quotation marks ("").
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
Stefano Mazzocchi
8ce21461cb getting closer to the desired functionality... still way too slow though
git-svn-id: http://google-refine.googlecode.com/svn/trunk@256 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 17:28:50 +00:00
Stefano Mazzocchi
50e58fb863 ngram-blocking gives more expected results... but slow as hell, maybe bug in the vptree code?
git-svn-id: http://google-refine.googlecode.com/svn/trunk@255 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 09:05:20 +00:00
Stefano Mazzocchi
546f87a536 let's try with another knn method
git-svn-id: http://google-refine.googlecode.com/svn/trunk@254 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:09:35 +00:00
Stefano Mazzocchi
358586ac8f adding minimal unit testing framework (type ./gridworks test to run)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@253 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:08:35 +00:00
Stefano Mazzocchi
0a7882bf6c moving testing data into its own folder
git-svn-id: http://google-refine.googlecode.com/svn/trunk@252 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:34:41 +00:00
Stefano Mazzocchi
5b0129b36f adding junit in preparation from testing
git-svn-id: http://google-refine.googlecode.com/svn/trunk@251 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:33:36 +00:00
David Huynh
977dbdb9ed Fixed numeric range facet:
- use X icon for facet remove button
- update facet only when user finishes dragging slider bracket

git-svn-id: http://google-refine.googlecode.com/svn/trunk@250 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:01:13 +00:00
David Huynh
311d15f493 Re-organized column header popup menus and added a bunch of common facets and common cell edit transforms.
Added native syntax for regex in GEL and modified replace, split, partition, and rpartition functions to support regex. Removed function replaceRegex.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@249 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 06:57:08 +00:00
Stefano Mazzocchi
5b079b04b7 - moved from float to double to avoid excessive casting from secondstring
- added a few of the more powerful distances
- fixed a bug in the VPTree builder (although is still not working as I expect it to)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@248 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 05:11:36 +00:00