David Huynh
2645c864ab
We can now suggest CVT properties.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@300 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 20:41:08 +00:00
David Huynh
c30a5126df
More work on the extend data preview dialog: columns can now be removed.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@299 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 19:44:33 +00:00
David Huynh
d0f77a5ef8
Minor layout tweak in clustering dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@298 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 18:48:39 +00:00
Stefano Mazzocchi
7137b4bdf6
make use of multiple cores when doing clustering (has a consistent performance speedup for 5000 rows or more so I enable it by default)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@297 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 07:45:30 +00:00
Stefano Mazzocchi
227b30c860
more optimizations for clustering
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@296 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 04:30:49 +00:00
David Huynh
a32273de70
More work on the extend data preview dialog. Results now looks correct, but we still are not handling CVTs.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@295 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 20:55:57 +00:00
David Huynh
e35c4c3b94
Minor bug.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@294 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 05:15:23 +00:00
Stefano Mazzocchi
3495c417cd
more fixes to the VPTree, this time it's working consistently for real
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@293 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 01:29:27 +00:00
Stefano Mazzocchi
f7ab7c9cf6
- incorporated Paolo Ciccarese's fixes for VPTrees in Vicino
...
- moved all clustering stuff in the vicino package space to simplify external collaboration on that code
- added "type" function to the GEL
git-svn-id: http://google-refine.googlecode.com/svn/trunk@292 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:34:17 +00:00
Stefano Mazzocchi
2946f2e8c3
- renamed facet-based-edit-dialog -> clustering-dialog
...
- added help in case the clustering dialog comes up with no clusters
- changed 'remove' -> (x) button for text facet
git-svn-id: http://google-refine.googlecode.com/svn/trunk@291 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:32:06 +00:00
David Huynh
99ae7dea29
More work on the extend data preview dialog. It's starting to render some results.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@290 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 08:08:25 +00:00
David Huynh
f34577ec85
Improved grid layout CSS rules.
...
Started working on extending data from Freebase feature.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@289 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 07:13:18 +00:00
David Huynh
c637df71c9
Factored out grid layout CSS rules.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@288 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 22:42:05 +00:00
David Huynh
67bac099f0
- Made removing facet a bit more interactive: the removed facet disappears right away.
...
- Made list facets limit themselves to only 2000 choices, so not to overload the browser.
- Made list and range facets handle errors in expressions better.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@287 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 20:08:50 +00:00
Stefano Mazzocchi
00a81c5fc4
make the kNN clustering report the right counts for the facet values (and order them in the clusters by counts)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@286 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 19:10:22 +00:00
Stefano Mazzocchi
d72c07b715
latest clustering fixes (the vptree is still too slow though, I'll probably abandon that approach for now)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@285 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 07:37:37 +00:00
David Huynh
58450555e9
Allow GEL identifiers to contain underscores.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@284 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:38:12 +00:00
David Huynh
025eccce4b
Implemented "record" field for each row.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@283 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:33:03 +00:00
David Huynh
af3cb76056
Added support for including dependent rows in row visiting. Facets still don't count them, though.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@282 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 01:06:23 +00:00
David Huynh
7e2667ab45
Minor bug in Excel importer: we forgot to update the max cell index.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@281 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 00:23:01 +00:00
David Huynh
e760750b57
Fixed minor bug that prevented column details from getting passed on to recon service.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@280 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:55:32 +00:00
David Huynh
7526e07e6d
TOOL-153: property suggest now allows full property IDs.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@279 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:19:51 +00:00
David Huynh
4855f70d88
Fixed event wiring bug in index.html that prevented the project name validation from getting run, and prevented limit= and skip= params from getting sent to the server.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@278 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 20:58:42 +00:00
David Huynh
c3ebb5a9f4
Got Vishal's jython integration to work.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Vishal Talwar
4bb9e06772
adding extremely crude jython and clojure expression evaluation
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@276 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 23:43:21 +00:00
David Huynh
c81548529b
Fixed process widget to scroll properly.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@275 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 21:18:12 +00:00
David Huynh
2094ded82c
Updated site index.html to link to 1.0b files.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@273 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:56:38 +00:00
David Huynh
86a8e13d88
Added "numeric" choice to numeric range facets.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@272 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:48:04 +00:00
David Huynh
b1fca11342
Made recon use cells from context rows.
...
Fixed bug in menu left-right positioning.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@271 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:32:20 +00:00
David Huynh
432e88a23b
Fixed range facet to restore non-numeric, blank, and error selections from its ui state properly.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@270 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:15:01 +00:00
David Huynh
f02fd3f5c3
Try to keep the same scrolling position in the data table view when it gets re-rendered.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@269 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:51:34 +00:00
Stefano Mazzocchi
a7d4951725
several improvements for clustering
...
- added a unicode ASCII-fiying addition to the fingerprinting functions
- removed all distance functions for kNN that didn't seem to do anything useful
- added the ability to indicate what value to use as cluster centroid by simply clicking on it
(this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome)
- added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones
- made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset
(performance is still not ideal but it's now reasonable)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:45:14 +00:00
David Huynh
6bf5418f9d
Cell changes should also flush column precomputes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@267 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:42:57 +00:00
David Huynh
0160b6841d
Fixed data table view bugs: collapsed columns should now stay collapsed even if the column model changes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@266 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:12:09 +00:00
David Huynh
e008332399
- make recon changes flush column precomputes
...
- fixed bug where recon features are not saved to file properly
- support selecting non-numeric, blank, and error choices in numeric range facets
git-svn-id: http://google-refine.googlecode.com/svn/trunk@265 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 06:02:36 +00:00
Stefano Mazzocchi
72b012971f
more polish on the clustering dialog
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@264 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 01:18:41 +00:00
David Huynh
1000d63539
In the range facet, although we don't want to update the facets until the user stops dragging, we still want to update the selection indicators from-to during the dragging.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@263 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:53:53 +00:00
Stefano Mazzocchi
c03b223a78
fixed another NPE bug
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@262 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:43:49 +00:00
David Huynh
51b38a4eed
Fixed minor bug in binning clusterer.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@261 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 21:28:40 +00:00
Stefano Mazzocchi
0c4b79c53a
more polish on the clustering dialog
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@260 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:58:51 +00:00
David Huynh
6b3a20dc46
Check for null confidence string.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@259 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:57:46 +00:00
Stefano Mazzocchi
3f7397e467
fixing typo
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@258 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:16:04 +00:00
David Huynh
2bac6844e2
Fixed csv importer to handle escaped quotation marks ("").
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
Stefano Mazzocchi
8ce21461cb
getting closer to the desired functionality... still way too slow though
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@256 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 17:28:50 +00:00
Stefano Mazzocchi
50e58fb863
ngram-blocking gives more expected results... but slow as hell, maybe bug in the vptree code?
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@255 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 09:05:20 +00:00
Stefano Mazzocchi
546f87a536
let's try with another knn method
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@254 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:09:35 +00:00
Stefano Mazzocchi
358586ac8f
adding minimal unit testing framework (type ./gridworks test to run)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@253 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:08:35 +00:00
Stefano Mazzocchi
0a7882bf6c
moving testing data into its own folder
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@252 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:34:41 +00:00
Stefano Mazzocchi
5b0129b36f
adding junit in preparation from testing
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@251 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:33:36 +00:00
David Huynh
977dbdb9ed
Fixed numeric range facet:
...
- use X icon for facet remove button
- update facet only when user finishes dragging slider bracket
git-svn-id: http://google-refine.googlecode.com/svn/trunk@250 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:01:13 +00:00