RandomSec/src
Stefano Mazzocchi a7d4951725 several improvements for clustering
- added a unicode ASCII-fiying addition to the fingerprinting functions
 - removed all distance functions for kNN that didn't seem to do anything useful
 - added the ability to indicate what value to use as cluster centroid by simply clicking on it
 (this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome)
 - added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones
 - made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset
(performance is still not ideal but it's now reasonable)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:45:14 +00:00
..
graphics Customized brackets of slider widget so that they are asymmetric. 2010-03-09 00:14:01 +00:00
main several improvements for clustering 2010-03-10 07:45:14 +00:00
site Introduced EvalError objects as possible values returned by expressions. 2010-02-27 05:48:33 +00:00