a7d4951725
- added a unicode ASCII-fiying addition to the fingerprinting functions - removed all distance functions for kNN that didn't seem to do anything useful - added the ability to indicate what value to use as cluster centroid by simply clicking on it (this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome) - added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones - made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset (performance is still not ideal but it's now reasonable) git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874 |
||
---|---|---|
.settings | ||
lib | ||
lib-src | ||
licenses | ||
src | ||
tests | ||
thirdparty | ||
.classpath | ||
.project | ||
build.xml | ||
gridworks | ||
gridworks.bat | ||
LICENSE.txt | ||
README.txt |
G r i d w o r k s ------------------- What is this? ------------- Gridworks is a tabular data exploration and manipulation tool. [more soon] - o - Thank you for your interest.