Stefano Mazzocchi
771810bc0d
avoid exception if there is only one extension in the whole archive
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@385 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 21:36:27 +00:00
Stefano Mazzocchi
2efbf0031f
- removed the 'thirdparty' directory (now the 'gridworks' script will download and install needed tools if they are not present in the system already)
...
- added 'findbugs' command that uses the findbugs static analyzer to look for problems in the code
- fixed a bunch of issues that findbugs found (a few methods would go a little faster, and a few NPE will be avoided... nothing major but good to have)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@382 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 07:15:16 +00:00
Stefano Mazzocchi
798b2a36ca
- archive and compressed file importer (supports zip, tar, gz, bz2, tar.gz and tar.bz2)
...
(works by loading the files that have the most common extensions in the archive)
- changed default max heap to 3Gb
git-svn-id: http://google-refine.googlecode.com/svn/trunk@381 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-04 07:48:47 +00:00
Stefano Mazzocchi
c24ec94835
had to shuffle around a bunch of classes to separate the main server classloader from the context classloader and allow reloading to happen for real
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@377 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:11 +00:00
Stefano Mazzocchi
72203cd3d5
- moved all code that contained MIT IP outside ( http://code.google.com/p/simile-vicino/ )
...
- moved bzip2 and tar code from apache ant into their own jar files
- now gridworks source contains only com.metaweb.* code everything else is a jar dependency
- started to work on archive importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@376 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 23:40:12 +00:00
Stefano Mazzocchi
4eda7ae2c0
avoid an array out of bounds exception in case there are no columns in the dataset
...
(I know, it should not happen but when it does let's not barf)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@375 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 20:21:41 +00:00
Stefano Mazzocchi
62f5f21ca3
atom is handled as well by the XML importer
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@374 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 06:44:05 +00:00
Stefano Mazzocchi
83faee3aa9
add a frame-less menu item in macosx to be able to open another gridworks browser window/tab in case we closed it by mistake
...
(no idea how to do this on windows, though, since there is no frame-less menu concept there)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@373 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 06:39:41 +00:00
Stefano Mazzocchi
521acda025
- pass the svn revision as format version (for more detailed verification)
...
- add an 'autoreload' setting that makes Gridworks autoreload its self if a class gets changed
(this is useful to make development cycles faster when working on the java code with autocompiling IDE like Eclipse or IDEA)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@372 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 00:52:38 +00:00
Stefano Mazzocchi
d1e72e7797
make the undo dialog closable
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@371 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 19:11:37 +00:00
Stefano Mazzocchi
988378c761
Hmm, String.split() bites us again: use the commons-lang one instead to avoid having to escape regexp values (this was preventing a user from splitting by "." in GEL)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@370 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 17:49:31 +00:00
Stefano Mazzocchi
0e07ec7acc
crude, I know, but for now make Gridworks digest RDF/XML as it was XML (works surprisingly well, btw)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@369 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 16:56:38 +00:00
Stefano Mazzocchi
dced641599
- added the ability to specify the character separator for CSV or TSV files that don't use commas or tabs (this was needed to parse a dataset that we got from the BBC to try things out)
...
- used commons-lang split function instead of the java String.split one, this is necessary to avoid having to escape separators that might be confused for regexps
git-svn-id: http://google-refine.googlecode.com/svn/trunk@368 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 22:34:21 +00:00
Stefano Mazzocchi
77b452e87f
adding version information to the about page
...
NOTE: this shows up only in the packaged distribution
git-svn-id: http://google-refine.googlecode.com/svn/trunk@367 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 19:20:32 +00:00
Stefano Mazzocchi
3c9af6501e
more consistent naming and various polishing
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@364 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 01:07:52 +00:00
Stefano Mazzocchi
5884d257db
default to 'relevance' instead of recon (which is faster). Change to recon if the user suggests schema hooks.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@363 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 01:07:17 +00:00
Stefano Mazzocchi
571f2c9ab3
- better README
...
- made the build system obtain and use svn revision info directly in version.js
- fixed lunch4j initial memory usage
- added .ini support for .exe starting in windows
- more robust up-to-date logic that uses SVN revisions instead of dates
- connected to new freebase.com/labs/gridworks web site
git-svn-id: http://google-refine.googlecode.com/svn/trunk@362 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 00:17:40 +00:00
Stefano Mazzocchi
7c132cfa53
clean eclipse warnings
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@357 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-29 20:51:18 +00:00
David Huynh
1d0e6abaf8
Got some work done on the plane:
...
- better detection of record XML elements in XML importer
- XML importer creates column groups and data table view renders them
git-svn-id: http://google-refine.googlecode.com/svn/trunk@356 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-27 05:23:09 +00:00
David Huynh
2a9fbd7d81
Made sure columns are named hierarchically in XML importer.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@355 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 23:15:10 +00:00
David Huynh
df7389876f
First shot at XML import.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@354 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 23:08:08 +00:00
David Huynh
4e76155652
Use application/x-unknown when exporting TSV so that the browser just saves the result. This is good for large exports, which overload the browser if the browser tries to display them.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@353 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 17:51:27 +00:00
David Huynh
47cad64a3f
Properly unescape \t, \r, \n, \\.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@352 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 17:50:44 +00:00
David Huynh
30e3ca4965
Added splitByLengths function.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@351 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 04:30:31 +00:00
David Huynh
4df1c4107a
Fixed a bug introduced recently: recon candidates were not serializing their topic types for the data view, so in the data view we can't send back a candidate's types when the user wants to match the candidate to some cells. I need to figure out a better way to optimize this.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@350 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 03:58:52 +00:00
David Huynh
00cce1b99a
Styling tweaks.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@349 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 21:24:44 +00:00
David Huynh
c6cd48a6d2
Polished the path in the header pane.
...
Made dropdown menu graphics a little more subtle.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@348 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 21:12:13 +00:00
David Huynh
32e395d0e6
Updated version's date for another rolling release.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@347 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 19:43:06 +00:00
David Huynh
cbfa77dcaa
Polished history widget. Now there's a link to roll the widget up.
...
Made sure busy dialog also has rounded corners on Chrome.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@346 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 19:30:50 +00:00
David Huynh
5c97177efd
Added "reset" and "remove" links to facet panel.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@345 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 18:53:29 +00:00
David Huynh
85d1671d6e
Fixed minor bug: recon wasn't saving out its candidates if its judgment is Matched. So when a project is saved and reloaded, it loses all of the recon candidates except for the matches.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@344 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 18:02:08 +00:00
David Huynh
ac57dea9c7
Do our own positioning of the process widget rather than using margin:auto, so that the links on the top header panel don't get obscured.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@343 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 04:48:05 +00:00
David Huynh
455802bffb
Alert user of new version to download, if any.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@342 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 04:36:01 +00:00
David Huynh
2846d66261
Detect max cell index on load, just in case the max cell index we've stored previously was out of whack.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@341 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 03:19:17 +00:00
David Huynh
f8d30e9e8e
Don't send back recon candidate types for rendering cells.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@340 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 01:17:45 +00:00
David Huynh
c07ba83a36
Don't send back recon candidate types for rendering cells.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@339 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 01:17:32 +00:00
David Huynh
3dc4db020f
Support quick undo of the last operation (Ctrl-Z).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@338 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 00:26:28 +00:00
David Huynh
6d8776953d
Added case sensitive and regex checkboxes to text search facets.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@337 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 23:01:53 +00:00
David Huynh
f5d270e35a
Fixed "off by one bucket" bug in range facet's binning algorithm.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@336 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 22:04:29 +00:00
David Huynh
19ba207d27
Re-ordered the "other" choices in numeric range facets to make better use of space.
...
Changed main layout of whole application so that the horizontal scrollbar of the data table is visible without scrolling.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@335 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 21:48:36 +00:00
David Huynh
1d20b33cf1
Documented the history package.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@334 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 18:19:31 +00:00
David Huynh
ec0110d65b
Documented gel.* packages.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@333 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 05:44:35 +00:00
David Huynh
60dd7eab82
Documented expr.* packages.
...
Converted some tabs into spaces.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@332 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-22 00:54:56 +00:00
David Huynh
60f60507f7
Fixed minor bug introduced recently into the Export Project menu command.
...
Documented the commands.* packages.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@331 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-21 20:26:35 +00:00
David Huynh
7648126a5e
Made facets deal with java.util.Collection rather than just Object[].
...
Documented the browsing.* packages.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@330 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-21 07:14:39 +00:00
David Huynh
d90e75dff1
Started a round of documentation.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@329 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-20 23:56:28 +00:00
David Huynh
a43b2a72c1
Made various GEL functions and the forEach control work with java.util.List and java.util.Collection in addition to just Object[].
...
Added field columnNames to row object.
Added 1-bounded numeric log facet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@328 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 23:04:17 +00:00
Vishal Talwar
6fba7d1e7f
sped up jython evaluation by calling function directly instead of invoking parser on string representation of function call
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@327 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 21:45:01 +00:00
David Huynh
ff0049307e
Increased file upload size limit to 1GB.
...
Fixed charset detector to be more robust in trying more than one charset.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@326 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 19:53:11 +00:00
David Huynh
fd85be7816
Added licenses to about page; styled it a bit.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@325 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 18:59:37 +00:00
David Huynh
8dd0dea472
Try to roundtrip reconciled IDs as much as possible when import/export as Excel files.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@322 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 00:32:52 +00:00
David Huynh
5db6ee8ae5
When the user invokes the Export Tripleloader command, check if the protograph is null or not and alert accordingly.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@321 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-18 23:35:54 +00:00
David Huynh
b54f7162a8
Made histogram widget capable of highlighting the selected range.
...
Added value.log() common numeric facet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@320 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-18 22:58:06 +00:00
David Huynh
91241539cf
Switched to a canvas-based implementation of histograms.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@319 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-18 20:45:52 +00:00
Stefano Mazzocchi
b9b4bb0ab4
better dropdown button that doesn't look disabled
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@318 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-18 01:47:03 +00:00
David Huynh
b7338e13f2
Tweaked column header menu dropdown icon.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@317 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-18 00:35:34 +00:00
David Huynh
d56bbc1208
Renamed Protograph Node dialog's title to Schema Skeleton Node.
...
Made deleting a protograph link update the previews.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@316 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 22:55:40 +00:00
David Huynh
124960e756
Made "search for match" dialog commit on fb-select event.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@315 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 22:42:59 +00:00
David Huynh
07cf85b2a5
Added make_cli option for generating a zip containing all files necessary to do 'gridworks run' at the command line. This excludes Java source files but contains pretty much everything else.
...
Added make_all option that makes dmg, exe, and cli.
Added html and xls exporters.
Made exported files named after project names rather than project IDs.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@314 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 22:30:16 +00:00
David Huynh
07945f9cde
A more helpful error message when the excel importer fails.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@313 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 21:43:58 +00:00
David Huynh
cd062cf028
Minor bug: recon candidate's "id" field should return id, not name.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@312 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 19:54:27 +00:00
David Huynh
b26160dc2b
Hopefully a more robust way to get the user data dir on Windows, especially on Windows Vista 64-bit, which jdatapath.dll isn't built for.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@311 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-17 03:51:58 +00:00
David Huynh
b8519e42d6
Use non-breaking hyphens for "re-match" links.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@309 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 21:01:27 +00:00
David Huynh
999c18cae7
Better date/time format for projects' last modified fields.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@308 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 20:55:01 +00:00
David Huynh
1a8a236cdd
Added an error page for when a project create operation fails.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@307 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 18:40:07 +00:00
David Huynh
798805edc5
More styling tweaks.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@306 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 18:11:08 +00:00
David Huynh
4e262f0e1d
Styling tweaks.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@305 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 17:38:38 +00:00
Stefano Mazzocchi
ad6e8c2e0c
add the ability to browse the values of a particular cluster
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@304 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 06:53:08 +00:00
David Huynh
084a6114d7
Track freebase types of columns added with data from Freebase, so that we can later add more data based on those columns. Fixed minor bug in serialization of data extension records.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@303 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 06:18:00 +00:00
Stefano Mazzocchi
cf95e5b5f6
freebase branding
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@302 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 01:46:32 +00:00
David Huynh
c6e7986206
Extend data operation is working.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@301 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 00:24:20 +00:00
David Huynh
2645c864ab
We can now suggest CVT properties.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@300 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 20:41:08 +00:00
David Huynh
c30a5126df
More work on the extend data preview dialog: columns can now be removed.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@299 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 19:44:33 +00:00
David Huynh
d0f77a5ef8
Minor layout tweak in clustering dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@298 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 18:48:39 +00:00
Stefano Mazzocchi
7137b4bdf6
make use of multiple cores when doing clustering (has a consistent performance speedup for 5000 rows or more so I enable it by default)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@297 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 07:45:30 +00:00
Stefano Mazzocchi
227b30c860
more optimizations for clustering
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@296 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-15 04:30:49 +00:00
David Huynh
a32273de70
More work on the extend data preview dialog. Results now looks correct, but we still are not handling CVTs.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@295 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 20:55:57 +00:00
David Huynh
e35c4c3b94
Minor bug.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@294 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 05:15:23 +00:00
Stefano Mazzocchi
3495c417cd
more fixes to the VPTree, this time it's working consistently for real
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@293 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-14 01:29:27 +00:00
Stefano Mazzocchi
f7ab7c9cf6
- incorporated Paolo Ciccarese's fixes for VPTrees in Vicino
...
- moved all clustering stuff in the vicino package space to simplify external collaboration on that code
- added "type" function to the GEL
git-svn-id: http://google-refine.googlecode.com/svn/trunk@292 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:34:17 +00:00
Stefano Mazzocchi
2946f2e8c3
- renamed facet-based-edit-dialog -> clustering-dialog
...
- added help in case the clustering dialog comes up with no clusters
- changed 'remove' -> (x) button for text facet
git-svn-id: http://google-refine.googlecode.com/svn/trunk@291 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 09:32:06 +00:00
David Huynh
99ae7dea29
More work on the extend data preview dialog. It's starting to render some results.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@290 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 08:08:25 +00:00
David Huynh
f34577ec85
Improved grid layout CSS rules.
...
Started working on extending data from Freebase feature.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@289 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-13 07:13:18 +00:00
David Huynh
c637df71c9
Factored out grid layout CSS rules.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@288 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 22:42:05 +00:00
David Huynh
67bac099f0
- Made removing facet a bit more interactive: the removed facet disappears right away.
...
- Made list facets limit themselves to only 2000 choices, so not to overload the browser.
- Made list and range facets handle errors in expressions better.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@287 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 20:08:50 +00:00
Stefano Mazzocchi
00a81c5fc4
make the kNN clustering report the right counts for the facet values (and order them in the clusters by counts)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@286 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 19:10:22 +00:00
Stefano Mazzocchi
d72c07b715
latest clustering fixes (the vptree is still too slow though, I'll probably abandon that approach for now)
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@285 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 07:37:37 +00:00
David Huynh
58450555e9
Allow GEL identifiers to contain underscores.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@284 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:38:12 +00:00
David Huynh
025eccce4b
Implemented "record" field for each row.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@283 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 06:33:03 +00:00
David Huynh
af3cb76056
Added support for including dependent rows in row visiting. Facets still don't count them, though.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@282 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 01:06:23 +00:00
David Huynh
7e2667ab45
Minor bug in Excel importer: we forgot to update the max cell index.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@281 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-12 00:23:01 +00:00
David Huynh
e760750b57
Fixed minor bug that prevented column details from getting passed on to recon service.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@280 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:55:32 +00:00
David Huynh
7526e07e6d
TOOL-153: property suggest now allows full property IDs.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@279 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 21:19:51 +00:00
David Huynh
4855f70d88
Fixed event wiring bug in index.html that prevented the project name validation from getting run, and prevented limit= and skip= params from getting sent to the server.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@278 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 20:58:42 +00:00
David Huynh
c3ebb5a9f4
Got Vishal's jython integration to work.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Vishal Talwar
4bb9e06772
adding extremely crude jython and clojure expression evaluation
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@276 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 23:43:21 +00:00
David Huynh
c81548529b
Fixed process widget to scroll properly.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@275 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 21:18:12 +00:00
David Huynh
86a8e13d88
Added "numeric" choice to numeric range facets.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@272 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:48:04 +00:00
David Huynh
b1fca11342
Made recon use cells from context rows.
...
Fixed bug in menu left-right positioning.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@271 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:32:20 +00:00
David Huynh
432e88a23b
Fixed range facet to restore non-numeric, blank, and error selections from its ui state properly.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@270 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 08:15:01 +00:00
David Huynh
f02fd3f5c3
Try to keep the same scrolling position in the data table view when it gets re-rendered.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@269 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:51:34 +00:00
Stefano Mazzocchi
a7d4951725
several improvements for clustering
...
- added a unicode ASCII-fiying addition to the fingerprinting functions
- removed all distance functions for kNN that didn't seem to do anything useful
- added the ability to indicate what value to use as cluster centroid by simply clicking on it
(this is useful for those names that have non-ASCII chars that might not even be on your keyboard.. and cut/paste is error prone/cumbersome)
- added a 10x multiplier to the PPM compression distance which makes it more aligned with the levenshtein ones
- made sure that we construct a phonetic fingerprint for the whole string and not just the beginning subset
(performance is still not ideal but it's now reasonable)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@268 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:45:14 +00:00
David Huynh
6bf5418f9d
Cell changes should also flush column precomputes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@267 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:42:57 +00:00
David Huynh
0160b6841d
Fixed data table view bugs: collapsed columns should now stay collapsed even if the column model changes.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@266 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 07:12:09 +00:00
David Huynh
e008332399
- make recon changes flush column precomputes
...
- fixed bug where recon features are not saved to file properly
- support selecting non-numeric, blank, and error choices in numeric range facets
git-svn-id: http://google-refine.googlecode.com/svn/trunk@265 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 06:02:36 +00:00
Stefano Mazzocchi
72b012971f
more polish on the clustering dialog
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@264 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-10 01:18:41 +00:00
David Huynh
1000d63539
In the range facet, although we don't want to update the facets until the user stops dragging, we still want to update the selection indicators from-to during the dragging.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@263 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:53:53 +00:00
Stefano Mazzocchi
c03b223a78
fixed another NPE bug
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@262 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 22:43:49 +00:00
David Huynh
51b38a4eed
Fixed minor bug in binning clusterer.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@261 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 21:28:40 +00:00
Stefano Mazzocchi
0c4b79c53a
more polish on the clustering dialog
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@260 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:58:51 +00:00
David Huynh
6b3a20dc46
Check for null confidence string.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@259 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 20:57:46 +00:00
David Huynh
2bac6844e2
Fixed csv importer to handle escaped quotation marks ("").
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@257 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 19:10:55 +00:00
Stefano Mazzocchi
8ce21461cb
getting closer to the desired functionality... still way too slow though
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@256 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 17:28:50 +00:00
Stefano Mazzocchi
50e58fb863
ngram-blocking gives more expected results... but slow as hell, maybe bug in the vptree code?
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@255 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 09:05:20 +00:00
Stefano Mazzocchi
546f87a536
let's try with another knn method
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@254 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:09:35 +00:00
David Huynh
977dbdb9ed
Fixed numeric range facet:
...
- use X icon for facet remove button
- update facet only when user finishes dragging slider bracket
git-svn-id: http://google-refine.googlecode.com/svn/trunk@250 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 07:01:13 +00:00
David Huynh
311d15f493
Re-organized column header popup menus and added a bunch of common facets and common cell edit transforms.
...
Added native syntax for regex in GEL and modified replace, split, partition, and rpartition functions to support regex. Removed function replaceRegex.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@249 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 06:57:08 +00:00
Stefano Mazzocchi
5b079b04b7
- moved from float to double to avoid excessive casting from secondstring
...
- added a few of the more powerful distances
- fixed a bug in the VPTree builder (although is still not working as I expect it to)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@248 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 05:11:36 +00:00
David Huynh
af9e9f590b
Fixed minor bug in facets of facet-based edit dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@247 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 00:41:33 +00:00
David Huynh
4cc1933065
Got facets in facet-based edit dialog to update only after the user has finished dragging the slider bracket.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@246 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 00:33:07 +00:00
David Huynh
562b9d67a2
Customized brackets of slider widget so that they are asymmetric.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@245 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 00:14:01 +00:00
David Huynh
a3bcfc1576
Implemented facets on cluster metrics in facet-based edit dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@244 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 23:05:44 +00:00
David Huynh
ff94de5900
Made dialogs draggable.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@243 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 21:50:53 +00:00
David Huynh
8633b20392
Fixed layout bug in expression preview dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@242 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 21:02:04 +00:00
David Huynh
6b421c2c75
In property suggest, bubble up properties of included types as well.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@241 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 20:56:04 +00:00
David Huynh
4a4ae6bf27
Fixed toTitlecase to handle parentheses and other delimiters.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@240 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 19:40:51 +00:00
David Huynh
c147837a3e
Cell and facet choice edit popups now allow Shift-Enter as a means to insert new lines. Plain Enter still applies the edit immediately.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@239 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 19:27:37 +00:00
David Huynh
0ef0aec0c5
Implemented list facet choice edit.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@238 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 19:07:16 +00:00
David Huynh
d731b89b4c
Back to Tahoma as the main font.
...
Tried more branding colors on dialog header background and tab widget header background.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@237 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 07:21:46 +00:00
David Huynh
6472e1f076
Re-layout when window is resized.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@236 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 06:48:12 +00:00
David Huynh
ac50b3c48b
Re-worked the cell editor popup.
...
Don't keep logging "Saved workspace."
git-svn-id: http://google-refine.googlecode.com/svn/trunk@235 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 06:30:47 +00:00
David Huynh
5d3a57eeeb
Implemented project import and export commands (from/to .tar files).
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
David Huynh
12d5c6aba5
Fixed layout of extract operation dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@233 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 00:45:29 +00:00
David Huynh
a1ec0ea8df
When saving projects, save only modified ones.
...
Save projects and workspace periodically.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@232 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 00:37:06 +00:00
David Huynh
3388c3e09f
Still some old Serializable stuff to remove.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@228 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 23:02:57 +00:00
David Huynh
80e6111a92
Added options for omitting error and blank choices in list facets, and use them in the various recon facets.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@227 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:54:02 +00:00
David Huynh
694f09fb0a
Major refactoring: everything is now saved to disk using our own formats, mostly json-based, some inside zip files.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@226 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:37:26 +00:00
Stefano Mazzocchi
f7b0caa1b8
now kNN clustering is fully operational... not very practical though, needs more work and testing
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@225 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 08:27:13 +00:00
David Huynh
e06d8fe130
Better checking for null value in Cell.load.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@224 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:35:44 +00:00
David Huynh
e0d72c81e9
Renamed "facet-based edit" operation and command to "mass edit", because it's not just facet-based.
...
Added option "apply to other cells with same original content" to single cell edit popup, so it can be used like a find&replace operation.
Renamed "do-text-transform" operation and command to just "text-transform".
git-svn-id: http://google-refine.googlecode.com/svn/trunk@223 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:25:00 +00:00
David Huynh
b9308e4034
Added option to apply and recluster in the facet based edit dialog.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@222 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 23:34:27 +00:00
David Huynh
02cd59a5c0
Display cluster sizes and number of cluster in facet-based edit dialog.
...
Added command to invoke that dialog from column popup menu, so you don't have to create a text facet first to get to it.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@221 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 23:01:46 +00:00
David Huynh
253874b1a1
Got Clusterer to use Column.name rather than Column.headerLabel now.
...
Tried using Verdana instead of Tahoma as the common font.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@220 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 22:48:42 +00:00
Stefano Mazzocchi
976c1da5c7
much improved facet clustering dialog and functionality
...
NOTE: kNN clustering code operational but is not working as expected
git-svn-id: http://google-refine.googlecode.com/svn/trunk@219 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 10:17:58 +00:00
David Huynh
db824bffeb
Fixed bug in saving recon changes.
...
Fixed bug in discard recon judgment operation.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@218 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 08:03:29 +00:00
David Huynh
78b1eb7e73
Major refactoring:
...
- Made all Change classes save to and load from .zip files.
- Changed Column.headerLabel to Column.name.
- Save project's raw data to "raw-data" file for now. We'll make it save to a zip file next.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@217 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 07:43:45 +00:00
David Huynh
589b9cd936
Re-organized popup menus for row operations. Added filter row.starred.
...
Disabled rendering of key column and column groups for now.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@216 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 22:31:47 +00:00
David Huynh
fe78fb8e30
A bit of branding and re-laying out the front page.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@215 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 22:09:25 +00:00
David Huynh
5c845f06bf
Now we can delete a project even if it hasn't been saved to file yet.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@214 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 19:43:08 +00:00
David Huynh
676a189968
Re-organized the .css files to match the .js files.
...
git-svn-id: http://google-refine.googlecode.com/svn/trunk@213 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 19:36:55 +00:00