Commit Graph

406 Commits

Author SHA1 Message Date
Stefano Mazzocchi
7690f932b7 more fixes
git-svn-id: http://google-refine.googlecode.com/svn/trunk@412 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 18:51:16 +00:00
Stefano Mazzocchi
f816a288af this command might fail in leopard (it's needed for snow-leopard) so avoid existing if that happens
git-svn-id: http://google-refine.googlecode.com/svn/trunk@411 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 18:40:50 +00:00
Stefano Mazzocchi
4626f88836 fixing untar issue
git-svn-id: http://google-refine.googlecode.com/svn/trunk@410 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 18:34:10 +00:00
David Huynh
302c27687c "type" in freebase suggest results got dropped so we need to fetch the result's types ourselves.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@409 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 18:23:07 +00:00
Stefano Mazzocchi
c921fb703f syntax problems
git-svn-id: http://google-refine.googlecode.com/svn/trunk@408 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 17:40:55 +00:00
Stefano Mazzocchi
6674b69773 adding PMD and CPD support
git-svn-id: http://google-refine.googlecode.com/svn/trunk@407 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 17:31:25 +00:00
Stefano Mazzocchi
6dbe794658 enabled windmill-based UI testing (type ./gridworks test to try)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@406 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 08:28:53 +00:00
Stefano Mazzocchi
6114530723 make sure the junit tests still work
git-svn-id: http://google-refine.googlecode.com/svn/trunk@405 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 01:17:14 +00:00
Jeff Fry
341a8d3f8f Initial Windmill tests for Gridworks: text facet, split values, numeric facet. Some asserts but likely want more.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@404 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:48:00 +00:00
Stefano Mazzocchi
8f5c35799b making room for windmill tests
git-svn-id: http://google-refine.googlecode.com/svn/trunk@403 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:35:59 +00:00
Stefano Mazzocchi
0e6718388e a little more solid
git-svn-id: http://google-refine.googlecode.com/svn/trunk@401 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:18:26 +00:00
Stefano Mazzocchi
0978a8ec1d make things more solid for snow-leopard
git-svn-id: http://google-refine.googlecode.com/svn/trunk@400 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:12:40 +00:00
Stefano Mazzocchi
9e90e6f05c this time is for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@399 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 23:42:57 +00:00
Stefano Mazzocchi
008cc32d0d - fixing the curl redirect issue
- reduced cut/paste
- starting to load windmill


git-svn-id: http://google-refine.googlecode.com/svn/trunk@398 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 23:28:52 +00:00
Stefano Mazzocchi
1ad122ee47 fixes issues in case wget is not found in path
git-svn-id: http://google-refine.googlecode.com/svn/trunk@397 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 23:13:09 +00:00
Stefano Mazzocchi
5afd2e8c00 hmm, let's see if this fixes the problem that David is seeing but I can't repro
git-svn-id: http://google-refine.googlecode.com/svn/trunk@396 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 23:05:45 +00:00
Stefano Mazzocchi
687eae475d make sure to have the tools dir
git-svn-id: http://google-refine.googlecode.com/svn/trunk@395 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 22:41:33 +00:00
David Huynh
0996b9e1dd Gzip project export tar files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@394 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 22:28:30 +00:00
David Huynh
309d682fcb Added data set for testing inter-project joins.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@393 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 18:16:17 +00:00
David Huynh
5320cc6587 Make duplicated column names unique during import by appending indices to them.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@392 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 17:55:36 +00:00
David Huynh
9d9329ca96 Implemented row remove command.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@391 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 07:47:44 +00:00
David Huynh
1fd85c62bf Implemented column rename command.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@390 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 07:15:34 +00:00
David Huynh
a28a8d1769 Fixed bug in collapse and expand all columns commands.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@389 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 06:18:43 +00:00
David Huynh
93d6f9fc54 Better error message for bad regular expressions in GEL.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@388 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 06:18:07 +00:00
David Huynh
f402db10af Implemented inter-project joins.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@387 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 05:35:48 +00:00
Stefano Mazzocchi
e2d92aa0b1 d'oh, placed the join in the wrong spot
git-svn-id: http://google-refine.googlecode.com/svn/trunk@386 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 01:13:29 +00:00
Stefano Mazzocchi
771810bc0d avoid exception if there is only one extension in the whole archive
git-svn-id: http://google-refine.googlecode.com/svn/trunk@385 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 21:36:27 +00:00
Stefano Mazzocchi
9dfdd1e351 make the thread join so that we can use ctrl-c to exit from the console
git-svn-id: http://google-refine.googlecode.com/svn/trunk@384 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 21:36:08 +00:00
Stefano Mazzocchi
7590e30454 revert back to 1Gb starting heap size (3Gb was too much)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@383 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 21:09:34 +00:00
Stefano Mazzocchi
2efbf0031f - removed the 'thirdparty' directory (now the 'gridworks' script will download and install needed tools if they are not present in the system already)
- added 'findbugs' command that uses the findbugs static analyzer to look for problems in the code
- fixed a bunch of issues that findbugs found (a few methods would go a little faster, and a few NPE will be avoided... nothing major but good to have)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@382 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 07:15:16 +00:00
Stefano Mazzocchi
798b2a36ca - archive and compressed file importer (supports zip, tar, gz, bz2, tar.gz and tar.bz2)
(works by loading the files that have the most common extensions in the archive)
- changed default max heap to 3Gb


git-svn-id: http://google-refine.googlecode.com/svn/trunk@381 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-04 07:48:47 +00:00
Stefano Mazzocchi
65c5aea079 set the internal version by hand, this is done to avoid NPE after reloads but also because it's only used to version the data dumps so the granularity of svn revisions was too high anyway
git-svn-id: http://google-refine.googlecode.com/svn/trunk@380 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 17:01:04 +00:00
Stefano Mazzocchi
7cf4f2e5e4 make autoreloading transpartent (so that we don't forget to set it and later think it's not working... like I just did ;-)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@379 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:28:54 +00:00
Stefano Mazzocchi
d312a75b2f don't need this
git-svn-id: http://google-refine.googlecode.com/svn/trunk@378 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:43 +00:00
Stefano Mazzocchi
c24ec94835 had to shuffle around a bunch of classes to separate the main server classloader from the context classloader and allow reloading to happen for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@377 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:11 +00:00
Stefano Mazzocchi
72203cd3d5 - moved all code that contained MIT IP outside (http://code.google.com/p/simile-vicino/)
- moved bzip2 and tar code from apache ant into their own jar files
- now gridworks source contains only com.metaweb.* code everything else is a jar dependency
- started to work on archive importer


git-svn-id: http://google-refine.googlecode.com/svn/trunk@376 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 23:40:12 +00:00
Stefano Mazzocchi
4eda7ae2c0 avoid an array out of bounds exception in case there are no columns in the dataset
(I know, it should not happen but when it does let's not barf)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@375 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 20:21:41 +00:00
Stefano Mazzocchi
62f5f21ca3 atom is handled as well by the XML importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@374 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 06:44:05 +00:00
Stefano Mazzocchi
83faee3aa9 add a frame-less menu item in macosx to be able to open another gridworks browser window/tab in case we closed it by mistake
(no idea how to do this on windows, though, since there is no frame-less menu concept there)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@373 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 06:39:41 +00:00
Stefano Mazzocchi
521acda025 - pass the svn revision as format version (for more detailed verification)
- add an 'autoreload' setting that makes Gridworks autoreload its self if a class gets changed
(this is useful to make development cycles faster when working on the java code with autocompiling IDE like Eclipse or IDEA)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@372 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 00:52:38 +00:00
Stefano Mazzocchi
d1e72e7797 make the undo dialog closable
git-svn-id: http://google-refine.googlecode.com/svn/trunk@371 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 19:11:37 +00:00
Stefano Mazzocchi
988378c761 Hmm, String.split() bites us again: use the commons-lang one instead to avoid having to escape regexp values (this was preventing a user from splitting by "." in GEL)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@370 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 17:49:31 +00:00
Stefano Mazzocchi
0e07ec7acc crude, I know, but for now make Gridworks digest RDF/XML as it was XML (works surprisingly well, btw)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@369 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 16:56:38 +00:00
Stefano Mazzocchi
dced641599 - added the ability to specify the character separator for CSV or TSV files that don't use commas or tabs (this was needed to parse a dataset that we got from the BBC to try things out)
- used commons-lang split function instead of the java String.split one, this is necessary to avoid having to escape separators that might be confused for regexps


git-svn-id: http://google-refine.googlecode.com/svn/trunk@368 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 22:34:21 +00:00
Stefano Mazzocchi
77b452e87f adding version information to the about page
NOTE: this shows up only in the packaged distribution


git-svn-id: http://google-refine.googlecode.com/svn/trunk@367 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 19:20:32 +00:00
Stefano Mazzocchi
9db9351a0b - condensing exe and cli distributions into one for windows (shipping both the .exe and the .bat in the same package)
- renamed the distibutions based on their target OS
- more polishing here and there


git-svn-id: http://google-refine.googlecode.com/svn/trunk@365 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 02:11:53 +00:00
Stefano Mazzocchi
3c9af6501e more consistent naming and various polishing
git-svn-id: http://google-refine.googlecode.com/svn/trunk@364 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 01:07:52 +00:00
Stefano Mazzocchi
5884d257db default to 'relevance' instead of recon (which is faster). Change to recon if the user suggests schema hooks.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@363 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 01:07:17 +00:00
Stefano Mazzocchi
571f2c9ab3 - better README
- made the build system obtain and use svn revision info directly in version.js
- fixed lunch4j initial memory usage
- added .ini support for .exe starting in windows
- more robust up-to-date logic that uses SVN revisions instead of dates
- connected to new freebase.com/labs/gridworks web site


git-svn-id: http://google-refine.googlecode.com/svn/trunk@362 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 00:17:40 +00:00
Stefano Mazzocchi
8508fbbd01 - adding the BSD license
- changed a buch of issues with cli builds and windows operations
- added the ability to persist init parameters in an gridworks.ini file (both used in windows and unix)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@359 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-30 21:24:04 +00:00