Commit Graph

88 Commits

Author SHA1 Message Date
Tom Morris
6289f80da5 Update Eclipse classpath for POI 3.7
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2064 7d457c2a-affb-35e4-300a-418c747d4874
2011-05-25 06:37:50 +00:00
Stefano Mazzocchi
610de0d33a adding Metaphone3 algorithm
Many thanks to Lawrence Philips for donating the code to us under the BSD license.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@2029 7d457c2a-affb-35e4-300a-418c747d4874
2011-03-01 00:17:48 +00:00
Iain Sproat
f55f11cd0d Adding classes to now make it possible to parse Html in GREL. Uses small subset of methods from the JSoup library, licensed under the MIT license.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1948 7d457c2a-affb-35e4-300a-418c747d4874
2010-12-06 23:15:24 +00:00
Tom Morris
b963fc2fc7 Allow top level directory to be imported as Eclipse project grefine-all
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1564 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-15 12:47:40 +00:00
Stefano Mazzocchi
2c8595098c Major refactor to separate the webapp part from the embedded servlet engine part
git-svn-id: http://google-refine.googlecode.com/svn/trunk@883 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-28 23:19:08 +00:00
Iain Sproat
25d3a9dfc1 Added a basic RDF triple importer plus unit tests. Some more work required - it's not plugged into the client and it creates a very sparse data structure (each triple is a new row). It uses JRDF library (Apache 1.1 license).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@813 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-18 12:41:40 +00:00
Stefano Mazzocchi
2cf360b723 adding even runtime jars to the eclipse build path so that people running gw from IDEs don't get ClassNotFound messages at runtime
git-svn-id: http://google-refine.googlecode.com/svn/trunk@798 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-17 16:35:20 +00:00
Stefano Mazzocchi
f28e23e503 Committing patches by Iain:
- use OpenCSV parser instead of our own
 - use TestNG instead of JUnit which is a lot more configurable in test selection (and allows us to do a much better job a leaving the tree green even while developing tests that are known to fail)
 - integrated TestNG in './gridworks test'
 - added Iain to the list of contributors in README.txt
 - changed the Eclipse test launch file to use the TestNG launcher (unfortunately, this is not shipped by default in Eclipse, so you have to install it yourself from the http://beust.com/eclipse update file, I'll add this to the wiki shortly)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@782 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-16 18:42:52 +00:00
Stefano Mazzocchi
ea459aed07 Applied a bunch of patches from Tom Morris (Issue 25, 26 and 27)
- make java6 dependency explicit in eclipse project files
- avoid using NotImplementException especially the sun.* one
- avoid using internal sun signal handling and rely on standard java.* APIs
 (I tested this one and it seems to be working fine)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@756 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-13 21:02:19 +00:00
Stefano Mazzocchi
1f2531f303 uniform newlines and seeting the proper svn controls for native line ending
(so that diffs from windows don't end up all screwed up)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@751 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-13 02:22:31 +00:00
Stefano Mazzocchi
86465c2d6f forgot these pieces of for the previous commit
git-svn-id: http://google-refine.googlecode.com/svn/trunk@723 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-12 09:00:38 +00:00
Stefano Mazzocchi
8285083fb9 fixing classpath so that gw can be run direclty from eclipse
git-svn-id: http://google-refine.googlecode.com/svn/trunk@712 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-12 00:25:39 +00:00
Stefano Mazzocchi
6990604981 implemented the full gridworks -> freebase conduit via delegated oauth and freeq/tripleloader
(still doesn't work as argus returns a 500 but the entire conduit is in place)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@519 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-23 08:25:52 +00:00
Stefano Mazzocchi
439474caeb Checkpoint for OAuth functionality in Gridworks
(doesn't work but since it's a substantial chunk of stuff, I want to get it in sooner rather than later)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@516 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-21 21:08:34 +00:00
Stefano Mazzocchi
7a716a4a1b - updgraded commons-coded to the last version (needed for base64 encoding of data: uris)
- added the ability to embed the scatterplot inside the returned json data with data: uris (although it doesn't seem to work well)
- connected the selection logic to the scatterfacets (although it doesn't seem to filter the rows... and I'm puzzled as why)
- reduced cut/paste and code overlap between the scatterplot generator and the scatterplot facet


git-svn-id: http://google-refine.googlecode.com/svn/trunk@490 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-17 03:00:38 +00:00
David Huynh
9e73a4e68c Started to work on a MARC importer. It doesn't work properly yet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@486 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-16 19:52:01 +00:00
Stefano Mazzocchi
397861b612 - replace the 'cos' library with the apache 'commons-fileupload' for licensing reason (the cos library had a weird arm-twisting license that forced you to buy an o'reilly book on servlets for each developer in your company... good thing I read it all)
- some tweaks on imgareaselect's look


git-svn-id: http://google-refine.googlecode.com/svn/trunk@483 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-15 18:42:41 +00:00
Stefano Mazzocchi
93a8f78192 - updated to latest jquery (1.4.2)
- removed commons-math which I don't use anymore
- added imgareaselect
- added a bunch of licenses for the javascript libraries dependencies


git-svn-id: http://google-refine.googlecode.com/svn/trunk@482 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-15 06:56:07 +00:00
David Huynh
4a06c49a9a Added streaming json parser for faster re-loading of existing projects.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@470 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-13 23:57:03 +00:00
Stefano Mazzocchi
60d61b7808 add commons-math library (I'm going to need this for more advanced facets)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@451 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-12 04:25:50 +00:00
Stefano Mazzocchi
6114530723 make sure the junit tests still work
git-svn-id: http://google-refine.googlecode.com/svn/trunk@405 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 01:17:14 +00:00
Stefano Mazzocchi
8f5c35799b making room for windmill tests
git-svn-id: http://google-refine.googlecode.com/svn/trunk@403 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:35:59 +00:00
Stefano Mazzocchi
c24ec94835 had to shuffle around a bunch of classes to separate the main server classloader from the context classloader and allow reloading to happen for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@377 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:11 +00:00
Stefano Mazzocchi
72203cd3d5 - moved all code that contained MIT IP outside (http://code.google.com/p/simile-vicino/)
- moved bzip2 and tar code from apache ant into their own jar files
- now gridworks source contains only com.metaweb.* code everything else is a jar dependency
- started to work on archive importer


git-svn-id: http://google-refine.googlecode.com/svn/trunk@376 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 23:40:12 +00:00
David Huynh
c3ebb5a9f4 Got Vishal's jython integration to work.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Stefano Mazzocchi
358586ac8f adding minimal unit testing framework (type ./gridworks test to run)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@253 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:08:35 +00:00
David Huynh
5d3a57eeeb Implemented project import and export commands (from/to .tar files).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
Stefano Mazzocchi
404883da92 forgot to remove from the eclipse build path
git-svn-id: http://google-refine.googlecode.com/svn/trunk@231 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 23:57:03 +00:00
Stefano Mazzocchi
a8177131b4 adding the protocol buffer library that is needed by the generated code
git-svn-id: http://google-refine.googlecode.com/svn/trunk@198 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 17:25:51 +00:00
Stefano Mazzocchi
c07431fb88 - cataloged all the licenses for the libraries Gridworks depends on
- added the secondstring libraries that contains all sorts of useful string distance functions
- added a java arithmetic coding library (used to implement a string distance based on PPM arithmetic coding)
- added the vicino kNN string clustering library (from MIT's SIMILE)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@181 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:29:52 +00:00
Stefano Mazzocchi
2691ee50d7 adding OS-specific data paths
git-svn-id: http://google-refine.googlecode.com/svn/trunk@173 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 02:53:07 +00:00
Stefano Mazzocchi
0c6590fe2c - added an encoding guesser
- fixed a bunch of encoding issues
- added a function to reinterpret call content in another encoding
- added a 'phonetic' function to the expression language that supports metaphone and soundex
- updated the COS library to the latest released version 
- added the IBM ICU4j library (that contains the encoding guesser)
- added examples with same content but different encodings


git-svn-id: http://google-refine.googlecode.com/svn/trunk@154 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-28 21:51:33 +00:00
Stefano Mazzocchi
f1923758e7 - add a bunch of new functions
- very lax date parser
 - lots of new advanced string functions
- new version of commons-lang


git-svn-id: http://google-refine.googlecode.com/svn/trunk@152 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-27 08:56:04 +00:00
David Huynh
cd376c7532 Added support for Excel 2007 XML file format.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@73 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 23:44:33 +00:00
Stefano Mazzocchi
2b985bf45a moving json support in its own jar (code was taken today directly from json.org and compiled and packaged by me)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@70 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 20:46:27 +00:00
Stefano Mazzocchi
1343162a75 major rewrite of the foundation:
- de-maveniziation (uses the same code that Acre uses to drive jetty directly)
 - removed all dependencies on external javascript code (jquery and suggest) by making a local copy (this makes gridworks totally self-serving, meaning that you can use it even if you don't have any internet connectivity)
 - fixed a NPE when the servlet is shutdown before any project is loaded
 - found a way to spawn a browser directly from the java code (untested in windows)
 - added two ant tasks to generate windows and macosx stand-alone binaries (unused just yet)

To run, just type "./gridworks run" at the command line


git-svn-id: http://google-refine.googlecode.com/svn/trunk@65 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 23:15:50 +00:00
Stefano Mazzocchi
2077d3f094 adding unix and windows startup scripts
use maven to build the eclipse scripts instead of committing them in svn which makes them less portable
(do './gridworks eclipse' at the beginning to regenerate your eclipse project files, then reload in eclipse)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@59 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 05:25:44 +00:00
David Huynh
22040a8348 Initial import.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2 7d457c2a-affb-35e4-300a-418c747d4874
2010-01-24 21:09:50 +00:00