Commit Graph

107 Commits

Author SHA1 Message Date
Tom Morris
b3f5fada95 FIXED - task 578 & 596: Clean up JSON importer
http://code.google.com/p/google-refine/issues/detail?id=578
http://code.google.com/p/google-refine/issues/detail?id=596

Extend tree parser framework to allow any Serializable instead of just Strings. Use this in JSON importer to: Import keywords null, true, false; Import empty strings and don't trim whitespace from strings on import;  Import numbers directly instead of importing them as text and then parsing them ourselves. Add tests to verify all this stuff

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2543 7d457c2a-affb-35e4-300a-418c747d4874
2012-09-08 01:20:25 +00:00
Tom Morris
abc162a0d0 Switch back to old JSON lib for now
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2536 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-21 17:33:17 +00:00
Tom Morris
60c3a31242 Update Jackson and JSON libs
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2532 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-18 21:46:49 +00:00
Tom Morris
4bb6c43982 task 604: add Guava to main project so that we're not dependent on an extension
http://code.google.com/p/google-refine/issues/detail?id=604

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2531 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-15 13:33:17 +00:00
Tom Morris
d6e00fb3c7 Add JRDF source jar
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2524 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-05 15:01:55 +00:00
Stefano Mazzocchi
2947ebba0e updating the signpost library and attaching sources for easier inspection
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2517 7d457c2a-affb-35e4-300a-418c747d4874
2012-08-01 21:47:21 +00:00
Stefano Mazzocchi
5dffd249de updating signpost to the latest release
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2514 7d457c2a-affb-35e4-300a-418c747d4874
2012-07-13 06:50:53 +00:00
Tom Morris
1df0dd62ce Issue 566 - export httpclient libs
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2494 7d457c2a-affb-35e4-300a-418c747d4874
2012-05-03 15:43:22 +00:00
Tom Morris
166b176ba2 Update to Apache POI 3.8
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2486 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-29 04:41:42 +00:00
Tom Morris
8ff6c5617f Update Jackson parser to 1.9.5
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2448 7d457c2a-affb-35e4-300a-418c747d4874
2012-03-01 18:11:28 +00:00
David Huynh
94e0369af7 Added extension for importing PC-Axis files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2365 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-07 07:27:35 +00:00
Stefano Mazzocchi
8184e16bb9 updating http client and http core to the latest released versions
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2351 7d457c2a-affb-35e4-300a-418c747d4874
2011-11-01 21:46:56 +00:00
David Huynh
ff7bbc8ec0 Export libraries.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2338 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-20 09:20:53 +00:00
Tom Morris
ca17e1ef0a New importer for Open Document Format (ODF) spreadsheet files (.ods)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2323 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-11 20:27:40 +00:00
Tom Morris
496d6b0b6a Update Eclipse classpath for Jackson 1.8.6
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2289 7d457c2a-affb-35e4-300a-418c747d4874
2011-10-07 17:03:33 +00:00
David Huynh
c42382f3ae Started deeper integration of GData: we now have a "Google Data" importing source, which lets you sign in and authorize access to your docs. It then lists all the spreadsheets you have access to. It does not yet let you import those spreadsheets.
Minor fixes to the open project action area; fixes to render relative dates properly.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@2190 7d457c2a-affb-35e4-300a-418c747d4874
2011-08-07 23:26:51 +00:00
Tom Morris
527d383bc5 Update to Apache commons codec 1.5
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2108 7d457c2a-affb-35e4-300a-418c747d4874
2011-06-11 22:53:17 +00:00
Tom Morris
297809847d Add references to source jars
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2091 7d457c2a-affb-35e4-300a-418c747d4874
2011-06-07 23:50:10 +00:00
Tom Morris
f674a96973 Add source directory for tests and necessary libraries
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2087 7d457c2a-affb-35e4-300a-418c747d4874
2011-06-06 21:27:24 +00:00
Tom Morris
6289f80da5 Update Eclipse classpath for POI 3.7
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2064 7d457c2a-affb-35e4-300a-418c747d4874
2011-05-25 06:37:50 +00:00
Stefano Mazzocchi
610de0d33a adding Metaphone3 algorithm
Many thanks to Lawrence Philips for donating the code to us under the BSD license.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@2029 7d457c2a-affb-35e4-300a-418c747d4874
2011-03-01 00:17:48 +00:00
Iain Sproat
f55f11cd0d Adding classes to now make it possible to parse Html in GREL. Uses small subset of methods from the JSoup library, licensed under the MIT license.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1948 7d457c2a-affb-35e4-300a-418c747d4874
2010-12-06 23:15:24 +00:00
Tom Morris
b963fc2fc7 Allow top level directory to be imported as Eclipse project grefine-all
git-svn-id: http://google-refine.googlecode.com/svn/trunk@1564 7d457c2a-affb-35e4-300a-418c747d4874
2010-10-15 12:47:40 +00:00
Stefano Mazzocchi
2c8595098c Major refactor to separate the webapp part from the embedded servlet engine part
git-svn-id: http://google-refine.googlecode.com/svn/trunk@883 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-28 23:19:08 +00:00
Iain Sproat
25d3a9dfc1 Added a basic RDF triple importer plus unit tests. Some more work required - it's not plugged into the client and it creates a very sparse data structure (each triple is a new row). It uses JRDF library (Apache 1.1 license).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@813 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-18 12:41:40 +00:00
Stefano Mazzocchi
2cf360b723 adding even runtime jars to the eclipse build path so that people running gw from IDEs don't get ClassNotFound messages at runtime
git-svn-id: http://google-refine.googlecode.com/svn/trunk@798 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-17 16:35:20 +00:00
Stefano Mazzocchi
f28e23e503 Committing patches by Iain:
- use OpenCSV parser instead of our own
 - use TestNG instead of JUnit which is a lot more configurable in test selection (and allows us to do a much better job a leaving the tree green even while developing tests that are known to fail)
 - integrated TestNG in './gridworks test'
 - added Iain to the list of contributors in README.txt
 - changed the Eclipse test launch file to use the TestNG launcher (unfortunately, this is not shipped by default in Eclipse, so you have to install it yourself from the http://beust.com/eclipse update file, I'll add this to the wiki shortly)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@782 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-16 18:42:52 +00:00
Stefano Mazzocchi
ea459aed07 Applied a bunch of patches from Tom Morris (Issue 25, 26 and 27)
- make java6 dependency explicit in eclipse project files
- avoid using NotImplementException especially the sun.* one
- avoid using internal sun signal handling and rely on standard java.* APIs
 (I tested this one and it seems to be working fine)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@756 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-13 21:02:19 +00:00
Stefano Mazzocchi
1f2531f303 uniform newlines and seeting the proper svn controls for native line ending
(so that diffs from windows don't end up all screwed up)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@751 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-13 02:22:31 +00:00
Stefano Mazzocchi
86465c2d6f forgot these pieces of for the previous commit
git-svn-id: http://google-refine.googlecode.com/svn/trunk@723 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-12 09:00:38 +00:00
Stefano Mazzocchi
8285083fb9 fixing classpath so that gw can be run direclty from eclipse
git-svn-id: http://google-refine.googlecode.com/svn/trunk@712 7d457c2a-affb-35e4-300a-418c747d4874
2010-05-12 00:25:39 +00:00
Stefano Mazzocchi
6990604981 implemented the full gridworks -> freebase conduit via delegated oauth and freeq/tripleloader
(still doesn't work as argus returns a 500 but the entire conduit is in place)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@519 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-23 08:25:52 +00:00
Stefano Mazzocchi
439474caeb Checkpoint for OAuth functionality in Gridworks
(doesn't work but since it's a substantial chunk of stuff, I want to get it in sooner rather than later)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@516 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-21 21:08:34 +00:00
Stefano Mazzocchi
7a716a4a1b - updgraded commons-coded to the last version (needed for base64 encoding of data: uris)
- added the ability to embed the scatterplot inside the returned json data with data: uris (although it doesn't seem to work well)
- connected the selection logic to the scatterfacets (although it doesn't seem to filter the rows... and I'm puzzled as why)
- reduced cut/paste and code overlap between the scatterplot generator and the scatterplot facet


git-svn-id: http://google-refine.googlecode.com/svn/trunk@490 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-17 03:00:38 +00:00
David Huynh
9e73a4e68c Started to work on a MARC importer. It doesn't work properly yet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@486 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-16 19:52:01 +00:00
Stefano Mazzocchi
397861b612 - replace the 'cos' library with the apache 'commons-fileupload' for licensing reason (the cos library had a weird arm-twisting license that forced you to buy an o'reilly book on servlets for each developer in your company... good thing I read it all)
- some tweaks on imgareaselect's look


git-svn-id: http://google-refine.googlecode.com/svn/trunk@483 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-15 18:42:41 +00:00
Stefano Mazzocchi
93a8f78192 - updated to latest jquery (1.4.2)
- removed commons-math which I don't use anymore
- added imgareaselect
- added a bunch of licenses for the javascript libraries dependencies


git-svn-id: http://google-refine.googlecode.com/svn/trunk@482 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-15 06:56:07 +00:00
David Huynh
4a06c49a9a Added streaming json parser for faster re-loading of existing projects.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@470 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-13 23:57:03 +00:00
Stefano Mazzocchi
60d61b7808 add commons-math library (I'm going to need this for more advanced facets)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@451 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-12 04:25:50 +00:00
Stefano Mazzocchi
6114530723 make sure the junit tests still work
git-svn-id: http://google-refine.googlecode.com/svn/trunk@405 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 01:17:14 +00:00
Stefano Mazzocchi
8f5c35799b making room for windmill tests
git-svn-id: http://google-refine.googlecode.com/svn/trunk@403 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-07 00:35:59 +00:00
Stefano Mazzocchi
c24ec94835 had to shuffle around a bunch of classes to separate the main server classloader from the context classloader and allow reloading to happen for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@377 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:11 +00:00
Stefano Mazzocchi
72203cd3d5 - moved all code that contained MIT IP outside (http://code.google.com/p/simile-vicino/)
- moved bzip2 and tar code from apache ant into their own jar files
- now gridworks source contains only com.metaweb.* code everything else is a jar dependency
- started to work on archive importer


git-svn-id: http://google-refine.googlecode.com/svn/trunk@376 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 23:40:12 +00:00
David Huynh
c3ebb5a9f4 Got Vishal's jython integration to work.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@277 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-11 19:56:43 +00:00
Stefano Mazzocchi
358586ac8f adding minimal unit testing framework (type ./gridworks test to run)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@253 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-09 08:08:35 +00:00
David Huynh
5d3a57eeeb Implemented project import and export commands (from/to .tar files).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
Stefano Mazzocchi
404883da92 forgot to remove from the eclipse build path
git-svn-id: http://google-refine.googlecode.com/svn/trunk@231 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 23:57:03 +00:00
Stefano Mazzocchi
a8177131b4 adding the protocol buffer library that is needed by the generated code
git-svn-id: http://google-refine.googlecode.com/svn/trunk@198 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 17:25:51 +00:00
Stefano Mazzocchi
c07431fb88 - cataloged all the licenses for the libraries Gridworks depends on
- added the secondstring libraries that contains all sorts of useful string distance functions
- added a java arithmetic coding library (used to implement a string distance based on PPM arithmetic coding)
- added the vicino kNN string clustering library (from MIT's SIMILE)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@181 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:29:52 +00:00
Stefano Mazzocchi
2691ee50d7 adding OS-specific data paths
git-svn-id: http://google-refine.googlecode.com/svn/trunk@173 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 02:53:07 +00:00
Stefano Mazzocchi
0c6590fe2c - added an encoding guesser
- fixed a bunch of encoding issues
- added a function to reinterpret call content in another encoding
- added a 'phonetic' function to the expression language that supports metaphone and soundex
- updated the COS library to the latest released version 
- added the IBM ICU4j library (that contains the encoding guesser)
- added examples with same content but different encodings


git-svn-id: http://google-refine.googlecode.com/svn/trunk@154 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-28 21:51:33 +00:00
Stefano Mazzocchi
f1923758e7 - add a bunch of new functions
- very lax date parser
 - lots of new advanced string functions
- new version of commons-lang


git-svn-id: http://google-refine.googlecode.com/svn/trunk@152 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-27 08:56:04 +00:00
David Huynh
cd376c7532 Added support for Excel 2007 XML file format.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@73 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 23:44:33 +00:00
Stefano Mazzocchi
2b985bf45a moving json support in its own jar (code was taken today directly from json.org and compiled and packaged by me)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@70 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-08 20:46:27 +00:00
Stefano Mazzocchi
1343162a75 major rewrite of the foundation:
- de-maveniziation (uses the same code that Acre uses to drive jetty directly)
 - removed all dependencies on external javascript code (jquery and suggest) by making a local copy (this makes gridworks totally self-serving, meaning that you can use it even if you don't have any internet connectivity)
 - fixed a NPE when the servlet is shutdown before any project is loaded
 - found a way to spawn a browser directly from the java code (untested in windows)
 - added two ant tasks to generate windows and macosx stand-alone binaries (unused just yet)

To run, just type "./gridworks run" at the command line


git-svn-id: http://google-refine.googlecode.com/svn/trunk@65 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 23:15:50 +00:00
Stefano Mazzocchi
2077d3f094 adding unix and windows startup scripts
use maven to build the eclipse scripts instead of committing them in svn which makes them less portable
(do './gridworks eclipse' at the beginning to regenerate your eclipse project files, then reload in eclipse)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@59 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-07 05:25:44 +00:00
David Huynh
22040a8348 Initial import.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@2 7d457c2a-affb-35e4-300a-418c747d4874
2010-01-24 21:09:50 +00:00