Commit Graph

60 Commits

Author SHA1 Message Date
Stefano Mazzocchi
6990604981 implemented the full gridworks -> freebase conduit via delegated oauth and freeq/tripleloader
(still doesn't work as argus returns a 500 but the entire conduit is in place)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@519 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-23 08:25:52 +00:00
Stefano Mazzocchi
439474caeb Checkpoint for OAuth functionality in Gridworks
(doesn't work but since it's a substantial chunk of stuff, I want to get it in sooner rather than later)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@516 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-21 21:08:34 +00:00
David Huynh
5ba67b7b26 Implemented column split command. It seems to be working in "by lengths" mode.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@510 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-20 23:08:14 +00:00
David Huynh
9e73a4e68c Started to work on a MARC importer. It doesn't work properly yet.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@486 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-16 19:52:01 +00:00
Stefano Mazzocchi
397861b612 - replace the 'cos' library with the apache 'commons-fileupload' for licensing reason (the cos library had a weird arm-twisting license that forced you to buy an o'reilly book on servlets for each developer in your company... good thing I read it all)
- some tweaks on imgareaselect's look


git-svn-id: http://google-refine.googlecode.com/svn/trunk@483 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-15 18:42:41 +00:00
David Huynh
4a06c49a9a Added streaming json parser for faster re-loading of existing projects.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@470 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-13 23:57:03 +00:00
David Huynh
f7e830e709 Fixed bug in which editing a single cell and then starring the same row seemed to revert the cell back to its original content.
Added an option for not guessing cell value type during import.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@446 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-11 21:54:56 +00:00
David Huynh
a0d8c385f9 Do a bit more checking when retrieving project metadata just in case project metadata is null.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@435 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-09 04:52:32 +00:00
Stefano Mazzocchi
d3d40d608a bunch of PMD-induced fixes
(now the PMD report is clean)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@430 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-09 00:14:11 +00:00
David Huynh
0996b9e1dd Gzip project export tar files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@394 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 22:28:30 +00:00
David Huynh
9d9329ca96 Implemented row remove command.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@391 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 07:47:44 +00:00
David Huynh
1fd85c62bf Implemented column rename command.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@390 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-06 07:15:34 +00:00
Stefano Mazzocchi
771810bc0d avoid exception if there is only one extension in the whole archive
git-svn-id: http://google-refine.googlecode.com/svn/trunk@385 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 21:36:27 +00:00
Stefano Mazzocchi
2efbf0031f - removed the 'thirdparty' directory (now the 'gridworks' script will download and install needed tools if they are not present in the system already)
- added 'findbugs' command that uses the findbugs static analyzer to look for problems in the code
- fixed a bunch of issues that findbugs found (a few methods would go a little faster, and a few NPE will be avoided... nothing major but good to have)


git-svn-id: http://google-refine.googlecode.com/svn/trunk@382 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-05 07:15:16 +00:00
Stefano Mazzocchi
798b2a36ca - archive and compressed file importer (supports zip, tar, gz, bz2, tar.gz and tar.bz2)
(works by loading the files that have the most common extensions in the archive)
- changed default max heap to 3Gb


git-svn-id: http://google-refine.googlecode.com/svn/trunk@381 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-04 07:48:47 +00:00
Stefano Mazzocchi
c24ec94835 had to shuffle around a bunch of classes to separate the main server classloader from the context classloader and allow reloading to happen for real
git-svn-id: http://google-refine.googlecode.com/svn/trunk@377 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-03 01:01:11 +00:00
Stefano Mazzocchi
72203cd3d5 - moved all code that contained MIT IP outside (http://code.google.com/p/simile-vicino/)
- moved bzip2 and tar code from apache ant into their own jar files
- now gridworks source contains only com.metaweb.* code everything else is a jar dependency
- started to work on archive importer


git-svn-id: http://google-refine.googlecode.com/svn/trunk@376 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 23:40:12 +00:00
Stefano Mazzocchi
62f5f21ca3 atom is handled as well by the XML importer
git-svn-id: http://google-refine.googlecode.com/svn/trunk@374 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-02 06:44:05 +00:00
Stefano Mazzocchi
0e07ec7acc crude, I know, but for now make Gridworks digest RDF/XML as it was XML (works surprisingly well, btw)
git-svn-id: http://google-refine.googlecode.com/svn/trunk@369 7d457c2a-affb-35e4-300a-418c747d4874
2010-04-01 16:56:38 +00:00
Stefano Mazzocchi
dced641599 - added the ability to specify the character separator for CSV or TSV files that don't use commas or tabs (this was needed to parse a dataset that we got from the BBC to try things out)
- used commons-lang split function instead of the java String.split one, this is necessary to avoid having to escape separators that might be confused for regexps


git-svn-id: http://google-refine.googlecode.com/svn/trunk@368 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-31 22:34:21 +00:00
David Huynh
df7389876f First shot at XML import.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@354 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-24 23:08:08 +00:00
David Huynh
3dc4db020f Support quick undo of the last operation (Ctrl-Z).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@338 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-23 00:26:28 +00:00
David Huynh
60f60507f7 Fixed minor bug introduced recently into the Export Project menu command.
Documented the commands.* packages.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@331 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-21 20:26:35 +00:00
David Huynh
ff0049307e Increased file upload size limit to 1GB.
Fixed charset detector to be more robust in trying more than one charset.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@326 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-19 19:53:11 +00:00
David Huynh
1a8a236cdd Added an error page for when a project create operation fails.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@307 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 18:40:07 +00:00
David Huynh
c6e7986206 Extend data operation is working.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@301 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-16 00:24:20 +00:00
David Huynh
5d3a57eeeb Implemented project import and export commands (from/to .tar files).
git-svn-id: http://google-refine.googlecode.com/svn/trunk@234 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-08 02:34:25 +00:00
David Huynh
694f09fb0a Major refactoring: everything is now saved to disk using our own formats, mostly json-based, some inside zip files.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@226 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 22:37:26 +00:00
David Huynh
e0d72c81e9 Renamed "facet-based edit" operation and command to "mass edit", because it's not just facet-based.
Added option "apply to other cells with same original content" to single cell edit popup, so it can be used like a find&replace operation.
Renamed "do-text-transform" operation and command to just "text-transform".

git-svn-id: http://google-refine.googlecode.com/svn/trunk@223 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-07 00:25:00 +00:00
David Huynh
78b1eb7e73 Major refactoring:
- Made all Change classes save to and load from .zip files.
- Changed Column.headerLabel to Column.name.
- Save project's raw data to "raw-data" file for now. We'll make it save to a zip file next.


git-svn-id: http://google-refine.googlecode.com/svn/trunk@217 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-06 07:43:45 +00:00
David Huynh
b3ac945c33 Implemented single-cell editing.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@210 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 08:11:48 +00:00
Stefano Mazzocchi
37e37488ec ability to delete a project from the front page
git-svn-id: http://google-refine.googlecode.com/svn/trunk@206 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-05 01:52:55 +00:00
David Huynh
eaef7b2394 Also let user decide what to do on expression evaluation error when creating a new column.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@182 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:32:54 +00:00
David Huynh
2fe8f98e4e Added repeat and repeatCount options for text transform operation. This lets us fix those & repeated encoding problems easily.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@179 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-04 00:00:46 +00:00
David Huynh
b4d2cef526 Added an option for what to do when a text transform errors out. Made a custom expression preview dialog for the text transform command in order to suppor that option.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@178 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 22:12:48 +00:00
David Huynh
c1498448e4 Implemented global and per-project expression histories and hooked them up to the expression preview dialog.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@176 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 21:21:38 +00:00
David Huynh
b75f1faea8 Changed tabs to spaces. No functionality change.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@174 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 04:19:58 +00:00
Stefano Mazzocchi
2691ee50d7 adding OS-specific data paths
git-svn-id: http://google-refine.googlecode.com/svn/trunk@173 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-03 02:53:07 +00:00
David Huynh
3ecfb4e4d9 Implemented facet-based edit operation for real.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@167 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-02 20:33:11 +00:00
Stefano Mazzocchi
621655372f - save encoding and confidence in the project metadata
- use the saved encoding for decoding
- don't error when fingerprinting null


git-svn-id: http://google-refine.googlecode.com/svn/trunk@160 7d457c2a-affb-35e4-300a-418c747d4874
2010-03-01 04:56:16 +00:00
Stefano Mazzocchi
0c6590fe2c - added an encoding guesser
- fixed a bunch of encoding issues
- added a function to reinterpret call content in another encoding
- added a 'phonetic' function to the expression language that supports metaphone and soundex
- updated the COS library to the latest released version 
- added the IBM ICU4j library (that contains the encoding guesser)
- added examples with same content but different encodings


git-svn-id: http://google-refine.googlecode.com/svn/trunk@154 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-28 21:51:33 +00:00
David Huynh
30dce3b3d5 Made range facet more robust against bad expressions.
Centralized code that updates components of the UI. Show "Working..." indicator if anything takes more than 500ms.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@142 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-26 21:56:41 +00:00
David Huynh
bb83dcda1c Added support for specifying number of initial rows to skip when creating a new project.
Fixed the height of the histogram images in range facets to eliminate jitters.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@135 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-24 18:52:54 +00:00
David Huynh
254853b51d Added reverse and sort functions.
Support a limit on how many rows to load into a new project.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@134 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-23 23:22:02 +00:00
David Huynh
ec1604e815 Added support for starring rows.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@129 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-23 07:45:12 +00:00
David Huynh
c98a8ad552 Pulled the operations package up one level.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@119 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-22 18:42:25 +00:00
David Huynh
1227c9dff4 Centralized mapping between operation names and their reconstructors.
Implemented comparison operators.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@117 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-22 04:23:25 +00:00
David Huynh
b3167a1a9f Added option for automatically approving best recon candidates that match the expected type and score at least some minimum score.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@115 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-21 08:11:07 +00:00
David Huynh
846e540ff6 Keep track of type names of reconciled columns so we can display them later in the schema alignment dialog.
Automatically create properties linking to all columns when starting with an empty protograph.

git-svn-id: http://google-refine.googlecode.com/svn/trunk@110 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-19 18:32:48 +00:00
David Huynh
8831703a2c Implemented "apply operations" feature.
git-svn-id: http://google-refine.googlecode.com/svn/trunk@102 7d457c2a-affb-35e4-300a-418c747d4874
2010-02-18 05:00:56 +00:00