diff --git a/docs/docs/manual/exporting.md b/docs/docs/manual/exporting.md index 983be9446..bf0f12237 100644 --- a/docs/docs/manual/exporting.md +++ b/docs/docs/manual/exporting.md @@ -6,34 +6,117 @@ sidebar_label: Exporting ## Overview +Once your data is cleaned, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata. + +You can also [export your full project data](#export-a-project) so that it can be opened by someone else using OpenRefine (or yourself, on another computer). ## Export data -Note you will only export data in the current view - that is, with current filters and facets applied. +Many of the following options only export data in the current view - that is, with current filters and facets applied. Some will give you the choice to export your entire dataset or just your current view. +To export from a project, click the Export dropdown button at the top right corner and pick the format you want. You options are: - -* TSV/CSV -* HTML table +* Tab-separated value (TSV) or Comma-separated value (CSV) +* HTML-formatted table * Excel (XLS or XLSX) * ODF spreadsheet -* Google Sheets \ +* Upload to Google Sheets (requires [Google account authorization](starting#google-sheet-from-drive)) +* [Custom tabular exporter](#custom-tabular-exporter) +* [SQL statement exporter](#sql-statement-exporter) +* [Templating exporter](#templating-exporter) -* Custom tabular export -* SQL -* Templating export \ +You can also export reconciled data to Wikidata, or export your Wikidata schema for future use with other OpenRefine projects: -* Upload edits to Wikidata -* Export to QuickStatement -* Export Wikidata schema +* [Upload edits to Wikidata](wikidata#upload-edits-to-wikidata) +* [Export to QuickStatements](wikidata#quickstatements-export) (version 1) +* [Export Wikidata schema](wikidata#import-and-export-schema) +### Custom tabular exporter + +![A screenshot of the custom tabular content tab.](/img/custom-tabular-exporter.png) + +With the custom tabular exporter, you can choose which of your data to export, the separator you wish to use, and whether you'd like to download it to your computer or upload it into a Google Sheet. + +On the Content tab, you can drag and drop the columns appearing in the column list to reorder the output. The options for reconciled and date data are applied to each column individually. + +This exporter is especially useful with reconciled data, as you can choose whether you wish to output the cells' original values, the matched values, or the matched IDs. Ouputting “match entity's name”, “matched entity's ID”, or “cell's content” will output, respectively, the contents of `cell.recon.match.name`, `cell.recon.match.id`, and `cell.value`. + +“Output nothing for unmatched cells” will export empty cells for both newly-created matches and cells with no chosen matches. “Link to matched entity's page” will produce hyperlinked text in an HTML table output, but have no effect in other formats. + +At this time, the date-formatting options in this window do not work. You can [keep track of this issue on Github](https://github.com/OpenRefine/OpenRefine/issues/3368). +In the future, you will also be able to choose how to [output date-formatted cells](exploring#dates). You can create a custom date output by using [formatting according to the SimpleDateFormat parsing key found here](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-). + +![A screenshot of the custom tabular file download tab.](/img/custom-tabular-exporter2.png) + +On the Download tab, you can generate a preview of how the first ten rows of your dataset will output. If you do not choose one of the file formats on the right, the Download button will generate a text file. On the Upload tab, you can create a new Google Sheet. + +With the Option Code tab, you can copy JSON of your current settings to reuse on another project, or you can paste in existing JSON settings to apply to the current project. + +### SQL exporter + +The SQL exporter creates a SQL statement containing the data you’ve exported, which you can use to overwrite or add to an existing database. Choosing ExportSQL exporter will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement with options to preview and download the statement. + +![A screenshot of the SQL statement content window.](/img/sql-exporter.png) + +The Content tab allows you to craft your dataset into an SQL table. From here, you can choose which columns to export, the data type to export for each (or choose "VARCHAR"), and the maximum character length for each field (if applicable based on the data type). You can set a default value for empty cells after unchecking “Allow null” in one or more columns. + +With this output tool, you can choose whether to output only currently visible rows, or all the rows in your dataset, as well as whether to include empty rows. Trimming column names will remove their whitespace characters. + +![A screenshot of the SQL statement download window.](/img/sql-exporter2.png) + +The Download tab allows you to finalize your complete SQL statement. + +Include schema means that you will start your statement with the creation of a table. Without that, you will only have an INSERT statement. + +Include content means the INSERT statement with data from your project. Without that, you will only create empty columns. + +You can include DROP and IF EXISTS if you require them, and set a name for the table which the statement will refer to. + +You can then preview your statement, which will open up a new browser tab/window showing a statement with the first ten rows of your data (if included), or you can save a `.sql` file to your computer. + +### Templating exporter + +If you pick Templating… from the Export dropdown menu, you can “roll your own” exporter. This is useful for formats that we don't support natively yet, or won't support. The Templating exporter generates JSON by default. + +The window that appears allows you to set your own separators, prefix, and suffix to create a complete dataset in the language of your choice. In the Row Template section, you can choose which columns to generate from each row by calling them with variables. + +This can be used to: +* output reconciliation data (`cells["column name"].recon.match.name`, `.recon.match.id`, and `.recon.best.name`, for example) instead of cell values +* create multiple columns of output from different member fields of a single project column +* employ GREL expressions to modify cell data for output (for example, `cells["column name"].value.toUppercase()`). + +Anything that appears inside doubled curly braces ({{}}) is treated as a GREL expression; anything outside is generated as straight text. You can use Jython or Clojure by declaring it at the start: for example, `{{jython:return cells["Author"].value}}` will run a Jython expression. + +:::caution +Note that some syntax is different in this tool than elsewhere in OpenRefine: a forward slash must be escaped with a backslash, while other characters do not need escaping. You cannot, at this time, include a closing curly brace (}) anywhere in your expression, or it will cause it to malfunction. +::: + +You can include [regular expressions](expressions#regular-expressions) as usual (inside forward slashes, with any GREL function that accepts them). For example, you could output a version of your cells with punctuation removed, using an expression such as `{{jsonize(cells["Column Name"].value.replaceChars("/[.!?$&,/]/",""))}}`. + +You could also simply output a plain-text document inserting data from your project into sentences (for example, "In `{{cells["Year"].value}}` we received `{{cells["RequestCount"].value}}` requests."). + +You can use the shorthand `${Column Name}` (no need for quotes) to insert column values directly. You cannot use this inside an expression, because of the closing curly brace. + +If your projects is in records mode, the Row separator field will insert a separator between records, rather than individual rows. Rows inside a single record will be directly appended to one another as per the content in the Row Template field. + +![A screenshot of the Templating exporter generating JSON by default.](/img/templating-exporter.png) + +Once you have created your template, you may wish to save the text you produced in each field, in order to reuse it in the future. Once you click Export OpenRefine will output a simple text file, and your template will be discarded. + +We have recipes on using the Templating exporter to [produce several different formats](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#12-templating-exporter). ## Export a project +You can share a project in progress with another computer, a colleague, or with someone who wants to check your history. This can be useful for showing that your data cleanup didn’t distort or manipulate the information in any way. Once you have exported a project, another OpenRefine installation can [import it as a new project](starting#import-a-project). +:::caution +OpenRefine project archives contain confidential data from previous steps which is still accessible to anyone who has the file. If you are hoping to keep your original dataset hidden for privacy reasons, such as using OpenRefine to anonymize information, do not share your project archive. +::: -* tar.gz only -* Optional rename -* Local or to Google Drive - * Doesn’t supply a Google Drive link, just gives a confirmation message - * Other user (or you on another computer) will need to download it and save it locally in order to import it \ No newline at end of file +From the Export dropdown, select OpenRefine project archive to file. OpenRefine exports your full project with all of its history. It does not export any current views or applied facets. Any reconciliation information will be preserved, but the importing installation will need to add the same reconciliation services to keep working with that data. + +OpenRefine exports files in `.tar.gz` format. You can rename the file when you save it; otherwise it will bear the project name. You can either save it locally or upload it to Google Drive (which requires you to authorize a Google account), using the OpenRefine project archive to Google Drive... option. OpenRefine will not share the link with you, only confirm that the file was uploaded. + +## Export operations + +You can [save and re-apply the history of any project](running#reusing-operations) (all the operations shown in the Undo/Redo tab). This creates JSON that you can save for later reuse on another OpenRefine project. \ No newline at end of file diff --git a/docs/docs/manual/starting.md b/docs/docs/manual/starting.md index 5e689fd1b..46208dbd4 100644 --- a/docs/docs/manual/starting.md +++ b/docs/docs/manual/starting.md @@ -137,7 +137,7 @@ You should create a project name at this stage. You can also supply tags to keep Because OpenRefine only runs locally on your computer, you can’t have a project accessible to more than one person at the same time. -The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting) and import them to new computers of your own, such as for working on the same project from the office and from home. +The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting#export-a-project) and import them to new computers of your own, such as for working on the same project from the office and from home. An exported project will include all of the [history](running#history-undoredo), so you can see (and undo) all the changes from the previous user. It is essentially a point-in-time snapshot of their work. OpenRefine only exports projects as `.tar.gz` files at this time. diff --git a/docs/static/img/custom-tabular-exporter.png b/docs/static/img/custom-tabular-exporter.png new file mode 100644 index 000000000..89099eb46 Binary files /dev/null and b/docs/static/img/custom-tabular-exporter.png differ diff --git a/docs/static/img/custom-tabular-exporter2.png b/docs/static/img/custom-tabular-exporter2.png new file mode 100644 index 000000000..2b02f7437 Binary files /dev/null and b/docs/static/img/custom-tabular-exporter2.png differ diff --git a/docs/static/img/sql-exporter.png b/docs/static/img/sql-exporter.png new file mode 100644 index 000000000..f5cb98f3c Binary files /dev/null and b/docs/static/img/sql-exporter.png differ diff --git a/docs/static/img/sql-exporter2.png b/docs/static/img/sql-exporter2.png new file mode 100644 index 000000000..1f23784e8 Binary files /dev/null and b/docs/static/img/sql-exporter2.png differ diff --git a/docs/static/img/templating-exporter.png b/docs/static/img/templating-exporter.png new file mode 100644 index 000000000..836e47d68 Binary files /dev/null and b/docs/static/img/templating-exporter.png differ