RandomSec/OpenRefine/docs/versioned_docs/version-3.4/manual/exporting.md
2022-01-04 16:31:32 +01:00

11 KiB
Raw Blame History

id title sidebar_label
exporting Exporting your work Exporting

Overview

Once your dataset is ready, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata.

You can also export your full project data so that it can be opened by someone else using OpenRefine (or yourself, on another computer).

Export data

A screenshot of the Export dropdown.

Many of the options only export data in the current view - that is, with current filters and facets applied. Some will give you the choice to export your entire dataset or just the currently-viewed rows.

To export data from a project, click the Export dropdown button in the top right corner and pick the format you want. Your options are:

You can also export reconciled data to Wikidata, or export your Wikidata schema for future use with other OpenRefine projects:

Custom tabular exporter

A screenshot of the custom tabular content tab.

With the custom tabular exporter, you can choose which of your data to export, the separator you wish to use, and whether you'd like to download the result to your computer or upload it into a Google Sheet.

On the Content tab, you can drag and drop the columns appearing in the column list to reorder the output. The options for reconciled and date data are applied to each column individually.

This exporter is especially useful with reconciled data, as you can choose whether you wish to output the cells' original values, the matched values, or the matched IDs. Ouputting “match entity's name”, “matched entity's ID”, or “cell's content” will output, respectively, the contents of cell.recon.match.name, cell.recon.match.id, and cell.value.

“Output nothing for unmatched cells” will export empty cells for both newly-created matches and cells with no chosen matches. “Link to matched entity's page” will produce hyperlinked text in an HTML table output, but have no effect in other formats.

At this time, the date-formatting options in this window do not work. You can keep track of this issue on Github. In the future, you will be able to choose how to output date-formatted cells. You can create a custom date output by using formatting according to the SimpleDateFormat parsing key found here.

A screenshot of the custom tabular file download tab.

On the Download tab, you can generate a preview of how the first ten rows of your dataset will output. If you do not choose one of the file formats on the right, the Download button will generate a text file. On the Upload tab, you can create a new Google Sheet.

With the Option Code tab, you can copy JSON of your current custom settings to reuse on another export, or you can paste in existing JSON settings to apply to the current project.

SQL exporter

The SQL exporter creates a SQL statement containing the data youve exported, which you can use to overwrite or add to an existing database. Choosing ExportSQL exporter will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement, with options to preview and download the statement.

A screenshot of the SQL statement content window.

The Content tab allows you to craft your dataset into an SQL table. From here, you can choose which columns to export, the data type to export for each (or choose "VARCHAR"), and the maximum character length for each field (if applicable based on the data type). You can set a default value for empty cells after unchecking “Allow null” in one or more columns.

With this output tool, you can choose whether to output only currently visible rows, or all the rows in your dataset, as well as whether to include empty rows. The option to “Trim column names” will remove their whitespace characters.

A screenshot of the SQL statement download window.

The Download tab allows you to finalize your complete SQL statement.

Include schema means that you will start your statement with the creation of a table. Without that, you will only have an INSERT statement.

Include content means including the INSERT statement with data from your project. Without that, you will only create empty columns.

You can include DROP and IF EXISTS if you require them, and set a name for the table to which the statement will refer.

You can then preview your statement, which will open up a new browser tab/window showing a statement with the first ten rows of your data (if included), or you can save a .sql file to your computer.

Templating exporter

If you pick Templating… from the Export dropdown menu, you can “roll your own” exporter. This is useful for formats that we don't support natively yet, or won't support. The Templating exporter generates JSON by default.

A screenshot of the Templating exporter generating JSON by default.

The Templating Export window allows you to set your own separators, prefix, and suffix to create a complete dataset in the language of your choice. In the Row template section, you can choose which columns to generate from each row by calling them with variables.

This can be used to:

  • output reconciliation data, such as cells["ColumnName"].recon.match.name
  • create multiple columns of output from different member fields of a single project column
  • employ expressions to modify data for output: for example, cells["ColumnName"].value.toUppercase().

Anything that appears inside doubled curly braces ({{ }}) is treated as a GREL expression; anything outside is generated as straight text. You can use Jython or Clojure by declaring it at the start:

{{jython:return cells["ColumnName"].value}}

:::caution Note that some syntax is different in this tool than elsewhere in OpenRefine: a forward slash must be escaped with a backslash, while other characters do not need escaping. You cannot, at this time, include a closing curly brace (}) anywhere in your expression, or it will cause it to malfunction. :::

You can include regular expressions as usual (inside forward slashes, with any GREL function that accepts them). For example, you could output a version of your cells with punctuation removed, using an expression such as

{{jsonize(cells["ColumnName"].value.replaceChars("/[.!?$&,/]/",""))}}

You could also simply output a plain-text document inserting data from your project into sentences: for example, "In {{cells["Year"].value}} we received {{cells["RequestCount"].value}} requests."

You can use the shorthand ${ColumnName} (no need for quotes) to insert column values directly. You cannot use this inside an expression, because of the closing curly brace.

If your projects is in records mode, the Row separator field will insert a separator between records, rather than individual rows. Rows inside a single record will be directly appended to one another as per the content in the Row Template field.

Once you have created your template, you may wish to save the text you produced in each field, in order to reuse it in the future. Once you click Export OpenRefine will output a simple .txt file, and your template will be discarded.

We have recipes on using the Templating exporter to produce several different formats.

Export a project

You can share a project in progress with another computer, a colleague, or with someone who wants to check your history. This can be useful for showing that your data cleanup didnt distort or manipulate the information in any way. Once you have exported a project, another OpenRefine installation can import it as a new project.

You can either save it locally or upload it to Google Drive (which requires you to authorize a Google account).

:::caution OpenRefine project archives contain confidential data from previous steps, which will still be accessible to anyone who has the archive. If you are hoping to keep your original dataset hidden for privacy reasons, such as using OpenRefine to anonymize information, do not share your project archive. :::

To save your project archive locally: from the Export dropdown, select OpenRefine project archive to file. OpenRefine exports your full project with all of its history. It does not export any current views or applied facets. Existing reconciliation information will be preserved, but the importing computer will need to add the same reconciliation services to keep working with that data.

OpenRefine exports files in .tar.gz format. You can rename the file when you save it; otherwise it will bear the project name.

To save your project archive to Google Drive: from the Export dropdown, select OpenRefine project archive to Google Drive.... OpenRefine will not share the link with you, only confirm that the file was uploaded.

Export operations

You can save and re-apply the history of any project (all the operations shown in the Undo/Redo tab). This creates JSON that you can save for later reuse on another OpenRefine project.