Updates to Reconciling
This commit is contained in:
parent
45646d0253
commit
1e0c3fa34e
@ -341,7 +341,7 @@ Examples:
|
||||
| `isError("abc")` | false |
|
||||
| `isError(1 / 0)` | true |
|
||||
|
||||
Remember that these are controls and not functions. So you can’t use dot notation (the `e.isX()` syntax).
|
||||
Remember that these are controls and not functions: you can’t use dot notation (the `e.isX()` syntax).
|
||||
|
||||
### Constants
|
||||
|Name |Meaning |
|
||||
@ -352,7 +352,7 @@ Remember that these are controls and not functions. So you can’t use dot notat
|
||||
|
||||
## Jython
|
||||
|
||||
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (.py or .pyc) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. Remember to restart OpenRefine, so that new Jython/Python libraries are initialized during Butterfly's startup.
|
||||
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup.
|
||||
|
||||
OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions), for parsing and working with HTML elements and extraction.
|
||||
|
||||
@ -374,7 +374,7 @@ Fields have to be accessed using the bracket operator rather than the dot operat
|
||||
return cells["col1"]["value"]
|
||||
```
|
||||
|
||||
To access the Levenshtein distance between the reconciled value and the cell value (?) use the [recon variables](#reconciliation):
|
||||
To access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value, use [recon variables](#reconciliation):
|
||||
|
||||
```
|
||||
return cell["recon"]["features"]["nameLevenshtein"]
|
||||
@ -415,4 +415,4 @@ For help with syntax, see the [Clojure website's guide to syntax](https://clojur
|
||||
|
||||
User-contributed Clojure recipes can be found on our wiki at [https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure).
|
||||
|
||||
Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/).
|
||||
Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/).
|
@ -6,58 +6,56 @@ sidebar_label: Reconciling
|
||||
|
||||
## Overview
|
||||
|
||||
Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison are produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, and interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services).
|
||||
Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison might be produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, or interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services).
|
||||
|
||||
To reconcile your OpenRefine project against an external dataset, that dataset must offer a web service that conforms to the [Reconciliation Service API standards](https://reconciliation-api.github.io/specs/0.1/).
|
||||
|
||||
You may wish to reconcile in order to:
|
||||
* fix spelling or variations in proper names
|
||||
* to clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH)
|
||||
* to link your data to an existing dataset
|
||||
* to add it to an open and editable system such as [Wikidata](https://www.wikidata.org)
|
||||
* or to see whether entities in your project appear in some specific list, such as the [Panama Papers](https://aleph.occrp.org/datasets/734).
|
||||
* clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH)
|
||||
* link your data to an existing dataset
|
||||
* add to an editable platform such as [Wikidata](https://www.wikidata.org)
|
||||
* or see whether entities in your project appear in some specific list, such as the [Panama Papers](https://aleph.occrp.org/datasets/734).
|
||||
|
||||
Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to ensure the process is successful. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton.
|
||||
Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to review and approve the results. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton.
|
||||
|
||||
:::info
|
||||
We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data.
|
||||
:::
|
||||
|
||||
## Sources
|
||||
|
||||
We recommend starting with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add.
|
||||
Start with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add.
|
||||
|
||||
OpenRefine maintains a [further list of sources on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources), which can be edited by anyone. This list includes ways that you can reconcile against a [local dataset](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services).
|
||||
|
||||
Other services may exist that are not yet listed in these two places: for example, the [310 datasets hosted by the Organized Crime and Corruption Reporting Project (OCCRP)](https://aleph.occrp.org/datasets/) each have their own reconciliation URL, or you can reconcile against their entire database with the URL [shared on the reconciliation API list](https://reconciliation-api.github.io/testbench/). For another example, you can reconcile against the entire Virtual International Authority File (VIAF) dataset, or [only the contributions from certain institutions](http://refine.codefork.com/). Search online to see if the authority you wish to reconcile against has an available service, or whether you can download a copy to reconcile against locally.
|
||||
|
||||
OpenRefine includes Wikidata reconciliation in the installation package - see the [Wikidata](wikidata) page for more information particular to that service.
|
||||
OpenRefine includes Wikidata reconciliation in the installation package - see the [Wikidata](wikidata) page for more information particular to that service. Extensions can add reconciliation services, and can also add enhanced reconciliation capacities. Check the list of extensions on the [Downloads page](https://openrefine.org/download.html) for more information.
|
||||
|
||||
:::info
|
||||
OpenRefine extensions can add reconciliation services, and can also add enhanced reconciliation capacities. Check the list of extensions on the [Downloads page](https://openrefine.org/download.html) for more information.
|
||||
:::
|
||||
|
||||
Each source will have its own documentation on how it provides reconciliation. Refer to the service itself if you have questions about its behaviors and which OpenRefine features it supports.
|
||||
Each source will have its own documentation on how it provides reconciliation. The table on [the reconciliation API list](https://reconciliation-api.github.io/testbench/) indicates whether your chosen service supports the features described below. Refer to the service's documentation if you have questions about its behaviors and which OpenRefine features it supports.
|
||||
|
||||
## Getting started
|
||||
|
||||
Select <span class="menuItems">Reconcile</span> → <span class="menuItems">Start reconciling</span> on a column. If you want to reconcile only some cells in that column, first use filters and facets to isolate them.
|
||||
Choose a column to reconcile and use its dropdown menu to select <span class="menuItems">Reconcile</span> → <span class="menuItems">Start reconciling</span>. If you want to reconcile only some cells in that column, first use filters and facets to isolate them.
|
||||
|
||||
In the reconciliation window, you will see Wikidata offered as a default service. To add another service, click <span class="buttonLabels">Add Standard Service...</span> and paste in the URL of a [service](#sources). You should see the name of the service appear in the list of <span class="buttonLabels">Services</span> if the URL is correct.
|
||||
|
||||
![The reconciliation window.](/img/reconcilewindow.png)
|
||||
|
||||
Once you select a service, the service may sample your selected column and identify some [suggested categories (“types”)](#reconciling-by-type) to reconcile against. Other services will suggest their available types without sampling, and some services have no types.
|
||||
Once you select a service, your selected column may be sampled in order to suggest [“types” (categories)](#reconciling-by-type) to reconcile against. Other services will suggest their available types without sampling, and some services have no types.
|
||||
|
||||
For example, if you had a list of artists represented in a gallery collection, you could reconcile their names against the Getty Research Institute’s [Union List of Artist Names (ULAN)](https://www.getty.edu/research/tools/vocabularies/ulan/). The same [Getty reconciliation URL](https://services.getty.edu/vocab/reconcile/) will offer you ULAN, AAT (Art and Architecture Thesaurus), and TGN (Thesaurus of Geographic Names).
|
||||
|
||||
![The reconciliation window with types.](/img/reconcilewindow2.png)
|
||||
|
||||
Refer to the documentation specific to the reconciliation service (frequently linked on [this page](https://reconciliation-api.github.io/testbench/)) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available.
|
||||
Refer to the [documentation specific to the reconciliation service](https://reconciliation-api.github.io/testbench/) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available.
|
||||
|
||||
Reconciliation can be a time-consuming process, especially with large datasets. We suggest starting with a small test batch. There is no throttle (delay between requests) to set for the reconciliation process. The amount of time will vary for each service, and vary based on the options you select during the process.
|
||||
|
||||
When the process is done, you will see the reconciliation data in the cells.
|
||||
If the cell was successfully matched, it displays a single dark blue link. In this case, the reconciliation is confident that the match is correct, and you should not have to check it manually.
|
||||
If there is no clear match, one or more candidates are displayed, together with their reconciliation score, with light blue links. You will need to select the correct one.
|
||||
If the cell was successfully matched, it displays text as a single dark blue link. In this case, the reconciliation is confident that the match is correct, and you should not have to check it manually.
|
||||
If there is no clear match, one or more candidates are displayed, together with their reconciliation score, with the text in light blue links. You will need to select the correct one.
|
||||
|
||||
For each matching decision you make, you have two options: match this cell only (one checkmark), or also use the same identifier for all other cells containing the same original string (two checkmarks).
|
||||
|
||||
@ -71,26 +69,28 @@ Hovering over the suggestion will also offer the two matching options as buttons
|
||||
|
||||
For matched values (those appearing as dark blue links), the underlying cell value has not been altered - the cell is storing both the original string and the matched entity link at the same time. If you were to copy your column to a new column at this point using `value`, for example, the reconcilation data would not transfer - only the original strings. You can learn more about how OpenRefine stores different pieces of information in each cell in [the Variables section specific to reconciliation data](expressions#reconciliation).
|
||||
|
||||
For each cell, you can manually “Create new item,” which will take the cell’s current value and apply it as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is like a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature.
|
||||
For each cell, you can manually “Create new item,” which will take the cell’s original value and apply it, as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature.
|
||||
|
||||
### Reconciliation facets
|
||||
|
||||
Under <span class="menuItems">Reconcile</span> → <span class="menuItems">Facets</span> you can see a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets for you when you reconcile a column.
|
||||
Under <span class="menuItems">Reconcile</span> → <span class="menuItems">Facets</span> there are a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets when you reconcile some cells.
|
||||
|
||||
One is a numeric facet for <span class="menuItems">best candidate's score</span>, the range of reconciliation scores of only the best candidate of each cell. Each service calculates scores differently and has a different range, but higher scores always mean better matches. You can facet for higher scores in the numeric facet, and then approve them all in bulk, by using <span class="menuItems">Reconcile</span> → <span class="menuItems">Actions</span> → <span class="menuItems">Match each cell to its best candidate</span>.
|
||||
One is a numeric facet for “best candidate's score,” the range of reconciliation scores of only the best candidate of each cell. Higher scores mean better matches, although each service calculates scores differently and has a different range. You can facet for higher scores using the numeric facet, and then approve them all in bulk, by using <span class="menuItems">Reconcile</span> → <span class="menuItems">[Actions](#reconciliation-actions)</span> → <span class="menuItems">Match each cell to its best candidate</span>.
|
||||
|
||||
There is also a “judgment” facet created, which lets you filter for the cells that haven't been matched (pick “none” in the facet). As you process each cell, its judgment changes from “none” to “matched” and it disappears from the view.
|
||||
|
||||
You can add other facets by selecting <span class="menuItems">Reconcile</span> → <span class="menuItems">Facets</span> on your reconciled column. You can facet by:
|
||||
|
||||
* your judgments (“matched,” or “none” for unreconciled cells, or “new” for entities you've created)
|
||||
* the action you’ve performed on that cell (chosen a “single” match, or set a "mass" match, or no action, as “unknown”)
|
||||
* the action you’ve performed on that cell (chosen a “single” match, or set a “mass” match, or no action, which appears as “unknown”)
|
||||
* the timestamps on the edits you’ve made so far (these appear as millisecond counts since an arbitrary point: they can be sorted alphabetically to move forward and back in time).
|
||||
|
||||
You can facet only the best candidates for each cell, based on:
|
||||
* the score (calculated based on each service's own methods)
|
||||
* the edit distance (using the [Levenshtein distance](cellediting#nearest-neighbor), a number based on how many single-character edits would be required to get your original value to the candidate value, with a larger value being a greater difference)
|
||||
* the word similarity (a percentage based on how many words, excluding [stop words](https://en.wikipedia.org/wiki/Stop_word), in the original value match words in the candidate. For example, the value "Maria Luisa Zuloaga de Tovar" matched to the candidate "Palacios, Luisa Zuloaga de" results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0).
|
||||
* the word similarity.
|
||||
|
||||
Word similarity is calculated as a percentage based on how many words (excluding [stop words](https://en.wikipedia.org/wiki/Stop_word)) in the original value match words in the candidate. For example, the value “Maria Luisa Zuloaga de Tovar” matched to the candidate “Palacios, Luisa Zuloaga de” results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0).
|
||||
|
||||
You can also look at each best candidate’s:
|
||||
* type (the ones you have selected in successive reconciliation attempts, or other types returned by the service based on the cell values)
|
||||
@ -102,17 +102,17 @@ These facets are useful for doing successive reconciliation attempts, against di
|
||||
### Reconciliation actions
|
||||
|
||||
You can use the <span class="menuItems">Reconcile</span> → <span class="menuItems">Actions</span> menu options to perform bulk changes (which will apply only to your currently viewed set of rows or records):
|
||||
* Match each cell to its best candidate (by highest score)
|
||||
* Create a new item for each cell (discard any suggested matches)
|
||||
* Create one new item for similar cells (a new entity will be created for each unique string)
|
||||
* Match all filtered cells to... (a specific item from the chosen service, via a search box. For services with the [“suggest entities” property](https://reconciliation-api.github.io/testbench/))
|
||||
* Discard all reconciliation judgments (reverts back to multiple candidates per cell, including cells that may have been auto-matched in the original reconciliation process)
|
||||
* Clear reconciliation data, reverting all cells back to their original values.
|
||||
* <span class="menuItems">Match each cell to its best candidate</span> (by highest score)
|
||||
* <span class="menuItems">Create a new item for each cell</span> (discard any suggested matches)
|
||||
* <span class="menuItems">Create one new item for similar cells</span> (a new entity will be created for each unique string)
|
||||
* <span class="menuItems">Match all filtered cells to...</span> (a specific item from the chosen service, via a search box; only works with services that support the “suggest entities” property)
|
||||
* <span class="menuItems">Discard all reconciliation judgments</span> (reverts back to multiple candidates per cell, including cells that may have been auto-matched in the original reconciliation process)
|
||||
* <span class="menuItems">Clear reconciliation data</span>, reverting all cells back to their original values.
|
||||
|
||||
The other options available under <span class="menuItems">Reconcile</span> are:
|
||||
* Copy reconciliation data... (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and/or new cells will copy over - unmatched values will not change)
|
||||
* [Use values as identifiers](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches)
|
||||
* [Add entity identifiers column](#add-entity-identifiers-column).
|
||||
* <span class="menuItems">Copy reconciliation data...</span> (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and new cells will copy over; unmatched values will not change)
|
||||
* [<span class="menuItems">Use values as identifiers</span>](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches)
|
||||
* [<span class="menuItems">Add entity identifiers column</span>](#add-entity-identifiers-column).
|
||||
|
||||
## Reconciling with unique identifiers
|
||||
|
||||
@ -130,13 +130,13 @@ You may get false positives, which you will need to hover over or click on to id
|
||||
|
||||
Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc.
|
||||
|
||||
Reconciling against a type may be faster and more accurate, but may result in fewer matches. Some services have hierarchical types (such as “mammal” as a subtype of “animal”). When you reconcile against a more specific type, unmatched values may fall back to more broad types. Other services will not do this, so you may need to perform successive reconciliation attempts against different types. Refer to the documentation specific to the reconciliation service to learn more.
|
||||
Reconciling against a type may be faster and more accurate, but may result in fewer matches. Some services have hierarchical types (such as “mammal” as a subtype of “animal”). When you reconcile against a more specific type, unmatched values may fall back to the broader type; other services will not do this, so you may need to perform successive reconciliation attempts against different types. Refer to the documentation specific to the reconciliation service to learn more.
|
||||
|
||||
When you select a service from the list, OpenRefine will load some or all available types. Some services will sample the first ten rows of your column to suggest types (check the [“Suggest types” column on this table of services](https://reconciliation-api.github.io/testbench/)). You will see a service’s types in the reconciliation window:
|
||||
When you select a service from the list, OpenRefine will load some or all available types. Some services will sample the first ten rows of your column to suggest types (check the [“Suggest types” column](https://reconciliation-api.github.io/testbench/)). You will see a service’s types in the reconciliation window:
|
||||
|
||||
![Reconciling using a type.](/img/reconcile-by-type.png)
|
||||
|
||||
In this example, “Person” and “Corporate Name” are potential types offered by VIAF. You can also use the <span class="fieldLabels">Reconcile against type:</span> field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications.
|
||||
In this example, “Person” and “Corporate Name” are potential types offered by the reconciliation API for VIAF. You can also use the <span class="fieldLabels">Reconcile against type:</span> field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications. You may need to enter the service's own strings precisely instead of attempting to search for a match.
|
||||
|
||||
Types are structured to fit their content: the Wikidata “human” type, for example, can include fields for birth and death dates, nationality, etc. The VIAF “person” type can include nationality and gender. You can use this to [include more properties](#reconciling-with-additional-columns) and find better matches.
|
||||
|
||||
@ -150,11 +150,11 @@ Some of your cells may be ambiguous, in the sense that a string can point to mor
|
||||
|
||||
![Reconciling sometimes turns up ambiguous matches.](/img/reconcileParis.gif)
|
||||
|
||||
Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). The other columns in your project will appear in the reconciliation window, with an <span class="fieldLabels">Include?</span> checkbox available on each.
|
||||
Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). You can re-reconcile unmatched cells with additional properties, in the right side of the <span class="menuItems">Start reconciling</span> window, under “Also use relevant details from other columns.” The column names in your project will appear in the reconciliation window, with an <span class="fieldLabels">Include?</span> checkbox next to each one.
|
||||
|
||||
You can fill in the <span class="fieldLabels">As Property</span> field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms.
|
||||
Fill in the <span class="fieldLabels">As Property</span> field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms.
|
||||
|
||||
Some services will not be able to search for the exact name of your desired <span class="fieldLabels">As Property</span> entry, but you can still manually supply the field name. Refer to the service to make sure you enter it correctly.
|
||||
Some services will not be able to search for the exact name of your desired <span class="fieldLabels">As Property</span> entry, but you can still manually supply the field name. Refer to the service to choose the most appropriate field, and make sure you enter it correctly.
|
||||
|
||||
![Including a birth-date type.](/img/reconcile-with-property.png)
|
||||
|
||||
@ -174,44 +174,51 @@ Once you have selected matches for your cells, you can retrieve the unique ident
|
||||
|
||||
If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using <span class="menuItems">Edit column</span> → <span class="menuItems">Add columns from reconciled values...</span>.
|
||||
|
||||
For example, if you have a column of chemical elements identified by name, you can fetch categorical information about them such as their atomic number and their element symbol, as the animation shows below:
|
||||
For example, if you have a column of chemical elements identified by name, you can fetch categorical information about them such as their atomic number and their element symbol:
|
||||
|
||||
![A screenshare of elements fetching related information.](/img/reconcileelements.gif)
|
||||
|
||||
Once you have pulled reconciliation values and selected one for each cell, selecting <span class="menuItems">Add column from reconciled values...</span> will bring up a window to choose which information you’d like to import into a new column. The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with suggested properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc.
|
||||
Once you have pulled reconciliation values and selected one for each cell, selecting <span class="menuItems">Add column from reconciled values...</span> will bring up a window to choose which information you’d like to import into new columns. You can manually enter desired properties, or select from a list of suggestions.
|
||||
|
||||
The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with the associated properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc.
|
||||
|
||||
![A screenshot of available properties from GND.](/img/reconcileGND.png)
|
||||
|
||||
If you have left any values unreconciled in your column, you will see “<not reconciled>” in the preview. These will generate blank cells if you continue with the column addition process. This process may pull more than one property per row in your data, so you may need to switch into records mode after you've added columns.
|
||||
If you have left any values unreconciled in your column, you will see “<not reconciled>” in the preview. These will generate blank cells if you continue with the column addition process.
|
||||
|
||||
This process may pull more than one property per row in your data (such as multiple children's names), so you may need to switch into records mode after you've added columns.
|
||||
|
||||
### Add columns by fetching URLs
|
||||
|
||||
If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as https://viaf.org/viaf/000000). You can use the <span class="menuItems">Edit column</span> → <span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions).
|
||||
If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as “https://viaf.org/viaf/000000”). You can use the <span class="menuItems">Edit column</span> → <span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions).
|
||||
|
||||
You may not want to pull the entire HTML content of the pages at the ends of these URLs, so look to see whether the service offers a metadata endpoint, such as JSON-formatted data. You can either use a column of IDs, or you can pull the ID from each matched cell during the fetching process.
|
||||
|
||||
For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using <span class="menuItems">Add column by fetching URLs</span> and entering the GREL expression `“http://vocab.getty.edu/” + value + “.json”` in the window. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”.
|
||||
For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using <span class="menuItems">Add column by fetching URLs</span> and entering the GREL expression `"http://vocab.getty.edu/" + value + ".json"`. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”.
|
||||
|
||||
You can alternatively insert the ID directly from the matched column using a GREL expression like `“http://vocab.getty.edu/” + cell.recon.match.id + “.json”` instead.
|
||||
Alternatively, you can insert the ID directly from the matched column's reconciliation variables, using a GREL expression like `“http://vocab.getty.edu/” + cell.recon.match.id + “.json”` instead.
|
||||
|
||||
Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about common errors with this process.
|
||||
Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about the fetching process.
|
||||
|
||||
## Keep all the suggestions made
|
||||
|
||||
If you would like to generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to “Edit column” → “Add column based on this column.” To create a list of all the possible matches, use
|
||||
To generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to <span class="menuItems">Edit column</span> → <span class="menuItems">Add column based on this column</span>. To create a list of all the possible matches, use something like
|
||||
|
||||
```forEach(cell.recon.candidates,c,c.name).join(“,”)```
|
||||
```
|
||||
forEach(cell.recon.candidates,c,c.name).join(", ")
|
||||
```
|
||||
|
||||
To get the unique identifiers of these matches instead, use
|
||||
|
||||
```forEach(cell.recon.candidates,c,c.id).join(“,”)```
|
||||
```
|
||||
forEach(cell.recon.candidates,c,c.id).join(", ")
|
||||
```
|
||||
|
||||
This information is stored as a string without any attached reconciliation information.
|
||||
This information is stored as a string, without any attached reconciliation information.
|
||||
|
||||
## Writing reconciliation expressions
|
||||
|
||||
OpenRefine's GREL supplies a number of variables related specifically to reconciled values.
|
||||
For example, some of the reconciliation variables are:
|
||||
OpenRefine supplies a number of variables related specifically to reconciled values. These can be used in GREL and Jython expressions. For example, some of the reconciliation variables are:
|
||||
|
||||
* `cell.recon.match.id` or `cell.recon.match.name` for matched values
|
||||
* `cell.recon.best.name` or `cell.recon.best.id` for best-candidate values
|
||||
@ -222,8 +229,8 @@ For example, some of the reconciliation variables are:
|
||||
|
||||
You can find out more in the [reconciliaton variables](expressions#reconciliaton-variables) section.
|
||||
|
||||
## Exporting your reconciled data
|
||||
## Exporting reconciled data
|
||||
|
||||
Once you have data that is reconciled to existing entities online, you may wish to export that data to a user-editable service such as Wikidata. See the section on [uploading your edits to Wikidata](wikidata#upload-edits-to-wikidata) for more information, or the section on [exporting](exporting) to see other formats OpenRefine can produce.
|
||||
|
||||
You can share reconciled data in progress through a [project export or import](exporting#export-a-project), with some preparation. The importing user needs to have the reconciliation services installed on their OpenRefine instance in advance of opening the project in order to use candidate and match links. Otherwise, the links will be broken and the user will need to add the reconciliation service and re-reconcile the columns in question. [Wikidata](wikidata) reconciliation data can be shared more easily as the service comes bundled with OpenRefine.
|
||||
You can share reconciled data in progress through a [project export or import](exporting#export-a-project), with some preparation. The importing user needs to have the appropriate reconciliation services installed on their OpenRefine instance (by going to <span class="menuItems">Start reconciling</span> and clicking on <span class="buttonLabels">Add Standard Service...</span>) in advance of opening the project, in order to use candidate and match links. Otherwise, the links will be broken and the user will need to add the reconciliation service and re-reconcile the columns in question. [Wikidata](wikidata) reconciliation data can be shared more easily as the service comes bundled with OpenRefine.
|
Loading…
Reference in New Issue
Block a user