Cleanup technical reference, incorporating changes made on the wiki (#3863)
* Cleanup technical reference, incorporating changes made on the wiki * Re-add reconciliation-api.md erroneously deleted in merge
This commit is contained in:
parent
7961c90d44
commit
0d9c197a5f
@ -53,7 +53,7 @@ This describes the overall steps to your first code contribution in OpenRefine.
|
||||
|
||||
- Reproduce the issue locally, by following the steps described in the issue. You might need to locate a particular dialog, use a specific importer on a sample file, or follow any other user workflow. If you have followed all the steps described in the issue and cannot observe the issue mentioned, write a comment on the issue explaining that you are not able to reproduce it (perhaps it was fixed by another change).
|
||||
|
||||
- Locate the code that is relevant for the issue you want to solve. Text search across files is often useful for that. For instance, if the issue you want to solve is about a dialog entitled "Columnize by key/values", you can search for "Columnize" in the entire source code.
|
||||
- Locate the code that is relevant for the issue you want to solve. Text search across files is often useful for that. For instance, if the issue you want to solve is about a dialog entitled "Columnize by key/values", you can search for "Columnize" in the entire source code. For more details about this technique, see [this comment](https://github.com/OpenRefine/OpenRefine/issues/3137#issuecomment-691649962).
|
||||
|
||||
- Study how the current code works. You might want to use a debugger to put breakpoints at the relevant locations (for inspecting the backend, use your IDE's debugger, for the frontend, use your browser's developer tools).
|
||||
|
||||
@ -65,5 +65,4 @@ This describes the overall steps to your first code contribution in OpenRefine.
|
||||
|
||||
- push your branch to your fork and create a pull request for it, explaining the approach you have used, any design decisions you have made.
|
||||
|
||||
|
||||
Thank you!
|
||||
|
@ -1,261 +0,0 @@
|
||||
---
|
||||
id: data-extension-api
|
||||
title: Data extension API
|
||||
sidebar_label: Data extension API
|
||||
---
|
||||
|
||||
This page describes a new optional API for reconciliation services, allowing clients to pull properties of reconciled records. It is supported from OpenRefine 2.8 onwards. A sample server implementation is available in the [Wikidata reconciliation interface](https://wikidata.reconci.link/).
|
||||
|
||||
## Overview of the workflow
|
||||
|
||||
1. Reconcile a column with a standard reconciliation service
|
||||
|
||||
2. Click "Add column from reconciled values"
|
||||
|
||||
3. The user is proposed some properties to fetch, based on the type they reconciled their column against (if any). They can also pick their own property with the suggest widget (same as for the reconciliation dialog).
|
||||
|
||||
4. A preview of the columns to be fetched is displayed on the right-hand side of the dialog, based on a sample of the rows.
|
||||
|
||||
5. Once the user has clicked "OK", columns are fetched and added to the project. Columns corresponding to other items from the service are directly reconciled, and the column is marked as reconciled against the type suggested by the service for that
|
||||
property. The user can run data extension again from that column.
|
||||
|
||||
[GIF Screencast](http://pintoch.ulminfo.fr/92dcdd20f3/recorded.gif)
|
||||
|
||||
## Specification
|
||||
|
||||
Services supporting data extension must add an `extend` field in their service metadata. This field is expected to have the following subfields, all optional:
|
||||
* `propose_properties` stores the endpoint of an API which will be used to suggest properties to fetch (see specification below). The field contains an object with a `service_url` and `service_path` which will be concatenated to obtain the URL where the endpoint is available, just like the other services in the metadata. If this field is not provided, no property will be suggested in the dialog (the user will have to input them manually).
|
||||
* `property_settings` stores the specification of a form where the user will be able to configure how a given property should be fetched (see specification below). If this field is not provided, the user will not be proposed with settings.
|
||||
|
||||
The service endpoint must also accept a new parameter `extend` (in addition to `queries` which is used for reconciliation). Its behaviour is described in the following section.
|
||||
|
||||
Example service metadata:
|
||||
```json
|
||||
"extend": {
|
||||
"propose_properties": {
|
||||
"service_url": "https://wikidata.reconci.link/",
|
||||
"service_path": "/en/propose_properties"
|
||||
},
|
||||
"property_settings": []
|
||||
}
|
||||
```
|
||||
### Property proposal protocol
|
||||
|
||||
The role of the property proposal endpoint is to suggest a list of properties to fetch. As only input, it accepts GET parameters:
|
||||
* the `type` of a column was reconciled against. If no type is provided, it should suggest properties for a column reconciled against no type.
|
||||
* a `limit` on the number of results to return
|
||||
|
||||
The type is specified by its id in the `type` GET parameter of the endpoint, as follows:
|
||||
|
||||
https://wikidata.reconci.link/en/propose_properties?type=Q3354859&limit=3
|
||||
|
||||
The endpoint returns a JSON response as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"properties": [
|
||||
{
|
||||
"id": "P969",
|
||||
"name": "located at street address"
|
||||
},
|
||||
{
|
||||
"id": "P1449",
|
||||
"name": "nickname"
|
||||
},
|
||||
{
|
||||
"id": "P17",
|
||||
"name": "country"
|
||||
},
|
||||
],
|
||||
"type": "Q3354859",
|
||||
"limit": 3
|
||||
}
|
||||
```
|
||||
This endpoint must support JSONP via the `callback` parameter (just like all other endpoints of the reconciliation service).
|
||||
|
||||
### Data extension protocol
|
||||
|
||||
After calling the property proposal endpoint, the consumer (OpenRefine) calls the service endpoint with a JSON object in the `extend` parameter, containing the following fields:
|
||||
* `ids` is a list of strings, each of which being an identifier of a record as returned by the reconciliation method. These are the records whose properties should be retrieved.
|
||||
* `properties` is a list of JSON objects. They specify the properties to be fetched for each item, and contain the following fields:
|
||||
* `id` (a string): the identifier of the property as returned by the property suggest service (and optionally the property proposal service)
|
||||
* `settings`: a JSON object storing parameters about how the property should be fetched (see below).
|
||||
|
||||
Example:
|
||||
```json
|
||||
{
|
||||
"ids": [
|
||||
"Q7205598",
|
||||
"Q218765",
|
||||
"Q845632",
|
||||
"Q5661356"
|
||||
],
|
||||
"properties": [
|
||||
{
|
||||
"id": "P856"
|
||||
},
|
||||
{
|
||||
"id": "P159"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
The service returns a JSON response formatted as follows:
|
||||
|
||||
* `meta` contains a list of column metadata. The order of the properties must be the same
|
||||
as the one provided in the query. Each element is an object containing the following keys:
|
||||
* `id` (mandatory): the identifier of the property
|
||||
* `name` (mandatory): the human-readable name of the property
|
||||
* `type` (optional): an object with `id` and `name` keys representing the expected
|
||||
type of values for that property. The notion of type is the same as the one
|
||||
used for reconciliation. The `type` field should only be provided when the property returns
|
||||
reconciled items.
|
||||
* `rows` contains an object. Its keys must be exactly the record ids (`ids`) passed in the query.
|
||||
The value for each record id is an object representing a row for that id. The keys of a row object must be exactly the property ids passed in the query (`"P856"` and `"P159"` in the example above). The value for a property id should be a list of cell objects.
|
||||
|
||||
Cell objects are JSON objects which contain the representation of an OpenRefine cell.
|
||||
* an object with a single `"str"` key and a string value for it represents
|
||||
a cell with a (bare) string in it.
|
||||
Example: `{"str": "193.54.0.0/15"}`
|
||||
|
||||
* an object with `"id"` and `"name"` represents a reconciled value
|
||||
(from the same reconciliation service). It will be stored as
|
||||
a matched cell (with maximum reconciliation score).
|
||||
Example: `{"name": "Warsaw","id": "Q270"}`
|
||||
|
||||
* an empty object `{}` represents an empty cell
|
||||
|
||||
* an object with `"date"` and an ISO-formatted date string represents a point in time.
|
||||
Example: `{"date": "1987-02-01T00:00:00+00:00"}`
|
||||
|
||||
* an object with `"float"` and a numerical value represents a quantity.
|
||||
Example: `{"float": 48.2736}`
|
||||
|
||||
* an object with `"int"` and an integer represents a number.
|
||||
Example: `{"int": 54}`
|
||||
|
||||
* an object with `"bool"` and `true` or `false` represents a boolean.
|
||||
Example: `{"bool": false}`
|
||||
|
||||
Example of a full response (for the example query above):
|
||||
```json
|
||||
{
|
||||
"rows": {
|
||||
"Q5661356": {
|
||||
"P159": [],
|
||||
"P856": []
|
||||
},
|
||||
"Q7205598": {
|
||||
"P159": [
|
||||
{
|
||||
"name": "Warsaw",
|
||||
"id": "Q270"
|
||||
}
|
||||
],
|
||||
"P856": [
|
||||
{
|
||||
"str": "http://www.polkomtel.com.pl/english"
|
||||
},
|
||||
{
|
||||
"str": "http://www.polkomtel.com.pl/"
|
||||
}
|
||||
]
|
||||
},
|
||||
"Q845632": {
|
||||
"P159": [
|
||||
{
|
||||
"name": "Bærum",
|
||||
"id": "Q57076"
|
||||
}
|
||||
],
|
||||
"P856": [
|
||||
{
|
||||
"str": "http://www.telenor.com/"
|
||||
}
|
||||
]
|
||||
},
|
||||
"Q218765": {
|
||||
"P159": [
|
||||
{
|
||||
"name": "Paris",
|
||||
"id": "Q90"
|
||||
}
|
||||
],
|
||||
"P856": [
|
||||
{
|
||||
"str": "http://www.sfr.fr/"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"meta": [
|
||||
{
|
||||
"id": "P159",
|
||||
"name": "headquarters location",
|
||||
"type": {
|
||||
"id": "Q7540126",
|
||||
"name": "headquarters",
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "P856",
|
||||
"name": "official website",
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
### Settings specification
|
||||
|
||||
The `property_settings` field in the service metadata allows the service to declare it accepts some settings for the properties it fetches. They are specified as a list of JSON objects which define the fields which should be exposed to the user.
|
||||
|
||||
Each setting object looks like this:
|
||||
```json
|
||||
{
|
||||
"default": 0,
|
||||
"type": "number",
|
||||
"label": "Limit",
|
||||
"name": "limit",
|
||||
"help_text": "Maximum number of values to return per row (0 for no limit)"
|
||||
}
|
||||
```
|
||||
It is essentially a definition of a form field in JSON, with self-explanatory fields.
|
||||
The `type` field specifies the type of the form field (among `number`, `select`, `text`, `checkbox`).
|
||||
The field `default` gives the default value of the form: the service must assume this value if the
|
||||
client does not specify this setting.
|
||||
|
||||
For the `select` field, an additional `choices` field defines the possible choices, with both labels and values:
|
||||
```json
|
||||
{
|
||||
"default": "any",
|
||||
"label": "References",
|
||||
"name": "references",
|
||||
"type": "select",
|
||||
"choices": [
|
||||
{
|
||||
"value": "any",
|
||||
"name": "Any statement"
|
||||
},
|
||||
{
|
||||
"value": "referenced",
|
||||
"name": "At least one reference"
|
||||
},
|
||||
{
|
||||
"value": "no_wiki",
|
||||
"name": "At least one non-wiki reference"
|
||||
}
|
||||
],
|
||||
"help_text": "Filter statements by their references"
|
||||
}
|
||||
```
|
||||
When querying the service for rows, the client can pass an optional `settings` object in each of the requested columns:
|
||||
```json
|
||||
{
|
||||
"id": "P342",
|
||||
"settings": {
|
||||
"limit": "20",
|
||||
"references": "referenced",
|
||||
}
|
||||
}
|
||||
```
|
||||
Each key of the settings object must correspond to one form field proposed by the service. The value of that key is the value of the form field represented as a string (for uniformity and consistency with JSON form serialization).
|
||||
The settings are intended to modify the results returned by the service: of course, the semantics of the settings is up to the service (as the service defines itself what settings it accepts).
|
@ -6,6 +6,10 @@ sidebar_label: OpenRefine API
|
||||
|
||||
This is a generic API reference for interacting with OpenRefine's HTTP API.
|
||||
|
||||
**NOTE:** This protocol is subject to change without warning at any time (and has in the past) and is not versioned. Use at your own risk!
|
||||
|
||||
For OpenRefine 3.3 and later, all POST requests need to include a CSRF token as described here: https://github.com/OpenRefine/OpenRefine/wiki/Changes-for-3.3#csrf-protection-changes
|
||||
|
||||
## Create project:
|
||||
|
||||
> **Command:** _POST /command/core/create-project-from-upload_
|
||||
|
@ -4,17 +4,14 @@ title: Reconciliation API
|
||||
sidebar_label: Reconciliation API
|
||||
---
|
||||
|
||||
_This page is kept for the record. [A cleaner version of this specification](https://reconciliation-api.github.io/specs/0.1/) was written by the [W3C Entity Reconciliation Community Group](https://www.w3.org/community/reconciliation/), which has been formed to improve and promote this API. Join the community group to get involved!_
|
||||
|
||||
_This is a technical description of the mechanisms behind the reconciliation system in OpenRefine. For usage instructions, see [Reconciliation](/manual/reconciling)._
|
||||
|
||||
## Introduction
|
||||
|
||||
A reconciliation service is a web service that, given some text which is a name or label for something, and optionally some additional details, returns a ranked list of potential entities matching the criteria. The candidate text does not have to match each entity's official name perfectly, and that's the whole point of reconciliation--to get from ambiguous text name to precisely identified entities. For instance, given the text "apple", a reconciliation service probably should return the fruit apple, the Apple Inc. company, and New York city (also known as the Big Apple).
|
||||
|
||||
Entities are identified by strong identifiers in some particular identifier space. In the same identifier space, identifiers follow the same syntax. For example, given the string "apple", a reconciliation service might return entities identified by the strings " [Q89](https://www.wikidata.org/wiki/Q89)", "[Q312](https://www.wikidata.org/wiki/Q312)", and "[Q60](https://www.wikidata.org/wiki/Q60)", in the Wikidata ID space. Each reconciliation service can only reconcile to one single identifier space, but several reconciliation services can reconcile to the same identifier space.
|
||||
|
||||
OpenRefine defines a reconciliation API so that users can use the reconciliation features of OpenRefine with various databases (this API was originally developed to work with the now deprecated "[Freebase](https://en.wikipedia.org/wiki/Freebase)" API).
|
||||
OpenRefine can connect to any reconciliation service which follows the [reconciliation API v0.1](https://reconciliation-api.github.io/specs/0.1/). This was formerly a specification edited by the OpenRefine project, which has now transitioned to its own
|
||||
[W3C Entity Reconciliation Community Group](https://www.w3.org/community/reconciliation/).
|
||||
|
||||
Informally, the main function of any reconciliation service is to find good candidates in the underlying database, given the following data:
|
||||
|
||||
@ -22,238 +19,6 @@ Informally, the main function of any reconciliation service is to find good cand
|
||||
* Optionally, a type which can be used to narrow down the search to entities of this type. OpenRefine does not define a particular set of acceptable types: this choice is left to the reconciliation service (see the suggest API for that).
|
||||
* Optionally, a list of properties and their values, which can be used to refine the search. For instance, when reconciling a database of books, the author name or the publication date are useful bits of information that can be transferred to the reconciliation service. This information will be sent to the reconciliation service if the user binds columns to properties. Again, the notion of property is not predefined in OpenRefine: its definition depends on the reconciliation service.
|
||||
|
||||
A standard reconciliation service is a HTTP-based RESTful JSON-formatted API. It consists of various endpoints, each of which fulfills a specific function. Only the first one is mandatory.
|
||||
See [the specifications of the protocol](https://reconciliation-api.github.io/specs/0.1) for more details about the protocol. You can suggest changes on its [issues tracker](https://github.com/reconciliation-api/specs/issues) or on the [group mailing
|
||||
list](https://lists.w3.org/Archives/Public/public-reconciliation/).
|
||||
|
||||
* The root URL. This is the URL that users will need to add the service to OpenRefine. For instance, the Wikidata reconciliation interface in English has the following URL: [https://wikidata.reconci.link/en/api](https://wikidata.reconci.link/en/api)
|
||||
* _Optional._ The suggest API, which enables autocompletion at various places in OpenRefine;
|
||||
* _Optional._ The preview API, which lets users preview the reconciled items directly from OpenRefine;
|
||||
* _Optional._ The data extension API, which lets users add columns from reconciled values based on the properties of the items in the reconciliation service.
|
||||
|
||||
The specification of each of these endpoints is given in the following sections.
|
||||
|
||||
## Workflow overview
|
||||
|
||||
OpenRefine communicates with reconciliation services in the following way.
|
||||
|
||||
* The user adds the service by inputting its endpoint in the dialog. OpenRefine queries this URL to retrieve its [service and metadata](reconciliation-api#service-metadata) which contains basic metadata about the service (such as its name and the available features).
|
||||
* The user selects a service to configure reconciliation using this service. OpenRefine queries the reconciliation service in [batch mode](reconciliation-api#multiple-query-mode) on the first ten items of the column to be reconciled. The reconciliation results are not presented to the user, but the types of the candidate items are aggregated and proposed to the user to restrict the matching to one of them. For example, if your service reconciles both people and companies, and the query for the first 10 names returns 15 candidates which are people and 7 candidates which are companies, the types will be presented to the user in that order. They can override the order and pick whichever type they want, as well as chosen a different type by hand or choose to reconcile without any type information.
|
||||
* The user configures the reconciliation. If a [suggest service](reconciliation-api#suggest-apis) is available, it will be used to provide auto-completion in the dialog, to choose types or properties.
|
||||
* When reconciliation starts, OpenRefine queries the service in [batch mode](reconciliation-api#multiple-query-mode) for small batches of rows and stores the responses of the service.
|
||||
* Once reconciliation is complete, the results are displayed. The user makes reconciliation decisions based on the choices provided. If a [suggest service](reconciliation-api#suggest-apis) is available, it will be used to input custom reconciliation decisions. If a [preview service](reconciliation-api#preview-api) is available, the user will be able to preview the reconciliation candidates without leaving OpenRefine.
|
||||
|
||||
## Main reconciliation service
|
||||
|
||||
The root URL has two functions:
|
||||
|
||||
* it returns the [service and metadata](reconciliation-api#service-metadata) if no query is provided.
|
||||
* when given the `queries` parameter, it performs a [set of queries in batch mode](reconciliation-api#multiple-query-mode) and returns the results for each query. This makes it efficient
|
||||
|
||||
There is a deprecated "single query" mode which is used if the `query` parameter is given. This mode is no longer supported or used by OpenRefine and other API consumers should not rely on it.
|
||||
|
||||
### Service metadata
|
||||
|
||||
When a service is called with just a JSONP `callback` parameter and no other parameters, it must return its _service metadata_ as a JSON object literal with the following fields:
|
||||
|
||||
* `"name"`: the name of the service, which will be used to display the service in the reconciliation menu;
|
||||
* `"identifierSpace"`: an URI for the type of identifiers returned by the service;
|
||||
* `"schemaSpace"`: an URI for the type of types understood by the service.
|
||||
* `"view"` an object with a template URL to view a given item from its identifier: `"view": {"url":"http://example.com/object/{{id}}"} `
|
||||
|
||||
The last two parameters are mainly useful to assert that the identifiers returned by two different reconciliation services mean the same thing. Other fields are optional: they are used to specify the URLs for the other endpoints (suggest, preview and extend) described in the next sections.
|
||||
|
||||
Here are two live examples:
|
||||
|
||||
1. [https://wikidata.reconci.link/en/api](https://wikidata.reconci.link/en/api)
|
||||
2. [http://refine.codefork.com/reconcile/viaf](http://refine.codefork.com/reconcile/viaf)
|
||||
|
||||
```json
|
||||
{
|
||||
"name" : "Wikidata Reconciliation for OpenRefine (en)",
|
||||
"identifierSpace" : "http://www.wikidata.org/entity/",
|
||||
"schemaSpace" : "http://www.wikidata.org/prop/direct/",
|
||||
"view" : {
|
||||
"url" : "https://www.wikidata.org/wiki/{{id}}"
|
||||
},
|
||||
"defaultTypes" : [],
|
||||
"preview" : {
|
||||
...
|
||||
},
|
||||
"suggest" : {
|
||||
...
|
||||
},
|
||||
"extend" : {
|
||||
...
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Query Request
|
||||
|
||||
### Multiple Query Mode
|
||||
|
||||
A call to a standard reconciliation service API for multiple queries looks like this:
|
||||
|
||||
http://foo.com/bar/reconcile?queries={...json object literal...}
|
||||
|
||||
The json object literal has zero or more key/value pairs with arbitrary keys where the value is in the same format as a single query, e.g.
|
||||
|
||||
http://foo.com/bar/reconcile?queries={ "q0" : { "query" : "foo" }, "q1" : { "query" : "bar" } }
|
||||
|
||||
"q0" and "q1" can be arbitrary strings. They will be used to key the results returned.
|
||||
|
||||
For larger data, it can make sense to use POST requests instead of GET:
|
||||
|
||||
```shell
|
||||
curl -X POST -d 'queries={ "q0" : { "query" : "foo" }, "q1" : { "query" : "bar" } }' http://foo.com/bar/reconcile
|
||||
```
|
||||
|
||||
OpenRefine uses POST for all requests, so make sure your service supports the format above.
|
||||
|
||||
### **DEPRECATED** Single Query Mode
|
||||
|
||||
A call to a reconciliation service API for a single query looks like either of these:
|
||||
|
||||
http://foo.com/bar/reconcile?query=...string...
|
||||
http://foo.com/bar/reconcile?query={...json object literal...}
|
||||
|
||||
If the query parameter is a string, then it's an abbreviation of `query={"query":...string...}`. Here are two live examples:
|
||||
|
||||
1. [https://wikidata.reconci.link/en/api?query=boston](https://wikidata.reconci.link/en/api?query=boston)
|
||||
2. [https://wikidata.reconci.link/en/api?query={%22query%22:%22boston%22,%22type%22:%22Q515%22}](https://wikidata.reconci.link/en/api?query={%22query%22:%22boston%22,%22type%22:%22Q515%22})
|
||||
|
||||
### Query JSON Object
|
||||
|
||||
The query json object literal has a few fields
|
||||
|
||||
| Parameter | Description |
|
||||
| --- | --- |
|
||||
| "query" | A string to search for. Required. |
|
||||
| "limit" | An integer to specify how many results to return. Optional. |
|
||||
| "type" | A single string, or an array of strings, specifying the types of result e.g., person, product, ... The actual format of each type depends on the service (e.g., "Q515" as a Wikidata type). Optional. |
|
||||
| "type\_strict" | A string, one of "any", "all", "should". Optional. |
|
||||
| "properties" | Array of json object literals. Optional |
|
||||
|
||||
Each json object literal of the `"properties"` array is of this form
|
||||
|
||||
```json
|
||||
{
|
||||
"p" : string, property name, e.g., "country", or
|
||||
"pid" : string, property ID, e.g., "P17" as a Wikidata property ID
|
||||
"v" : a single, or an array of, string or number or object literal, e.g., "Japan"
|
||||
}
|
||||
```
|
||||
|
||||
A `"v"` object literal would have a single key `"id"` whose value is an identifier resolved previously to the same identity space.
|
||||
|
||||
Here is an example of a full query parameter:
|
||||
|
||||
```json
|
||||
{
|
||||
"query" : "Ford Taurus",
|
||||
"limit" : 3,
|
||||
"type" : "Q3231690",
|
||||
"type_strict" : "any",
|
||||
"properties" : [
|
||||
{ "p" : "P571", "v" : 2009 },
|
||||
{ "pid" : "P176" , "v" : { "id" : "Q20827633" } }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Query Response
|
||||
For multiple queries, the response is a JSON literal object with the same keys as in the request
|
||||
|
||||
```json
|
||||
{
|
||||
"q0" : {
|
||||
"result" : { ... }
|
||||
},
|
||||
"q1" : {
|
||||
"result" : { ... }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Each result consists of a JSON literal object with the structure
|
||||
|
||||
```json
|
||||
{
|
||||
"result" : [
|
||||
{
|
||||
"id" : ... string, database ID ...
|
||||
"name" : ... string ...
|
||||
"type" : ... array of strings ...
|
||||
"score" : ... double ...
|
||||
"match" : ... boolean, true if the service is quite confident about the match ...
|
||||
},
|
||||
... more results ...
|
||||
],
|
||||
... potentially some useful envelope data, such as timing stats ...
|
||||
}
|
||||
```
|
||||
|
||||
The results should be sorted by decreasing score.
|
||||
The service must also support JSONP through a callback parameter ie &callback=foo.
|
||||
|
||||
## Preview API
|
||||
|
||||
The preview service API (complementary to the reconciliation service API) is quite simple. Pass it an identifier and it renders information about the corresponding entity in an HTML page, which will be shown in an iframe inside OpenRefine. The given width and height dimensions tell OpenRefine how to size that iframe.
|
||||
|
||||
## Suggest APIs
|
||||
|
||||
In the "Start Reconciling" dialog box in OpenRefine, you can specify which type of entities the column in question contains. For instance, the column might contains titles of scientific journals. But you don't know the identifier corresponding to the "scientific journal" type. So we need a suggest API that translates "scientific journal" to something like, say, "[Q5633421](https://www.wikidata.org/wiki/Q5633421)" if we're reconciling against Wikidata.
|
||||
|
||||
In the same dialog box, you can specify that other columns should be used to provide more details for the reconciliation. For instance, if there is a column specifying the journals ISSN (a standard identifier for serial publications), passing that data onto the reconciliation service might make reconciliation more accurate. You might want to specify how that second column is related to the column being reconciled, but you might not now how to specify "ISSN" as a precise relationship. So we need a suggest API that translates "ISSN" to something like "[P236](https://www.wikidata.org/wiki/Property:P236)" (once again using Wikidata as an example).
|
||||
|
||||
There is also a need for a suggest service for entities rather than just for types and properties. When a cell has no good candidate, then you would want to perform a search yourself (by clicking on "search for match" in that cell).
|
||||
|
||||
Each suggest API has 2 jobs to do:
|
||||
|
||||
* translate what the user type into a ranked list of entities (and this is similar to the core reconciliation service and might share the same implementation)
|
||||
* render a flyout when an entity is moused over or highlighted using arrow keys (and this is similar to the preview API and might share the same implementation)
|
||||
|
||||
The metadata for each suggest API (type, property, or entity) is as follows:
|
||||
|
||||
```json
|
||||
{
|
||||
"service_url" : "... url including only the domain ...",
|
||||
"service_path" : "... optional relative path ...",
|
||||
"flyout_service_url" : "... optional url including only the domain ...",
|
||||
"flyout_service_path" : "... optional relative path ..."
|
||||
}
|
||||
```
|
||||
|
||||
The `service_url` field is required and it should look like this: `http://foo.com`. There should be no trailing `/` at the end. The other fields are optional and have defaults if not provided:
|
||||
|
||||
* `service_path` defaults to `/private/suggest`
|
||||
* `flyout_service_url` defaults to the provided `service_url` field
|
||||
* `flyout_service_path` defaults to `/private/flyout`
|
||||
|
||||
Refer to [the Suggest API documentation](suggest-api) for further details.
|
||||
|
||||
## Data Extension
|
||||
|
||||
From OpenRefine 2.8 it is possible to fetch values from reconcilied sources natively. This is only possible for the reconciliation endpoints that support this additional feature, described in the [Data Extension API documentation](Data-Extension-API).
|
||||
|
||||
## Examples
|
||||
|
||||
We've cloned a number of the Refine reconciliation services as a way of providing them visibility. They can be found at [https://github.com/OpenRefine](https://github.com/OpenRefine)
|
||||
|
||||
Some examples of reconciliation services which have made code available include:
|
||||
|
||||
* [https://github.com/dergachev/redmine-reconcile](https://github.com/dergachev/redmine-reconcile) - Python & Flask implementation that just returns the given name/number with a base url prepended
|
||||
* [https://github.com/okfn/helmut](https://github.com/okfn/helmut) - A generic Refine reconciliation API implementation using Python & Flask
|
||||
* [https://github.com/mblwhoi/reconciliation_service_skeleton](https://github.com/mblwhoi/reconciliation_service_skeleton) - Skeleton for Standalone Python & Flask Reconciliation Service for Refine
|
||||
* [https://github.com/mikejs/reconcile-demo](https://github.com/mikejs/reconcile-demo)
|
||||
* [https://github.com/rdmpage/phyloinformatics/tree/master/services) - PHP examples (reconciliation\_\*.php](https://github.com/rdmpage/phyloinformatics/tree/master/services)
|
||||
* [http://lucene.apache.org/solr/) and Python Django](https://github.com/opensemanticsearch/open-semantic-entity-search-api](https://github.com/opensemanticsearch/open-semantic-entity-search-api) - Open Source REST-API for Named Entity Extraction, Normalization, Reconciliation, Recommendation, Named Entity Disambiguation and Named Entity Linking of named entities in full-text documents by SKOS thesaurus, RDF ontologies, SQL databases and lists of names (powered by [Apache Solr)
|
||||
* [https://github.com/granoproject/grano-reconcile](https://github.com/granoproject/grano-reconcile) - python example
|
||||
* [https://github.com/codeforkjeff/conciliator](https://github.com/codeforkjeff/conciliator) - a Java framework for creating reconciliation services over the top of existing data sources. The code includes reconciliation services layered over [the Virtual International Authority File (VIAF)](http://viaf.org), [ORCID](http://orcid.org), [the Open Library](http://openlibrary.org) and [Apache Solr](http://lucene.apache.org/solr/).
|
||||
* The open-reconcile project provides a complete Java based reconciliation service which queries a SQL database. [https://code.google.com/p/open-reconcile](https://code.google.com/p/open-reconcile)
|
||||
* The [RDF Extension](http://refine.deri.ie) incorporates, among other things, reconciliation support with different approaches:
|
||||
* a service to reconciliate against querying a SPARQL endpoint
|
||||
* reconcile against a provided RDF file
|
||||
* based on Apache Stanbol ([implementation details](https://github.com/fadmaa/grefine-rdf-extension/pull/59))
|
||||
* [Sunlight Labs](https://github.com/sunlightlabs) implemented a reconciliation service using Piston on Django for their [Influence Explorer](https://sunlightlabs.github.io/datacommons/). [The code is available](https://github.com/sunlightlabs/datacommons/blob/master/dcapi/reconcile/handlers.py)
|
||||
|
||||
Also look at the [[Reconcilable Data Sources]] page for other examples of available reconciliation services that are compatible with Refine. Not all of them are open source, but they might spark some ideas.
|
||||
|
@ -1,101 +0,0 @@
|
||||
---
|
||||
id: suggest-api
|
||||
title: Suggest API
|
||||
sidebar_label: Suggest API
|
||||
---
|
||||
|
||||
The Suggest API has 2 entry points:
|
||||
|
||||
- `suggest`: translates some text that the user has typed, optionally constrained in some ways, to a ranked list of entities (this is very similar to a [Reconciliation API](reconciliation-api) and can share the same implementation)
|
||||
- `flyout` : renders a small view of an entity
|
||||
|
||||
For the `suggest` entry point, it is important to balance speed versus accuracy. The widget must respond in interactive time, meaning about 200 msec, for each change to the user's text input. At the same time, the ranked list of entities must seem quite relevant to what the user types.
|
||||
|
||||
Similarly, for the `flyout` entry point, it is important to respond quickly while providing enough essential details so that the user can visually check if the highlighted entity is the desired one. You probably would want to embed a thumbnail image, as we have found that images are excellent for visual identification.
|
||||
|
||||
## suggest Entry Point
|
||||
|
||||
The `suggest` entry point takes the following URL parameters
|
||||
|
||||
Parameter | Description | Required/Optional
|
||||
----------|-----------------------------|------------------
|
||||
"prefix" | a string the user has typed | required
|
||||
"type" | optional, a single string, or an array of strings, specifying the types of result e.g., person, product, ... The actual format of each type depends on the service | optional |
|
||||
"type\_strict" | optional, a string, one of "any", "all", "should" | optional |
|
||||
"limit" | optional, an integer to specify how many results to return | optional |
|
||||
"start" | optional, an integer to specify the first result to return (thus in conjunction with `limit`, support pagination) | optional |
|
||||
|
||||
The Suggest API should return results as JSON (JSONP must also be supported). The JSON should consist of a 'result' array containing objects with at least a 'name' and 'id'. The JSON response can optionally include other information as illustrated in this structure for a full JSON response:
|
||||
```json
|
||||
{
|
||||
"code" : "/api/status/ok",
|
||||
"status" : "200 OK",
|
||||
"prefix" : ... string, the prefix URL parameter echoed back ...
|
||||
"result" : [
|
||||
{
|
||||
"id" : ... string, identifier of entity ...
|
||||
"name" : ... string, nameof entity ...
|
||||
"notable" : [{
|
||||
"id" : ... string, identifier of type ...
|
||||
"name" : ... string, name of type ...
|
||||
}, ...]
|
||||
},
|
||||
... more results ...
|
||||
],
|
||||
}
|
||||
```
|
||||
|
||||
* `code` (optional) error/success state of the API above the level of HTTP. Use "/api/status/ok" for a successful request and "/api/status/error" if there has been an error.
|
||||
* `status` (optional) should correspond to the HTTP response code.
|
||||
* `prefix` (optional) the query string submitted to the Suggest API.
|
||||
* `result` (required) array containing multiple results from the Suggest API consisting of at least an id and name.
|
||||
* `id` (required) the id of an entity being suggested by the Suggest API
|
||||
* `name` (required) a short string which labels or names the entity being suggested by the Suggest API
|
||||
* `description` (optional) a short description of the item, which will be displayed below the name
|
||||
* `notable` (optional) is a a list of JSON objects that describes the types of the entity. They are rendered in addition to the entity's name to provide more disambiguation details and stored in the reconciliation data of the cells. This list can also be supplied as a list of type identifiers (such as `["Q5"]` instead of `[{"id":"Q5","name":"human"}]`).
|
||||
|
||||
Here is an example of a minimal request and response using the Suggest API layered over [Wikidata](https://www.wikidata.org):
|
||||
|
||||
URL: https://wikidata.reconci.link/en/suggest/entity?prefix=A5
|
||||
JSON response:
|
||||
|
||||
```json
|
||||
{
|
||||
"result": [
|
||||
{
|
||||
"name": "A5",
|
||||
"description": "road",
|
||||
"id": "Q429719"
|
||||
},
|
||||
{
|
||||
"name": "Apple A5",
|
||||
"description": null,
|
||||
"id": "Q420764"
|
||||
},
|
||||
{
|
||||
"name": "A5 autoroute",
|
||||
"description": "controlled-access highway from Paris's Francilienne to the A31 near Beauchemin",
|
||||
"id": "Q788832"
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## flyout Entry Point
|
||||
|
||||
The `flyout` entry point takes a single URL parameter: `id`, which is the identifier of the entity to render, as a string. It also takes a `callback` parameter to support JSONP. It returns a JSON object literal with a single field: `html`, which is the rendered view of the given entity.
|
||||
|
||||
Here is an example of a minimal request and response using the Suggest API layered over [[Wikidata](https://www.wikidata.org) with only the required fields in each case:
|
||||
|
||||
URL: https://wikidata.reconci.link/en/flyout/entity?id=Q786288
|
||||
JSON response:
|
||||
|
||||
```json
|
||||
{
|
||||
"html": "<p style=\"font-size: 0.8em; color: black;\">national road in Latvia</p>",
|
||||
"id": "Q786288"
|
||||
}
|
||||
```
|
||||
|
||||
OpenRefine incorporates a set of `fbs-` CSS class names which can be used in the flyout HTML if desired to render the flyout information in a standard style.
|
@ -3,3 +3,49 @@ id: version-release-process
|
||||
title: How to do an OpenRefine version release
|
||||
sidebar_label: How to do an OpenRefine version release
|
||||
---
|
||||
|
||||
When releasing a new version of Refine, the following steps should be followed:
|
||||
|
||||
1. Make sure the `master` branch is stable and nothing has broken since the previous version. We need developers to stabilize the trunk and some volunteers to try out `master` for a few days.
|
||||
2. Change the version number in [RefineServlet.java](http://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/RefineServlet.java#L62) and in the POM files using `mvn versions:set -DnewVersion=2.6-beta -DgenerateBackupPoms=false`. Commit the changes.
|
||||
3. Compose the list of changes in the code and on the wiki. If the issues have been updated with the appropriate milestone, the Github issue tracker should be able to provide a good starting point for this.
|
||||
4. Set up build machine. This needs to be Mac OS X or Linux.
|
||||
5. Download Windows and Mac JREs to bundle them in the Windows and Mac packages from [AdoptOpenJDK](https://adoptopenjdk.net/). You only need the JREs, not the JDKs. Use the lowest version of Java supported (Java 8 currently). Configure the location of these JREs in the `settings.xml` file at the root of the repository. It is important to download recent versions of the JREs as this impacts which HTTPS certificates are accepted by the tool.
|
||||
6. Insert the production Google credentials in https://github.com/OpenRefine/OpenRefine/blob/bc540a880eceb88e54f85ca43eb54769de3bfa4f/extensions/gdata/src/com/google/refine/extension/gdata/GoogleAPIExtension.java#L36-L39 without committing the changes.
|
||||
7. [Build the release candidate kits using the shell script (not just Maven)](https://github.com/OpenRefine/OpenRefine/wiki/Building-OpenRefine-From-Source). This must be done on Mac OS X or Linux to be able to build all 3 kits. On Linux you will need to install the `genisoimage` program first.
|
||||
```shell
|
||||
./refine dist 2.6-beta.2
|
||||
```
|
||||
To build the Windows version with embedded JRE, use `mvn package -s settings.xml -P embedded-jre -DskipTests=true`.
|
||||
|
||||
8. On a Mac machine, compress the Mac `.dmg` (`genisoimage` does not compress it by default) with the following command on a mac machine: `hdiutil convert openrefine-uncompressed.dmg -format UDZO -imagekey zlib-level=9 -o openrefine-3.1-mac.dmg`. If running OS X in a VM, it's probably quicker and more reliable to transfer the kits to the host machine first and then to Github. Finder -> Go -> Connect -> smb://10.0.2.2/. You can then sign the generated DMG file with `codesign -s "Apple Distribution: Code for Science and Society, Inc." openrefine-3.1-mac.dmg`. This requires that you have installed the appropriate certificate on your Mac, see below.
|
||||
|
||||
9. Tag the release candidate in git and push the tag to Github. For example:
|
||||
```shell
|
||||
git tag -a -m "Second beta" 2.6-beta.2
|
||||
git push origin --tags
|
||||
```
|
||||
10. Upload the kits to Github releases [https://github.com/OpenRefine/OpenRefine/releases/](https://github.com/OpenRefine/OpenRefine/releases/) Mention the SHA sums of all uploaded artifacts.
|
||||
11. Announce the beta/release candidate for testing
|
||||
12. Repeat build/release candidate/testing cycle, if necessary.
|
||||
13. Tag the release in git. Build the distributions and upload them.
|
||||
14. [Update the OpenRefine Homebrew cask](https://github.com/OpenRefine/OpenRefine/wiki/Maintaining-OpenRefine's-Homebrew-Cask) or coordinate an update via the [developer list](https://groups.google.com/forum/#!forum/openrefine-dev)
|
||||
15. Verify that the correct versions are shown in the widget at [http://openrefine.org/download](http://openrefine.org/download)
|
||||
16. Announce on the [OpenRefine mailing list](https://groups.google.com/forum/#!forum/openrefine).
|
||||
17. Update the version in master to the next version number with `-SNAPSHOT` (such as `4.3-SNAPSHOT`)
|
||||
```shell
|
||||
mvn versions:set -DnewVersion=4.3-SNAPSHOT
|
||||
```
|
||||
18. If releasing a new major or minor version, create a snapshot of the docs, following [Docusaurus' versioning procedure](https://docusaurus.io/docs/versioning).
|
||||
|
||||
Apple code signing
|
||||
==================
|
||||
|
||||
We have code signing certificates for our iOS distributions. To use them, follow these steps:
|
||||
* Request advisory.committee@openrefine.org to be added to the Apple team: you need to provide the email address that corresponds to your AppleID account;
|
||||
* Create a certificate signing request from your Mac: https://help.apple.com/developer-account/#/devbfa00fef7
|
||||
* Go to https://developer.apple.com/account/resources/certificates/add and select "Apple Distribution" as certificate type
|
||||
* Upload the certificate signing request in the form
|
||||
* Download the generated certificate
|
||||
* Import this certificate in the "Keychain Access" app on your mac
|
||||
* You can now sign code on behalf of the team using the `codesign` utility, such as `codesign -s "Apple Distribution: Code for Science and Society, Inc." openrefine-3.1-mac.dmg`.
|
||||
|
@ -37,8 +37,6 @@ module.exports = {
|
||||
'technical-reference/architecture',
|
||||
'technical-reference/openrefine-api',
|
||||
'technical-reference/reconciliation-api',
|
||||
'technical-reference/suggest-api',
|
||||
'technical-reference/data-extension-api',
|
||||
'technical-reference/contributing',
|
||||
'technical-reference/build-test-run',
|
||||
'technical-reference/development-roadmap',
|
||||
|
Loading…
Reference in New Issue
Block a user