254 lines
11 KiB
Markdown
254 lines
11 KiB
Markdown
|
---
|
|||
|
id: schema-alignment
|
|||
|
title: Schema alignment
|
|||
|
sidebar_label: Schema alignment
|
|||
|
---
|
|||
|
|
|||
|
A Wikibase schema is a template of Wikidata edits that is applied
|
|||
|
to each row in the project. This page describes how each part of this
|
|||
|
template works, and how it generates edits depending on the contents of
|
|||
|
the table cells.
|
|||
|
|
|||
|
## Items {#items}
|
|||
|
|
|||
|
An item in the schema represents a set of changes on a particular
|
|||
|
Wikidata item, generated by a single row. This item can contain changes
|
|||
|
in [terms](#terms) (labels, descriptions and aliases) or
|
|||
|
[statements](#statements).
|
|||
|
|
|||
|
It is possible to make edits on different items for each row of your
|
|||
|
table: just add multiple items in your schema. Each item has a subject,
|
|||
|
which can be either entered manually (when the item on which the edits
|
|||
|
should be made is the same for all rows), or any reconciled column can
|
|||
|
be dropped in this field. In this case, the edits will depend on the
|
|||
|
reconciliation status of each cell:
|
|||
|
|
|||
|
- If the cell is matched to an item, edits will be made on that item;
|
|||
|
- If the cell is marked as corresponding to a new item, a new item
|
|||
|
will be created for it. See [New items](./new-entities) for more
|
|||
|
details about how this works;
|
|||
|
- If the cell has reconciliation candidates but has not been matched
|
|||
|
to any of them, the edit will be skipped (even if there is only one
|
|||
|
candidate with a high reconciliation score);
|
|||
|
- If the cell is not reconciled or blank, the edit will be skipped.
|
|||
|
|
|||
|
Do not worry about the ordering of items in the schema or the order of
|
|||
|
your rows, as OpenRefine will rearrange your edits to optimize their
|
|||
|
upload. If your project makes edits on the same item across multiple
|
|||
|
rows, these edits will be merged together and performed in one edit. See
|
|||
|
[Uploading your changes](./uploading) about that.
|
|||
|
|
|||
|
## Terms {#terms}
|
|||
|
|
|||
|
**Terms** are the language-specific strings that you find at the top of
|
|||
|
Wikidata items: labels, descriptions and aliases. OpenRefine lets you
|
|||
|
edit these terms via the Wikidata schema.
|
|||
|
|
|||
|
### Languages {#languages}
|
|||
|
|
|||
|
Each term belongs to a particular language. Wikidata supports [hundreds
|
|||
|
of languages](https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all), which
|
|||
|
are designated by language codes. For each term that you want to add to
|
|||
|
an item, you will need to specify the language for this term. There are
|
|||
|
two cases:
|
|||
|
|
|||
|
- Either the language is constant across your dataset: you know that
|
|||
|
all the names in a given column are spelled in the same language. In
|
|||
|
this case, type the name of the language in the input and select the
|
|||
|
language in the drop-down suggestion dialog. This will place the
|
|||
|
appropriate language code in the input.
|
|||
|
- Or the language varies across your dataset. In this case, you need
|
|||
|
to provide a column of Wikimedia language codes that indicates the
|
|||
|
language for each term that you want to add. Just drag and drop this
|
|||
|
column to the language field. If there are any invalid language
|
|||
|
codes in this column, the corresponding terms will be ignored.
|
|||
|
OpenRefine will translate any deprecated language codes to their
|
|||
|
preferred values silently.
|
|||
|
|
|||
|
### Labels {#labels}
|
|||
|
|
|||
|
This is because Wikidata items can have at most one label per language,
|
|||
|
so you need to choose whether to override any existing label (default
|
|||
|
behaviour before 3.2) or only insert your label if there is no such
|
|||
|
label in the given language (default behaviour starting from 3.2). When
|
|||
|
the content of the cell providing the label is blank, nothing will be
|
|||
|
changed (so, it is not possible to remove labels).
|
|||
|
|
|||
|
### Descriptions {#descriptions}
|
|||
|
|
|||
|
Descriptions work like labels: there is at most one description per
|
|||
|
language, and OpenRefine can override existing descriptions or leave
|
|||
|
them unchanged. It is not possible to remove descriptions either.
|
|||
|
|
|||
|
### Aliases {#aliases}
|
|||
|
|
|||
|
Aliases are added to the list of existing aliases in the given language.
|
|||
|
When adding an alias in a language where no label has been added yet,
|
|||
|
the alias is automatically promoted to a label for this language. It is
|
|||
|
not possible to remove aliases or to override any existing aliases.
|
|||
|
|
|||
|
## Statements {#statements}
|
|||
|
|
|||
|
You can add statements in the schema: this will generate new statements
|
|||
|
on the corresponding items. These statements will be merged with any
|
|||
|
existing statements on the actual Wikidata items and [this merging process depends on the upload medium](./uploading#Merging-strategies-for-statements).
|
|||
|
It is forecast to give more control over the merging strategy in the
|
|||
|
near future.
|
|||
|
|
|||
|
### Main values {#main-values}
|
|||
|
|
|||
|
Statements must have main values: \"novalue\" or \"somevalue\"
|
|||
|
statements are not supported yet. The main value of a statement is a
|
|||
|
data value whose type depends on the property used for the statement. If
|
|||
|
the main value cannot be evaluated (for instance because one of the
|
|||
|
cells it depends on is empty), then the entire statement will be
|
|||
|
skipped.
|
|||
|
|
|||
|
See the [data values](#data-values) section for more details
|
|||
|
about how to specify each type of data value and when they are skipped.
|
|||
|
|
|||
|
### Qualifiers {#qualifiers}
|
|||
|
|
|||
|
Qualifiers can be added on each statement. When their values are
|
|||
|
skipped, only the qualifier will be discarded: the rest of the statement
|
|||
|
will still be added.
|
|||
|
|
|||
|
### References {#references}
|
|||
|
|
|||
|
References can (and should) be added to back each statement. If values
|
|||
|
inside the reference are skipped, the corresponding part of the
|
|||
|
reference will be discarded but the reference will still be added
|
|||
|
(unless the reference becomes empty).
|
|||
|
|
|||
|
### Ranks {#ranks}
|
|||
|
|
|||
|
All statements ranks are set to **Normal**. It is currently not possible
|
|||
|
to set a different rank.
|
|||
|
|
|||
|
## Data values {#data-values}
|
|||
|
|
|||
|
Data values are the data that you can find as target of a statement (or
|
|||
|
qualifier, or part of a reference). Each property dictates a particular
|
|||
|
type of data value. In each case, OpenRefine uses a particular process
|
|||
|
to translate cell contents to a data value of the appropriate type. This
|
|||
|
section explains the process for all data types.
|
|||
|
|
|||
|
### Items {#items-1}
|
|||
|
|
|||
|
Items are evaluated in the same way as the subjects of items in the
|
|||
|
schema. They can be input directly using the auto-suggest service
|
|||
|
provided, or any column reconciled against Wikidata can be used. Refer to
|
|||
|
[the first Items section](#items) to see how they are
|
|||
|
evaluated.
|
|||
|
|
|||
|
### Strings and external identifiers {#strings-and-external-identifiers}
|
|||
|
|
|||
|
Bare strings and external identifiers can be input directly as constants
|
|||
|
(if they do not change across rows) or using any column. If a reconciled
|
|||
|
column is used for a string value, it is the value of the cell that is
|
|||
|
going to be used, not the name of the reconciled item (which is what
|
|||
|
OpenRefine displays). Values are skipped when the column is blank or
|
|||
|
null.
|
|||
|
|
|||
|
### Monolingual texts {#monolingual-texts}
|
|||
|
|
|||
|
Monolingual texts consist of two parts:
|
|||
|
|
|||
|
- the language: see [Languages](#languages) for their
|
|||
|
structure;
|
|||
|
- the value of the text: see [the section above](#strings-and-external-identifiers).
|
|||
|
|
|||
|
A monolingual text is skipped when any of its parts is skipped (that is,
|
|||
|
if the language or the text are invalid).
|
|||
|
|
|||
|
### Dates {#dates}
|
|||
|
|
|||
|
Dates are parsed from cell contents (or from any constant provided in
|
|||
|
the schema) and the precision of the date is inferred from its format.
|
|||
|
Here are the valid formats:
|
|||
|
|
|||
|
- `YYYYM`, such as `2001M` (millenium precision)
|
|||
|
- `YYYYC`, such as `1901C` (century precision)
|
|||
|
- `YYYYD`, such as `1981D` (decade precision)
|
|||
|
- `YYYY`, such as `1984` (year precision)
|
|||
|
- `YYYY-MM`, such as `2019-03` (month precision)
|
|||
|
- `YYYY-MM-DD`, such as `1897-08-14` (day precision)
|
|||
|
|
|||
|
Any value that does not match any of these formats will be ignored. All
|
|||
|
dates are represented in UTC, Gregorian calendar.
|
|||
|
|
|||
|
In OpenRefine 3.3, the following new formats have been introduced:
|
|||
|
|
|||
|
- `TODAY` returns today's date with day precision. This will be
|
|||
|
evaluated when performing the edits (or exporting to
|
|||
|
QuickStatements);
|
|||
|
- `YYYY-MM-DD_QID` can be used to specify a date in a particular
|
|||
|
calendar (such as the [proleptic Julian calendar (Q1985786)](https://www.wikidata.org/wiki/Q1985786).
|
|||
|
|
|||
|
In OpenRefine 3.5, the following new format has been introduced:
|
|||
|
|
|||
|
- `-234` represents the year 234 [BCE](https://en.wikipedia.org/wiki/Common_Era)
|
|||
|
|
|||
|
### Quantities {#quantities}
|
|||
|
|
|||
|
Quantities consist of two parts: the amount and the unit.
|
|||
|
|
|||
|
- the amount is mandatory and must be a string, such as `18,229.1020`.
|
|||
|
The precision that is displayed will be respected (the same number
|
|||
|
of trailing zeros will be shown in Wikidata). By default, no upper
|
|||
|
and lower bounds will be set. To define these, one needs to use the
|
|||
|
engineering notation, such as `3.45E+3`, which will be interpreted
|
|||
|
as `3,450±5`. As usual, the amount can be provided as a constant or
|
|||
|
as a column variable. In the latter case, the values in the column
|
|||
|
must be strings.
|
|||
|
- the unit is optional. It is an item, so it can be provided either
|
|||
|
with the auto-suggest dialog or as a reconciled column. It is
|
|||
|
important to note that if a reconciled column is used, any
|
|||
|
unreconciled cells will discard the entire quantity value. So a
|
|||
|
template for a quantity value is either always unit-less, or always
|
|||
|
has a unit.
|
|||
|
|
|||
|
### Globe coordinates {#globe-coordinates}
|
|||
|
|
|||
|
Geographic coordinates are specified as strings with the following
|
|||
|
formats, where all components are floating point numbers in degrees:
|
|||
|
|
|||
|
- `latitude,longitude` for a default precision of ten micro degrees
|
|||
|
(for instance:
|
|||
|
[`49.265278,4.028611`](https://tools.wmflabs.org/geohack/geohack.php?params=49.265277777778_N_4.0286111111111_E_globe:earth&language=en)
|
|||
|
can be used indicate the position of Reims, France.
|
|||
|
|
|||
|
|
|||
|
- `latitude,longitude,precision` when specifying an explicit precision
|
|||
|
(for instance: `49.265278,4.028611,0.1` can be used indicate the
|
|||
|
position of Reims within a tenth of a degree).
|
|||
|
|
|||
|
All globe coordinates are on Earth ([Q2](https://www.wikidata.org/wiki/Q2)).
|
|||
|
|
|||
|
If your coordinates are in a different format, such as
|
|||
|
`49° 15′ 55″ N, 4° 1′ 43″ E`, you will need to convert them to decimal
|
|||
|
format first.
|
|||
|
|
|||
|
### Media on Commons {#media-on-commons}
|
|||
|
|
|||
|
Media on Wikimedia Commons is treated like strings, whose values must
|
|||
|
exactly match filenames on Commons. These values are not checked during
|
|||
|
schema evaluations: if they are wrong, uploading the statements will
|
|||
|
fail.
|
|||
|
|
|||
|
Tabular data and Geoshapes must be prefixed with the `Data:` namespace.
|
|||
|
This is indicated by the placeholder in the field that appears when
|
|||
|
constructing the schema.
|
|||
|
|
|||
|
### Properties {#properties}
|
|||
|
|
|||
|
Properties are always constants: there is currently no way to reconcile
|
|||
|
a column against properties. They have to be selected with the
|
|||
|
auto-suggest dialog.
|
|||
|
|
|||
|
### Other data types {#other-data-types}
|
|||
|
|
|||
|
URLs, mathematical expressions and other textual datatypes are supported
|
|||
|
and treated as strings. At the time of writing, all datatypes supported
|
|||
|
by Wikidata are supported by OpenRefine.
|