RandomSec/OpenRefine/docs/versioned_docs/version-3.5/manual/wikibase/advanced-schemas.md

73 lines
3.5 KiB
Markdown
Raw Normal View History

2022-01-30 23:08:52 +01:00
Sometimes your data is not as simple as a normal table, or the sort of
statements that you want to do varies on each row. This document
explains how to work around these cases.
## Hierarchical data {#hierarchical-data}
Sometimes your source provides data in a structured format, such as XML,
JSON or RDF. OpenRefine can import these files and will convert them to
tables. These tables will reflect some of the hierarchy in the file by
means of null cells, using the [records mode](/manual/exploring#rows-vs-records).
The Wikibase extension always works in rows mode, so if we want to add
statements which reference both the artist and the song, we need to fill
the null cells with the corresponding artist. You can do this with the
**Fill down** operation (in the **Edit cells** menu for this column).
This function will copy not just cell values but also reconciliation
results.
## Conditional additions {#conditional-additions}
Sometimes you want to add a statement only in some conditions.
The workflow to achieve this looks like this:
- Use facets to select the rows where you do not want to add any
information;
- Blank out the cells in the column that contain the information you
want to add. If you do not want to lose this information, you can
create a copy of the column beforehand;
- Remove your facets to see all rows again;
- Create a schema using the column you partially blanked out as
statement value.
## Varying properties {#varying-properties}
Sometimes you wish you could use column variables for properties in your
schema. It is currently not possible, first because we do not have a
reconciliation service for properties yet, but also because allowing
varying properties in a statement would mean that these properties could
potentially have different datatypes, which would break the structure of
the schema.
If you only want to use a few properties, there is a way to go around
this problem. For instance, say you have a first column of altitudes and a
second column that indicates whether you should add it as
[operating altitude (P2254)](https://www.wikidata.org/wiki/Property:P2254) or as
[elevation above sea level (P2044)](https://www.wikidata.org/wiki/Property:P2044).
Create a text facet on the first column. Filter to keep only the
*altitude* values. Add a new column based on the second column, by
keeping the default expression (`value`) which just copies the existing
values. Then, select the *maximum operating altitude* value in the facet
and do the same. Reset the facet, you should have obtained two new columns
which partition the original column. You can now create a schema which adds
two statements, with values taken from those columns. Since blank values are
ignored, exactly one statement will be added for each item, with the desired property.
## Adapting to existing data on Wikibase {#adapting-to-existing-data-on-wikibase}
Sometimes you want to create statements only if there are no such
statements on the item yet. Here is one way to achieve this:
- first, retrieve the existing values from Wikidata first, using the
**Edit columns****Add columns from reconciled values** action;
- second, create a *facet by null* on the newly created column that
contains the information you want to control against;
- select the non-null rows (value **false**);
- clear the contents of the column where your source values are
(**Edit cells** → **Common transformations****To null**).
You can now construct your schema as usual - null values will be ignored
when generating the statements.