RandomSec/OpenRefine/docs/versioned_docs/version-3.5/manual/wikibase/advanced-schemas.md
2022-01-04 16:31:32 +01:00

3.5 KiB

Sometimes your data is not as simple as a normal table, or the sort of statements that you want to do varies on each row. This document explains how to work around these cases.

Hierarchical data

Sometimes your source provides data in a structured format, such as XML, JSON or RDF. OpenRefine can import these files and will convert them to tables. These tables will reflect some of the hierarchy in the file by means of null cells, using the records mode.

The Wikibase extension always works in rows mode, so if we want to add statements which reference both the artist and the song, we need to fill the null cells with the corresponding artist. You can do this with the Fill down operation (in the Edit cells menu for this column). This function will copy not just cell values but also reconciliation results.

Conditional additions

Sometimes you want to add a statement only in some conditions.

The workflow to achieve this looks like this:

  • Use facets to select the rows where you do not want to add any information;
  • Blank out the cells in the column that contain the information you want to add. If you do not want to lose this information, you can create a copy of the column beforehand;
  • Remove your facets to see all rows again;
  • Create a schema using the column you partially blanked out as statement value.

Varying properties

Sometimes you wish you could use column variables for properties in your schema. It is currently not possible, first because we do not have a reconciliation service for properties yet, but also because allowing varying properties in a statement would mean that these properties could potentially have different datatypes, which would break the structure of the schema.

If you only want to use a few properties, there is a way to go around this problem. For instance, say you have a first column of altitudes and a second column that indicates whether you should add it as operating altitude (P2254) or as elevation above sea level (P2044).

Create a text facet on the first column. Filter to keep only the altitude values. Add a new column based on the second column, by keeping the default expression (value) which just copies the existing values. Then, select the maximum operating altitude value in the facet and do the same. Reset the facet, you should have obtained two new columns which partition the original column. You can now create a schema which adds two statements, with values taken from those columns. Since blank values are ignored, exactly one statement will be added for each item, with the desired property.

Adapting to existing data on Wikibase

Sometimes you want to create statements only if there are no such statements on the item yet. Here is one way to achieve this:

  • first, retrieve the existing values from Wikidata first, using the Edit columnsAdd columns from reconciled values action;
  • second, create a facet by null on the newly created column that contains the information you want to control against;
  • select the non-null rows (value false);
  • clear the contents of the column where your source values are (Edit cellsCommon transformationsTo null).

You can now construct your schema as usual - null values will be ignored when generating the statements.