Add explicit anchor ids in docs. Closes #3888. (#3891)

This commit is contained in:
Antonin Delpeuch 2021-05-26 06:48:54 +02:00 committed by GitHub
parent 13b6f4f93b
commit aa05936a07
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
65 changed files with 938 additions and 938 deletions

View File

@ -3,7 +3,7 @@ id: cellediting
title: Cell editing
sidebar_label: Cell editing
---
## Overview
## Overview {#overview}
OpenRefine offers a number of features to edit and improve the contents of cells automatically and efficiently.
@ -11,7 +11,7 @@ One way of doing this is editing through a [text facet](facets#text-facet). Once
You can apply a text facet on numbers, boolean values, and dates, but if you edit a value it will be converted into the text [data type](exploring#data-types) (regardless of whether you edit a date into another correctly-formatted date, or a “true” value into “false”, etc.).
## Transform
## Transform {#transform}
Select <span class="menuItems">Edit cells</span><span class="menuItems">Transform...</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column.
@ -21,29 +21,29 @@ You can also switch to the <span class="tabLabels">Undo / Redo</span> tab inside
OpenRefine offers you some frequently-used transformations in the next menu option, <span class="menuItems">Common transforms</span>. For more custom transforms, read up on [expressions](expressions).
## Common transforms
## Common transforms {#common-transforms}
### Trim leading and trailing whitespace
### Trim leading and trailing whitespace {#trim-leading-and-trailing-whitespace}
Often cell contents that should be identical, and look identical, are different because of space or line-break characters that are invisible to users. This function will get rid of any characters that sit before or after visible text characters.
### Collapse consecutive whitespace
### Collapse consecutive whitespace {#collapse-consecutive-whitespace}
You may find that some text cells contain what look like spaces but are actually tabs, or contain multiple spaces in a row. This function will remove all space characters that sit in sequence and replace them with a single space.
### Unescape HTML
### Unescape HTML {#unescape-html}
Your data may come from an HTML-formatted source that expresses some characters through references (such as “&amp;nbsp;” for a space, or “%u0107” for a ć) instead of the actual Unicode characters. You can use the “unescape HTML entities” transform to look for these codes and replace them with the characters they represent. For other formatting that needs to be escaped, try a custom transformation with [`escape()`](grelfunctions#escapes-s-mode).
### Replace smart quotes with ASCII
### Replace smart quotes with ASCII {#replace-smart-quotes-with-ascii}
Smart quotes (or curly quotes) recognize whether they come at the beginning or end of a string, and will generate an “open” quote (“) and a “close” quote (”). These characters are not ASCII-compliant (though they are UTF8-compliant) so you can use this tranform to replace them with a straight double quote character (") instead.
### Case transforms
### Case transforms {#case-transforms}
You can transform an entire column of text into UPPERCASE, lowercase, or Title Case using these three options. This can be useful if you are planning to do textual analysis and wish to avoid case-sensitivity (which some functions are) causing problems in your analysis. Consider also using a [custom facet](facets#custom-text-facet) to temporarily modify cases instead of this permanent operation if appropriate.
### Data-type transforms
### Data-type transforms {#data-type-transforms}
As detailed in [Data types](exploring#data-types), OpenRefine recognizes different data types: string, number, boolean, and date. When you use these transforms, OpenRefine will check to see if the given values can be converted, then both transform the data in the cells (such as “3” as a text string to “3” as a number) and convert the data type on each successfully transformed cell. Cells that cannot be transformed will output the original value and maintain their original data type.
@ -55,7 +55,7 @@ Because these common transforms do not offer the ability to output an error inst
You can also convert cells into null values or empty strings. This can be useful if you wish to, for example, erase duplicates that you have identified and are analyzing as a subset.
## Fill down and blank down
## Fill down and blank down {#fill-down-and-blank-down}
Fill down and blank down are two functions most frequently used when encountering data organized into [records](exploring#row-types-rows-vs-records) - that is, multiple rows associated with one specific entity.
@ -65,7 +65,7 @@ Be careful that your data is sorted properly before you begin blanking down - no
If, conversely, youve received data with empty cells because it was already in something akin to records mode, you can fill down information to the rest of the rows. This will duplicate whatever value exists in the topmost cell with a value: if the first row in the record is blank, it will take information from the next cell, or the cell after that, until it finds a value. The blank cells above this will remain blank.
## Split multi-valued cells
## Split multi-valued cells {#split-multi-valued-cells}
Splitting cells with more than one value in them is a common way to get your data from single rows into [multi-row records](exploring#rows-vs-records). Survey data, for example, frequently allows respondents to “Select all that apply,” or an inventory list might have items filed under more than one category.
@ -79,11 +79,11 @@ You can also split based on the lengths of the strings you expect to find. This
If you have data that should be split into multiple columns instead of multiple rows, see [Split into several columns](columnediting#split-into-several-columns).
## Join multi-valued cells
## Join multi-valued cells {#join-multi-valued-cells}
Joining will reverse the “split multi-valued cells” operation, or join up information from multiple rows into one row. All the strings will be compressed into the topmost cell in the record, in the order they appear. A window will appear where you can set the separator; the default is a comma and a space (, ). This separator is optional. We suggest the separator | as a sufficiently rare character.
## Cluster and edit
## Cluster and edit {#cluster-and-edit}
Creating a facet on a column is a great way to look for inconsistencies in your data; clustering is a great way to fix those inconsistencies. Clustering uses a variety of comparison methods to find text entries that are similar but not exact, then shares those results with you so that you can merge the cells that should match. Where editing a single cell or text facet at a time can be time-consuming and difficult, clustering is quick and streamlined.
@ -101,7 +101,7 @@ For each cluster identified, you can pick one of the existing values to apply to
You can also export the currently identified clusters as a JSON file, or close the window with or without applying your changes. You can also use the histograms on the right to narrow down to, for example, clusters with lots of matching rows, or clusters of long or short values.
### Clustering methods
### Clustering methods {#clustering-methods}
You dont need to understand the details behind each clustering method to apply them successfully to your data. The order in which these methods are presented in the interface and on this page is the order we recommend - starting with the most strict rules and moving to the most lax, which require more human supervision to apply correctly.
@ -118,7 +118,7 @@ The clustering pop-up window offers you a variety of clustering methods:
* levenshtein
* ppm
#### Key collision
#### Key collision {#key-collision}
**Key collisions** are very fast and can process millions of cells in seconds:
@ -128,7 +128,7 @@ The clustering pop-up window offers you a variety of clustering methods:
This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself wont identify because it separates words). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”).
##### Phonetic clustering
##### Phonetic clustering {#phonetic-clustering}
The next four methods are phonetic algorithms: they identify letters that sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after hearing it spoken aloud.
@ -140,7 +140,7 @@ The next four methods are phonetic algorithms: they identify letters that sound
Regardless of the language of your data, applying each of them might find different potential matches: for example, Metaphone clusters “Cornwall” and “Corn Hill” and “Green Hill,” while Cologne clusters “Greenvale” and “Granville” and “Cornwall” and “Green Wall.”
#### Nearest neighbor
#### Nearest neighbor {#nearest-neighbor}
**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups.
@ -152,13 +152,13 @@ We recommend setting the block number to at least 3, and then increasing it if y
For more of the theory behind clustering, see [Clustering In Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth).
## Replace
## Replace {#replace}
OpenRefine provides a find/replace function for you to edit your data. Selecting <span class="menuItems">Edit cells</span><span class="menuItems">Replace</span> will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use [regular expressions](expressions#regular-expressions) in this field. You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first.
You can also perform a sort of find/replace operation by editing one cell, and selecting “apply to all identical cells.”
## Edit one cell at a time
## Edit one cell at a time {#edit-one-cell-at-a-time}
You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue link labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the [data type](exploring#data-types) of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window.

View File

@ -4,15 +4,15 @@ title: Column editing
sidebar_label: Column editing
---
## Overview
## Overview {#overview}
Column editing contains some of the most powerful data-improvement methods in OpenRefine. The operations in the <span class="menuItems">Edit column</span> menu involve using one column of data to add entirely new columns and fields to your dataset.
## Splitting or joining
## Splitting or joining {#splitting-or-joining}
Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. The reverse is also often true: you may have several columns of category values that you want to join into one “category” column.
.
### Split into several columns
### Split into several columns {#split-into-several-columns}
![A screenshot of the settings window for splitting columns.](/img/columnsplit.png)
@ -22,7 +22,7 @@ You can also specify a maximum number of new columns to be made: separator chara
New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can choose to remove the original column with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting strings to numbers, and may not work with [dates](exploring#dates).
### Join columns
### Join columns {#join-columns}
![A screenshot of the settings window for joining columns.](/img/columnjoin.png)
@ -30,7 +30,7 @@ You can join columns by selecting <span class="menuItems">Edit column</span> →
The joined data will appear in the column you originally selected, or you can create a new column for this content and specify a name. You can delete all the columns that were used in this join operation.
## Add column based on this column
## Add column based on this column {#add-column-based-on-this-column}
Selecting <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column...</span> will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external sources.
@ -58,7 +58,7 @@ row.record.cells.Column1.value + row.record.cells.Column2.value
You may wish to add separators or spaces, or modify your input during this operation with more advanced expressions.
## Add column by fetching URLs
## Add column by fetching URLs {#add-column-by-fetching-urls}
Through the <span class="menuItems">Add column by fetching URLs</span> function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be building URL strings based on your column of data, by using `value` to insert a relevant substring. Your chosen column needs to contains parts of paths to valid HTML pages or files online.
@ -85,7 +85,7 @@ Note the following:
* [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept)
* [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization)
### Common errors
### Common errors {#common-errors}
When OpenRefine attempts to fetch information from a web service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function.
@ -116,7 +116,7 @@ Note that for Mac users and for Windows users with the OpenRefine installation w
* On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts`.
* On Windows: `\server\target\jre\lib\security\`.
## Renaming, removing, and moving
## Renaming, removing, and moving {#renaming-removing-and-moving}
Every column's <span class="menuItems">Edit column</span> dropdown contains options to move it (to the beginning, end, left, or right), rename it, and delete it.
These operations can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to <span class="menuItems">[View](sortview#view)</span><span class="menuItems">Collapse this column</span> instead.

View File

@ -4,13 +4,13 @@ title: Exploring data
sidebar_label: Overview
---
## Overview
## Overview {#overview}
OpenRefine offers lots of features to help you learn about your dataset, even if you dont change a single character. In this section we cover different ways for sorting through, filtering, and viewing your data.
Unlike spreadsheets, OpenRefine doesnt store formulas and display the output of those calculations; it only shows the value inside each cell. It doesnt support cell colors or text formatting.
## Data types
## Data types {#data-types}
Each piece of information (each cell) in OpenRefine is assigned a data type. Some file formats, when imported, can set data types that are recognized by OpenRefine. Cells without an associated data type on import will be considered a “string” at first, but you can have OpenRefine convert cell contents into other data types later. This is set at the cell level, not at the column level.
@ -41,7 +41,7 @@ Changing a cell's data type is not the same operation as transforming its conten
To transform data from one type to another, see [Transforming data](cellediting#data-type-transforms) for information on using common tranforms, and see [Expressions](expressions) for information on using [toString()](grelfunctions#tostringo-string-format-optional), [toDate()](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-), and other functions.
### Dates
### Dates {#dates}
A “date” type is created when a column is [transformed into dates](transforming#to-date), when an expression is used to [convert cells to dates](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-) or when individual cells are set to have the data type “date”.
@ -74,7 +74,7 @@ The following table shows some example [date and time formatting styles for the
|Long |June 30, 2009 7:03:47 AM PDT |30 juin 2009 07:03:47 PDT|
|Full |Tuesday, June 30, 2009 7:03:47 AM PDT |mardi 30 juin 2009 07 h 03 PDT|
## Rows vs. records
## Rows vs. records {#rows-vs-records}
A row is a simple way to organize data: a series of cells, one cell per column. Sometimes there are multiple pieces of information in one cell, such as when a survey respondent can select more than one response.

View File

@ -4,13 +4,13 @@ title: Exporting your work
sidebar_label: Exporting
---
## Overview
## Overview {#overview}
Once your dataset is ready, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata.
You can also [export your full project data](#export-a-project) so that it can be opened by someone else using OpenRefine (or yourself, on another computer).
## Export data
## Export data {#export-data}
![A screenshot of the Export dropdown.](/img/export-menu.png)
@ -33,7 +33,7 @@ You can also export reconciled data to Wikidata, or export your Wikidata schema
* [Export to QuickStatements](wikidata#quickstatements-export) (version 1)
* [Export Wikidata schema](wikidata#import-and-export-schema)
### Custom tabular exporter
### Custom tabular exporter {#custom-tabular-exporter}
![A screenshot of the custom tabular content tab.](/img/custom-tabular-exporter.png)
@ -54,7 +54,7 @@ On the <span class="tabLabels">Download</span> tab, you can generate a preview o
With the <span class="tabLabels">Option Code</span> tab, you can copy JSON of your current custom settings to reuse on another export, or you can paste in existing JSON settings to apply to the current project.
### SQL exporter
### SQL exporter {#sql-exporter}
The SQL exporter creates a SQL statement containing the data youve exported, which you can use to overwrite or add to an existing database. Choosing <span class="menuItems">Export</span><span class="menuItems">SQL exporter</span> will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement, with options to preview and download the statement.
@ -76,7 +76,7 @@ You can include DROP and IF EXISTS if you require them, and set a name for the t
You can then preview your statement, which will open up a new browser tab/window showing a statement with the first ten rows of your data (if included), or you can save a `.sql` file to your computer.
### Templating exporter
### Templating exporter {#templating-exporter}
If you pick <span class="menuItems">Templating…</span> from the <span class="menuItems">Export</span> dropdown menu, you can “roll your own” exporter. This is useful for formats that we don't support natively yet, or won't support. The Templating exporter generates JSON by default.
@ -113,7 +113,7 @@ Once you have created your template, you may wish to save the text you produced
We have recipes on using the Templating exporter to [produce several different formats](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#12-templating-exporter).
## Export a project
## Export a project {#export-a-project}
You can share a project in progress with another computer, a colleague, or with someone who wants to check your history. This can be useful for showing that your data cleanup didnt distort or manipulate the information in any way. Once you have exported a project, another OpenRefine installation can [import it as a new project](starting#import-a-project).
@ -129,6 +129,6 @@ OpenRefine exports files in `.tar.gz` format. You can rename the file when you s
To save your project archive to Google Drive: from the <span class="menuItems">Export</span> dropdown, select <span class="menuItems">OpenRefine project archive to Google Drive...</span>. OpenRefine will not share the link with you, only confirm that the file was uploaded.
## Export operations
## Export operations {#export-operations}
You can [save and re-apply the history of any project](running#reusing-operations) (all the operations shown in the Undo/Redo tab). This creates JSON that you can save for later reuse on another OpenRefine project.

View File

@ -4,7 +4,7 @@ title: Expressions
sidebar_label: Overview
---
## Overview
## Overview {#overview}
You can use expressions in multiple places in OpenRefine to extend data cleanup and transformation. Expressions are available with the following functions:
* <span class="menuItems">Facet</span>:
@ -30,7 +30,7 @@ These languages have some syntax differences but support many of the same [varia
This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples).
## Expressions
## Expressions {#expressions}
There are significant differences between OpenRefine's expressions and the spreadsheet formulas you may be used to using for data manipulation. OpenRefine does not store formulas in cells and display output dynamically: OpenRefines transformations are one-time operations that can change column contents or generate new columns. These are applied using variables such as `value` or `cell` to perform the same modification to each cell in a column.
@ -53,7 +53,7 @@ For another example, if you were to create a new column based on your data using
Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns.
## The expressions editor
## The expressions editor {#the-expressions-editor}
When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor.
@ -72,13 +72,13 @@ Starring formulas youve used in the past can be helpful for repetitive tasks
You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell.
## Regular expressions
## Regular expressions {#regular-expressions}
OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span class="menuItems">Text filter</span> or a <span class="menuItems">Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality.
If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/).
### GREL-supported regex
### GREL-supported regex {#grel-supported-regex}
To write a regular expression inside a GREL expression, wrap it between a pair of forward slashes (/) much like the way you would in Javascript. For example, in
@ -100,7 +100,7 @@ On the [GREL functions](grelfunctions) page, functions that support regex will i
* [split](grelfunctions#splits-s-or-p-sep)
* [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional)
### Jython-supported regex
### Jython-supported regex {#jython-supported-regex}
You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a <span class="menuItems">Custom Text Facet</span>:
@ -108,7 +108,7 @@ You can also use [regex with Jython expressions](http://www.jython.org/docs/libr
python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
```
### Clojure-supported regex
### Clojure-supported regex {#clojure-supported-regex}
[Clojure](https://clojure.org/reference/reader) uses the same regex engine as Java, and can be invoked with [re-find](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-find), [re-matches](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-matches), etc. You can use the #"pattern" reader macro as described [in the Clojure documentation](https://clojure.org/reference/other_functions#regex). For example, to get the nth element of a returned sequence, you can use the nth function:
@ -116,7 +116,7 @@ python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
clojure (nth (re-find #"\u2014 (.*),\s*BWV" value) 1)
```
## Variables
## Variables {#variables}
Most OpenRefine variables have attributes: aspects of the variables that can be called separately. We call these attributes “member fields” because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to the `record` variable, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). The square bracket syntax is also used for variables that can call columns by name, for example, `cells["Postal Code"]`.
@ -131,7 +131,7 @@ Most OpenRefine variables have attributes: aspects of the variables that can be
| `rowIndex` | The index value of the current row (the first row is 0) |
| `columnName` | The name of the current cell's column, as a string |
### Row
### Row {#row}
The `row` variable itself is best used to access its member fields, which you can do using either a dot operator or square brackets: `row.index` or `row["index"]`.
@ -150,11 +150,11 @@ For array objects such as `row.columnNames` you can preview the array using the
forEach(row.columnNames,v,v).join("; ")
```
### Cells
### Cells {#cells}
The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. To get the corresponding column's value inside the `cells` variable, use `.value` at the end, for example, `cells["Postal Code"].value`. There is no `cells.value` - it can only be used with member fields.
### Cell
### Cell {#cell}
A `cell` object contains all the data of a cell and is stored as a single object.
@ -167,7 +167,7 @@ You can use `cell` on its own in the expressions editor to copy all the contents
| `cell.recon` | An object encapsulating reconciliation results for that cell | See the [reconciliation](expressions#reconciliation) section |
| `cell.errorMessage` | Returns the message of an *EvalError* instead of the error object itself (use value to return the error object) | .value |
### Reconciliation
### Reconciliation {#reconciliation}
Several of the fields here provide the data used in [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work.
@ -193,7 +193,7 @@ Arrays such as `cell.recon.candidates` and `cell.recon.candidates.type` can be j
forEach(cell.recon.candidates,v,v.name).join("; ")
```
### Record
### Record {#record}
A `row.record` object encapsulates one or more rows that are grouped together, when your project is in records mode. You must call it as `row.record`; `record` will not return values.

View File

@ -4,7 +4,7 @@ title: Exploring facets
sidebar_label: Facets
---
## Overview
## Overview {#overview}
Facets are one of OpenRefines strongest features - thats where the diamond logo comes from!
@ -15,7 +15,7 @@ Faceted browsing gives you a big-picture look at your data (do they agree or dis
Typically, you create a facet on a particular column. That facet selection appears on the left, in the <span class="tabLabels">Facet/Filter</span> tab, and you can click on a displayed facet to view all the records that match. You can also “exclude” the facet, to view every record that does _not_ match, and you can select more than one facet by clicking “include.”
### An example
### An example {#an-example}
You can learn about facets and filtering with the following example. You can copy the following table and paste it using the <span class="menuItems">Clipboard</span> method of starting a project if you would like to try it yourself.
@ -62,7 +62,7 @@ When you look back at the text facet display of country names, you should see a
We can combine these facets - say, by narrowing to only the Chinese cities with populations greater than 20 million - simply by clicking in both. You should see 2 matching rows for both these criteria.
### Things to know about facets
### Things to know about facets {#things-to-know-about-facets}
When you have facets applied, you will see “matching rows” in the [project grid header](running#project-grid-header). If you click <span class="menuItems">Export</span> and copy your data out of OpenRefine while facets are active, many of the exporting options will only export the matching rows, not all the rows in your project.
@ -74,7 +74,7 @@ You can modify any facet expression by clicking the “change” button to the r
Facet boxes that appear in the sidebar can be resized and rearranged. You can drag and drop the title bar of each box to reorder them, and drag on the bottom bar of text facet boxes.
## Text facet
## Text facet {#text-facet}
A text facet can be generated on any column with the “text” data type. Select the column dropdown and go to <span class="menuItems">Facet</span><span class="menuItems">Text facet</span>. The created facet will be sorted alphabetically, and can be sorted by count.
@ -88,7 +88,7 @@ The choices and counts displayed in each facet can be copied as tab-separated va
![A column of years faceted as text and numbers, and with the count ready to be copied.](/img/yeardata.png)
## Numeric facet
## Numeric facet {#numeric-facet}
![A screenshot of an example numeric facet.](/img/numericfacet.png)
@ -100,7 +100,7 @@ You will be offered the option to include blank, non-numeric, and error values i
You can create a text facet on numeric data, which will treat each entry as a string. This can be useful if you wish, for example, to manually include facets instead of selecting a range, or sort by count, or copy that count.
:::
## Timeline facet
## Timeline facet {#timeline-facet}
![A screenshot of an example timeline facet.](/img/timelinefacet.png)
@ -108,7 +108,7 @@ Much like a numeric facet, a timeline facet will display as a small histogram wi
The facet appears with a count of blank cells and those with errors, which can help you analyze whether your date cells are correctly converted.
## Scatterplot facet
## Scatterplot facet {#scatterplot-facet}
A scatterplot is a visual representation of two related sets of numeric data.
@ -124,7 +124,7 @@ If you have multiple facets applied, plotted points in your scatterplot displays
If you would like to export a scatterplot, OpenRefine will open a new tab with a generated PNG file that you can save.
## Custom text facet
## Custom text facet {#custom-text-facet}
You may want to explore your textual data with modifications that aren't permanent. Creating custom text facets will load your column into memory, transform the data temporarily, and store those transformations inside the facet.
@ -156,7 +156,7 @@ That expression will look for the first letter (the character at index 0) of eac
You can learn more about text-modification functions on the [Expressions page](expressions).
## Custom numeric facet
## Custom numeric facet {#custom-numeric-facet}
You may want to explore your numerical data with modifications that aren't permanent. You can also use custom numeric facets to analyze textual data, such as by getting the length of text strings (with `value.length()`), or by analyzing it as though it were formatted as numbers (with `toNumber(value)`).
@ -188,13 +188,13 @@ mod(value, 7)
You can learn more about numeric-modification functions on the [Expressions page](expressions).
## Customized facets
## Customized facets {#customized-facets}
Customized facets have been added to expand the number of default facets users can apply with a single click. They represent some common and useful functions you shouldnt have to work out using an [expression](expressions).
All facets that display in the <span class="tabLabels">Facet/Filter</span> tab can be edited by clicking on the “change” button to the right of the column title. This brings up the expressions window that will allow you to modify and preview the expression being used.
### Word facet
### Word facet {#word-facet}
A <span class="menuItems">Word facet</span> is a simple version of a text facet: it splits up the content of the cells based on spaces, and outputs each character string as a facet:
@ -206,7 +206,7 @@ This can be useful for exploring the language used in a corpus, looking for comm
Word facet is case-sensitive and only splits by spaces, not by line breaks or other natural divisions.
### Duplicates facet
### Duplicates facet {#duplicates-facet}
A <span class="menuItems">Duplicates facet</span> will return only rows that have non-unique values in the column youve selected. It will create a facet of “true” and “false” values - true being cells that are not unique, and “false” being cells that are. The actual expression being used is
@ -220,7 +220,7 @@ Duplicates facets are case-sensitive and you may wish to filter out things like
facetCount(trim(toLowercase(value)), 'trim(toLowercase(value))', 'cityLabel') > 1
```
### Numeric log facet
### Numeric log facet {#numeric-log-facet}
Logarithmic scales reduce wide-ranging quantities to more compact and manageable ranges. A log transformation can be used to make highly skewed distributions less skewed. If your numerical data is unevenly distributed (say, lots of values in one range, and then a long tail extending off into different magnitudes), a <span class="menuItems">Numeric log facet</span> can represent that range better than a simple numeric facet. It will break these values down into more navigable segments than the buckets of a numeric facet. This facet can make patterns in your data more visible. OpenRefine uses a base-10 log, the “common logarithm.”
@ -248,7 +248,7 @@ Most values will be clustered in the 0-100 range, but 35,000 is many magnitudes
A 1-bounded numeric log facet can be used if you'd like to exclude all the values below 1 (including zero and negative numbers).
### Text-length facet
### Text-length facet {#text-length-facet}
The <span class="menuItems">Text-length facet</span> returns a numerical value for each cell and plots it on a numeric facet chart. The expression used is
@ -261,7 +261,7 @@ This can be useful to, for example, look for values that did not successfully sp
You can also employ a <span class="menuItems">Log of text-length facet</span> that allows you to navigate more easily a wide range of string lengths. This can be useful in the case of web-scraping, where lots of textual data is loaded into single cells and needs to be parsed out.
### Unicode character-code facet
### Unicode character-code facet {#unicode-character-code-facet}
![A screenshot of the Unicode facet.](/img/unicodefacet.png)
@ -269,7 +269,7 @@ The Unicode facet identifies and returns [Unicode decimal values](https://en.wik
This facet creates a numerical chart, which offers you the ability to narrow down to a range of numbers. For example, lowercase characters are numbers 97-122, uppercase characters are numbers 65-90, and numerical digits are numbers 48-57.
### Facet by error
### Facet by error {#facet-by-error}
An error is a data type created by OpenRefine in the process of transforming data. For example, say you had converted a column to the number data type. If one cell had text characters in it, OpenRefine could either output the original text string unchanged or output an error. If you allow errors to be created, you can facet by them later to search for them and fix them.
@ -277,7 +277,7 @@ An error is a data type created by OpenRefine in the process of transforming dat
To store errors in cells, ensure that you have <span class="fieldLabels">store error</span> selected for the “On error” option in the expressions window.
### Facet by null, empty, or blank
### Facet by null, empty, or blank {#facet-by-null-empty-or-blank}
Any column can be faceted for [null and/or empty cells](#cell-data-types). These can help you find cells where you want to manually enter content.
@ -285,7 +285,7 @@ Any column can be faceted for [null and/or empty cells](#cell-data-types). These
An empty cell is a cell that is set to contain a string, but doesnt have any characters in it (a zero-length string). This can be left over from an operation that removed characters, or from manually editing a cell and deleting its contents.
### Facet by star or flag
### Facet by star or flag {#facet-by-star-or-flag}
Stars and flags offer you the opportunity to mark specific rows for yourself for later focus. Stars and flags persist through closing and opening your project, and thus can provide a different function than using a permalink to persist your facets. Stars and flags can be used in any way you want, although they are designed to help you flag errors and star rows of particular importance.
@ -304,7 +304,7 @@ You may wish to create a custom subset of your data through a series of separate
You can also create a text facet on any column with the expression `row.starred` or `row.flagged`.
## Text filter
## Text filter {#text-filter}
Filters allow you to narrow down your data based on whether a given column includes a text string.

View File

@ -4,7 +4,7 @@ title: General Refine Expression Language
sidebar_label: General Refine Expression Language
---
## Basics
## Basics {#basics}
GREL (General Refine Expression Language) is designed to resemble Javascript. Formulas use variables and depend on data types to do things like string manipulation or mathematical calculations:
@ -22,7 +22,7 @@ Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two
See the [GREL functions page for a thorough reference](grelfunctions) on each function and its inputs and outputs. Read on below for more about the general nature of GREL expressions.
## Syntax
## Syntax {#syntax}
In GREL, functions can use either of these two forms:
* functionName(arg0, arg1, ...)
@ -56,13 +56,13 @@ Any function that outputs an array can use square brackets to select only one pa
For example, [partition()](grelfunctions#partitions-s-or-p-fragment-b-omitfragment-optional) would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array).
## Controls
## Controls {#controls}
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how.
Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror().
#### if(e, eTrue, eFalse)
#### if(e, eTrue, eFalse) {#ife-etrue-efalse}
Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value.
@ -83,7 +83,7 @@ Nested if (switch case) example:
null)))
#### with(e1, variable v, e2)
#### with(e1, variable v, e2) {#withe1-variable-v-e2}
Evaluates expression e1 and binds its value to variable v. Then evaluates expression e2 and returns that result.
@ -93,7 +93,7 @@ Evaluates expression e1 and binds its value to variable v. Then evaluates expres
| `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] |
| `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 |
#### filter(e1, v, e test)
#### filter(e1, v, e test) {#filtere1-v-e-test}
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array.
@ -101,7 +101,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ---------------------------------------------- | ------------- |
| `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] |
#### forEach(e1, v, e2)
#### forEach(e1, v, e2) {#foreache1-v-e2}
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array. When e1 is a JSON object, `forEach` iterates over its keys.
@ -109,7 +109,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ------------------------------------------ | ------------------- |
| `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] |
#### forEachIndex(e1, i, v, e2)
#### forEachIndex(e1, i, v, e2) {#foreachindexe1-i-v-e2}
Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array.
@ -117,17 +117,17 @@ Evaluates expression e1 to an array. Then for each array element, binds its inde
| ------------------------------------------------------------------------------- | --------------------------- |
| `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy |
#### forRange(n from, n to, n step, v, e)
#### forRange(n from, n to, n step, v, e) {#forrangen-from-n-to-n-step-v-e}
Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array.
#### forNonBlank(e, v, eNonBlank, eBlank)
#### forNonBlank(e, v, eNonBlank, eBlank) {#fornonblanke-v-enonblank-eblank}
Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead.
Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement.
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e)
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e) {#isblanke-isnonblanke-isnulle-isnotnulle-isnumerice-iserrore}
Evaluates the expression e, and returns a boolean based on the named evaluation.
@ -146,7 +146,7 @@ Examples:
Remember that these are controls and not functions: you cant use dot notation (for example, the format `e.isX()` will not work).
## Constants
## Constants {#constants}
|Name |Meaning |
|-|-|
| true | The boolean constant true |

View File

@ -4,7 +4,7 @@ title: GREL functions
sidebar_label: GREL functions
---
## Reading this reference
## Reading this reference {#reading-this-reference}
For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of [data type](exploring#data-types) used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, and o for object (meaning any data type), as well as “null” and “error” data types.
@ -15,31 +15,31 @@ Optional arguments will say “(optional)”.
In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any [regex](expressions#regular-expressions) notation, wrap the pattern in forward slashes.
## Boolean functions
## Boolean functions {#boolean-functions}
###### and(b1, b2, ...)
###### and(b1, b2, ...) {#andb1-b2-}
Uses the logical operator AND on two or more booleans to output a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, `(1 < 3).and(1 < 0)` returns false because one condition is true and one is false.
###### or(b1, b2, ...)
###### or(b1, b2, ...) {#orb1-b2-}
Uses the logical operator OR on two or more booleans to output a boolean. For example, `(1 < 3).or(1 > 7)` returns true because at least one of the conditions (the first one) is true.
###### not(b)
###### not(b) {#notb}
Uses the logical operator NOT on a boolean to output a boolean. For example, `not(1 > 7)` returns true because 1 > 7 itself is false.
###### xor(b1, b2, ...)
###### xor(b1, b2, ...) {#xorb1-b2-}
Uses the logical operator XOR (exclusive-or) on two or more booleans to output a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, `(1 < 3).xor(1 < 7)` returns false because more than one of the conditions is true.
## String functions
## String functions {#string-functions}
###### length(s)
###### length(s) {#lengths}
Returns the length of string s as a number.
###### toString(o, string format (optional))
###### toString(o, string format (optional)) {#tostringo-string-format-optional}
Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value.
@ -56,79 +56,79 @@ You can also convert dates to strings, using date parsing syntax built in to Ope
Note: In OpenRefine, using toString() on a null cell outputs the string “null”.
### Testing string characteristics
### Testing string characteristics {#testing-string-characteristics}
###### startsWith(s, sub)
###### startsWith(s, sub) {#startswiths-sub}
Returns a boolean indicating whether s starts with sub. For example, `"food".startsWith("foo")` returns true, whereas `"food".startsWith("bar")` returns false.
###### endsWith(s, sub)
###### endsWith(s, sub) {#endswiths-sub}
Returns a boolean indicating whether s ends with sub. For example, `"food".endsWith("ood")` returns true, whereas `"food".endsWith("odd")` returns false.
###### contains(s, sub or p)
###### contains(s, sub or p) {#containss-sub-or-p}
Returns a boolean indicating whether s contains sub, which is either a substring or a regex pattern. For example, `"food".contains("oo")` returns true whereas `"food".contains("ee")` returns false.
You can search for a regular expression by wrapping it in forward slashes rather than quotes: `"rose is a rose".contains(/\s+/)` returns true. startsWith() and endsWith() can only take strings, while contains() can take a regex pattern, so you can use contains() to look for beginning and ending string patterns.
### Basic string modification
### Basic string modification {#basic-string-modification}
#### Case conversion
#### Case conversion {#case-conversion}
###### toLowercase(s)
###### toLowercase(s) {#tolowercases}
Returns string s converted to all lowercase characters.
###### toUppercase(s)
###### toUppercase(s) {#touppercases}
Returns string s converted to all uppercase characters.
###### toTitlecase(s)
###### toTitlecase(s) {#totitlecases}
Returns string s converted into titlecase: a capital letter starting each word, and the rest of the letters lowercase. For example, `"Once upon a midnight DREARY".toTitlecase()` returns the string “Once Upon A Midnight Dreary”.
#### Trimming
#### Trimming {#trimming}
###### trim(s)
###### trim(s) {#trims}
Returns a copy of the string s with leading and trailing whitespace removed. For example, `" island ".trim()` returns the string “island”. Identical to strip().
###### strip(s)
###### strip(s) {#strips}
Returns a copy of the string s with leading and trailing whitespace removed. For example, `" island ".strip()` returns the string “island”. Identical to trim().
###### chomp(s, sep)
###### chomp(s, sep) {#chomps-sep}
Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, `"barely".chomp("ly")` and `"bare".chomp("ly")` both return the string “bare”.
#### Substring
#### Substring {#substring}
###### substring(s, n from, n to (optional))
###### substring(s, n from, n to (optional)) {#substrings-n-from-n-to-optional}
Returns the substring of s starting from character index from, and up to (excluding) character index to. If the to argument is omitted, substring will output to the end of s. For example, `"profound".substring(3)` returns the string “found”, and `"profound".substring(2, 4)` returns the string “of”.
Remember that character indices start from zero. A negative character index counts from the end of the string. For example, `"profound".substring(0, -1)` returns the string “profoun”.
###### slice(s, n from, n to (optional))
###### slice(s, n from, n to (optional)) {#slices-n-from-n-to-optional}
Identical to substring() in relation to strings. Also works with arrays; see [Array functions section](#slicea-n-from-n-to-optional).
###### get(s, n from, n to (optional))
###### get(s, n from, n to (optional)) {#gets-n-from-n-to-optional}
Identical to substring() in relation to strings. Also works with named fields. Also works with arrays; see [Array functions section](#geta-n-from-n-to-optional).
#### Find and replace
#### Find and replace {#find-and-replace}
###### indexOf(s, sub)
###### indexOf(s, sub) {#indexofs-sub}
Returns the first character index of sub as it first occurs in s; or, returns -1 if s does not contain sub. For example, `"internationalization".indexOf("nation")` returns 5, whereas `"internationalization".indexOf("world")` returns -1.
###### lastIndexOf(s, sub)
###### lastIndexOf(s, sub) {#lastindexofs-sub}
Returns the first character index of sub as it last occurs in s; or, returns -1 if s does not contain sub. For example, `"parallel".lastIndexOf("a")` returns 3 (pointing at the second “a”).
###### replace(s, s or p find, s replace)
###### replace(s, s or p find, s replace) {#replaces-s-or-p-find-s-replace}
Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, `"The cow jumps over the moon and moos".replace("oo", "ee")` returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern. For example, `"The cow jumps over the moon and moos".replace(/\s+/, "_")` will return “The_cow_jumps_over_the_moon_and_moos”.
@ -137,17 +137,17 @@ You cannot find or replace nulls with this, as null is not a string. You can ins
1. Facet by null and then bulk-edit them to a string, or
2. Transform the column with an expression such as `if(value==null,"new",value)`.
###### replaceChars(s, s find, s replace)
###### replaceChars(s, s find, s replace) {#replacecharss-s-find-s-replace}
Returns the string obtained by replacing a character in s, identified by find, with the corresponding character identified in replace. For example, `"Téxt thát was optícálly recógnízéd".replaceChars("áéíóú", "aeiou")` returns the string “Text that was optically recognized”. You cannot use this to replace a single character with more than one character.
###### find(s, sub or p)
###### find(s, sub or p) {#finds-sub-or-p}
Outputs an array of all consecutive substrings inside string s that match the substring or [regex](expressions#grel-supported-regex) pattern p. For example, `"abeadsabmoloei".find(/[aeio]+/)` would result in the array [ "a", "ea", "a", "o", "oei" ].
You can supply a substring instead of p, by putting it in quotes, and OpenRefine will compile it into a regex pattern. Anytime you supply quotes, OpenRefine interprets the contents as a string, not regex. If you wish to use any regex notation, wrap the pattern in forward slashes.
###### match(s, p)
###### match(s, p) {#matchs-p}
Attempts to match the string s in its entirety against the [regex](expressions#grel-supported-regex) pattern p and, if the pattern is found, outputs an array of all [capturing groups](https://www.regular-expressions.info/brackets.html) (found in order). For example, `"230.22398, 12.3480".match(/.*(\d\d\d\d)/)` returns an array of 1 substring: [ "3480" ]. It does not find 2239 as the first sequence with four digits, because the regex indicates the four digits must come at the end of the string.
@ -164,31 +164,31 @@ For example, if `value` is “hello 123456 goodbye”, the following would occur
|`value.match(/.*(\d{6}).*/)` |[ "123456" ] (array with one value)|
|`value.match(/(.*)(\d{6})(.*)/)` |[ "hello ", "123456", " goodbye" ] (array with three values)|
### String parsing and splitting
### String parsing and splitting {#string-parsing-and-splitting}
###### toNumber(s)
###### toNumber(s) {#tonumbers}
Returns a string converted to a number. Will attempt to convert other formats into a string, then into a number. If the value is already a number, it will return the number.
###### split(s, s or p sep, b preserveTokens (optional))
###### split(s, s or p sep, b preserveTokens (optional)) {#splits-s-or-p-sep-b-preservetokens-optional}
Returns the array of strings obtained by splitting s by sep. The separator can be either a string or a regex pattern. For example, `"fire, water, earth, air".split(",")` returns an array of 4 strings: [ "fire", " water", " earth", " air" ]. Note that the space characters are retained but the separator is removed. If you include “true” for the preserveTokens boolean, empty segments are preserved.
###### splitByLengths(s, n1, n2, ...)
###### splitByLengths(s, n1, n2, ...) {#splitbylengthss-n1-n2-}
Returns the array of strings obtained by splitting s into substrings with the given lengths. For example, `"internationalization".splitByLengths(5, 6, 3)` returns an array of 3 strings: [ "inter", "nation", "ali" ]. Excess characters are discarded.
###### smartSplit(s, s or p sep (optional))
###### smartSplit(s, s or p sep (optional)) {#smartsplits-s-or-p-sep-optional}
Returns the array of strings obtained by splitting s by sep, or by guessing either tab or comma separation if there is no sep given. Handles quotes properly and understands cancelled characters. The separator can be either a string or a regex pattern. For example, `value.smartSplit("\n")` will split at a carriage return or a new-line character.
Note: [`value.escape('javascript')`](#escapes-s-mode) is useful for previewing unprintable characters prior to using smartSplit().
###### splitByCharType(s)
###### splitByCharType(s) {#splitbychartypes}
Returns an array of strings obtained by splitting s into groups of consecutive characters each time the characters change [Unicode categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category). For example, `"HenryCTaylor".splitByCharType()` will result in an array of [ "H", "enry", "CT", "aylor" ]. It is useful for separating letters and numbers: `"BE1A3E".splitByCharType()` will result in [ "BE", "1", "A", "3", "E" ].
###### partition(s, s or p fragment, b omitFragment (optional))
###### partition(s, s or p fragment, b omitFragment (optional)) {#partitions-s-or-p-fragment-b-omitfragment-optional}
Returns an array of strings [ a, fragment, z ] where a is the substring within s before the first occurrence of fragment, and z is the substring after fragment. Fragment can be a string or a regex. For example, `"internationalization".partition("nation")` returns 3 strings: [ "inter", "nation", "alization" ]. If s does not contain fragment, it returns an array of [ s, "", "" ] (the original unpartitioned string, and two empty strings).
@ -196,69 +196,69 @@ If the omitFragment boolean is true, for example with `"internationalization".pa
You can use regex for your fragment. The expresion `"abcdefgh".partition(/c.e/)` will output [“abc”, "cde", defgh” ].
###### rpartition(s, s or p fragment, b omitFragment (optional))
###### rpartition(s, s or p fragment, b omitFragment (optional)) {#rpartitions-s-or-p-fragment-b-omitfragment-optional}
Returns an array of strings [ a, fragment, z ] where a is the substring within s before the last occurrence of fragment, and z is the substring after the last instance of fragment. (Rpartition means “reverse partition.”) For example, `"parallel".rpartition("a")` returns 3 strings: [ "par", "a", "llel" ]. Otherwise works identically to partition() above.
### Encoding and hashing
### Encoding and hashing {#encoding-and-hashing}
###### diff(s1, s2, s timeUnit (optional))
###### diff(s1, s2, s timeUnit (optional)) {#diffs1-s2-s-timeunit-optional}
Takes two strings and compares them, returning a string. Returns the remainder of s2 starting with the first character where they differ. For example, `"cacti".diff("cactus")` returns "us". Also works with dates; see [Date functions](#diffd1-d2-s-timeunit).
###### escape(s, s mode)
###### escape(s, s mode) {#escapes-s-mode}
Escapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the [recipes](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#question-marks--showing-in-your-data) for examples of escaping and unescaping.
###### unescape(s, s mode)
###### unescape(s, s mode) {#unescapes-s-mode}
Unescapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the [recipes](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#atampampt----att) for examples of escaping and unescaping.
###### md5(o)
###### md5(o) {#md5o}
Returns the [MD5 hash](https://en.wikipedia.org/wiki/MD5) of an object. If fed something other than a string (array, number, date, etc.), md5() will convert it to a string and deliver the hash of the string. For example, `"internationalization".md5()` will return 2c55a1626e31b4e373ceedaa9adc12a3.
###### sha1(o)
###### sha1(o) {#sha1o}
Returns the [SHA-1 hash](https://en.wikipedia.org/wiki/SHA-1) of an object. If fed something other than a string (array, number, date, etc.), sha1() will convert it to a string and deliver the hash of the string. For example, `"internationalization".sha1()` will return cd05286ee0ff8a830dbdc0c24f1cb68b83b0ef36.
###### phonetic(s, s encoding)
###### phonetic(s, s encoding) {#phonetics-s-encoding}
Returns a phonetic encoding of a string, based on an available phonetic algorithm. See the [section on phonetic clustering](cellediting#clustering-methods) for more information. Can be one of the following supported phonetic methods: [metaphone, doublemetaphone, metaphone3](https://www.wikipedia.org/wiki/Metaphone), [soundex](https://en.wikipedia.org/wiki/Soundex), [cologne](https://en.wikipedia.org/wiki/Cologne_phonetics). Quotes are required around your encoding method. For example, `"Ruth Prawer Jhabvala".phonetic("metaphone")` outputs the string “R0PRWRJHBFL”.
###### reinterpret(s, s encoderTarget, s encoderSource)
###### reinterpret(s, s encoderTarget, s encoderSource) {#reinterprets-s-encodertarget-s-encodersource}
Returns s reinterpreted through the given character encoders. You must supply one of the [supported encodings](http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html) for each of the original source and the target output. Note that quotes are required around your character encoder.
When an OpenRefine project is started, data is imported and interpreted. A specific character encoding is identified or manually selected at that time (such as UTF-8). You can reinterpret a column into another specificed encoding using this function. This function may not fix your data; it may be better to use this in conjunction with new projects to test the interpretation, and pre-format your data as needed.
###### fingerprint(s)
###### fingerprint(s) {#fingerprints}
Returns the fingerprint of s, a string that is the first step in [fingerprint clustering methods](cellediting#clustering-methods): it will trim whitespaces, convert all characters to lowercase, remove punctuation, sort words alphabetically, etc. For example, `"Ruth Prawer Jhabvala".fingerprint()` outputs the string “jhabvala prawer ruth”.
###### ngram(s, n)
###### ngram(s, n) {#ngrams-n}
Returns an array of the word n-grams of s. That is, it lists all the possible consecutive combinations of n words in the string. For example, `"Ruth Prawer Jhabvala".ngram(2)` would output the array [ "Ruth Prawer", "Prawer Jhabvala" ]. A word n-gram of 1 simply lists all the words in original order; an n-gram larger than the number of words in the string will only return the original string inside an array (e.g. `"Ruth Prawer Jhabvala".ngram(4)` would simply return ["Ruth Prawer Jhabvala"]).
###### ngramFingerprint(s, n)
###### ngramFingerprint(s, n) {#ngramfingerprints-n}
Returns the [n-gram fingerprint](cellediting#clustering-methods) of s. For example, `"banana".ngram(2)` would output “anbana”, after first generating the 2-grams “ba an na an na”, removing duplicates, and sorting them alphabetically.
###### unicode(s)
###### unicode(s) {#unicodes}
Returns an array of strings describing each character of s in their full unicode notation. For example, `"Bernice Rubens".unicode()` outputs [ 66, 101, 114, 110, 105, 99, 101, 32, 82, 117, 98, 101, 110, 115 ].
###### unicodeType(s)
###### unicodeType(s) {#unicodetypes}
Returns an array of strings describing each character of s by their unicode type. For example, `"Bernice Rubens".unicodeType()` outputs [ "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "space separator", "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter" ].
## Format-based functions (JSON, HTML, XML)
## Format-based functions (JSON, HTML, XML) {#format-based-functions-json-html-xml}
###### jsonize(o)
###### jsonize(o) {#jsonizeo}
Quotes a value as a JSON literal value.
###### parseJson(s)
###### parseJson(s) {#parsejsons}
Parses a string as JSON. get() can then be used with parseJson(): for example, `parseJson(" { 'a' : 1 } ").get("a")` returns 1.
@ -286,9 +286,9 @@ For example, from the following JSON array in `value`, we want to get all instan
The GREL expression `forEach(value.parseJson().keywords,v,v.text).join(":::")` will output “York en route:::Anthony Eden:::President Eisenhower”.
### Jsoup XML and HTML parsing
### Jsoup XML and HTML parsing {#jsoup-xml-and-html-parsing}
###### parseHtml(s)
###### parseHtml(s) {#parsehtmls}
Given a cell full of HTML-formatted text, parseHtml() simplifies HTML tags (such as by removing “ /” at the end of self-closing tags), closes any unclosed tags, and inserts linebreaks and indents for cleaner code. You cannot pass parseHtml() a URL, but you can pre-fetch HTML with the <span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> menu option.
A cell cannot store the output of parseHtml() unless you convert it with toString(): for example, `value.parseHtml().toString()`.
@ -297,10 +297,10 @@ When parseHtml() simplifies HTML, it can sometimes introduce errors. When closin
You can then extract or [select()](#selects-element) which portions of the HTML document you need for further splitting, partitioning, etc. An example of extracting all table rows from a div using parseHtml().select() together is described more in depth at [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML).
###### parseXml(s)
###### parseXml(s) {#parsexmls}
Given a cell full of XML-formatted text, parseXml() returns a full XML document and adds any missing closing tags. You can then extract or [select()](#selects-element) which portions of the XML document you need for further splitting, partitioning, etc. Functions the same way as parseHtml() is described above.
###### select(s, element)
###### select(s, element) {#selects-element}
Returns an array of all the desired elements from an HTML or XML document, if the element exists. Elements are identified using the [Jsoup selector syntax](https://jsoup.org/apidocs/org/jsoup/select/Selector.html). For example, `value.parseHtml().select("img.portrait")[0]` would return the entirety of the first “img” tag with the “portrait” class found in the parsed HTML inside `value`. Returns an empty array if no matching element is found. Use with toString() to capture the results in a cell. A tutorial of select() is shown in [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML).
You can use select() more than once:
@ -309,73 +309,73 @@ You can use select() more than once:
value.parseHtml().select("div#content")[0].select("tr").toString()
```
###### htmlAttr(s, element)
###### htmlAttr(s, element) {#htmlattrs-element}
Returns a string from an attribute on an HTML element. Use it in conjunction with parseHtml() as in the following example: `value.parseHtml().select("a.email")[0].htmlAttr("href")` would retrieve the email address attached to a link with the “email” class.
###### xmlAttr(s, element)
###### xmlAttr(s, element) {#xmlattrs-element}
Returns a string from an attribute on an XML element. Functions the same way htmlAttr() is described above. Use it in conjunction with parseXml().
###### htmlText(element)
###### htmlText(element) {#htmltextelement}
Returns a string of the text from within an HTML element (including all child elements), removing HTML tags and line breaks inside the string. Use it in conjunction with parseHtml() and select() to provide an element, as in the following example: `value.parseHtml().select("div.footer")[0].htmlText()`.
###### xmlText(element)
###### xmlText(element) {#xmltextelement}
Returns a string of the text from within an XML element (including all child elements). Functions the same way htmlText() is described above. Use it in conjunction with parseXml() and select() to provide an element.
###### wholeText(element)
###### wholeText(element) {#wholetextelement}
Selects the (unencoded) text of an element and its children, including any new lines and spaces, and returns a string of unencoded, un-normalized text. Use it in conjunction with parseHtml() and select() to provide an element as in the following example: `value.parseHtml().select("div.footer")[0].wholeText()`.
###### innerHtml(element)
###### innerHtml(element) {#innerhtmlelement}
Returns the [inner HTML](https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML) of an HTML element. This will include text and children elements within the element selected. Use it in conjunction with parseHtml() and select() to provide an element.
###### innerXml(element)
###### innerXml(element) {#innerxmlelement}
Returns the inner XML elements of an XML element. Does not return the text directly inside your chosen XML element - only the contents of its children. To select the direct text, use ownText(). To select both, use xmlText(). Use it in conjunction with parseXml() and select() to provide an element.
###### ownText(element)
###### ownText(element) {#owntextelement}
Returns the text directly inside the selected XML or HTML element only, ignoring text inside children elements (for this, use innerXml()). Use it in conjunction with a parser and select() to provide an element.
## Array functions
## Array functions {#array-functions}
###### length(a)
###### length(a) {#lengtha}
Returns the size of an array, meaning the number of objects inside it. Arrays can be empty, in which case length() will return 0.
###### slice(a, n from, n to (optional))
###### slice(a, n from, n to (optional)) {#slicea-n-from-n-to-optional}
Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. If the to value is omitted, it is understood to be the end of the array. For example, `[0, 1, 2, 3, 4].slice(1, 3)` returns [ 1, 2 ], and `[ 0, 1, 2, 3, 4].slice(2)` returns [ 2, 3, 4 ]. Also works with strings; see [String functions](#slices-n-from-n-to-optional).
###### get(a, n from, n to (optional))
###### get(a, n from, n to (optional)) {#geta-n-from-n-to-optional}
Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0.
If the to value is omitted, only one array item is returned, as a string, instead of a sub-array. To return a sub-array from one index to the end, you can set the to argument to a very high number such as `value.get(2,999)` or you can use something like `with(value,a,a.get(1,a.length()))` to count the length of each array.
Also works with strings; see [String functions](#gets-n-from-n-to-optional).
###### inArray(a, s)
###### inArray(a, s) {#inarraya-s}
Returns true if the array contains the desired string, and false otherwise. Will not convert data types; for example, `[ 1, 2, 3, 4 ].inArray("3")` will return false.
###### reverse(a)
###### reverse(a) {#reversea}
Reverses the array. For example, `[ 0, 1, 2, 3].reverse()` returns the array [ 3, 2, 1, 0 ].
###### sort(a)
###### sort(a) {#sorta}
Sorts the array in ascending order. Sorting is case-sensitive, uppercase first and lowercase second. For example, `[ "al", "Joe", "Bob", "jim" ].sort()` returns the array [ "Bob", "Joe", "al", "jim" ].
###### sum(a)
###### sum(a) {#suma}
Return the sum of the numbers in the array. For example, `[ 2, 1, 0, 3 ].sum()` returns 6.
###### join(a, sep)
###### join(a, sep) {#joina-sep}
Joins the items in the array with sep, and returns it all as a string. For example, `[ "and", "or", "not" ].join("/")` returns the string “and/or/not”.
###### uniques(a)
###### uniques(a) {#uniquesa}
Returns the array with duplicates removed. Case-sensitive. For example, `[ "al", "Joe", "Bob", "Joe", "Al", "Bob" ].uniques()` returns the array [ "Joe", "al", "Al", "Bob" ].
As of OpenRefine 3.4.1, uniques() reorders the array items it returns; in 3.4 beta 644 and onwards, it preserves the original order (in this case, [ "al", "Joe", "Bob", "Al" ]).
## Date functions
## Date functions {#date-functions}
###### now()
###### now() {#now}
Returns the current time according to your system clock, in the [ISO 8601 extended format](exploring#data-types) (converted to UTC). For example, 10:53am (and 00 seconds) on November 26th 2020 in EST returns [date 2020-11-26T15:53:00Z].
###### toDate(o, b monthFirst, s format1, s format2, ...)
###### toDate(o, b monthFirst, s format1, s format2, ...) {#todateo-b-monthfirst-s-format1-s-format2-}
Returns the inputted object converted to a date object. Without arguments, it returns the ISO 8601 extended format. With arguments, you can control the output format:
* monthFirst: set false if the date is formatted with the day before the month.
@ -409,17 +409,17 @@ For example, you can parse a column containing dates in different formats, such
| Z | Time zone | RFC 822 time zone | \-0800 |
| X | Time zone | ISO 8601 time zone | \-08; -0800; -08:00 |
###### diff(d1, d2, s timeUnit)
###### diff(d1, d2, s timeUnit) {#diffd1-d2-s-timeunit}
Given two dates, returns a number indicating the difference in a given time unit (see the table below). For example, `diff(("Nov-11".toDate('MMM-yy')), ("Nov-09".toDate('MMM-yy')), "weeks")` will return 104, for 104 weeks, or two years. The later date should go first. If the output is negative, invert d1 and d2.
Also works with strings; see [diff() in string functions](#diffsd1-sd2-s-timeunit-optional).
###### inc(d, n, s timeUnit)
###### inc(d, n, s timeUnit) {#incd-n-s-timeunit}
Returns a date changed by the given amount in the given unit of time (see the table below). The default unit is “hour”. A positive value increases the date, and a negative value moves it back in time. For example, if you want to move a date backwards by two months, use `value.inc(-2,"month")`.
###### datePart(d, s timeUnit)
###### datePart(d, s timeUnit) {#datepartd-s-timeunit}
Returns part of a date. The data type returned depends on the unit (see the table below).
@ -452,7 +452,7 @@ OpenRefine supports the following values for timeUnit:
| nanos | Nanoseconds | Number | value.datePart("n") → 789000 |
| time | Milliseconds between input and the [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time) | Number | value.datePart("time") → 1394775004000 |
## Math functions
## Math functions {#math-functions}
For integer division and precision, you can use simple evaluations such as `1 / 2`, which is equivalent to `floor(1/2)` - that is, it returns only whole number results. If either operand is a floating point number, they both get promoted to floating point and a floating point result is returned. You can use `1 / 2.0` or `1.0 / 2` or `1.0 * x / y` (if you're working with variables of unknown contents).
@ -497,12 +497,12 @@ Some of these math functions don't recognize integers when supplied as the first
|`tan(n)`|Returns the trigonometric tangent of an angle.|`tan(10)` returns 0.6483608274590866.|
|`tanh(n)`|Returns the hyperbolic tangent of a value.|`tanh(10)` returns 0.9999999958776927.|
## Other functions
## Other functions {#other-functions}
###### type(o)
###### type(o) {#typeo}
Returns a string with the data type of o, such as undefined, string, number, boolean, etc. For example, a [Transform](cellediting#transform) operation using `value.type()` will convert all cells in a column to strings of their data types.
###### facetCount(choiceValue, s facetExpression, s columnName)
###### facetCount(choiceValue, s facetExpression, s columnName) {#facetcountchoicevalue-s-facetexpression-s-columnname}
Returns the facet count corresponding to the given choice value, by looking for the facetExpression in the choiceValue in columnName. For example, to create facet counts for the following table, we could generate a new column based on “Gift” and enter in `value.facetCount("value", "Gift")`. This would add the column we've named “Count”:
| Gift | Recipient | Price | Count |
@ -514,13 +514,13 @@ Returns the facet count corresponding to the given choice value, by looking for
The facet expression, wrapped in quotes, can be useful to manipulate the inputted values before counting. For example, you could do a textual cleanup using fingerprint(): `(value.fingerprint()).facetCount(value.fingerprint(),"Gift")`.
###### hasField(o, s name)
###### hasField(o, s name) {#hasfieldo-s-name}
Returns a boolean indicating whether o has a member field called [name](expressions#variables). For example, `cell.recon.hasField("match")` will return false if a reconciliation match hasnt been selected yet, or true if it has. You cannot chain your desired fields: for example, `cell.hasField("recon.match")` will return false even if the above expression returns true).
###### coalesce(o1, o2, o3, ...)
###### coalesce(o1, o2, o3, ...) {#coalesceo1-o2-o3-}
Returns the first non-null from a series of objects. For example, `coalesce(value, "")` would return an empty string “” if `value` was null, but otherwise return `value`.
###### cross(cell, s projectName (optional), s columnName (optional))
###### cross(cell, s projectName (optional), s columnName (optional)) {#crosscell-s-projectname-optional-s-columnname-optional}
Returns an array of zero or more rows in the project projectName for which the cells in their column columnName have the same content as the cell in your chosen column. For example, if two projects contained matching names, and you wanted to pull addresses for people by their names from a project called “People” you would apply the following expression to your column of names:
```
cell.cross("People","Name").cells["Address"].value[0]

View File

@ -4,17 +4,17 @@ title: Installing OpenRefine
sidebar_label: Installing
---
## System requirements
## System requirements {#system-requirements}
OpenRefine does not require internet access to run its basic functions. Once you download and install it, it runs as a small web server on your own computer, and you access that local web server by using your browser. It only requires an internet connection to import data from the web, reconcile data using a web service, or export data to the web.
OpenRefine requires three things on your computer in order to function:
#### Compatible operating system
#### Compatible operating system {#compatible-operating-system}
OpenRefine is designed to work with **Windows**, **Mac**, and **Linux** operating systems. [Our team releases packages for each](https://openrefine.org/download.html).
#### Java
#### Java {#java}
[Java](https://java.com/en/download/) must be installed and configured on your computer to run OpenRefine. The Mac version of OpenRefine includes Java; new in OpenRefine 3.4, there is also a Windows package with Java included.
@ -22,7 +22,7 @@ If you install and start OpenRefine on a Windows computer without Java, it will
We recommend you [download](https://java.com/en/download/) and install Java before proceeding with the OpenRefine installation.
#### Compatible browser
#### Compatible browser {#compatible-browser}
OpenRefine works best on browsers based on Webkit, such as:
@ -33,13 +33,13 @@ OpenRefine works best on browsers based on Webkit, such as:
We are aware of some minor rendering and performance issues on other browsers such as Firefox. We don't support Internet Explorer. If you are having issues running OpenRefine, see the [section on Running](running.md#troubleshooting).
### Release versions
### Release versions {#release-versions}
OpenRefine always has a [latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest), as well as some more recent developments available in beta, release candidate, or [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). If you are installing for the first time, we recommend [the latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest).
If you wish to use an extension that is only compatible with an earlier version of OpenRefine, and do not require the latest features, you may find that [an older stable version is best for you](https://github.com/OpenRefine/OpenRefine/releases) in our list of releases. Look at later releases to see which security vulnerabilities are being fixed, in order to assess your own risk tolerance for using earlier versions. Look for “final release” versions instead of “beta” or “release candidate” versions.
#### Unstable versions
#### Unstable versions {#unstable-versions}
If you need a recently developed function, and are willing to risk some untested code, you can look at [the most recent items in the list](https://github.com/OpenRefine/OpenRefine/releases) and see what changes appeal to you.
@ -47,7 +47,7 @@ If you need a recently developed function, and are willing to risk some untested
For the absolute latest development updates, see the [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). These are created with every commit.
#### Whats changed
#### Whats changed {#whats-changed}
Our [latest version is OpenRefine 3.4.1](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1), released September 24th 2020. The major changes in this version are listed on the [3.4.1 release page](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1) with the downloadable packages.
@ -57,8 +57,8 @@ You can find information about all OpenRefine versions on the [Releases page on
OpenRefine may also work in other environments, such as [Chromebooks](https://gist.github.com/organisciak/3e12e5138e44a2fed75240f4a4985b4f) where Linux terminals are available. Look at our list of [Other Distributions on the Downloads page](https://openrefine.org/download.html) for other ways of running OpenRefine, and refer to our contributor community to see new environments in development.
:::
## Installing or upgrading
### Back up your data
## Installing or upgrading {#installing-or-upgrading}
### Back up your data {#back-up-your-data}
If you are upgrading from an older version of OpenRefine and have projects already on your computer, you should create backups of those projects before you install a new version.
@ -70,7 +70,7 @@ For extra security you can [export your existing OpenRefine projects](exporting#
Take note of the [extensions](#installing-extensions) you have currently installed. They may not be compatible with the upgraded version of OpenRefine. Installations can be installed in two places, so be sure to check both your workspace directory and the existing installation directory.
:::
### Install or upgrade OpenRefine
### Install or upgrade OpenRefine {#install-or-upgrade-openrefine}
If you are upgrading an existing OpenRefine installation, you can delete the old program files and install the new files into the same space. Do not overwrite the files as some obsolete files may be left over unnecessarily.
@ -118,7 +118,7 @@ The long version:
[Homebrew](http://brew.sh) is a popular command-line package manager for Mac. Installing Homebrew is accomplished by pasting the installation command on the Homebrew website into a Terminal window. Once Homebrew is installed, applications like OpenRefine can be installed via a simple command. You can [install Homebrew from their website](http://brew.sh).
###### Install
###### Install {#install}
Install OpenRefine with this command:
@ -141,7 +141,7 @@ Behind the scenes, this command causes Homebrew to download the OpenRefine insta
If an existing `OpenRefine.app` is found in the Applications folder, Homebrew will not overwrite it, so installing via Homebrew requires either deleting or renaming previously installed copies.
###### Uninstall
###### Uninstall {#uninstall}
To uninstall OpenRefine, paste this command into the Terminal:
@ -155,7 +155,7 @@ You should see output like this:
==> Removing App '/Applications/OpenRefine.app'.
```
###### Update
###### Update {#update}
To update to the latest version of OpenRefine, paste this command into the Terminal:
@ -197,7 +197,7 @@ tar xzf openrefine-linux-3.4.tar.gz
---
### Set where data is stored
### Set where data is stored {#set-where-data-is-stored}
OpenRefine stores data in two places: program files in the program directory, wherever it is youve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking <span class="buttonLabels">Browse workspace directory</span>.
@ -276,7 +276,7 @@ You can change this when you run OpenRefine from the terminal, by pointing to th
---
### Logs
### Logs {#logs}
OpenRefine does not currently output an error log, but because the OpenRefine console window is always open (on Linux and Windows) while OpenRefine runs in your browser, you can copy information from the console if an error occurs.
@ -284,7 +284,7 @@ Using a Mac, you can [run OpenRefine using the terminal](running#starting-and-ex
---
## Increasing memory allocation
## Increasing memory allocation {#increasing-memory-allocation}
OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large datasets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators:
* more than one million total cells
@ -313,7 +313,7 @@ If your project is big enough to need more than the default amount of memory, co
<TabItem value="win">
#### Using openrefine.exe
#### Using openrefine.exe {#using-openrefineexe}
If you run `openrefine.exe`, you will need to edit the `openrefine.l4j.ini` file found in the program directory and edit the line
@ -328,7 +328,7 @@ The line “-Xmx1024M” defines the amount of memory available in megabytes. Ch
Once you increase the memory allocation, you may find that you cannot run `openrefine.exe`. In this case, your computer needs a 64-bit version of [Java](https://www.java.com/en/download/help/index_installing.xml) (this is different from [Java JDK](#install-or-upgrade-java). Look for the “Windows Offline (64-bit)” download on the Downloads page and install that. Your system must also be set to use the 64-bit version of Java by [changing the Java configuration](https://www.java.com/en/download/help/update_runtime_settings.xml).
:::
#### Using refine.bat
#### Using refine.bat {#using-refinebat}
On Windows, OpenRefine can also be run by using the file `refine.bat` in the program directory. If you start OpenRefine using `refine.bat`, the memory available to OpenRefine can be specified either through command line options, or through the `refine.ini` file.
@ -364,7 +364,7 @@ If you have downloaded the `.dmg` package and you start OpenRefine by double-cli
If you have downloaded the `.tar.gz` package and you start OpenRefine from the command line, add the “-m xxxxM” parameter like this:
`./refine -m 2048m`
#### Setting a default
#### Setting a default {#setting-a-default}
If you don't want to set this option on the command line each time, you can also set it in the `refine.ini` file. Edit the line
@ -381,7 +381,7 @@ Make sure it is not commented out (that is, that the line doesn't start with a
---
## Installing extensions
## Installing extensions {#installing-extensions}
Extensions have been created by our contributor community to add functionality or provide convenient shortcuts for common uses of OpenRefine. [We list extensions we know about on our downloads page](https://openrefine.org/download.html).
@ -389,7 +389,7 @@ Extensions have been created by our contributor community to add functionality o
If youd like to create or modify an extension, [see our developer documentation here](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Developers). If youre having a problem, [use our downloads page](https://openrefine.org/download.html) to go to the extensions page and report the issue there.
:::
### Two ways to install extensions
### Two ways to install extensions {#two-ways-to-install-extensions}
You can [install extensions in one of two places](#set-where-data-is-stored):
@ -398,7 +398,7 @@ You can [install extensions in one of two places](#set-where-data-is-stored):
We provide these options because you may wish to reinstall a given extension manually each time you upgrade OpenRefine, in order to be sure it works properly.
### Find the right place to install
### Find the right place to install {#find-the-right-place-to-install}
If you want to install the extension into the program folder, go to your program directory and then go to `webapp\extensions` (or create it if not does not exist).
@ -410,7 +410,7 @@ If you want to install the extension into your workspace, you can:
You can also [find your workspace on each operating system using these instructions](#set-where-data-is-stored).
### Install the extension
### Install the extension {#install-the-extension}
Some extensions have their own instructions: make sure you read the documentation before you begin installing.

View File

@ -4,7 +4,7 @@ title: Jython & Clojure
sidebar_label: Jython & Clojure
---
## Jython
## Jython {#jython}
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible.
@ -14,7 +14,7 @@ You will need to restart OpenRefine, so that new Jython or Python libraries are
OpenRefine now has [most of the Jsoup.org library built into GREL functions](grelfunctions#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements.
### Syntax
### Syntax {#syntax}
Expressions in Jython must have a `return` statement:
@ -47,13 +47,13 @@ To return the lower case of `value` (if the value is not null):
return None
```
### Tutorials
### Tutorials {#tutorials}
- [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules)
- [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python)
Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org).
## Clojure
## Clojure {#clojure}
Clojure 1.10.1 comes bundled with the default installation of OpenRefine 3.4.1. At this time, not all [variables](expressions#variables) can be used with Clojure expressions: only `value`, `row`, `rowIndex`, `cell`, and `cells` are available.

View File

@ -4,7 +4,7 @@ title: Reconciling
sidebar_label: Reconciling
---
## Overview
## Overview {#overview}
Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison might be produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, or interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services).
@ -23,7 +23,7 @@ Reconciliation is semi-automated: OpenRefine matches your cell values to the rec
We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data.
:::
## Sources
## Sources {#sources}
Start with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add.
@ -41,7 +41,7 @@ Of particular note is [reconcile-csv](http://okfnlabs.org/reconcile-csv/) which
Similiarly, you may choose to export some SPARQL output to a TSV to limit the scope of values you're reconciling against and/or for better peformance.
## Getting started
## Getting started {#getting-started}
Choose a column to reconcile and use its dropdown menu to select <span class="menuItems">Reconcile</span><span class="menuItems">Start reconciling</span>. If you want to reconcile only some cells in that column, first use filters and facets to isolate them.
@ -77,7 +77,7 @@ For matched values (those appearing as dark blue links), the underlying cell val
For each cell, you can manually “Create new item,” which will take the cells original value and apply it, as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature.
### Reconciliation facets
### Reconciliation facets {#reconciliation-facets}
Under <span class="menuItems">Reconcile</span><span class="menuItems">Facets</span> there are a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets when you reconcile some cells.
@ -105,7 +105,7 @@ You can also look at each best candidates:
These facets are useful for doing successive reconciliation attempts, against different types, and with different supplementary information. The information represented by these facets are held in the cells themselves and can be called using the [reconciliation variables](expressions#reconciliation) available in expressions.
### Reconciliation actions
### Reconciliation actions {#reconciliation-actions}
You can use the <span class="menuItems">Reconcile</span><span class="menuItems">Actions</span> menu options to perform bulk changes (which will apply only to your currently viewed set of rows or records):
* <span class="menuItems">Match each cell to its best candidate</span> (by highest score)
@ -120,7 +120,7 @@ The other options available under <span class="menuItems">Reconcile</span> are:
* [<span class="menuItems">Use values as identifiers</span>](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches)
* [<span class="menuItems">Add entity identifiers column</span>](#add-entity-identifiers-column).
## Reconciling with unique identifiers
## Reconciling with unique identifiers {#reconciling-with-unique-identifiers}
Reconciliation services use unique identifiers for their entities. For example, the 14th Dalai Lama has the VIAF ID [38242123](https://viaf.org/viaf/38242123/) and the Wikidata ID [Q17293](https://www.wikidata.org/wiki/Q37349). You can supply these identifiers directly to your chosen reconciliation service in order to pull more data, but these strings will not be “reconciled” against the external dataset.
@ -132,7 +132,7 @@ You may get false positives, which you will need to hover over or click on to id
![Hovering over an error.](/img/reconcileIDerror.png)
## Reconciling by type
## Reconciling by type {#reconciling-by-type}
Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc.
@ -150,7 +150,7 @@ If your column doesnt fit one specific type offered, you can <span class="fie
We recommend working in batches and reconciling against different types, moving from specific to broad. You can create a facet for <span class="menuItems">Best candidates types</span> facet to see which types are being represented. Some candidates may return more than one type, depending on the service. Types may appear in facets by their unique IDs, rather than by their semantic labels (for example, Q5 for “human” in Wikidata).
## Reconciling with additional columns
## Reconciling with additional columns {#reconciling-with-additional-columns}
Some of your cells may be ambiguous, in the sense that a string can point to more than one entity: there are dozens of places called “Paris” and many characters, people, and pieces of culture, too. Selecting non-geographic or more localized types can help narrow that down, but if your chosen service doesn't provide a useful type, you can include more properties that make it clear whether you're looking for Paris, France.
@ -164,7 +164,7 @@ Some services will not be able to search for the exact name of your desired <spa
![Including a birth-date type.](/img/reconcile-with-property.png)
## Fetching more data
## Fetching more data {#fetching-more-data}
One reason to reconcile to some external service is that it allows you to pull data from that service into your OpenRefine project. There are three ways to do this:
@ -172,11 +172,11 @@ One reason to reconcile to some external service is that it allows you to pull d
* Add columns from reconciled values
* Add column by fetching URLs.
### Add entity identifiers column
### Add entity identifiers column {#add-entity-identifiers-column}
Once you have selected matches for your cells, you can retrieve the unique identifiers for those cells and create a new column for these, with <span class="menuItems">Reconcile</span><span class="menuItems">Add entity identifiers column</span>. You will be asked to supply a column name. New items and other unmatched cells will generate null values in this column.
### Add columns from reconciled values
### Add columns from reconciled values {#add-columns-from-reconciled-values}
If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using <span class="menuItems">Edit column</span><span class="menuItems">Add columns from reconciled values...</span>.
@ -194,7 +194,7 @@ If you have left any values unreconciled in your column, you will see “&lt;not
This process may pull more than one property per row in your data (such as multiple occupations), so you may need to switch into records mode after you've added columns.
### Add columns by fetching URLs
### Add columns by fetching URLs {#add-columns-by-fetching-urls}
If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as “https&#58;//viaf.org/viaf/000000”). You can use the <span class="menuItems">Edit column</span><span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions).
@ -206,7 +206,7 @@ Alternatively, you can insert the ID directly from the matched column's reconcil
Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about the fetching process.
## Keep all the suggestions made
## Keep all the suggestions made {#keep-all-the-suggestions-made}
To generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column</span>. To create a list of all the possible matches, use something like
@ -222,7 +222,7 @@ forEach(cell.recon.candidates,c,c.id).join(", ")
This information is stored as a string, without any attached reconciliation information.
## Writing reconciliation expressions
## Writing reconciliation expressions {#writing-reconciliation-expressions}
OpenRefine supplies a number of variables related specifically to reconciled values. These can be used in GREL and Jython expressions. For example, some of the reconciliation variables are:
@ -235,7 +235,7 @@ OpenRefine supplies a number of variables related specifically to reconciled val
You can find out more in the [reconciliaton variables](expressions#reconciliaton-variables) section.
## Exporting reconciled data
## Exporting reconciled data {#exporting-reconciled-data}
Once you have data that is reconciled to existing entities online, you may wish to export that data to a user-editable service such as Wikidata. See the section on [uploading your edits to Wikidata](wikidata#upload-edits-to-wikidata) for more information, or the section on [exporting](exporting) to see other formats OpenRefine can produce.

View File

@ -4,7 +4,7 @@ title: Running OpenRefine
sidebar_label: Running
---
## Starting and exiting
## Starting and exiting {#starting-and-exiting}
OpenRefine does not require internet access to run its basic functions. Once you download and install it, it runs as a small web server on your own computer, and you access that local web server by using your browser.
@ -37,18 +37,18 @@ import TabItem from '@theme/TabItem';
<TabItem value="win">
#### With openrefine.exe
#### With openrefine.exe {#with-openrefineexe}
You can run OpenRefine by double-clicking `openrefine.exe` or calling it from the command line.
If you want to [modify the way `openrefine.exe` opens](#starting-with-modifications), you can edit the `openrefine.l4j.ini` file.
#### With refine.bat
#### With refine.bat {#with-refinebat}
On Windows, OpenRefine can also be run by using the file `refine.bat` in the program directory. If you start OpenRefine using `refine.bat`, you can do so by opening the file itself, or by calling it from the command line.
If you call `refine.bat` from the command line, you can [start OpenRefine with modifications](#starting-with-modifications).
If you want to modify the way `refine.bat` opens through double-clicking or using a shortcut, you can edit the `refine.ini` file.
#### Exiting
#### Exiting {#exiting}
To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. This will save any last changes to your projects.
@ -98,11 +98,11 @@ If you see this error, you need to [install and configure a JDK package](install
---
### Troubleshooting
### Troubleshooting {#troubleshooting}
If you are having problems connecting to OpenRefine with your browser, [check our Wiki for information about browser settings and operating-system issues](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#i-am-having-trouble-connecting-to-openrefine-with-my-browser).
### Starting with modifications
### Starting with modifications {#starting-with-modifications}
When you run OpenRefine from a command line, you can change a number of default settings.
@ -168,7 +168,7 @@ To see the full list of command-line options, run `./refine -h`.
---
#### Modifications set within files
#### Modifications set within files {#modifications-set-within-files}
On Windows, you can modify the way `openrefine.exe` runs by editing `openrefine.l4j.ini`; you can modify the way `refine.bat` runs by editing `refine.ini`.
@ -194,7 +194,7 @@ REFINE_MIN_MEMORY=1400M
...
```
##### JVM preferences
##### JVM preferences {#jvm-preferences}
Further modifications can be performed by using JVM preferences. These JVM preferences are different options and have different syntax than the key/value descriptions used on the command line.
@ -293,13 +293,13 @@ JAVA_OPTIONS=-Drefine.data_dir=usr/lib/OpenRefineWorkspace
Refer to the [official Java documentation](https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html) for more preferences that can be set.
## The home screen
## The home screen {#the-home-screen}
When you first launch OpenRefine, you will see a screen with a menu on the left hand side that includes <span class="menuItems">Create Project</span>, <span class="menuItems">Open Project</span>, <span class="menuItems">Import Project</span>, and <span class="menuItems">Language Settings</span>. This is called the “home screen,” where you can manage your projects and general settings.
In the lower left-hand corner of the screen, you'll see <span class="menuItems">Preferences</span>, <span class="menuItems">Help</span>, and <span class="menuItems">About</span>.
### Language settings
### Language settings {#language-settings}
From the home screen, look in the options to the left for <span class="menuItems">Language Settings</span>. You can set your preferred interface language here. This language setting will persist until you change it again in the future. Languages are translated as a community effort; some languages are partially complete and default back to English where unfinished. Currently OpenRefine supports the following languages for 75% or more of the interface:
@ -324,7 +324,7 @@ To leave the Language Settings screen, click on the diamond “OpenRefine” log
We use Weblate to provide translations for the interface. You can check [our profile on Weblate](https://hosted.weblate.org/projects/openrefine/translations/) to see which languages are in the process of being supported. See [our technical reference if you are interested in contributing translation work](https://docs.openrefine.org/technical-reference/translating) to make OpenRefine accessible to people in other languages.
:::
### Preferences
### Preferences {#preferences}
In the bottom left corner of the screen, look for <span class="menuItems">Preferences</span>. At this time you can set preferences using a key/value pair: that is, selecting one of the keys below and setting a value for it.
@ -343,13 +343,13 @@ To leave the Preferences screen, click on the diamond “OpenRefine” logo.
If the preference youre looking for isnt here, look at the options you can set from the [command line or in an `.ini` file](#starting-with-modifications).
## The project screen
## The project screen {#the-project-screen}
The project screen (or work screen) is where you will spend most of your time once you have [begun to work on a project](starting). This is a quick walkthrough of the parts of the interface you should familiarize yourself with.
![A screenshot of the project screen.](/img/projectscreen.png)
### The project bar
### The project bar {#the-project-bar}
The project bar runs across the very top of the project screen. It contains the the OpenRefine logo, the project title, and the project control buttons on the right side.
@ -369,7 +369,7 @@ The <span class="menuItems">Open…</span> button will open up a new browser tab
<span class="menuItems">Help</span> will open up a new browser tab and bring you to this user manual on the web.
### The grid header
### The grid header {#the-grid-header}
The grid header sits below the project bar and above the project grid (where the data of your project is displayed). The grid header will tell you the total number of rows or records in your project, and indicate whether you are in [rows or records mode](exploring#rows-vs-records).
@ -379,11 +379,11 @@ Directly below the row number, you have the ability to switch between [row mode
To the right of the rows/records selection is the array of options for how many rows/records to view on screen at one time. At the far right of the screen you can navigate through your entire dataset one page at a time.
### Extensions
### Extensions {#extensions}
The <span class="menuItems">Extensions</span> dropdown offers you options for extending your data - most commonly by uploading your edited statements to Wikidata, or by importing or exporting schema. You can learn more about these functions on the [Wikibase section](wikibase/overview). Other extensions may also add functions to this dropdown menu.
### The grid
### The grid {#the-grid}
The area of the project screen that displays your dataset is called the “grid” (or the “data grid,” or the “project grid”). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you.
@ -397,7 +397,7 @@ The project grid may display with both vertical and horizontal scrolling, depend
Mousing over individual cells will allow you to [edit cells individually](cellediting#edit-one-cell-at-a-time).
### Facet/Filter
### Facet/Filter {#facetfilter}
The <span class="tabLabels">Facet/Filter</span> tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](facets) and [filters](facets#text-filter) are explained more in [Exploring data](exploring).
@ -413,7 +413,7 @@ Removing your facets will clear out the sidebar entirely. If you have written cu
You can preserve your facets and filters for future use by copying a <span class="menuItems">[Permalink](#the-project-bar)</span>.
### History (Undo/Redo)
### History (Undo/Redo) {#history-undoredo}
In OpenRefine, any activity that changes the data can be undone. Changes are tracked from the very beginning, when a project is first created. The change history of each project is saved with the project's data, so quitting OpenRefine does not erase the steps you've taken. When you restart OpenRefine, you can view and undo changes that you made before you quit OpenRefine. OpenRefine [autosaves](starting#autosaving) your actions every five minutes by default, and when you close OpenRefine properly (using Ctrl + C). You can [change this interval](running#jvm-preferences).
@ -440,7 +440,7 @@ If you have moved back one or more states, and then you perform a new operation
The Undo/Redo tab will indicate which step youre on, and if youre about to risk erasing work - by saying something like “4/5" or “1/7” at the end.
#### Reusing operations
#### Reusing operations {#reusing-operations}
Operations that you perform in OpenRefine can be reused. For example, a formula you wrote inside one project can be copied and applied to another project later.
@ -450,13 +450,13 @@ Move to the second project, go to the <span class="tabLabels">Undo/Redo</span> t
Not all operations can be extracted. Edits to a single cell, for example, cant be replicated.
## Advanced OpenRefine uses
## Advanced OpenRefine uses {#advanced-openrefine-uses}
### Running OpenRefine's Linux version on a Mac
### Running OpenRefine's Linux version on a Mac {#running-openrefines-linux-version-on-a-mac}
You can run OpenRefine from the command line in Mac by using the Linux installation package. We do not promise support for this method. Follow the instructions in the Linux section.
### Running as a server
### Running as a server {#running-as-a-server}
:::caution
Please note that if your machine has an external IP (is exposed to the Internet), you should not do this, or should protect it behind a proxy or firewall, such as nginx. Proceed at your own risk.
@ -491,7 +491,7 @@ On Mac, you can add a specific entry to the `Info.plist` file located within the
OpenRefine has no built-in security or version control for multi-user scenarios. OpenRefine has a single data model that is not shared, so there is a risk of data operations being overwritten by other users. Care must be taken by users.
:::
### Automating OpenRefine
### Automating OpenRefine {#automating-openrefine}
Some users may wish to employ OpenRefine for batch processing as part of a larger automated pipeline. Not all OpenRefine features can work without human supervision and advancement (such as clustering), but many data transformation tasks can be automated.

View File

@ -4,7 +4,7 @@ title: Sort and view
sidebar_label: Sort and view
---
## Sort
## Sort {#sort}
You can temporarily sort your rows by one column. You can sort based on [data type](exploring#data-types):
* text alphabetically or reverse
@ -24,11 +24,11 @@ If you have multiple sorting methods applied, they will work in the order you ap
When the sorting method you've applied is temporary, you will see that the rows retain their original numbering. When you make that sorting method permanent, by selecting <span class="menuItems">Reorder rows permanently</span>, the row numbers will change and the <span class="menuItems">Sort</span> menu in the project grid header will disappear. This will apply all current sorting methods.
## View
## View {#view}
You can control what data you view in the grid. On each column, you will see a <span class="menuItems">View</span> menu option. From there, you can “collapse” (hide) that specific column, all other columns, all columns to the left, and all columns to the right. Using the <span class="menuItems">View</span> option that appears in the <span class="menuItems">All</span> columns dropdown menu, you can collapse all columns, and expand all the columns that you previously collapsed.
### Show/hide “null”
### Show/hide “null” {#showhide-null}
You can find, under <span class="menuItems">All</span><span class="menuItems">View</span>, the option to show and hide [“null” values](exploring#data-types). A small grey “null” will appear in each applicable cell. Remember that a null cell is not the same thing as an empty cell.

View File

@ -4,7 +4,7 @@ title: Starting a project
sidebar_label: Starting a project
---
## Overview
## Overview {#overview}
An OpenRefine project is started by importing in some existing data - OpenRefine doesnt allow you to create a dataset from nothing.
@ -14,7 +14,7 @@ The data and all of your edits are [automatically saved](#autosaving) inside the
You can also receive and open other peoples projects, or send them yours, by [exporting a project archive](exporting#export-a-project) and [importing it](#import-a-project).
## Create a project by importing data
## Create a project by importing data {#create-a-project-by-importing-data}
When you start OpenRefine, youll be taken to the <span class="menuItems">Create Project</span> screen. Youll see on the left side of the screen that your options are to:
@ -53,13 +53,13 @@ You cannot combine two datasets into one project by appending data within rows.
For whichever method you choose to start your project, when you click <span class="menuItems">Next >></span> you will be given a preview and a chance to configure the way OpenRefine interprets the data you input.
### Get data from this computer
### Get data from this computer {#get-data-from-this-computer}
Click on <span class="menuItems">Browse…</span> and select a file (or several) on your hard drive. All files will be shown, not just compatible ones.
If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files. When importing multiple archives you can store the name of the archive each file was extracted from by ticking the `Store archive file` option upon import.
### Web addresses (URLs)
### Web addresses (URLs) {#web-addresses-urls}
Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you.
@ -67,7 +67,7 @@ If you supply two or more file URLs, OpenRefine will identify each one and ask y
Do not use this form to load a Google Sheet by its link; use [the Google Data form instead](#google-data).
### Clipboard
### Clipboard {#clipboard}
You can copy and paste in data from anywhere. OpenRefine will recognize comma-separated, tab-separated, or table-formatted information copied from sources such as word-processing documents, spreadsheets, and tables in PDFs. You can also just paste in a list of items that you want to turn into rows. OpenRefine recognizes each new text line as a row.
@ -75,7 +75,7 @@ This can be useful if you want to pre-select a specific number of rows from your
This can also be useful if you would like to paste in a list of URLs, which you can use later to [fetch more data](columnediting).
### Database (SQL)
### Database (SQL) {#database-sql}
If you are an administrator or have SQL access to a database of information, you may want to pull the latest dataset directly from there. This could include an online catalogue, a content management system, or a digital repository or collection management system. You can also load a database (`.db`) file saved locally. You will need to use an [SQL query](https://www.w3schools.com/sql/) to import your intended data.
@ -91,13 +91,13 @@ You can either connect just once to gather data, or save the connection to use i
If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way.
### Google data
### Google data {#google-data}
You have two ways to load in data from Google Sheets:
* providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and
* selecting a Google Sheet in your Google Drive.
#### Google Sheet by URL
#### Google Sheet by URL {#google-sheet-by-url}
You can import data from any Google Sheet that has link-sharing turned on. Paste in a URL that looks something like
@ -107,7 +107,7 @@ https://docs.google.com/spreadsheets/………/edit?usp=sharing
This will only work with Sheets, not with any other Google Drive file that might have an available link, including `.xls` and other valid files that are hosted in Google Drive. These links will not work when attempting to start a project [by URL](#web-addresses-urls) either, so you need to download those files to your computer.
#### Google Sheet from Drive
#### Google Sheet from Drive {#google-sheet-from-drive}
You can authorize OpenRefine to access your Google Drive data and import data from any Google Sheet it finds there. This will include Sheets that belong to you and Sheets that are shared with you, as well as Sheets that are in your trash.
@ -120,7 +120,7 @@ OpenRefine will generate a list of all Sheets it finds, with the most recently m
When you click <span class="buttonLabels">Preview</span> the Sheet will open in a new browser tab. When you click the Sheet title, OpenRefine will begin to process the data.
## Project preview
## Project preview {#project-preview}
Once OpenRefine is ready to import the data, you will see a screen with <span class="menuItems">Configure Parsing Options</span> at the top. Youll see a preview of the first 100 rows and all identified columns.
@ -141,7 +141,7 @@ Look for character encoding issues at this stage. You may want to manually selec
You should create a project name at this stage. You can also supply tags to keep your projects organized. When youre happy with the preview, click <span class="buttonLabels">Create Project</span>.
## Import a project
## Import a project {#import-a-project}
Because OpenRefine only runs locally on your computer, you cant have a project accessible to more than one person at the same time.
@ -163,17 +163,17 @@ Then, click <span class="buttonLabels">Import Project</span>. Your project shoul
OpenRefine will store the project in its own workspace directory, so you can now delete the original file that was sent to you.
## Project management
## Project management {#project-management}
You can access all of your created projects by clicking on <span class="menuItems">Open Project</span>. Your project list can be organized by modification date, title, row count, and other metadata you can supply (such as subject, descripton, tags, or creator). To edit the fields you see here, click <span class="menuItems">About</span> to the left of each project. There you can edit a number of available fields. You can also see the project ID that corresponds to the name of the folder in your work directory.
### Naming projects
### Naming projects {#naming-projects}
You may have multiple projects from the same dataset, or multiple versions from sharing a project with another person. OpenRefine automatically generates a project name from the imported file, or “clipboard” when you use <span class="menuItems">Clipboard</span> importing. Project names dont have to be unique, and OpenRefine will create many projects with the same name unless you intervene.
You can edit a project's name when you create it or import it, and you can rename a project later by opening it and clicking on the project name at the top of the screen.
### Autosaving
### Autosaving {#autosaving}
OpenRefine [saves all of your actions](running#history-undoredo) (everything you can see in the <span class="tabLabels">Undo/Redo</span> panel). That includes flagging and starring rows.
@ -183,12 +183,12 @@ Autosaving happens by default every five minutes. You can [change this preferenc
You can only save and share facets and filters, not any other type of view. To save current facets and filters, click <span class="menuItems">Permalink</span>. The project will reload with a different URL, which you can then copy and save elsewhere. This permalink will save both the facets and filters youve set, and the settings for each one (such as sorting by count rather than by name).
### Deleting projects
### Deleting projects {#deleting-projects}
You can delete projects, which will erase the project files from the workspace directory on your computer. This is immediate and cannot be undone.
Go to <span class="menuItems">Open Project</span> and find the project you want to delete. Click on the <span class="menuItems">X</span> to the left of the project name. There will be a confirmation dialog.
### Project files
### Project files {#project-files}
You can find all of your raw project files in your work directory. They will be named according to the unique “Project ID” that OpenRefine has assigned them, which you can find on the <span class="menuItems">Open Project</span> screen, under the “About” link for each project.

View File

@ -4,7 +4,7 @@ title: Transforming data
sidebar_label: Overview
---
## Overview
## Overview {#overview}
OpenRefine gives you powerful ways to clean, correct, codify, and extend your data. Without ever needing to type inside a single cell, you can automatically fix typos, convert things to the right format, and add structured categories from trusted sources.
@ -17,7 +17,7 @@ This section of ways to improve data are organized by their appearance in the me
* [add new columns](columnediting) based on existing data, with fetching new information, or through [reconciliation](reconciling)
* convert your rows of data into [multi-row records](exploring#rows-vs-records).
## Edit rows
## Edit rows {#edit-rows}
Moving rows around is a permanent change to your data.

View File

@ -4,11 +4,11 @@ title: Transposing
sidebar_label: Transposing
---
## Overview
## Overview {#overview}
These functions were created to solve common problems with reshaping your data: pivoting cells from a row into a column, or pivoting cells from a column into a row. You can also transpose from a repeated set of values into multiple columns.
## Transpose cells across columns into rows
## Transpose cells across columns into rows {#transpose-cells-across-columns-into-rows}
Imagine personal data with addresses in this format:
@ -21,7 +21,7 @@ You can transpose the address information from this format into multiple rows. G
![A screenshot of the transpose across columns window.](/img/transpose1.png)
### One column
### One column {#one-column}
You can transpose the multiple address columns into a series of rows:
@ -51,7 +51,7 @@ You can choose one column and include the column-name information in each cell b
||Country: USA|
||Postal code: 19010|
### Two columns
### Two columns {#two-columns}
You can retain the column names as separate cell values, by selecting <span class="fieldLabels">Two new columns</span> and naming the key and value columns.
@ -67,7 +67,7 @@ You can retain the column names as separate cell values, by selecting <span clas
||Country|USA|
||Postal code|19010|
## Transpose cells in rows into columns
## Transpose cells in rows into columns {#transpose-cells-in-rows-into-columns}
Imagine employee data in this format:
@ -107,7 +107,7 @@ value.replace("Employee: ", "")
If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try <span class="menuItems">Columnize by key/value columns</span>.
## Columnize by key/value columns
## Columnize by key/value columns {#columnize-by-keyvalue-columns}
This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at <span class="menuItems">Transpose</span><span class="menuItems">Columnize by key/value columns</span>.
@ -131,7 +131,7 @@ In this format, each flower species is described by multiple attributes on conse
| Galanthus nivalis | White | 162168 |
| Narcissus cyclamineus | Yellow | 161899 |
### Entries with multiple values in the same column
### Entries with multiple values in the same column {#entries-with-multiple-values-in-the-same-column}
If a new row would have multiple values for a given key, then these values will be grouped on consecutive rows, to form a [record structure](exploring#rows-vs-records).
@ -157,7 +157,7 @@ This table is transformed by the Columnize operation to:
The first key encountered by the operation serves as the record key, so the “Green” value is attached to the “Galanthus nivalis” name. See the [Row order](#row-order) section for more details about the influence of row order on the results of the operation.
### Notes column
### Notes column {#notes-column}
In addition to the key and value columns, you can optionally add a column for notes. This can be used to store extra metadata associated to a key/value pair.
@ -181,7 +181,7 @@ If the “Source” column is selected as the notes column, this table is transf
Notes columns can therefore be used to preserve provenance or other context about a particular key/value pair.
### Row order
### Row order {#row-order}
The order in which the key/value pairs appear matters. The Columnize operation will use the first key it encounters as the delimiter for entries: every time it encounters this key again, it will produce a new row, and add the following key/value pairs to that row.
@ -207,7 +207,7 @@ The occurrences of the “Name” value in the “Field” column define the bou
This sensitivity to order is removed if there are extra columns: in that case, the first extra column will serve as the key for the new rows.
### Extra columns
### Extra columns {#extra-columns}
If your dataset contains extra columns, that are not being used as the key, value, or notes columns, they can be preserved by the operation. For this to work, they must have the same value in all old rows corresponding to a new row.

View File

@ -4,15 +4,15 @@ title: Troubleshooting
sidebar_label: Troubleshooting
---
## Frequently asked questions
## Frequently asked questions {#frequently-asked-questions}
We collect and share FAQs and responses on Github at [https://github.com/OpenRefine/OpenRefine/wiki/FAQ](https://github.com/OpenRefine/OpenRefine/wiki/FAQ).
If you dont find your problem and solution there, continue on to the resources in the Community section below to see more conversations and look for solutions.
## Community
## Community {#community}
### If youre having a problem:
### If youre having a problem: {#if-youre-having-a-problem}
* Search the [User forum](https://groups.google.com/g/openrefine) to see if the problem is already reported
* Search [Github issues](https://github.com/OpenRefine/OpenRefine/issues) to see if the problem is already reported
* Read [Stack Overflow](https://stackoverflow.com/questions/tagged/openrefine) to see if others had a similar problem
@ -21,7 +21,7 @@ If you dont find your problem and solution there, continue on to the resource
* First as a new thread (conversation) in the [User forum](https://groups.google.com/g/openrefine).
* Then, if you wish, you can create a Github issue.
### If you want to contribute:
### If you want to contribute: {#if-you-want-to-contribute}
* [Help us translate the tool into more languages](https://docs.openrefine.org/technical-reference/translating), using Weblate
* [We have a guide to contributing](technical-reference/contributing) in the [Technical Reference](technical-reference/technical-reference-index) section
* Contribute your feature requests in the [User forum](https://groups.google.com/g/openrefine) or as [Github issues](https://github.com/OpenRefine/OpenRefine/issues/new/choose)

View File

@ -2,7 +2,7 @@ Sometimes your data is not as simple as a normal table, or the sort of
statements that you want to do varies on each row. This document
explains how to work around these cases.
## Hierarchical data
## Hierarchical data {#hierarchical-data}
Sometimes your source provides data in a structured format, such as XML,
JSON or RDF. OpenRefine can import these files and will convert them to
@ -16,7 +16,7 @@ the null cells with the corresponding artist. You can do this with the
This function will copy not just cell values but also reconciliation
results.
## Conditional additions
## Conditional additions {#conditional-additions}
Sometimes you want to add a statement only in some conditions.
@ -30,7 +30,7 @@ The workflow to achieve this looks like this:
- Create a schema using the column you partially blanked out as
statement value.
## Varying properties
## Varying properties {#varying-properties}
Sometimes you wish you could use column variables for properties in your
schema. It is currently not possible, first because we do not have a
@ -54,7 +54,7 @@ which partition the original column. You can now create a schema which adds
two statements, with values taken from those columns. Since blank values are
ignored, exactly one statement will be added for each item, with the desired property.
## Adapting to existing data on Wikibase
## Adapting to existing data on Wikibase {#adapting-to-existing-data-on-wikibase}
Sometimes you want to create statements only if there are no such
statements on the item yet. Here is one way to achieve this:

View File

@ -6,23 +6,23 @@ sidebar_label: Connecting to Wikibase
This page explains how to connect OpenRefine to any Wikibase instance. If you just want to work with [Wikidata](https://www.wikidata.org/), you can ignore this page as Wikidata is configured out of the box in OpenRefine.
## For Wikibase end users
## For Wikibase end users {#for-wikibase-end-users}
All you need to configure OpenRefine to work with a Wikibase instance is a *manifest* for that instance, which provides some metadata and links required for the integration to work.
We offer some off-the-shelf manifests for some public Wikibase instances in the [wikibase-manifests](https://github.com/OpenRefine/wikibase-manifests) repository. But the administrators of your Wikibase instance should provide one that is potentially more
up to date, so it makes sense to request it to them first.
## For Wikibase administrators
## For Wikibase administrators {#for-wikibase-administrators}
To let your users contribute to your Wikibase instance with OpenRefine, you will need to write a manifest as described above. There is currently no canonical location where this manifest should be hosted - just make sure can be found easily by your users. This section explains the format of the manifest.
### Requirements
### Requirements {#requirements}
To work with OpenRefine, your Wikibase instance needs an associated reconciliation service. For instance you can use [a Python wrapper](https://github.com/wetneb/openrefine-wikibase) for this.
### The format of the manifest
### The format of the manifest {#the-format-of-the-manifest}
Here is the manifest of Wikidata:
@ -66,83 +66,83 @@ Here is the manifest of Wikidata:
In general, there are several parts of the manifest: version, mediawiki, wikibase, oauth, reconciliation and editgroups.
#### version
#### version {#version}
The version should in the format "1.x". The minor version should be increased when you update the manifest in a backward-compatible manner. The major version should be "1" if the manifest is in the format specified by [wikibase-manifest-schema-v1.json](https://github.com/afkbrb/wikibase-manifest/blob/master/wikibase-manifest-schema-v1.json).
#### mediawiki
#### mediawiki {#mediawiki}
This part contains some basic information of the Wikibase.
##### name
##### name {#name}
The name of the Wikibase, should be unique for different Wikibase instances.
##### root
##### root {#root}
The root of the Wikibase. Typically in the form "https://foo.bar/wiki/". The trailing slash cannot be omitted.
##### main_page
##### main_page {#main_page}
The main page of the Wikibase. Typically in the form "https://foo.bar/wiki/Main_Page".
##### api
##### api {#api}
The MediaWiki API endpoint of the Wikibase. Typically in the form "https://foo.bar/w/api.php".
#### wikibase
#### wikibase {#wikibase}
This part contains configurations of the Wikibase extension.
##### site_iri
##### site_iri {#site_iri}
The IRI of the Wikibase, in the form 'http://foo.bar/entity/'. This should match the IRI prefixes used in RDF serialization. Be careful about using "http" or "https", because any variation will break comparisons at various places. The trailing slash cannot be omitted.
##### maxlag
##### maxlag {#maxlag}
Maxlag is a parameter that controls how aggressive a mass-editing tool should be when uploading edits to a Wikibase instance. See https://www.mediawiki.org/wiki/Manual:Maxlag_parameter for more details. The value should be adapted according to the actual traffic of the Wikibase.
##### properties
##### properties {#properties}
Some special properties of the Wikibase.
###### instance_of
###### instance_of {#instance_of}
The ID of the property "instance of".
###### subclass_of
###### subclass_of {#subclass_of}
The ID of the property "subclass of".
##### constraints
##### constraints {#constraints}
Not required. Should be configured if the Wikibase has the [WikibaseQualityConstraints extension](https://www.mediawiki.org/wiki/Extension:WikibaseQualityConstraints) installed. Configurations of constraints consists of IDs of constraints related properties and items. For Wikidata, these IDs are retrieved from [extension.json](https://github.com/wikimedia/mediawiki-extensions-WikibaseQualityConstraints/blob/master/extension.json). To configure this for another Wikibase instance, you should contact an admin of the Wikibase instance to get the content of `extension.json`.
#### oauth
#### oauth {#oauth}
Not required. Should be configured if the Wikibase has the [OAuth extension](https://www.mediawiki.org/wiki/Extension:OAuth) installed.
##### registration_page
##### registration_page {#registration_page}
The page to register an OAuth consumer of the Wikibase. Typically in the form "https://foo.bar/wiki/Special:OAuthConsumerRegistration/propose".
#### reconciliation
#### reconciliation {#reconciliation}
The Wikibase instance must have at least a reconciliation service endpoint linked to it. If there is no reconciliation service for the Wikibase, you can run one with [openrefine-wikibase](https://github.com/wetneb/openrefine-wikibase).
##### endpoint
##### endpoint {#endpoint}
The default reconciliation service endpoint of the Wikibase instance. The endpoint must contain the "${lang}" variable such as "https://wdreconcile.toolforge.org/${lang}/api", since the reconciliation service is expected to work for different languages.
#### editgroups
#### editgroups {#editgroups}
Not required. Should be configured if the Wikibase instance has [EditGroups](https://github.com/Wikidata/editgroups) service(s).
##### url_schema
##### url_schema {#url_schema}
The URL schema used in edits summary. This is used for EditGroups to extract the batch id from a batch of edits and for linking to the EditGroups page of the batch. The URL schema must contains the variable '${batch_id}', such as '([[:toollabs:editgroups/b/OR/${batch_id}|details]])' for Wikidata.
#### Check the format of the manifest
#### Check the format of the manifest {#check-the-format-of-the-manifest}
As mentioned above, the manifest should be in the format specified by [wikibase-manifest-schema-v1.json](https://github.com/afkbrb/wikibase-manifest/blob/master/wikibase-manifest-schema-v1.json). You can check the format by adding the manifest directly to OpenRefine, and OpenRefine will complain if there is anything wrong with the format.

View File

@ -7,7 +7,7 @@ sidebar_label: New items
OpenRefine can create new items. This page explains how they are
generated.
## Words of caution
## Words of caution {#words-of-caution}
- The fact that OpenRefine does not propose any item when reconciling
a cell does not mean that the item is not present in the Wikibase instance:
@ -21,7 +21,7 @@ generated.
edit group that includes new items in Wikidata, you will need to ask an
administrator to do it.
## Workflow overview
## Workflow overview {#workflow-overview}
Here is how you would typically create new items with OpenRefine:
@ -57,7 +57,7 @@ an item in a Wikibase schema.
You can also perform the edits with QuickStatements - in this case, your
OpenRefine project will not be updated with the newly created Qids.
## Adding labels to new items
## Adding labels to new items {#adding-labels-to-new-items}
The text that is in a cell reconciled to \"new\" is not automatically
used as label for the newly-created item. This is because OpenRefine has
@ -73,7 +73,7 @@ issues will be raised if insufficient basic information is added on the
items (but these other warnings will not prevent you from performing the
edits).
## Marking multiple cells as identical items
## Marking multiple cells as identical items {#marking-multiple-cells-as-identical-items}
If you mark individual cells as new items, one new item per cell will be
created. Sometimes multiple rows refer to the same item. OpenRefine
@ -87,7 +87,7 @@ If these two conditions are met, then isolate these cells with facets
and go to **Reconcile****Actions** → **Create one item for similar
cells**. This will mark the cells as new and referring to the same item.
## Retrieving the Qids of the newly-created items
## Retrieving the Qids of the newly-created items {#retrieving-the-qids-of-the-newly-created-items}
Once you have performed your edits with OpenRefine, any new cells
covered by the facet will be updated with their new Qids. You can

View File

@ -8,7 +8,7 @@ sidebar_label: Overview
OpenRefine's Wikibase integration is provided by an extension which is available by default in OpenRefine. In this page, we present the functionalities for Wikidata, but [any Wikibase instance can be connected to OpenRefine](./configuration) to obtain a similar integration.
## Editing Wikidata with OpenRefine
## Editing Wikidata with OpenRefine {#editing-wikidata-with-openrefine}
As a user-maintained data source, Wikidata can be edited by anyone. OpenRefine makes it simple to upload information in bulk. You simply need to get your information into the correct format, and ensure that it is new (not redundant to information already on Wikidata) and does not conflict with existing Wikidata information.
@ -32,7 +32,7 @@ If you upload edits that are redundant (that is, all the statements you want to
You can use OpenRefine's reconciliation preview to look at the target Wikidata elements and see what information they already have, and whether the elements' histories have had similar edits reverted in the past.
### Wikidata schema
### Wikidata schema {#wikidata-schema}
A [schema](https://en.wikipedia.org/wiki/Database_schema) is a plan for how to structure information in a database. In OpenRefine, the schema operates as a template for how Wikidata edits should be applied: how to translate your tabular data into statements. With a schema, you can:
* preview the Wikidata edits and inspect them manually;
@ -52,7 +52,7 @@ OpenRefine presents you with an easy visual way to map out the relationships in
You may wish to refer to [this Wikidata tutorial on how OpenRefine handles Wikidata schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic_editing). For details about how each data type is handled in the Wikibase schema, see [Schema alignment](./schema-alignment).
#### Editing terms with your schema
#### Editing terms with your schema {#editing-terms-with-your-schema}
With OpenRefine, you can edit the terms (labels, aliases, descriptions, or sitelinks) of Wikidata entities as well as establish relationships between entities. For example, you may wish to upload pseudonyms, pen names, maiden names, or married names for authors.
@ -77,7 +77,7 @@ You could upload the “Translated titles” to “Label” with the language sp
![Constructing a schema with aliases and languages.](/img/wikidata-translated.png)
### Manage Wikidata account
### Manage Wikidata account {#manage-wikidata-account}
To edit Wikidata directly from OpenRefine, you must log in with a Wikidata account. OpenRefine can only upload edits with Wikidata user accounts that are “[autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users)” - at this time, that means accounts that have more than 50 edits and have existed for longer than four days.
@ -96,7 +96,7 @@ If your account or your bot is not properly authorized, OpenRefine will not disp
You can store your unencrypted username and password in OpenRefine, saved locally to your computer and available for future use. For security reasons, you may wish to leave this box unchecked. You can also save your OpenRefine-specific bot password in your browser or with a password management tool.
### Import and export schema
### Import and export schema {#import-and-export-schema}
You can save time on repetitive processes by defining a schema on one project, then exporting it and importing for use on new datasets in the future. Or you and your colleagues can share a schema with each other to coordinate your work.
@ -104,7 +104,7 @@ You can export a schema from a project using <span class="menuItems">Export</spa
You can import a schema using <span class="menuItems">Extensions</span><span class="menuItems">Import schema</span>. You can upload a JSON file, or paste JSON statements directly into a field in the window. An imported schema will look for columns with the same names, and you will see an error message if your project doesn't contain matching columns.
### Upload edits to Wikidata
### Upload edits to Wikidata {#upload-edits-to-wikidata}
There are two menu options in OpenRefine for applying your edits to Wikidata, and the details of the differences between the two can be found in the [Uploading page](./uploading). Under <span class="menuItems">Export</span> you will see <span class="menuItems">Wikidata edits...</span> and under <span class="menuItems">Extensions</span> you will see <span class="menuItems">Upload edits to Wikidata</span>. Both will bring up the same window for you to [log in with a Wikidata account](#manage-wikidata-account).
@ -114,7 +114,7 @@ If you are ready to upload your edits, you can provide an “Edit summary” - a
If your edits have been successful, you will see them listed on [your Wikidata user contributions page](https://www.wikidata.org/wiki/Special:Contributions/), and on the [Edit groups page](https://editgroups.toolforge.org/). All edits can be undone from this second interface.
### QuickStatements export
### QuickStatements export {#quickstatements-export}
Your OpenRefine data can be exported in a format recognized by [QuickStatements](https://www.wikidata.org/wiki/Help:QuickStatements), a tool that creates Wikidata edits using text commands. OpenRefine generates “version 1” QuickStatements commands.
@ -125,7 +125,7 @@ In order to use QuickStatements, you must authorize it with a Wikidata account t
Follow the [steps listed on this page](https://www.wikidata.org/wiki/Help:QuickStatements#Running_QuickStatements).
To prepare your OpenRefine data into QuickStatements, select <span class="menuItems">Export</span><span class="menuItems">QuickStatements file</span>, or <span class="menuItems">Extensions</span><span class="menuItems">Export to QuickStatements</span>. Exporting your schema from OpenRefine will generate a text file called `statements.txt` by default. Paste the contents of the text file into a new QuickStatements batch using version 1. You can find [version 1 of the tool (no longer maintained) here](https://wikidata-todo.toolforge.org/quick_statements.php). The text commands will be processed into Wikidata edits and previewed for you to review before submitting.
### Issue detection
### Issue detection {#issue-detection}
This section is an overview of the [Quality assurance page](./quality-assurance).

View File

@ -7,13 +7,13 @@ sidebar_label: Quality assurance
This page explains how the Wikidata extension of OpenRefine analyzes edits before they are uploaded to the Wikibase
instance. Most of these checks rely on the use of the [Wikibase Quality Constraints](https://gerrit.wikimedia.org/g/mediawiki/extensions/WikibaseQualityConstraints) extension and the configuration of the property and item identifiers in the [Wikibase manifest](./configuration).
## Overview
## Overview {#overview}
Changes are scrutinized before they are uploaded, but also before the current content of the corresponding items is retrieved and merged with the updates. This means that some constraint violations cannot be predicted by the software (for instance, adding a new statement that conflicts with an existing statement on the item). However, this makes it possible to run the checks quickly, even for relatively large batches of edits. Issues are therefore refreshed in real time while the user builds the schema.
As a consequence, not all constraint violations can be detected: the ones that are supported are listed in the [Constraint violations](#constraint-violations) section. Conversely, not all issues reported will be flagged as constraint violations on the Wikibase site: see [Generic issues](#generic-issues) for these.
## Reconciliation
## Reconciliation {#reconciliation}
You should always assess the quality of your reconciliation results first. OpenRefine has various tools for quality assurance of reconciliation results. For instance:
@ -21,7 +21,7 @@ You should always assess the quality of your reconciliation results first. OpenR
* you can compare the values in your table with those on the items (via a text facet defined by a custom expression);
* you can facet by type on the reconciled items (add a new column with the types and use a text facet ordered by counts to get a sense of the distribution of types in your reconciled items).
## Constraint violations
## Constraint violations {#constraint-violations}
Constraints are retrieved as defined on the properties, using [ (P2302)](https://www.wikidata.org/wiki/Property:P2302).
@ -36,7 +36,7 @@ The following constraints are supported:
A comparison of the supported constraints with respect to other implementations is available [here](https://www.wikidata.org/wiki/Wikidata:WikiProject_property_constraints/reports/implementations).
## Generic issues
## Generic issues {#generic-issues}
OpenRefine also detects issues that are not flagged (yet) by constraint violations on Wikidata:
* Statements without references. This does not rely on [citation needed constraint (Q54554025)](https://www.wikidata.org/wiki/Q54554025): all statements are expected to have references. (The idea is that when importing a dataset, every statement you add

View File

@ -14,7 +14,7 @@ You can find documentation and further resources on the reconciliation API [here
For the most part, Wikidata reconciliation behaves the same way other reconciliation services do, but there are a few processes and features specific to Wikidata.
## Language settings
## Language settings {#language-settings}
You can install a version of the Wikidata reconciliation service that uses your language. First, you need the language code: this is the [two-letter code found on this list](https://en.wikipedia.org/wiki/List_of_Wikipedias), or in the domain name of the desired Wikipedia/Wikidata (for instance, “fr” if your Wikipedia is https://fr.wikipedia.org/wiki/).
@ -22,7 +22,7 @@ Then, open the reconciliation window (under <span class="menuItems">Reconcile</s
When reconciling using this interface, items and properties will be displayed in your chosen language if the label is available. The matching score of the reconciliation is not influenced by your choice of language for the service: items are matched by considering all labels and returning the best possible match. The language of your dataset is also irrelevant to your choice of language for the reconciliation service; it simply determines which language labels to return based on the entity chosen.
## Restricting matches by type
## Restricting matches by type {#restricting-matches-by-type}
In Wikidata, types are items themselves. For instance, the [university of Ljubljana (Q1377)](https://www.wikidata.org/wiki/Q1377) has the type [public university (Q875538)](https://www.wikidata.org/wiki/Q875538), using the [instance of (P31)](https://www.wikidata.org/wiki/Property:P31) property. Types can be subclasses of other types, using the [subclass of (P279)](https://www.wikidata.org/wiki/Property:P279) property. For instance, [public university (Q875538)](https://www.wikidata.org/wiki/Q875538) is a subclass of [university (Q3918)](https://www.wikidata.org/wiki/Q3918). You can visualize these structures with the [Wikidata Graph Builder](https://angryloki.github.io/wikidata-graph-builder/).
@ -30,13 +30,13 @@ When you select or enter a type for reconciliation, OpenRefine will include that
Some items and types may not yet be set as an instance or subclass of anything (because Wikidata is crowdsourced). If you restrict reconciliation to a type, items without the chosen type will not appear in the results, except as a fallback, and will have a lower score.
## Reconciling via unique identifiers
## Reconciling via unique identifiers {#reconciling-via-unique-identifiers}
You can supply a column of unique identifiers (in the form "Q###" for entities) directly to Wikidata in order to pull more data, but [these strings will not be “reconciled” against the external dataset](reconciling#reconciling-with-unique-identifiers). Apply the operation <span class="menuItems">Reconcile</span><span class="menuItems">Use values as identifiers</span> on your column of QIDs. All cells will appear as dark blue “confirmed” matches. Some of the “matches” may be errors, which you will need to hover over or click on to identify. You cannot use this to reconcile properties (in the form "P###").
If the identifier you submit is assigned to multiple Wikidata items (because Wikidata is crowdsourced), all of the items are returned as candidates, with none automatically matched.
## Property paths, special properties, and subfields
## Property paths, special properties, and subfields {#property-paths-special-properties-and-subfields}
Wikidata's hierarchical property structure can be called by using property paths (using |, /, and . symbols). Labels, aliases, descriptions, and sitelinks can also be accessed. You can also match values against subfields, such as latitude and longitude subfields of a geographical coordinate.

View File

@ -9,7 +9,7 @@ to each row in the project. This page describes how each part of this
template works, and how it generates edits depending on the contents of
the table cells.
## Items
## Items {#items}
An item in the schema represents a set of changes on a particular
Wikidata item, generated by a single row. This item can contain changes
@ -38,13 +38,13 @@ upload. If your project makes edits on the same item across multiple
rows, these edits will be merged together and performed in one edit. See
[Uploading your changes](./upload) about that.
## Terms
## Terms {#terms}
**Terms** are the language-specific strings that you find at the top of
Wikidata items: labels, descriptions and aliases. OpenRefine lets you
edit these terms via the Wikidata schema.
### Languages
### Languages {#languages}
Each term belongs to a particular language. Wikidata supports [hundreds
of languages](https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all), which
@ -65,7 +65,7 @@ two cases:
OpenRefine will translate any deprecated language codes to their
preferred values silently.
### Labels
### Labels {#labels}
This is because Wikidata items can have at most one label per language,
so you need to choose whether to override any existing label (default
@ -74,20 +74,20 @@ label in the given language (default behaviour starting from 3.2). When
the content of the cell providing the label is blank, nothing will be
changed (so, it is not possible to remove labels).
### Descriptions
### Descriptions {#descriptions}
Descriptions work like labels: there is at most one description per
language, and OpenRefine can override existing descriptions or leave
them unchanged. It is not possible to remove descriptions either.
### Aliases
### Aliases {#aliases}
Aliases are added to the list of existing aliases in the given language.
When adding an alias in a language where no label has been added yet,
the alias is automatically promoted to a label for this language. It is
not possible to remove aliases or to override any existing aliases.
## Statements
## Statements {#statements}
You can add statements in the schema: this will generate new statements
on the corresponding items. These statements will be merged with any
@ -95,7 +95,7 @@ existing statements on the actual Wikidata items and [this merging process depen
It is forecast to give more control over the merging strategy in the
near future.
### Main values
### Main values {#main-values}
Statements must have main values: \"novalue\" or \"somevalue\"
statements are not supported yet. The main value of a statement is a
@ -107,25 +107,25 @@ skipped.
See the [data values](#data-values) section for more details
about how to specify each type of data value and when they are skipped.
### Qualifiers
### Qualifiers {#qualifiers}
Qualifiers can be added on each statement. When their values are
skipped, only the qualifier will be discarded: the rest of the statement
will still be added.
### References
### References {#references}
References can (and should) be added to back each statement. If values
inside the reference are skipped, the corresponding part of the
reference will be discarded but the reference will still be added
(unless the reference becomes empty).
### Ranks
### Ranks {#ranks}
All statements ranks are set to **Normal**. It is currently not possible
to set a different rank.
## Data values
## Data values {#data-values}
Data values are the data that you can find as target of a statement (or
qualifier, or part of a reference). Each property dictates a particular
@ -133,7 +133,7 @@ type of data value. In each case, OpenRefine uses a particular process
to translate cell contents to a data value of the appropriate type. This
section explains the process for all data types.
### Items
### Items {#items-1}
Items are evaluated in the same way as the subjects of items in the
schema. They can be input directly using the auto-suggest service
@ -141,7 +141,7 @@ provided, or any column reconciled against Wikidata can be used. Refer to
[the first Items section](#items) to see how they are
evaluated.
### Strings and external identifiers
### Strings and external identifiers {#strings-and-external-identifiers}
Bare strings and external identifiers can be input directly as constants
(if they do not change across rows) or using any column. If a reconciled
@ -150,7 +150,7 @@ going to be used, not the name of the reconciled item (which is what
OpenRefine displays). Values are skipped when the column is blank or
null.
### Monolingual texts
### Monolingual texts {#monolingual-texts}
Monolingual texts consist of two parts:
@ -161,7 +161,7 @@ Monolingual texts consist of two parts:
A monolingual text is skipped when any of its parts is skipped (that is,
if the language or the text are invalid).
### Dates
### Dates {#dates}
Dates are parsed from cell contents (or from any constant provided in
the schema) and the precision of the date is inferred from its format.
@ -189,7 +189,7 @@ In OpenRefine 3.5, the following new format has been introduced:
- `-234` represents the year 234 [BCE](https://en.wikipedia.org/wiki/Common_Era)
### Quantities
### Quantities {#quantities}
Quantities consist of two parts: the amount and the unit.
@ -208,7 +208,7 @@ Quantities consist of two parts: the amount and the unit.
template for a quantity value is either always unit-less, or always
has a unit.
### Globe coordinates
### Globe coordinates {#globe-coordinates}
Geographic coordinates are specified as strings with the following
formats, where all components are floating point numbers in degrees:
@ -229,7 +229,7 @@ If your coordinates are in a different format, such as
`49° 15 55″ N, 4° 1 43″ E`, you will need to convert them to decimal
format first.
### Media on Commons
### Media on Commons {#media-on-commons}
Media on Wikimedia Commons is treated like strings, whose values must
exactly match filenames on Commons. These values are not checked during
@ -240,13 +240,13 @@ Tabular data and Geoshapes must be prefixed with the `Data:` namespace.
This is indicated by the placeholder in the field that appears when
constructing the schema.
### Properties
### Properties {#properties}
Properties are always constants: there is currently no way to reconcile
a column against properties. They have to be selected with the
auto-suggest dialog.
### Other data types
### Other data types {#other-data-types}
URLs, mathematical expressions and other textual datatypes are supported
and treated as strings. At the time of writing, all datatypes supported

View File

@ -6,7 +6,7 @@ sidebar_label: Uploading edits
This page explains how to upload your edits to the target Wikibase. It assumes you already have a created a Wikibase schema in your OpenRefine project.
## Uploading with OpenRefine
## Uploading with OpenRefine {#uploading-with-openrefine}
* Click <span class="menuItems">Wikidata</span><span class="menuItems">Upload edits to Wikidata</span>.
* Log in with your personal account or your bot account depending on which account you want to use to make the edits. It is a good practice to use a [bot password](https://www.mediawiki.org/wiki/Manual:Bot_passwords).
@ -15,7 +15,7 @@ This page explains how to upload your edits to the target Wikibase. It assumes y
Because performing edits in OpenRefine counts as an operation, you can extract this operation and reapply it to other projects. If you do so, you should also include the operation that saves the schema (only the last one is required), and make sure that the column names in the schema match those of the OpenRefine project where you are applying the operation.
## Uploading with QuickStatements
## Uploading with QuickStatements {#uploading-with-quickstatements}
This requires that the Wikibase site has an associated [QuickStatements](https://meta.wikimedia.org/wiki/QuickStatements) tool.
@ -25,22 +25,22 @@ This requires that the Wikibase site has an associated [QuickStatements](https:/
* Paste the generated changes in the text area;
* Perform the edits with <span class="buttonLabels">Run</span> or <span class="buttonLabels">Run in background</span>.
## Notable differences between the two methods
## Notable differences between the two methods {#notable-differences-between-the-two-methods}
### Merging strategy for statements
### Merging strategy for statements {#merging-strategy-for-statements}
OpenRefine checks for existing statements which match not only the property and the target value, but also the qualifiers. On the other hand, QuickStatements ignores qualifiers when matching statements.
Both merging strategies can be useful depending on the properties. It is forecast to let the user configure the matching method in OpenRefine.
If references are provided, both tools merge references in matching statements.
### New item creation
### New item creation {#new-item-creation}
OpenRefine supports creating new items with arbitrary relations between them.
QuickStatements supports creating new items with the <code>CREATE</code> instruction, and subsequent instructions can use the <code>LAST</code> placeholder to use the Qid of the last created item. When generating QuickStatements instructions, OpenRefine reorders your edits so that this syntax can be used. In rare cases, such as when a statement links two newly-created items, it is impossible to use QuickStatements to perform the edit. In this case, no QuickStatements script will be generated.
### Speed and number of edits
### Speed and number of edits {#speed-and-number-of-edits}
OpenRefine generally performs one edit per item touched by an edit batch and at most two in general (in the case where new items contain links between them). This was chosen to minimize server load, speed up the upload and keep item histories compact. The downside is that the edit summaries can be less meaningful - it is therefore important that users supply informative summaries when uploading their batches. OpenRefine asymptotically edits at the rate of 60 edits per minute (so, usually 60 items per minute). The first edits are made more quickly, which is convenient for small batches.

View File

@ -8,7 +8,7 @@ OpenRefine is a web application, but is designed to be run locally on your own m
This architecture provides a good separation of concerns (data vs. UI); allows the use of familiar web technologies (HTML, CSS, Javascript) to implement user interface features; and enables the server side to be called by third-party software through standard GET and POST operations.
## Technology stack
## Technology stack {#technology-stack}
The server-side part of OpenRefine is implemented in Java as one single servlet which is executed by the [Jetty](http://jetty.codehaus.org/jetty/) web server + servlet container. The use of Java strikes a balance between performance and portability across operating systems (there is very little OS-specific code and has mostly to do with starting the application).
@ -27,7 +27,7 @@ String clustering is provided by the [SIMILE Vicino](http://code.google.com/p/si
OAuth functionality is provided by the [Signpost](https://github.com/mttkay/signpost) project.
## Server-side architecture
## Server-side architecture {#server-side-architecture}
OpenRefine's server-side is written entirely in Java (`main/src/`) and its entry point is the Java servlet `com.google.refine.RefineServlet`. By default, the servlet is hosted in the lightweight Jetty web server instantiated by `server/src/com.google.refine.Refine`. Note that the server class itself is under `server/src/`, not `main/src/`; this separation leaves the possibility of hosting `RefineServlet` in a different servlet container.
@ -35,7 +35,7 @@ The web server configuration is in `main/webapp/WEB-INF/web.xml`; that's where `
As mentioned before, the server-side maintains states of the data, and the primary class involved is `com.google.refine.ProjectManager`.
### Projects
### Projects {#projects}
In OpenRefine there's the concept of a workspace similar to that in Eclipse. When you run OpenRefine it manages projects within a single workspace, and the workspace is embodied in a file directory with sub-directories. The default workspace directories are listed in the [FAQs](https://github.com/OpenRefine/OpenRefine/wiki/FAQ-Where-Is-Data-Stored). You can get OpenRefine to use a different directory by specifying a -d parameter at the command line.
@ -45,14 +45,14 @@ A project's _actual_ data includes the columns, rows, cells, reconciliation reco
A project is loaded into memory when it needs to be displayed or modified, and it remains in memory until 1 hour after the last time it gets modified. Periodically the project manager tries to save modified projects, and it saves as many modified projects as possible within 30 seconds.
### Data Model
### Data Model {#data-model}
A project's data consists of
- _raw data_: a list of rows, each row consisting of a list of cells
- _models_ on top of that raw data that give high-level presentation or interpretation of that data. This design lets the same raw data be viewed in different ways by different models, and let the models be changed without costly changes to the raw data.
#### Column Model
#### Column Model {#column-model}
Cells in rows are not named and can only be addressed by their list position indices. So, a _column model_ is needed to give a name to each list position. The column model also stores other metadata for each column, including the type that cells in the column have been reconciled to and the overall reconciliation statistics of those cells.
@ -60,7 +60,7 @@ Each column also acts as a cache for data computed from the raw data related to
Columns in the column model can be removed and re-ordered without changing the raw data--the cells in the rows. This makes column removal and ordering operations really quick.
##### Column Groups
##### Column Groups {#column-groups}
Consider the following data:
@ -74,7 +74,7 @@ Blank cells play a very important role. The blank cell in a key column of a row
Currently (as of 12th December 2017) only the XML and JSON importers create column groups, and while the data table view does display column groups but it doesn't support modifying them.
### Changes, History, Processes, and Operations
### Changes, History, Processes, and Operations {#changes-history-processes-and-operations}
All changes to the project's data are tracked (N.B. this does not include changes to a project's metadata - such as the project name.)
@ -98,17 +98,17 @@ In summary,
- some processes are long-running and some are immediate; processes are run sequentially in a queue
- generalizable processes can be re-constructed from abstract operations
## Client-side architecture
## Client-side architecture {#client-side-architecture}
The client-side part of OpenRefine is implemented in HTML, CSS and Javascript and uses the following Javascript libraries:
* [jQuery](http://jquery.com/)
* [jQueryUI](http:jqueryui.com/)
* [Recurser jquery-i18n](https://github.com/recurser/jquery-i18n)
### Importing architecture
### Importing architecture {#importing-architecture}
OpenRefine has a sophisticated architecture for accommodating a diverse and extensible set of importable file formats and work flows. The formats range from simple CSV, TSV to fixed-width fields to line-based records to hierarchical XML and JSON. The work flows allow the user to preview and tweak many different import settings before creating the project. In some cases, such as XML and JSON, the user also has to select which elements in the data file to import. Additionally, a data file can also be an archive file (e.g., .zip) that contains many files inside; the user can select which of those files to import. Finally, extensions to OpenRefine can inject functionalities into any part of this architecture.
### The Index Page and Action Areas
### The Index Page and Action Areas {#the-index-page-and-action-areas}
The opening screen of OpenRefine is implemented by the file refine/main/webapp/modules/core/index.vt and will be referred to here as the index page. Its default implementation contains 3 finger tabs labeled Create Project, Open Project, and Import Project. Each tab selects an "action area". The 3 default action areas are for, obviously, creating a new project, opening an existing project, and importing a project .tar file.
@ -126,13 +126,13 @@ The UI class is a constructor function that takes one argument, a jQuery-wrapped
If your extension requires a very unique importing work flow, or a very novel feature that should be exposed on the index page, then add a new action area. Otherwise, try to use the existing work flows as much as possible.
### The Create Project Action Area
### The Create Project Action Area {#the-create-project-action-area}
The Create Project action area is itself extensible. Initially, it embeds a set of finger tabs corresponding to a variety of "source selection UIs": you can select a source of data by specifying a file on your computer, or you can specify the URL to a publicly accessible data file or data feed, or you can paste in from the clipboard a chunk of data.
There are actually 3 points of extension in the Create Project action area, and the first is invisible.
#### Importing Controllers
#### Importing Controllers {#importing-controllers}
The Create Project action area manages a list of "importing controllers". Each controller follows a particular work flow (in UI terms, think "wizard"). Refine comes with a "default importing controller" (refine/main/webapp/modules/core/scripts/index/default-importing-controller/controller.js) and its work flow assumes that the data can be retrieved and cached in whole before getting processed in order to generate a preview for the user to inspect. (If the data cannot be retrieved and cached in whole before previewing, then another importing controller is needed.)
@ -153,7 +153,7 @@ Refine.CreateProjectUI.controllers.push(Refine.DefaultImportingController); // r
We will cover the server-side code below.
#### Data Source Selection UIs
#### Data Source Selection UIs {#data-source-selection-uis}
Data source selection UIs are another point of extensibility in the Create Project action area. As mentioned previously, by default there are 3 data source UIs. Those are added by the default importing controller.
@ -192,34 +192,34 @@ The argument `form` is a jQuery-wrapped FORM element that will get submitted to
See refine/main/webapp/modules/core/scripts/index/default-importing-sources/sources.js for examples of such source selection UIs. While we write about source selection UIs managed by the default importing controller here, chances are your own extension will not be adding such a new source selection UI. Your extension probably adds with a new importing controller as well as a new source selection UI that work together.
#### File Selection Panel
#### File Selection Panel {#file-selection-panel}
Documentation not currently available
#### Parsing UI Panel
#### Parsing UI Panel {#parsing-ui-panel}
Documentation not currently available
### Server-side Components
### Server-side Components {#server-side-components}
#### ImportingController
#### ImportingController {#importingcontroller}
Documentation not currently available
#### UrlRewriter
#### UrlRewriter {#urlrewriter}
Documentation not currently available
#### FormatGuesser
#### FormatGuesser {#formatguesser}
Documentation not currently available
#### ImportingParser
#### ImportingParser {#importingparser}
Documentation not currently available
## Faceted browsing architecture
## Faceted browsing architecture {#faceted-browsing-architecture}
Faceted browsing support is core to OpenRefine as it is the primary and only mechanism for filtering to a subset of rows on which to do something _en masse_ (ie in bulk). Without faceted browsing or an equivalent querying/browsing mechanism, you can only change one thing at a time (one cell or row) or else change everything all at once; both kinds of editing are practically useless when dealing with large data sets.
In OpenRefine, different components of the code need to know which rows to process from the faceted browsing state (how the facets are constrained). For example, when the user applies some facet selections and then exports the data, the exporter serializes only the matching rows, not all rows in the project. Thus, faceted browsing isn't only hooked up to the data view for displaying data to the user, but it is also hooked up to almost all other parts of the system.
### Engine Configuration
### Engine Configuration {#engine-configuration}
As OpenRefine is a web app, there might be several browser windows opened on the same project, each in a different faceted browsing state. It is best to maintain the faceted browsing state in each browser window while keeping the server side completely stateless with regard to faceted browsing. Whenever the client-side needs something done by the server, it transfers the entire faceted browsing state over to the server-side. The faceted browsing state behaves much like the `WHERE` clause in a SQL query, telling the server-side how to select the rows to process.
@ -267,7 +267,7 @@ In the code, the faceted browsing state, or faceted browsing query, is actually
}
```
### Server-Side Subsystem
### Server-Side Subsystem {#server-side-subsystem}
From an engine configuration like the one above, the server-side faceted browsing subsystem is capable of producing:
@ -278,6 +278,6 @@ When the engine config JSON arrives in an HTTP request on the server-side, a `co
To produce information on how to render a particular facet in the UI, the engine follows the same procedure described in the previous except it skips over the facet in question. In other words, it produces an iteration over all rows constrained by the other facets. Then it feeds that iteration to the facet in question by calling the facet's `computeChoices()` method. This gives the method a chance to compute the rendering information for its UI counterpart on the client-side. When all facets have been given a chance to compute their rendering information, the engine calls all facets to serialize their information as JSON and returns the JSON to the client-side. Only one HTTP call is needed to compute all facets.
### Client-side subsystem
### Client-side subsystem {#client-side-subsystem}
On the client-side there is also an engine object (implemented in Javascript rather than Java) and zero or more facet objects (also in Javascript, obviously). The engine is responsible for distributing the rendering information computed on the server-side to the right facets, and when the user interacts with a facet, the facet tells the engine to update the whole UI. To do so, the engine gathers the configuration of each facet and composes the whole engine config as a single JSON object. Two separate AJAX calls are made with that engine config, one to retrieve the rows to render, and one to re-compute the rendering information for the facets because changing one facet does affect all the other facets.

View File

@ -15,7 +15,7 @@ You will need:
From the top level directory in the OpenRefine application you can build, test and run OpenRefine using the `./refine` shell script (if you are working in a \*nix shell), or using the `refine.bat` script from the Windows command line. Note that the `refine.bat` on Windows only supports a subset of the functionality, supported by the `refine` shell script. The example commands below are using the `./refine` shell script, and you will need to use `refine.bat` if you are working from the Windows command line.
### Set up JDK
### Set up JDK {#set-up-jdk}
You must [install JDK](https://jdk.java.net/15/) and set the JAVA_HOME environment variable (please ensure it points to the JDK, and not the JRE).
@ -77,7 +77,7 @@ export JAVA_HOME="$(/usr/libexec/java_home -v 13)"
<TabItem value="linux">
##### With the terminal
##### With the terminal {#with-the-terminal}
Enter the following:
@ -87,7 +87,7 @@ sudo apt install default-jre
This probably wont install the latest JDK package available on the Java website, but it is faster and more straightforward. (At the time of writing, it installs OpenJDK 11.0.7.)
##### Manually
##### Manually {#manually}
First, [extract the JDK package](https://openjdk.java.net/install/) to the new directory `usr/lib/jvm`:
@ -132,7 +132,7 @@ It should show the path you set above.
### Maven (Optional)
### Maven (Optional) {#maven-optional}
OpenRefine's build script will download Maven for you and use it, if not found already locally installed.
If you will be using your Maven installation instead of OpenRefine's build script download installation, then set the `MVN_HOME` environment variable. You may need to reboot your machine after setting these environment variables. If you receive a message `Could not find the main class: com.google.refine.Refine. Program will exit.` it is likely `JAVA_HOME` is not set correctly.
@ -145,7 +145,7 @@ MAVEN_HOME=E:\Downloads\apache-maven-3.5.4-bin\apache-maven-3.5.4\
NOTE: You can use Maven commands directly, but running some goals in isolation might fail (try adding the `compile test-compile` goals in your invocation if that is the case).
### Building
### Building {#building}
To see what functions are supported by OpenRefine's build system, type
```shell
@ -158,7 +158,7 @@ To build the OpenRefine application from source type:
./refine build
```
### Testing
### Testing {#testing}
Since OpenRefine is composed of two parts, a server and a in-browser UI, the testing system reflects that:
* on the server side, it's powered by [TestNG](http://testng.org/) and the unit tests are written in Java;
@ -182,14 +182,14 @@ If you want to run only the client side portion of the tests, use:
./refine ui_test chrome
```
## Running
## Running {#running}
To run OpenRefine from the command line (assuming you have been able to build from the source code successfully)
```shell
./refine
```
By default, OpenRefine will use [refine.ini](https://github.com/OpenRefine/OpenRefine/blob/master/refine.ini) for configuration. You can copy it and rename it to `refine-dev.ini`, which will be used for configuration instead. `refine-dev.ini` won't be tracked by Git, so feel free to put your custom configurations into it.
## Building Distributions (Kits)
## Building Distributions (Kits) {#building-distributions-kits}
The Refine build system uses Apache Ant to automate the creation of the installation packages for the different operating systems. The packages are currently optimized to run on Mac OS X which is the only platform capable of creating the packages for all three OS that we support.
@ -200,7 +200,7 @@ To build the distributions type
```
where 'version' is the release version.
## Building, Testing and Running OpenRefine from Eclipse
## Building, Testing and Running OpenRefine from Eclipse {#building-testing-and-running-openrefine-from-eclipse}
OpenRefine' source comes with Maven configuration files which are recognized by [Eclipse](http://www.eclipse.org/) if the Eclipse Maven plugin (m2e) is installed.
At the command line, go to a directory **not** under your Eclipse workspace directory and check out the source:
@ -224,22 +224,22 @@ Right click on the `server` subproject, click `Run as...` and `Run configuration
This will add a run configuration that you can then use to run OpenRefine from Eclipse.
## Testing in Eclipse
## Testing in Eclipse {#testing-in-eclipse}
You can run the server tests directly from Eclipse. To do that you need to have the TestNG launcher plugin installed, as well as the TestNG M2E plugin (for integration with Maven). If you don't have it, you can get it by [installing new software](https://help.eclipse.org/2020-03/index.jsp?topic=/org.eclipse.platform.doc.user/tasks/tasks-129.htm) from this update URL http://dl.bintray.com/testng-team/testng-eclipse-release/
Once the TestNG launching plugin is installed in your Eclipse, right click on the source folder "main/tests/server/src", select `Run As` -> `TestNG Test`. This should open a new tab with the TestNG launcher running the OpenRefine tests.
### Test coverage in Eclipse
### Test coverage in Eclipse {#test-coverage-in-eclipse}
It is possible to analyze test coverage in Eclipse with the `EclEmma Java Code Coverage` plugin. It will add a `Coverage as…` menu similar to the `Run as…` and `Debug as…` menus which will then display the covered and missed lines in the source editor.
### Debug with Eclipse
### Debug with Eclipse {#debug-with-eclipse}
Here's an example of putting configuration in Eclipse for debugging, like putting values for the Google Data extension. Other type of configurations that can be set are memory, Wikidata login information and more.
![Screenshot of Eclipse debug configuration](/img/eclipse-debug-config.png)
## Building, Testing and Running OpenRefine from IntelliJ idea
## Building, Testing and Running OpenRefine from IntelliJ idea {#building-testing-and-running-openrefine-from-intellij-idea}
At the command line, go to a directory you want to save the OpenRefine project and execute the following command to clone the repository:

View File

@ -6,14 +6,14 @@ sidebar_label: Contributing
Please read the general [guidelines on contributing to OpenRefine](https://github.com/OpenRefine/OpenRefine/blob/master/CONTRIBUTING.md) first, then review the information on [reporting and tracking issues](#reporting-and-tracking-issues), and on making your [first pull request](#your-first-pull-request) below)
## Reporting and tracking issues
## Reporting and tracking issues {#reporting-and-tracking-issues}
If you need to file a bug or request a feature, [create an Issue in the OpenRefine Github repository](https://github.com/OpenRefine/OpenRefine/issues). Github issues should be used for reporting specific bugs and requesting specific features. If you just don't know how to do something using OpenRefine, or want to discuss some ideas, please:
- [Try the user manual](/)
- [post to our OpenRefine mailing list](http://groups.google.com/group/openrefine/)
## Contributing to the documentation
## Contributing to the documentation {#contributing-to-the-documentation}
We use [Docusaurus](https://docusaurus.io/) for our docs. For small documentation changes, you should be able to edit the Markdown files directly and submit them as a pull request. A preview of the docs will be generated automatically. But it is also
possible to preview your changes locally. Assuming you have [Node.js](https://nodejs.org/en/download/) installed (which includes npm), you can install Docusaurus with:
@ -41,7 +41,7 @@ You can also spin a local web server to serve the docs for you, with auto-refres
yarn start
```
## Your first code pull request
## Your first code pull request {#your-first-code-pull-request}
This describes the overall steps to your first code contribution in OpenRefine. If you have trouble with any of these steps feel free to reach out on the [developer mailing list](https://groups.google.com/forum/#!forum/openrefine-dev) or the [Gitter channel](https://gitter.im/OpenRefine/OpenRefine).

View File

@ -9,17 +9,17 @@ Please be aware that the OpenRefine roadmap is subject to change at any time, so
If there are features you would like to see that are not currently listed here or in current [milestones](https://github.com/OpenRefine/OpenRefine/milestones), [projects](https://github.com/OpenRefine/OpenRefine/projects) and [issues](https://github.com/OpenRefine/OpenRefine/issues), please add them to the [issue tracker](https://github.com/OpenRefine/OpenRefine/issues).
## Planned releases
## Planned releases {#planned-releases}
### 4.0
### 4.0 {#40}
[New backend storage option to allow using much bigger datasets at the expense of real-time feedback.](https://github.com/OpenRefine/OpenRefine/milestone/7)
New UI (possibly Vue or React based)
## Work in progress
## Work in progress {#work-in-progress}
Alongside the planned releases there are often smaller pieces of work in progress. Check for [recently updated issues](https://github.com/OpenRefine/OpenRefine/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) and [pull requests](https://github.com/OpenRefine/OpenRefine/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc) to see what is currently in the works.
## On the back burner
## On the back burner {#on-the-back-burner}
Some aspects of OpenRefine have previously been targeted for release, but have not made it into a release and have not been worked on recently. If you would like to see features in these areas, please create an issue the describes what development you would like to see:
- Streamlining traditional features

View File

@ -6,7 +6,7 @@ sidebar_label: Functional tests
import useBaseUrl from '@docusaurus/useBaseUrl';
## Introduction
## Introduction {#introduction}
OpenRefine interface is tested with the [Cypress framework](https://www.cypress.io/).
With Cypress, tests are performing assertions using a real browser, the same way a real user would use the software.
@ -18,7 +18,7 @@ Cypress tests can be ran
If you are writing tests, the Cypress test runner is good enough, and the command-line is mainly used by the CI/CD platform (Github actions)
## Cypress brief overview
## Cypress brief overview {#cypress-brief-overview}
Cypress operates insides a browser, it's internally using NodeJS.
That's a key difference with tools such as Selenium.
@ -36,14 +36,14 @@ The general workflow of a Cypress test is to
- Trigger user actions
- Assert that the DOM contains expected texts and elements using selectors
## Getting started
## Getting started {#getting-started}
If this is the first time you use Cypress, it is recommended for you to get familiar with the tool.
- [Cypress overview](https://docs.cypress.io/guides/overview/why-cypress.html)
- [Cypress examples of tests and syntax](https://example.cypress.io/)
### 1. Install Cypress
### 1. Install Cypress {#1-install-cypress}
You will need:
@ -58,7 +58,7 @@ cd ./main/tests/cypress
yarn install
```
### 2. Start the test runner
### 2. Start the test runner {#2-start-the-test-runner}
The test runner assumes that OpenRefine is up and running on the local machine, the tests themselves do not launch OpenRefine, nor restarts it.
@ -74,21 +74,21 @@ Then start Cypress
yarn --cwd ./main/tests/cypress run cypress open
```
### 3. Run the existing tests
### 3. Run the existing tests {#3-run-the-existing-tests}
Once the test runner is up, you can choose to run one or several tests by selecting them from the interface.
Click on one of them and the test will start.
### 4. Add your first test
### 4. Add your first test {#4-add-your-first-test}
- Add a `test.spec.js` into the `main/tests/cypress/cypress/integration` folder.
- The test is instantly available in the list
- Click on the test
- Start to add some code
## Tests technical documentation
## Tests technical documentation {#tests-technical-documentation}
### A typical test
### A typical test {#a-typical-test}
A typical OpenRefine test starts with the following code
@ -117,14 +117,14 @@ For example
See below on the dedicated section 'Testing utilities'
### Testing guidelines
### Testing guidelines {#testing-guidelines}
- `cy.wait` should be used in the last resort scenario. It's considered a bad practice, though sometimes there is no other choice
- Tests should remain isolated from each other. It's best to try one feature at the time
- A test should always start with a fresh project
- The name of the files should mirror the OpenRefine UI organization
### Testing utilities
### Testing utilities {#testing-utilities}
OpenRefine contributors have added some utility methods on the top of the Cypress framework.
Those methods perform some common actions or assertions on OpenRefine, to avoid code duplication.
@ -156,12 +156,12 @@ The fixture parameter can be
Those datasets live in `cypress/fixtures`
### Browsers
### Browsers {#browsers}
In terms of browsers, Cypress is using what is installed on your operating system.
See the [Cypress documentation](https://docs.cypress.io/guides/guides/launching-browsers.html#Browsers) for a list of supported browsers
### Folder organization
### Folder organization {#folder-organization}
Tests are located in `main/tests/cypress/cypress` folder.
The test should not use any file outside the cypress folder.
@ -172,7 +172,7 @@ The test should not use any file outside the cypress folder.
- `/screenshots` and `/videos` contains the recording of the tests, Git ignored
- `/support` is a custom library of assertion and common user actions, to avoid code duplication in the tests themselves
### Configuration
### Configuration {#configuration}
Cypress execution can be configured with environment variables, they can be declared at the OS level, or when running the test
@ -182,11 +182,11 @@ Available variables are
Cypress contains [exaustive documentation](https://docs.cypress.io/guides/guides/environment-variables.html#Setting) about configuration, but here are two simple ways to configure the execution of the tests:
#### Overriding with a cypress.env.json file
#### Overriding with a cypress.env.json file {#overriding-with-a-cypressenvjson-file}
This file is ignored by Git, and you can use it to configure Cypress locally
#### Command-line
#### Command-line {#command-line}
You can pass variables at the command-line level
@ -194,7 +194,7 @@ You can pass variables at the command-line level
yarn --cwd ./main/tests/cypress run cypress open --env OPENREFINE_URL="http://localhost:1234"
```
### Visual testing
### Visual testing {#visual-testing}
Tests generally ensure application behavior by making assertions against the DOM, to ensure specific texts or css attributes are present in the document body.
Visual testing, on the contrary, is a way to test applications by comparing images.
@ -212,7 +212,7 @@ Identified cases are so far:
Reference screenshots (Called snapshots), are stored in /cypress/snapshots.
And a snapshot can be taken for the whole page, or just a single part of the page.
#### When a visual test fails
#### When a visual test fails {#when-a-visual-test-fails}
First, Cypress will display the following error message:
@ -223,7 +223,7 @@ The diff images shows the reference image on the left, the image that was taken
![Diff image when a visual test fails](/img/failed-visual-test.png)
## CI/CD
## CI/CD {#cicd}
In CI/CD, tests are run headless, with the following command-line
@ -233,7 +233,7 @@ In CI/CD, tests are run headless, with the following command-line
Results are displayed in the standard output
## Resources
## Resources {#resources}
[Cypress command line options](https://docs.cypress.io/guides/guides/command-line.html#Installation)
[Lots of good Cypress examples](https://example.cypress.io/)

View File

@ -6,29 +6,29 @@ sidebar_label: Maintainer guidelines
This page describes our practices to review issues and pull requests in the OpenRefine project.
## Reviewing issues
## Reviewing issues {#reviewing-issues}
When people create new issues, they automatically get assigned [the "to be reviewed" tag](https://github.com/OpenRefine/OpenRefine/issues?q=is%3Aissue+is%3Aopen+label%3A%22to+be+reviewed%22).
Ideally, for each of these issues, someone familiar with OpenRefine (not necessarily a developer!) should read the issue and try to determine if there is a genuine bug to fix, or if the enhancement request is legitimate. In those cases, we can remove the "to be reviewed" tag and leave the issue open. In the others, the issue should be politely closed.
### Bugs
### Bugs {#bugs}
For a bug, we should first check if it is a real unexpected behaviour or if just comes from a misunderstanding of the intended behaviour of the tool (which could suggest an improvement to the documentation). Then, if it sounds like a genuine problem, we need to check if it can be reproduced independently on the master branch. If the issue does not give enough details about the bug to reproduce it on master, mark it as "not reproducible" and ask the reporter for more information. After some time without any information from the reporter, we can close the issue.
### Enhancement requests
### Enhancement requests {#enhancement-requests}
For an enhancement, we need to make a judgment call of whether the proposed functionality is in the scope of the project. There is no universal rule for this of course, so just use your own intuition: do you think this would improve the tool? Would it be consistent with the spirit of the project? Trust your own opinion - if people disagree, they can have a discussion on the issue.
### Tagging good first issues
### Tagging good first issues {#tagging-good-first-issues}
Adding [the "good first issue" tag](https://github.com/OpenRefine/OpenRefine/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) is something that requires a bit more familiarity with the development process. This tag is used by GitHub to showcase issues in some project lists and we point interested potential contributors to it. It is therefore important that tackling these issues gives them a nice onboarding experience, with as few hurdles as possible.
Develepers should add the "good first issue" tag when they are confident that they can provide a good pull request for the issue with at most a few hours of work. Also, solving the issue should not require any difficult design decision. The issue should be uncontentious: it should be clear that the proposed solution should be accepted by the team.
## Reviewing pull requests
## Reviewing pull requests {#reviewing-pull-requests}
### Process
### Process {#process}
1. A committer reviews the PR to check for the requirements below and tests it. Each PR should be linked to one or more corresponding issues and the reviewer should check that those are correctly addressed by the PR. The reviewer should be someone else than the PR author. For PRs with an important impact or contentious issues, it is important to leave enough time for other contributors to give their opinion.
@ -38,14 +38,14 @@ Develepers should add the "good first issue" tag when they are confident that th
4. If the change is worth noting for users or developers, the reviewer adds an entry in the changelog for the next release (such as [Changes for 3.5](https://github.com/OpenRefine/OpenRefine/wiki/Changes-for-3.5))
### Requirements
### Requirements {#requirements}
#### Code style
#### Code style {#code-style}
Currently, only our code style for integration tests (using Cypress) is codified and enforced by the CI.
For the rest, we rely on imitating the surrounding code. [We should decide on a code style and check it in the CI for other areas of the tool](https://github.com/OpenRefine/OpenRefine/issues/2338).
#### Testing
#### Testing {#testing}
We currently rely have two sorts of tests:
* Backend tests, in Java, written with the TestNG framework. Their granularity varies, but generally speaking they are unit tests which test components in isolation.
@ -56,32 +56,32 @@ Functional changes to the UI should ideally come with corresponding Cypress test
Those tests should be supplied in the same PR as the one that touches the product code.
#### Documentation
#### Documentation {#documentation}
Changes to user-facing functionality should be reflected in the docs. Those documentation changes should happen in the same PR as the one that touches the product code.
#### UI style
#### UI style {#ui-style}
We do not have formally defined UI style guidelines. Contributors are invited to imitate the existing style.
#### Licensing and dependencies
#### Licensing and dependencies {#licensing-and-dependencies}
Dependencies can only be added if they are released under a license that is compatible with our BSD Clause-3 license.
One should pay attention to the size of the dependencies since they inflate the size of the release bundles.
#### Continuous integration
#### Continuous integration {#continuous-integration}
The various check statuses reported by our continuous integration suite should be green.
### Special pull requests
### Special pull requests {#special-pull-requests}
#### Weblate PRs
#### Weblate PRs {#weblate-prs}
Weblate PRs should not be squashed as it prevents Weblate from recognizing that the corresponding changes have been made in master. They should be merged without squashing.
Reviewing Weblate PRs only amonuts to a quick visual sanity check as maintainers are not expected to master the languages involved. If corrections need to be made, they should be done in Weblate itself.
#### Dependabot PRs
#### Dependabot PRs {#dependabot-prs}
When reviewing a Dependabot PR it is generally useful to pay attention to:
* the type of version change: most libraries follow the "semver" versioning convention, which indicates the nature of the change.

View File

@ -4,19 +4,19 @@ title: Migrating older Extensions
sidebar_label: Migrating older Extensions
---
## Migrating from Ant to Maven
## Migrating from Ant to Maven {#migrating-from-ant-to-maven}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change}
Ant is a fairly old (antique?) build system that does not incorporate any dependency management.
By migrating to Maven we are making it easier for developers to extend OpenRefine with new libraries, and stop having to ship dozens of .jar files in the repository. Using the Maven repository also encourages developers to add dependencies to released versions of libraries instead of custom snapshots that are hard to update.
### When was this change made?
### When was this change made? {#when-was-this-change-made}
The migration was done between 3.0 and 3.1-beta with this commit:
https://github.com/OpenRefine/OpenRefine/commit/47323a9e750a3bc9d43af606006b5eb20ca397b8
### How to migrate an extension
### How to migrate an extension {#how-to-migrate-an-extension}
You will need to write a `pom.xml` in the root folder of your extension to configure the compilation process with Maven. Sample `pom.xml` files for extensions can be found in the extensions that are shipped with OpenRefine (`gdata`, `database`, `jython`, `pc-axis` and `wikidata`). A sample extension (`sample`) is also provided, with a minimal build file.
@ -56,17 +56,17 @@ And add the dependency to the `<dependencies>` section as usual:
<version>0.5.3-SNAPSHOT</version>
</dependency>
## Migrating to Wikimedia's i18n jQuery plugin
## Migrating to Wikimedia's i18n jQuery plugin {#migrating-to-wikimedias-i18n-jquery-plugin}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change-1}
This adds various important localization features, such as the ability to handle plurals or interpolation. This also restores the language fallback (displaying strings in English if they are not available in the target language) which did not work with the previous set up.
### When was the migration made?
### When was the migration made? {#when-was-the-migration-made}
The migration was made between 3.1-beta and 3.1, with this commit: https://github.com/OpenRefine/OpenRefine/commit/22322bd0272e99869ab8381b1f28696cc7a26721
### How to migrate an extension
### How to migrate an extension {#how-to-migrate-an-extension-1}
You will need to update your translation files, merging nested objets in one global object, concatenating keys. You can do this by running the following Python script on all your JSON translation files:
@ -97,27 +97,27 @@ Then your javascript files which retrieve the translated strings should be updat
You can then chase down the places where you are concatenating translated strings, and replace that with more flexible patterns using [the plugin's features](https://github.com/wikimedia/jquery.i18n#jqueryi18n-plugin).
## Migrating from org.json to Jackson
## Migrating from org.json to Jackson {#migrating-from-orgjson-to-jackson}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change-2}
The org.json (or json-java) library has multiple drawbacks.
* First, it has limited functionality - all the serialization and deserialization has to be done explicitly - an important proportion of OpenRefine's code was dedicated to implementing these;
* Second, its implementation is not optimized for speed - multiple projects have reported speedups when migrating to more modern JSON libraries;
* Third, and this was the decisive factor to initiate the migration: [its license](https://json.org/license) is the MIT license with an additional condition which makes it non-free. Getting rid of this dependency was required by the Software Freedom Conservancy as a prerequisite to become a fiscal sponsor for the project.
### When was the migration made?
### When was the migration made? {#when-was-the-migration-made-1}
This change was made between 3.1 and 3.2-beta, with this commit: https://github.com/OpenRefine/OpenRefine/commit/5639f1b2f17303b03026629d763dcb6fef98550b
### How to migrate an extension or fork
### How to migrate an extension or fork {#how-to-migrate-an-extension-or-fork}
You will need to use the Jackson library to serialize the classes that implement interfaces or extend classes exposed by OpenRefine.
The interface `Jsonizable` was deleted. Any class that used to implement this now needs to be serializable by Jackson, producing the same format as the previous serialization code. This applies to any operation, facet, overlay model or GREL function. If you are new to Jackson, have a look at [this tutorial](https://www.baeldung.com/jackson) to learn how to annotate your class for serialization. Once this is done, you can remove the `void write(JSONWriter writer, Properties options)` method from your class. Note that it is important that you do this migration for all classes implementing the `Jsonizable` interface that are exposed to OpenRefine's core.
We encourage you to migrate out of org.json completely, but this is only required for the classes that interact with OpenRefine's core.
#### General notes about migrating
#### General notes about migrating {#general-notes-about-migrating}
OpenRefine's ObjectMapper is available at `ParsingUtilities.mapper`. It is configured to only serialize the fields and getters that have been explicitly marked with `@JsonProperty` (to avoid accidental JSON format changes due to refactoring). On deserialization it will ignore any field in the JSON payload that does not correspond to a field in the Java class. It has serializers and deserializers for `OffsetDateTime` and `LocalDateTime`.
@ -130,13 +130,13 @@ Useful snippets to use in tests:
Before undertaking the migration, we recommend that you write some tests which serialize and deserialize your objects. This will help you make sure that the JSON format is preserved during the migration. One way to do this is to collect some sample JSON representations of your objects, and check in your tests that deserializing these JSON payloads and serializing them back to JSON preserves the JSON payload. Some utilities are available to help you with that in [`TestUtils`](https://github.com/OpenRefine/OpenRefine/blob/master/main/tests/server/src/com/google/refine/tests/util/TestUtils.java) (we had [some to test org.json serialization](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/tests/server/src/com/google/refine/tests/util/TestUtils.java) before we got rid of the dependency, feel free to copy them).
#### For functions
#### For functions {#for-functions}
Before the migration, you had to explicitly define JSON serialization of functions with a `write` method. You should now override the getters returning the various documentation fields.
Example: `Cos` function [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/expr/functions/math/Cos.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/expr/functions/math/Cos.java).
#### For operations
#### For operations {#for-operations}
Before the JSON migration we refactored engine-dependent operations so that the engine configuration is represented by an `EngineConfig` object instead of a `JSONObject`. Therefore the constructor for your operation should be updated to use this new class. Your constructor should also be annotated to be used during deserialization.
@ -144,25 +144,25 @@ Note that you do not need to explicitly serialize the operation type, this is al
Example: `ColumnRemovalOperation` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java).
#### For changes
#### For changes {#for-changes}
Changes are serialized in plain text but often relies on JSON serialization for parts of the data. Just use the methods above with `ParsingUtilities.mapper` to maintain this behaviour.
Example: `ReconChange` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/model/changes/ReconChange.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java).
#### For importers
#### For importers {#for-importers}
The importing options have been migrated from `JSONObject` to `ObjectNode`. Your compiler should help you propagate this change. Utility functions in `JSONUtilities` have been migrated to Jackson so you should have minimal changes if you used them.
Example: `TabularImportingParserBase` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/importers/TabularImportingParserBase.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/importers/TabularImportingParserBase.java).
#### For overlay models
#### For overlay models {#for-overlay-models}
Migrate serialization and deserialization as for other objects.
Example: `WikibaseSchema` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/extensions/wikidata/src/org/openrefine/wikidata/schema/WikibaseSchema.java#L203) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/extensions/wikidata/src/org/openrefine/wikidata/schema/WikibaseSchema.java#L60)
#### For preference values
#### For preference values {#for-preference-values}
Any class that is stored in OpenRefine's preference now needs to implement the `com.google.refine.preferences.PreferenceValue` interface. The static `load` method and the `write` method used previously for deserialization should be deleted and regular Jackson serialization and deserialization should be implemented instead. Note that you do not need to explicitly serialize the class name, this is already done for you by the interface.

View File

@ -10,7 +10,7 @@ This is a generic API reference for interacting with OpenRefine's HTTP API.
For OpenRefine 3.3 and later, all POST requests need to include a CSRF token as described here: https://github.com/OpenRefine/OpenRefine/wiki/Changes-for-3.3#csrf-protection-changes
## Create project:
## Create project: {#create-project}
> **Command:** _POST /command/core/create-project-from-upload_
@ -41,7 +41,7 @@ If the project creation is successful, you will be redirected to a URL of the fo
From the project parameter you can extract the project id for use in future API calls. The content of the response is the HTML for the OpenRefine interface for viewing the project.
### Get project models:
### Get project models: {#get-project-models}
> **Command:** _GET /command/core/get-models?_
@ -49,7 +49,7 @@ From the project parameter you can extract the project id for use in future API
Recovers the models for the specific project. This includes columns, records, overlay models, scripting. In the columnModel a list of the columns is displayed, key index and name, and column groupings.
### Response:
### Response: {#response}
**On success:**
```JSON
{
@ -112,7 +112,7 @@ Recovers the models for the specific project. This includes columns, records, o
## Apply operations
## Apply operations {#apply-operations}
> **Command:** _POST /command/core/apply-operations?_
@ -159,7 +159,7 @@ Example of a Valid JSON **Array**
On success returns JSON response
`{ "code" : "ok" }`
## Export rows
## Export rows {#export-rows}
> **Command:** _POST /command/core/export-rows_
@ -181,7 +181,7 @@ Returns exported row data in the specified format. The formats supported will de
* ods
* html
## Delete project
## Delete project {#delete-project}
> **Command:** _POST /command/core/delete-project_
@ -189,7 +189,7 @@ Returns exported row data in the specified format. The formats supported will de
Returns JSON response
## Check status of async processes
## Check status of async processes {#check-status-of-async-processes}
> **Command:** _GET /command/core/get-processes_
@ -197,13 +197,13 @@ Returns JSON response
Returns JSON response
## Get all projects metadata:
## Get all projects metadata: {#get-all-projects-metadata}
> **Command:** _GET /command/core/get-all-project-metadata_
Recovers the meta data for all projects. This includes the project's id, name, time of creation and last time of modification.
### Response:
### Response: {#response-1}
```json
{
"projects":{
@ -217,12 +217,12 @@ Recovers the meta data for all projects. This includes the project's id, name, t
}
```
## Expression Preview
## Expression Preview {#expression-preview}
> **Command:** _POST /command/core/preview-expression_
Pass some expression (GREL or otherwise) to the server where it will be executed on selected columns and the result returned.
### Parameters:
### Parameters: {#parameters}
* **cellIndex**: _[column]_
The cell/column you wish to execute the expression on.
* **rowIndices**: _[rows]_
@ -236,7 +236,7 @@ A boolean value (true/false) indicating whether or not this command should be re
* **repeatCount**: _[repeatCount]_
The maximum amount of times a command will be repeated.
### Response:
### Response: {#response-2}
**On success:**
```json
{
@ -256,30 +256,30 @@ The result array will hold up to ten results, depending on how many rows there a
}
```
## Third-party software libraries
## Third-party software libraries {#third-party-software-libraries}
Libraries using the [OpenRefine API](openrefine-api):
### Python
### Python {#python}
* [refine-client-py](https://github.com/PaulMakepeace/refine-client-py/)
* Or this fork of the above with an extended CLI [openrefine-client](https://github.com/felixlohmeier/openrefine-client)
* [refine-python](https://github.com/maxogden/refine-python)
### Ruby
### Ruby {#ruby}
* [refine-ruby](https://github.com/distillytics/refine-ruby)
* The above is a maintained fork of [google-refine](https://github.com/maxogden/refine-ruby)
* [google_refine](https://github.com/chengguangnan/google_refine)
### NodeJS
### NodeJS {#nodejs}
* [node-openrefine](https://github.com/pm5/node-openrefine)
### R
### R {#r}
* [rrefine](https://cran.r-project.org/web/packages/rrefine/index.html)
### PHP
### PHP {#php}
* [openrefine-php-client](https://github.com/keboola/openrefine-php-client)
### Java
### Java {#java}
* [refine-java](https://github.com/dtap-gmbh/refine-java)
### Bash
### Bash {#bash}
* [bash-refine.sh](https://gist.github.com/felixlohmeier/d76bd27fbc4b8ab6d683822cdf61f81d) (templates for shell scripts)

View File

@ -12,11 +12,11 @@ You can help translate OpenRefine into your language by visiting [Weblate](https
Click to help translate --> [Weblate](https://hosted.weblate.org/engage/openrefine/?utm_source=widget)
## User entry of language data ##
## User entry of language data ## {#user-entry-of-language-data-}
Localized strings are entered in a .json file, one per language. They are located in the folder `main/webapp/modules/core/langs/` in a file named `translation-xx`.json, where xx is the language code (i.e. fr for French).
### Simple case of localized string ###
### Simple case of localized string ### {#simple-case-of-localized-string-}
This is an example of a simple string, with the start of the JSON file. This example is for French.
```
{
@ -28,22 +28,22 @@ This is an example of a simple string, with the start of the JSON file. This exa
So the key `core-index/help` will render as `"Aide"` in French.
### Localization with a parameterized value ###
### Localization with a parameterized value ### {#localization-with-a-parameterized-value-}
In this example, the name of the column (represented by `$1` in this example), will be substituted with the string of the name of the column.
`"core-facets/edit-facet-title": "Cliquez ici pour éditer le nom de la facette\nColonne : $1",`
### Localization with a singular/plural value ###
### Localization with a singular/plural value ### {#localization-with-a-singularplural-value-}
In this example, one of the parameter will have a different string depending if the value is 1 or another value.
In this example, the string for page, the second parameter, `$2`, will have an « s » or not depending on the value of `$2`.
`"core-views/goto-page": "$1 de $2 {{plural:$2|page|pages}}"`
## Front End Coding
## Front End Coding {#front-end-coding}
The OpenRefine front end has been localized using the [Wikidata jquery.i18n library](https://github.com/OpenRefine/OpenRefine/pull/1285. The localized text is stored in a JSON dictionary on the server and retrieved with a new OpenRefine command.
### Adding a new string
### Adding a new string {#adding-a-new-string}
There should be no hard-coded language strings in the HTML or JSON used for the front end. If you need a new string, first check the existing strings to make sure there isn't an equivalent string, **in an equivalent context**, that you can reuse. Context is important because it can affect how the same literal English text is translated. This cuts down on the amount of text which needs to be translated.
@ -70,7 +70,7 @@ or, if you need to embed HTML tags:
$('#new-html-element-id').html($.i18n('section/newkey']);
```
### Adding a new language
### Adding a new language {#adding-a-new-language}
The language dictionaries are stored in the `langs` subdirectory for the module e.g.
@ -81,14 +81,14 @@ The language dictionaries are stored in the `langs` subdirectory for the module
To add support for a new language, copy `translation-en.json` to `translation-<locale>.json` and have your translator translate all the value strings (ie right hand side).
#### Main interface
#### Main interface {#main-interface}
The translation is best done [with Weblate](https://hosted.weblate.org/engage/openrefine/?utm_source=widget). Files are periodically merged by the developer team.
Run the latest (hopefully cloned from github) version and check whether translated words fit to the layout. Not all items can be translated word by word, especially into non-Ìndo-European languages.
If you see any text which remains in English even when you have checked all items, please create bug report in the issue tracker so that the developers can fix it.
#### Extensions
#### Extensions {#extensions}
Extensions can be translated via Weblate just like the core software.
@ -100,6 +100,6 @@ To support a new language file, the developer should add a corresponding entry t
<option value="<locale>">[Language Label]</option>
```
## Server / Backend Coding
## Server / Backend Coding {#server--backend-coding}
Currently no back end functions are translated, so things like error messages, undo history, etc may appear in English form. Rather than sending raw error text to the front end, it's better to send an error code which is translated into text on the front end. This allows for multiple languages to be supported.

View File

@ -4,7 +4,7 @@ title: Writing Extensions
sidebar_label: Writing Extensions
---
## Introduction
## Introduction {#introduction}
This is a very brief overview of the structure of OpenRefine extensions. For more detailed documentation and step-by-step guides please see the following external documentation/tutorials:
@ -20,7 +20,7 @@ Extensions that come with the code base are located under [the extensions subdir
Please note that you should bundle any dependencies yourself, so you are insulated from OpenRefine packaging changes over time.
### Directory Layout
### Directory Layout {#directory-layout}
A OpenRefine extension sits in a file directory that contains the following files and sub-directories:
@ -62,11 +62,11 @@ The `pom.xml` file is an [Apache Maven](http://maven.apache.org/) build file. Yo
Note that your extension's Java code would need to reference some libraries used in OpenRefine and OpenRefine's Java classes themselves. These dependencies are reflected in the Maven configuration for the extension.
## Sample extension
## Sample extension {#sample-extension}
The sample extension is included in the code base so that you can copy it and get started on writing your own extension. After you copy it, make sure you change its name inside its `module/MOD-INF/controller.js` file.
### Basic Structure
### Basic Structure {#basic-structure}
The sample extension's code is in `refine/extensions/sample/`. In that directory, Java source code is contained under the `src` sub-directory, and webapp code is under the `module` sub-directory. Here is the full directory layout:
@ -99,15 +99,15 @@ Client-side code is in the inner `module` sub-directory. They can be plain old .
The `init()` function in `controller.js` allows the extension to register various client-side handlers for augmenting pages served by Refine's core. These handlers are feature-specific. For example, [this is where the jython extension adds its parser](https://github.com/OpenRefine/OpenRefine/blob/master/extensions/jython/module/MOD-INF/controller.js#L46). As for the sample extension, it adds its script `project-injection.js` and style `project-injection.less` into the `/project` page. If you [view the source of the /project page](http://127.0.0.1:3333/project), you will see references to those two files.
### Wiring Up the Extension
### Wiring Up the Extension {#wiring-up-the-extension}
The Extensions are loaded by the Butterfly framework. Butterfly refers to these as 'modules'. [The location of modules is set in the `main/webapp/butterfly.properties` file](https://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/WEB-INF/butterfly.properties#L27). Butterfly simply descends into each of those paths and looks for any `MOD-INF` directories.
For more information, see [Extension Points](https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points).
## Extension points
## Extension points {#extension-points}
### Client-side: Javascript and CSS
### Client-side: Javascript and CSS {#client-side-javascript-and-css}
The UI in OpenRefine for working with a project is coded in [the /main/webapp/modules/core/project.vt file](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/project.vt). The file is quite small, and that's because almost all of its content is to be expanded dynamically through the Velocity variables $scriptInjection and $styleInjection. So that your own Javascript and CSS files get loaded, you need to register them with the ClientSideResourceManager, which is done in the /module/MOD-INF/controller.js file. See [the controller.js file in this sample extension code](http://github.com/OpenRefine/OpenRefine/blob/master/extensions/sample/module/MOD-INF/controller.js) for an example.
@ -128,7 +128,7 @@ You can specify one or more files for registration, and their paths are relative
Javascript Bundling: Note that `project.vt` belongs to the core module and is thus under the control of the core module's [controller.js file](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/MOD-INF/controller.js). The Javascript files to be included in `project.vt` are by default bundled together for performance. When debugging, you can prevent this bundling behavior by setting `bundle` to `false` near the top of that `controller.js` file. (If you have commit access to this code base, be sure not to check that change in.)
### Client-side: Images
### Client-side: Images {#client-side-images}
We recommend that you always refer to images through your CSS files rather than in your Javascript code. URLs to images will thus be relative to your CSS files, e.g.,
@ -144,7 +144,7 @@ If you really really absolutely need to refer to your images in your Javascript
ModuleWirings["my-extension"] + "images/x.png"
```
### Client-side: HTML Templates
### Client-side: HTML Templates {#client-side-html-templates}
Beside Javascript, CSS, and images, your extension might also include HTML templates that get loaded on the fly by your Javascript code and injected into the page's DOM. For example, here is [the Cluster edit dialog template](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/dialogs/clustering-dialog.html), which gets loaded by code in [the equivalent javascript file 'clustering-dialog.js'](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/dialogs/clustering-dialog.js):
@ -154,11 +154,11 @@ var dialog = $(DOM.loadHTML("core", "scripts/dialogs/clustering-dialog.html"));
`DOM.loadHTML` returns the content of the file as a string, and `$(...)` turns it into a DOM fragment. Where `"core"` is, you would want your extension's name. The path of the HTML file is relative to your extension's `module` subdirectory.
### Client-side: Project UI Extension Points
### Client-side: Project UI Extension Points {#client-side-project-ui-extension-points}
Getting your extension's Javascript code included in `project.vt` doesn't accomplish much by itself unless your code also registers hooks into the UI. For example, you can surely implement an exporter in Javascript, but unless you add a corresponding menu command in the UI, your user can't use your exporter.
#### Main Menu
#### Main Menu {#main-menu}
The main menu can be extended by calling any one of the methods `MenuBar.appendTo`, `MenuBar.insertBefore`, and `MenuBar.insertAfter`. Each method takes 2 arguments: an array of strings that identify a particular existing menu item or submenu, and one new single menu item or submenu or an array of menu items and submenus. For example, to insert 2 menu items and a menu separator before the menu item Project > Export Filtered Rows > Templating..., write this Javascript code wherever that would execute when your Javascript files get loaded:
@ -183,7 +183,7 @@ The array `["core/project", "core/export", "core/export-templating"]` pinpoints
See the beginning of [/main/webapp/modules/core/scripts/project/menu-bar.js](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/project/menu-bar.js) for IDs of menu items and submenus.
#### Column Header Menu
#### Column Header Menu {#column-header-menu}
The drop-down menu of each column can also be extended, but the mechanism is slightly different compared to the main menu. Because the drop-down menu for a particular column is constructed on the fly when the user actually clicks the drop-down menu button, extending the column header menu can't really be done once at start-up time, but must be done every time a column header menu gets created. So, registration in this case involves providing a function that gets called each such time:
@ -207,7 +207,7 @@ MenuSystem.appendTo(menu, ["core/facet"], [
In addition to `MenuSystem.appendTo`, you can also call `MenuSystem.insertBefore` and `MenuSystem.insertAfter` which the same 3 arguments. To see what IDs you can use, see the function `DataTableColumnHeaderUI.prototype._createMenuForColumnHeader` in [/main/webapp/modules/core/scripts/views/data-table/column-header-ui.js](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/views/data-table/column-header-ui.js).
### Server-side: Ajax Commands
### Server-side: Ajax Commands {#server-side-ajax-commands}
The client-side of OpenRefine gets things done by calling AJAX commands on the server-side. These commands must be registered with the OpenRefine servlet, so that the servlet knows how to route AJAX calls from the client-side. This can be done inside the `init` function in your extension's `controller.js` file, e.g.,
@ -220,7 +220,7 @@ function init() {
Your command will then be accessible at [http://127.0.0.1:3333/command/my-extension/my-command](http://127.0.0.1:3333/command/my-extension/my-command).
### Server-side: Operations
### Server-side: Operations {#server-side-operations}
Most commands change the project's data. Most of them do so by creating abstract operations. See the Changes, History, Processes, and Operations section of the [Server Side Architecture](https://github.com/OpenRefine/OpenRefine/wiki/Server-Side-Architecture) document.
@ -242,7 +242,7 @@ static public AbstractOperation reconstruct(Project project, JSONObject obj) thr
}
```
### Server-side: GREL
### Server-side: GREL {#server-side-grel}
GREL can be extended with new functions. This is also done in the `init` function in `controller.js`, e.g.,
@ -258,7 +258,7 @@ Packages.com.google.refine.expr.ExpressionUtils.registerBinder(
new Packages.com.foo.bar.MyBinder());
```
### Server-side: Importers
### Server-side: Importers {#server-side-importers}
You can register an importer as follows:
@ -269,7 +269,7 @@ Packages.com.google.refine.importers.ImporterRegistry.registerImporter(
The string `"importer-name"` isn't important at all. It's not really related to file extension or mime-type. Just use something unique. Your importer will be explicitly called to test if it can import something.
### Server-side: Exporters
### Server-side: Exporters {#server-side-exporters}
You can register an exporter as follows:
@ -280,7 +280,7 @@ Packages.com.google.refine.exporters.ExporterRegistry.registerExporter(
The string `"exporter-name"` isn't important at all. It's only used by the client-side to tell the server-side which exporter to use. Just use something unique and, of course, relevant.
### Server-side: Overlay Models
### Server-side: Overlay Models {#server-side-overlay-models}
Overlay models are objects attached onto a core Project object to store and manage additional data for that project. For example, the schema alignment skeleton is managed by the Protograph overlay model. An overlay model implements the interface `com.google.refine.model.OverlayModel` and can be registered like so:
@ -306,7 +306,7 @@ public void write(JSONWriter writer, Properties options) throws JSONException {
}
```
### Server-side: Scripting Languages
### Server-side: Scripting Languages {#server-side-scripting-languages}
A scripting language (such as Jython) can be registered as follows:

View File

@ -3,7 +3,7 @@ id: cellediting
title: Cell editing
sidebar_label: Cell editing
---
## Overview
## Overview {#overview}
OpenRefine offers a number of features to edit and improve the contents of cells automatically and efficiently.
@ -11,7 +11,7 @@ One way of doing this is editing through a [text facet](facets#text-facet). Once
You can apply a text facet on numbers, boolean values, and dates, but if you edit a value it will be converted into the text [data type](exploring#data-types) (regardless of whether you edit a date into another correctly-formatted date, or a “true” value into “false”, etc.).
## Transform
## Transform {#transform}
Select <span class="menuItems">Edit cells</span><span class="menuItems">Transform...</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column.
@ -21,29 +21,29 @@ You can also switch to the <span class="tabLabels">Undo / Redo</span> tab inside
OpenRefine offers you some frequently-used transformations in the next menu option, <span class="menuItems">Common transforms</span>. For more custom transforms, read up on [expressions](expressions).
## Common transforms
## Common transforms {#common-transforms}
### Trim leading and trailing whitespace
### Trim leading and trailing whitespace {#trim-leading-and-trailing-whitespace}
Often cell contents that should be identical, and look identical, are different because of space or line-break characters that are invisible to users. This function will get rid of any characters that sit before or after visible text characters.
### Collapse consecutive whitespace
### Collapse consecutive whitespace {#collapse-consecutive-whitespace}
You may find that some text cells contain what look like spaces but are actually tabs, or contain multiple spaces in a row. This function will remove all space characters that sit in sequence and replace them with a single space.
### Unescape HTML
### Unescape HTML {#unescape-html}
Your data may come from an HTML-formatted source that expresses some characters through references (such as “&amp;nbsp;” for a space, or “%u0107” for a ć) instead of the actual Unicode characters. You can use the “unescape HTML entities” transform to look for these codes and replace them with the characters they represent. For other formatting that needs to be escaped, try a custom transformation with [`escape()`](grelfunctions#escapes-s-mode).
### Replace smart quotes with ASCII
### Replace smart quotes with ASCII {#replace-smart-quotes-with-ascii}
Smart quotes (or curly quotes) recognize whether they come at the beginning or end of a string, and will generate an “open” quote (“) and a “close” quote (”). These characters are not ASCII-compliant (though they are UTF8-compliant) so you can use this tranform to replace them with a straight double quote character (") instead.
### Case transforms
### Case transforms {#case-transforms}
You can transform an entire column of text into UPPERCASE, lowercase, or Title Case using these three options. This can be useful if you are planning to do textual analysis and wish to avoid case-sensitivity (which some functions are) causing problems in your analysis. Consider also using a [custom facet](facets#custom-text-facet) to temporarily modify cases instead of this permanent operation if appropriate.
### Data-type transforms
### Data-type transforms {#data-type-transforms}
As detailed in [Data types](exploring#data-types), OpenRefine recognizes different data types: string, number, boolean, and date. When you use these transforms, OpenRefine will check to see if the given values can be converted, then both transform the data in the cells (such as “3” as a text string to “3” as a number) and convert the data type on each successfully transformed cell. Cells that cannot be transformed will output the original value and maintain their original data type.
@ -55,7 +55,7 @@ Because these common transforms do not offer the ability to output an error inst
You can also convert cells into null values or empty strings. This can be useful if you wish to, for example, erase duplicates that you have identified and are analyzing as a subset.
## Fill down and blank down
## Fill down and blank down {#fill-down-and-blank-down}
Fill down and blank down are two functions most frequently used when encountering data organized into [records](exploring#row-types-rows-vs-records) - that is, multiple rows associated with one specific entity.
@ -65,7 +65,7 @@ Be careful that your data is sorted properly before you begin blanking down - no
If, conversely, youve received data with empty cells because it was already in something akin to records mode, you can fill down information to the rest of the rows. This will duplicate whatever value exists in the topmost cell with a value: if the first row in the record is blank, it will take information from the next cell, or the cell after that, until it finds a value. The blank cells above this will remain blank.
## Split multi-valued cells
## Split multi-valued cells {#split-multi-valued-cells}
Splitting cells with more than one value in them is a common way to get your data from single rows into [multi-row records](exploring#rows-vs-records). Survey data, for example, frequently allows respondents to “Select all that apply,” or an inventory list might have items filed under more than one category.
@ -79,11 +79,11 @@ You can also split based on the lengths of the strings you expect to find. This
If you have data that should be split into multiple columns instead of multiple rows, see [Split into several columns](columnediting#split-into-several-columns).
## Join multi-valued cells
## Join multi-valued cells {#join-multi-valued-cells}
Joining will reverse the “split multi-valued cells” operation, or join up information from multiple rows into one row. All the strings will be compressed into the topmost cell in the record, in the order they appear. A window will appear where you can set the separator; the default is a comma and a space (, ). This separator is optional. We suggest the separator | as a sufficiently rare character.
## Cluster and edit
## Cluster and edit {#cluster-and-edit}
Creating a facet on a column is a great way to look for inconsistencies in your data; clustering is a great way to fix those inconsistencies. Clustering uses a variety of comparison methods to find text entries that are similar but not exact, then shares those results with you so that you can merge the cells that should match. Where editing a single cell or text facet at a time can be time-consuming and difficult, clustering is quick and streamlined.
@ -101,7 +101,7 @@ For each cluster identified, you can pick one of the existing values to apply to
You can also export the currently identified clusters as a JSON file, or close the window with or without applying your changes. You can also use the histograms on the right to narrow down to, for example, clusters with lots of matching rows, or clusters of long or short values.
### Clustering methods
### Clustering methods {#clustering-methods}
You dont need to understand the details behind each clustering method to apply them successfully to your data. The order in which these methods are presented in the interface and on this page is the order we recommend - starting with the most strict rules and moving to the most lax, which require more human supervision to apply correctly.
@ -118,7 +118,7 @@ The clustering pop-up window offers you a variety of clustering methods:
* levenshtein
* ppm
#### Key collision
#### Key collision {#key-collision}
**Key collisions** are very fast and can process millions of cells in seconds:
@ -128,7 +128,7 @@ The clustering pop-up window offers you a variety of clustering methods:
This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself wont identify because it separates words). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”).
##### Phonetic clustering
##### Phonetic clustering {#phonetic-clustering}
The next four methods are phonetic algorithms: they identify letters that sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after hearing it spoken aloud.
@ -140,7 +140,7 @@ The next four methods are phonetic algorithms: they identify letters that sound
Regardless of the language of your data, applying each of them might find different potential matches: for example, Metaphone clusters “Cornwall” and “Corn Hill” and “Green Hill,” while Cologne clusters “Greenvale” and “Granville” and “Cornwall” and “Green Wall.”
#### Nearest neighbor
#### Nearest neighbor {#nearest-neighbor}
**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups.
@ -152,13 +152,13 @@ We recommend setting the block number to at least 3, and then increasing it if y
For more of the theory behind clustering, see [Clustering In Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth).
## Replace
## Replace {#replace}
OpenRefine provides a find/replace function for you to edit your data. Selecting <span class="menuItems">Edit cells</span><span class="menuItems">Replace</span> will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use [regular expressions](expressions#regular-expressions) in this field. You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first.
You can also perform a sort of find/replace operation by editing one cell, and selecting “apply to all identical cells.”
## Edit one cell at a time
## Edit one cell at a time {#edit-one-cell-at-a-time}
You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue link labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the [data type](exploring#data-types) of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window.

View File

@ -4,15 +4,15 @@ title: Column editing
sidebar_label: Column editing
---
## Overview
## Overview {#overview}
Column editing contains some of the most powerful data-improvement methods in OpenRefine. The operations in the <span class="menuItems">Edit column</span> menu involve using one column of data to add entirely new columns and fields to your dataset.
## Splitting or joining
## Splitting or joining {#splitting-or-joining}
Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. The reverse is also often true: you may have several columns of category values that you want to join into one “category” column.
.
### Split into several columns
### Split into several columns {#split-into-several-columns}
![A screenshot of the settings window for splitting columns.](/img/columnsplit.png)
@ -22,7 +22,7 @@ You can also specify a maximum number of new columns to be made: separator chara
New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can choose to remove the original column with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting strings to numbers, and may not work with [dates](exploring#dates).
### Join columns
### Join columns {#join-columns}
![A screenshot of the settings window for joining columns.](/img/columnjoin.png)
@ -30,7 +30,7 @@ You can join columns by selecting <span class="menuItems">Edit column</span> →
The joined data will appear in the column you originally selected, or you can create a new column for this content and specify a name. You can delete all the columns that were used in this join operation.
## Add column based on this column
## Add column based on this column {#add-column-based-on-this-column}
Selecting <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column...</span> will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external sources.
@ -58,7 +58,7 @@ row.record.cells.Column1.value + row.record.cells.Column2.value
You may wish to add separators or spaces, or modify your input during this operation with more advanced expressions.
## Add column by fetching URLs
## Add column by fetching URLs {#add-column-by-fetching-urls}
Through the <span class="menuItems">Add column by fetching URLs</span> function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be building URL strings based on your column of data, by using `value` to insert a relevant substring. Your chosen column needs to contains parts of paths to valid HTML pages or files online.
@ -85,7 +85,7 @@ Note the following:
* [Accept](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept)
* [Authorization](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization)
### Common errors
### Common errors {#common-errors}
When OpenRefine attempts to fetch information from a web service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function.
@ -116,7 +116,7 @@ Note that for Mac users and for Windows users with the OpenRefine installation w
* On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts`.
* On Windows: `\server\target\jre\lib\security\`.
## Renaming, removing, and moving
## Renaming, removing, and moving {#renaming-removing-and-moving}
Every column's <span class="menuItems">Edit column</span> dropdown contains options to move it (to the beginning, end, left, or right), rename it, and delete it.
These operations can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to <span class="menuItems">[View](sortview#view)</span><span class="menuItems">Collapse this column</span> instead.

View File

@ -4,13 +4,13 @@ title: Exploring data
sidebar_label: Overview
---
## Overview
## Overview {#overview}
OpenRefine offers lots of features to help you learn about your dataset, even if you dont change a single character. In this section we cover different ways for sorting through, filtering, and viewing your data.
Unlike spreadsheets, OpenRefine doesnt store formulas and display the output of those calculations; it only shows the value inside each cell. It doesnt support cell colors or text formatting.
## Data types
## Data types {#data-types}
Each piece of information (each cell) in OpenRefine is assigned a data type. Some file formats, when imported, can set data types that are recognized by OpenRefine. Cells without an associated data type on import will be considered a “string” at first, but you can have OpenRefine convert cell contents into other data types later. This is set at the cell level, not at the column level.
@ -41,7 +41,7 @@ Changing a cell's data type is not the same operation as transforming its conten
To transform data from one type to another, see [Transforming data](cellediting#data-type-transforms) for information on using common tranforms, and see [Expressions](expressions) for information on using [toString()](grelfunctions#tostringo-string-format-optional), [toDate()](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-), and other functions.
### Dates
### Dates {#dates}
A “date” type is created when a column is [transformed into dates](transforming#to-date), when an expression is used to [convert cells to dates](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-) or when individual cells are set to have the data type “date”.
@ -74,7 +74,7 @@ The following table shows some example [date and time formatting styles for the
|Long |June 30, 2009 7:03:47 AM PDT |30 juin 2009 07:03:47 PDT|
|Full |Tuesday, June 30, 2009 7:03:47 AM PDT |mardi 30 juin 2009 07 h 03 PDT|
## Rows vs. records
## Rows vs. records {#rows-vs-records}
A row is a simple way to organize data: a series of cells, one cell per column. Sometimes there are multiple pieces of information in one cell, such as when a survey respondent can select more than one response.

View File

@ -4,13 +4,13 @@ title: Exporting your work
sidebar_label: Exporting
---
## Overview
## Overview {#overview}
Once your dataset is ready, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata.
You can also [export your full project data](#export-a-project) so that it can be opened by someone else using OpenRefine (or yourself, on another computer).
## Export data
## Export data {#export-data}
![A screenshot of the Export dropdown.](/img/export-menu.png)
@ -33,7 +33,7 @@ You can also export reconciled data to Wikidata, or export your Wikidata schema
* [Export to QuickStatements](wikidata#quickstatements-export) (version 1)
* [Export Wikidata schema](wikidata#import-and-export-schema)
### Custom tabular exporter
### Custom tabular exporter {#custom-tabular-exporter}
![A screenshot of the custom tabular content tab.](/img/custom-tabular-exporter.png)
@ -54,7 +54,7 @@ On the <span class="tabLabels">Download</span> tab, you can generate a preview o
With the <span class="tabLabels">Option Code</span> tab, you can copy JSON of your current custom settings to reuse on another export, or you can paste in existing JSON settings to apply to the current project.
### SQL exporter
### SQL exporter {#sql-exporter}
The SQL exporter creates a SQL statement containing the data youve exported, which you can use to overwrite or add to an existing database. Choosing <span class="menuItems">Export</span><span class="menuItems">SQL exporter</span> will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement, with options to preview and download the statement.
@ -76,7 +76,7 @@ You can include DROP and IF EXISTS if you require them, and set a name for the t
You can then preview your statement, which will open up a new browser tab/window showing a statement with the first ten rows of your data (if included), or you can save a `.sql` file to your computer.
### Templating exporter
### Templating exporter {#templating-exporter}
If you pick <span class="menuItems">Templating…</span> from the <span class="menuItems">Export</span> dropdown menu, you can “roll your own” exporter. This is useful for formats that we don't support natively yet, or won't support. The Templating exporter generates JSON by default.
@ -113,7 +113,7 @@ Once you have created your template, you may wish to save the text you produced
We have recipes on using the Templating exporter to [produce several different formats](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#12-templating-exporter).
## Export a project
## Export a project {#export-a-project}
You can share a project in progress with another computer, a colleague, or with someone who wants to check your history. This can be useful for showing that your data cleanup didnt distort or manipulate the information in any way. Once you have exported a project, another OpenRefine installation can [import it as a new project](starting#import-a-project).
@ -129,6 +129,6 @@ OpenRefine exports files in `.tar.gz` format. You can rename the file when you s
To save your project archive to Google Drive: from the <span class="menuItems">Export</span> dropdown, select <span class="menuItems">OpenRefine project archive to Google Drive...</span>. OpenRefine will not share the link with you, only confirm that the file was uploaded.
## Export operations
## Export operations {#export-operations}
You can [save and re-apply the history of any project](running#reusing-operations) (all the operations shown in the Undo/Redo tab). This creates JSON that you can save for later reuse on another OpenRefine project.

View File

@ -4,7 +4,7 @@ title: Expressions
sidebar_label: Overview
---
## Overview
## Overview {#overview}
You can use expressions in multiple places in OpenRefine to extend data cleanup and transformation. Expressions are available with the following functions:
* <span class="menuItems">Facet</span>:
@ -30,7 +30,7 @@ These languages have some syntax differences but support many of the same [varia
This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples).
## Expressions
## Expressions {#expressions}
There are significant differences between OpenRefine's expressions and the spreadsheet formulas you may be used to using for data manipulation. OpenRefine does not store formulas in cells and display output dynamically: OpenRefines transformations are one-time operations that can change column contents or generate new columns. These are applied using variables such as `value` or `cell` to perform the same modification to each cell in a column.
@ -53,7 +53,7 @@ For another example, if you were to create a new column based on your data using
Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns.
## The expressions editor
## The expressions editor {#the-expressions-editor}
When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor.
@ -72,13 +72,13 @@ Starring formulas youve used in the past can be helpful for repetitive tasks
You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell.
## Regular expressions
## Regular expressions {#regular-expressions}
OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span class="menuItems">Text filter</span> or a <span class="menuItems">Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality.
If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/).
### GREL-supported regex
### GREL-supported regex {#grel-supported-regex}
To write a regular expression inside a GREL expression, wrap it between a pair of forward slashes (/) much like the way you would in Javascript. For example, in
@ -100,7 +100,7 @@ On the [GREL functions](grelfunctions) page, functions that support regex will i
* [split](grelfunctions#splits-s-or-p-sep)
* [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional)
### Jython-supported regex
### Jython-supported regex {#jython-supported-regex}
You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a <span class="menuItems">Custom Text Facet</span>:
@ -108,7 +108,7 @@ You can also use [regex with Jython expressions](http://www.jython.org/docs/libr
python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
```
### Clojure-supported regex
### Clojure-supported regex {#clojure-supported-regex}
[Clojure](https://clojure.org/reference/reader) uses the same regex engine as Java, and can be invoked with [re-find](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-find), [re-matches](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-matches), etc. You can use the #"pattern" reader macro as described [in the Clojure documentation](https://clojure.org/reference/other_functions#regex). For example, to get the nth element of a returned sequence, you can use the nth function:
@ -116,7 +116,7 @@ python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
clojure (nth (re-find #"\u2014 (.*),\s*BWV" value) 1)
```
## Variables
## Variables {#variables}
Most OpenRefine variables have attributes: aspects of the variables that can be called separately. We call these attributes “member fields” because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to the `record` variable, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). The square bracket syntax is also used for variables that can call columns by name, for example, `cells["Postal Code"]`.
@ -131,7 +131,7 @@ Most OpenRefine variables have attributes: aspects of the variables that can be
| `rowIndex` | The index value of the current row (the first row is 0) |
| `columnName` | The name of the current cell's column, as a string |
### Row
### Row {#row}
The `row` variable itself is best used to access its member fields, which you can do using either a dot operator or square brackets: `row.index` or `row["index"]`.
@ -150,11 +150,11 @@ For array objects such as `row.columnNames` you can preview the array using the
forEach(row.columnNames,v,v).join("; ")
```
### Cells
### Cells {#cells}
The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. To get the corresponding column's value inside the `cells` variable, use `.value` at the end, for example, `cells["Postal Code"].value`. There is no `cells.value` - it can only be used with member fields.
### Cell
### Cell {#cell}
A `cell` object contains all the data of a cell and is stored as a single object.
@ -167,7 +167,7 @@ You can use `cell` on its own in the expressions editor to copy all the contents
| `cell.recon` | An object encapsulating reconciliation results for that cell | See the [reconciliation](expressions#reconciliation) section |
| `cell.errorMessage` | Returns the message of an *EvalError* instead of the error object itself (use value to return the error object) | .value |
### Reconciliation
### Reconciliation {#reconciliation}
Several of the fields here provide the data used in [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work.
@ -193,7 +193,7 @@ Arrays such as `cell.recon.candidates` and `cell.recon.candidates.type` can be j
forEach(cell.recon.candidates,v,v.name).join("; ")
```
### Record
### Record {#record}
A `row.record` object encapsulates one or more rows that are grouped together, when your project is in records mode. You must call it as `row.record`; `record` will not return values.

View File

@ -4,7 +4,7 @@ title: Exploring facets
sidebar_label: Facets
---
## Overview
## Overview {#overview}
Facets are one of OpenRefines strongest features - thats where the diamond logo comes from!
@ -15,7 +15,7 @@ Faceted browsing gives you a big-picture look at your data (do they agree or dis
Typically, you create a facet on a particular column. That facet selection appears on the left, in the <span class="tabLabels">Facet/Filter</span> tab, and you can click on a displayed facet to view all the records that match. You can also “exclude” the facet, to view every record that does _not_ match, and you can select more than one facet by clicking “include.”
### An example
### An example {#an-example}
You can learn about facets and filtering with the following example. You can copy the following table and paste it using the <span class="menuItems">Clipboard</span> method of starting a project if you would like to try it yourself.
@ -62,7 +62,7 @@ When you look back at the text facet display of country names, you should see a
We can combine these facets - say, by narrowing to only the Chinese cities with populations greater than 20 million - simply by clicking in both. You should see 2 matching rows for both these criteria.
### Things to know about facets
### Things to know about facets {#things-to-know-about-facets}
When you have facets applied, you will see “matching rows” in the [project grid header](running#project-grid-header). If you click <span class="menuItems">Export</span> and copy your data out of OpenRefine while facets are active, many of the exporting options will only export the matching rows, not all the rows in your project.
@ -74,7 +74,7 @@ You can modify any facet expression by clicking the “change” button to the r
Facet boxes that appear in the sidebar can be resized and rearranged. You can drag and drop the title bar of each box to reorder them, and drag on the bottom bar of text facet boxes.
## Text facet
## Text facet {#text-facet}
A text facet can be generated on any column with the “text” data type. Select the column dropdown and go to <span class="menuItems">Facet</span><span class="menuItems">Text facet</span>. The created facet will be sorted alphabetically, and can be sorted by count.
@ -88,7 +88,7 @@ The choices and counts displayed in each facet can be copied as tab-separated va
![A column of years faceted as text and numbers, and with the count ready to be copied.](/img/yeardata.png)
## Numeric facet
## Numeric facet {#numeric-facet}
![A screenshot of an example numeric facet.](/img/numericfacet.png)
@ -100,7 +100,7 @@ You will be offered the option to include blank, non-numeric, and error values i
You can create a text facet on numeric data, which will treat each entry as a string. This can be useful if you wish, for example, to manually include facets instead of selecting a range, or sort by count, or copy that count.
:::
## Timeline facet
## Timeline facet {#timeline-facet}
![A screenshot of an example timeline facet.](/img/timelinefacet.png)
@ -108,7 +108,7 @@ Much like a numeric facet, a timeline facet will display as a small histogram wi
The facet appears with a count of blank cells and those with errors, which can help you analyze whether your date cells are correctly converted.
## Scatterplot facet
## Scatterplot facet {#scatterplot-facet}
A scatterplot is a visual representation of two related sets of numeric data.
@ -124,7 +124,7 @@ If you have multiple facets applied, plotted points in your scatterplot displays
If you would like to export a scatterplot, OpenRefine will open a new tab with a generated PNG file that you can save.
## Custom text facet
## Custom text facet {#custom-text-facet}
You may want to explore your textual data with modifications that aren't permanent. Creating custom text facets will load your column into memory, transform the data temporarily, and store those transformations inside the facet.
@ -156,7 +156,7 @@ That expression will look for the first letter (the character at index 0) of eac
You can learn more about text-modification functions on the [Expressions page](expressions).
## Custom numeric facet
## Custom numeric facet {#custom-numeric-facet}
You may want to explore your numerical data with modifications that aren't permanent. You can also use custom numeric facets to analyze textual data, such as by getting the length of text strings (with `value.length()`), or by analyzing it as though it were formatted as numbers (with `toNumber(value)`).
@ -188,13 +188,13 @@ mod(value, 7)
You can learn more about numeric-modification functions on the [Expressions page](expressions).
## Customized facets
## Customized facets {#customized-facets}
Customized facets have been added to expand the number of default facets users can apply with a single click. They represent some common and useful functions you shouldnt have to work out using an [expression](expressions).
All facets that display in the <span class="tabLabels">Facet/Filter</span> tab can be edited by clicking on the “change” button to the right of the column title. This brings up the expressions window that will allow you to modify and preview the expression being used.
### Word facet
### Word facet {#word-facet}
A <span class="menuItems">Word facet</span> is a simple version of a text facet: it splits up the content of the cells based on spaces, and outputs each character string as a facet:
@ -206,7 +206,7 @@ This can be useful for exploring the language used in a corpus, looking for comm
Word facet is case-sensitive and only splits by spaces, not by line breaks or other natural divisions.
### Duplicates facet
### Duplicates facet {#duplicates-facet}
A <span class="menuItems">Duplicates facet</span> will return only rows that have non-unique values in the column youve selected. It will create a facet of “true” and “false” values - true being cells that are not unique, and “false” being cells that are. The actual expression being used is
@ -220,7 +220,7 @@ Duplicates facets are case-sensitive and you may wish to filter out things like
facetCount(trim(toLowercase(value)), 'trim(toLowercase(value))', 'cityLabel') > 1
```
### Numeric log facet
### Numeric log facet {#numeric-log-facet}
Logarithmic scales reduce wide-ranging quantities to more compact and manageable ranges. A log transformation can be used to make highly skewed distributions less skewed. If your numerical data is unevenly distributed (say, lots of values in one range, and then a long tail extending off into different magnitudes), a <span class="menuItems">Numeric log facet</span> can represent that range better than a simple numeric facet. It will break these values down into more navigable segments than the buckets of a numeric facet. This facet can make patterns in your data more visible. OpenRefine uses a base-10 log, the “common logarithm.”
@ -248,7 +248,7 @@ Most values will be clustered in the 0-100 range, but 35,000 is many magnitudes
A 1-bounded numeric log facet can be used if you'd like to exclude all the values below 1 (including zero and negative numbers).
### Text-length facet
### Text-length facet {#text-length-facet}
The <span class="menuItems">Text-length facet</span> returns a numerical value for each cell and plots it on a numeric facet chart. The expression used is
@ -261,7 +261,7 @@ This can be useful to, for example, look for values that did not successfully sp
You can also employ a <span class="menuItems">Log of text-length facet</span> that allows you to navigate more easily a wide range of string lengths. This can be useful in the case of web-scraping, where lots of textual data is loaded into single cells and needs to be parsed out.
### Unicode character-code facet
### Unicode character-code facet {#unicode-character-code-facet}
![A screenshot of the Unicode facet.](/img/unicodefacet.png)
@ -269,7 +269,7 @@ The Unicode facet identifies and returns [Unicode decimal values](https://en.wik
This facet creates a numerical chart, which offers you the ability to narrow down to a range of numbers. For example, lowercase characters are numbers 97-122, uppercase characters are numbers 65-90, and numerical digits are numbers 48-57.
### Facet by error
### Facet by error {#facet-by-error}
An error is a data type created by OpenRefine in the process of transforming data. For example, say you had converted a column to the number data type. If one cell had text characters in it, OpenRefine could either output the original text string unchanged or output an error. If you allow errors to be created, you can facet by them later to search for them and fix them.
@ -277,7 +277,7 @@ An error is a data type created by OpenRefine in the process of transforming dat
To store errors in cells, ensure that you have <span class="fieldLabels">store error</span> selected for the “On error” option in the expressions window.
### Facet by null, empty, or blank
### Facet by null, empty, or blank {#facet-by-null-empty-or-blank}
Any column can be faceted for [null and/or empty cells](#cell-data-types). These can help you find cells where you want to manually enter content.
@ -285,7 +285,7 @@ Any column can be faceted for [null and/or empty cells](#cell-data-types). These
An empty cell is a cell that is set to contain a string, but doesnt have any characters in it (a zero-length string). This can be left over from an operation that removed characters, or from manually editing a cell and deleting its contents.
### Facet by star or flag
### Facet by star or flag {#facet-by-star-or-flag}
Stars and flags offer you the opportunity to mark specific rows for yourself for later focus. Stars and flags persist through closing and opening your project, and thus can provide a different function than using a permalink to persist your facets. Stars and flags can be used in any way you want, although they are designed to help you flag errors and star rows of particular importance.
@ -304,7 +304,7 @@ You may wish to create a custom subset of your data through a series of separate
You can also create a text facet on any column with the expression `row.starred` or `row.flagged`.
## Text filter
## Text filter {#text-filter}
Filters allow you to narrow down your data based on whether a given column includes a text string.

View File

@ -4,7 +4,7 @@ title: General Refine Expression Language
sidebar_label: General Refine Expression Language
---
## Basics
## Basics {#basics}
GREL (General Refine Expression Language) is designed to resemble Javascript. Formulas use variables and depend on data types to do things like string manipulation or mathematical calculations:
@ -22,7 +22,7 @@ Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two
See the [GREL functions page for a thorough reference](grelfunctions) on each function and its inputs and outputs. Read on below for more about the general nature of GREL expressions.
## Syntax
## Syntax {#syntax}
In GREL, functions can use either of these two forms:
* functionName(arg0, arg1, ...)
@ -56,13 +56,13 @@ Any function that outputs an array can use square brackets to select only one pa
For example, [partition()](grelfunctions#partitions-s-or-p-fragment-b-omitfragment-optional) would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array).
## Controls
## Controls {#controls}
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how.
Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror().
#### if(e, eTrue, eFalse)
#### if(e, eTrue, eFalse) {#ife-etrue-efalse}
Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value.
@ -83,7 +83,7 @@ Nested if (switch case) example:
null)))
#### with(e1, variable v, e2)
#### with(e1, variable v, e2) {#withe1-variable-v-e2}
Evaluates expression e1 and binds its value to variable v. Then evaluates expression e2 and returns that result.
@ -93,7 +93,7 @@ Evaluates expression e1 and binds its value to variable v. Then evaluates expres
| `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] |
| `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 |
#### filter(e1, v, e test)
#### filter(e1, v, e test) {#filtere1-v-e-test}
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array.
@ -101,7 +101,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ---------------------------------------------- | ------------- |
| `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] |
#### forEach(e1, v, e2)
#### forEach(e1, v, e2) {#foreache1-v-e2}
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array.
@ -109,7 +109,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ------------------------------------------ | ------------------- |
| `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] |
#### forEachIndex(e1, i, v, e2)
#### forEachIndex(e1, i, v, e2) {#foreachindexe1-i-v-e2}
Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array.
@ -117,17 +117,17 @@ Evaluates expression e1 to an array. Then for each array element, binds its inde
| ------------------------------------------------------------------------------- | --------------------------- |
| `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy |
#### forRange(n from, n to, n step, v, e)
#### forRange(n from, n to, n step, v, e) {#forrangen-from-n-to-n-step-v-e}
Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array.
#### forNonBlank(e, v, eNonBlank, eBlank)
#### forNonBlank(e, v, eNonBlank, eBlank) {#fornonblanke-v-enonblank-eblank}
Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead.
Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement.
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e)
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e) {#isblanke-isnonblanke-isnulle-isnotnulle-isnumerice-iserrore}
Evaluates the expression e, and returns a boolean based on the named evaluation.
@ -146,7 +146,7 @@ Examples:
Remember that these are controls and not functions: you cant use dot notation (for example, the format `e.isX()` will not work).
## Constants
## Constants {#constants}
|Name |Meaning |
|-|-|
| true | The boolean constant true |

View File

@ -4,7 +4,7 @@ title: GREL functions
sidebar_label: GREL functions
---
## Reading this reference
## Reading this reference {#reading-this-reference}
For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of [data type](exploring#data-types) used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, and o for object (meaning any data type), as well as “null” and “error” data types.
@ -15,31 +15,31 @@ Optional arguments will say “(optional)”.
In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any [regex](expressions#regular-expressions) notation, wrap the pattern in forward slashes.
## Boolean functions
## Boolean functions {#boolean-functions}
###### and(b1, b2, ...)
###### and(b1, b2, ...) {#andb1-b2-}
Uses the logical operator AND on two or more booleans to output a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, `(1 < 3).and(1 < 0)` returns false because one condition is true and one is false.
###### or(b1, b2, ...)
###### or(b1, b2, ...) {#orb1-b2-}
Uses the logical operator OR on two or more booleans to output a boolean. For example, `(1 < 3).or(1 > 7)` returns true because at least one of the conditions (the first one) is true.
###### not(b)
###### not(b) {#notb}
Uses the logical operator NOT on a boolean to output a boolean. For example, `not(1 > 7)` returns true because 1 > 7 itself is false.
###### xor(b1, b2, ...)
###### xor(b1, b2, ...) {#xorb1-b2-}
Uses the logical operator XOR (exclusive-or) on two or more booleans to output a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, `(1 < 3).xor(1 < 7)` returns false because more than one of the conditions is true.
## String functions
## String functions {#string-functions}
###### length(s)
###### length(s) {#lengths}
Returns the length of string s as a number.
###### toString(o, string format (optional))
###### toString(o, string format (optional)) {#tostringo-string-format-optional}
Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value.
@ -56,79 +56,79 @@ You can also convert dates to strings, using date parsing syntax built in to Ope
Note: In OpenRefine, using toString() on a null cell outputs the string “null”.
### Testing string characteristics
### Testing string characteristics {#testing-string-characteristics}
###### startsWith(s, sub)
###### startsWith(s, sub) {#startswiths-sub}
Returns a boolean indicating whether s starts with sub. For example, `"food".startsWith("foo")` returns true, whereas `"food".startsWith("bar")` returns false.
###### endsWith(s, sub)
###### endsWith(s, sub) {#endswiths-sub}
Returns a boolean indicating whether s ends with sub. For example, `"food".endsWith("ood")` returns true, whereas `"food".endsWith("odd")` returns false.
###### contains(s, sub or p)
###### contains(s, sub or p) {#containss-sub-or-p}
Returns a boolean indicating whether s contains sub, which is either a substring or a regex pattern. For example, `"food".contains("oo")` returns true whereas `"food".contains("ee")` returns false.
You can search for a regular expression by wrapping it in forward slashes rather than quotes: `"rose is a rose".contains(/\s+/)` returns true. startsWith() and endsWith() can only take strings, while contains() can take a regex pattern, so you can use contains() to look for beginning and ending string patterns.
### Basic string modification
### Basic string modification {#basic-string-modification}
#### Case conversion
#### Case conversion {#case-conversion}
###### toLowercase(s)
###### toLowercase(s) {#tolowercases}
Returns string s converted to all lowercase characters.
###### toUppercase(s)
###### toUppercase(s) {#touppercases}
Returns string s converted to all uppercase characters.
###### toTitlecase(s)
###### toTitlecase(s) {#totitlecases}
Returns string s converted into titlecase: a capital letter starting each word, and the rest of the letters lowercase. For example, `"Once upon a midnight DREARY".toTitlecase()` returns the string “Once Upon A Midnight Dreary”.
#### Trimming
#### Trimming {#trimming}
###### trim(s)
###### trim(s) {#trims}
Returns a copy of the string s with leading and trailing whitespace removed. For example, `" island ".trim()` returns the string “island”. Identical to strip().
###### strip(s)
###### strip(s) {#strips}
Returns a copy of the string s with leading and trailing whitespace removed. For example, `" island ".strip()` returns the string “island”. Identical to trim().
###### chomp(s, sep)
###### chomp(s, sep) {#chomps-sep}
Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, `"barely".chomp("ly")` and `"bare".chomp("ly")` both return the string “bare”.
#### Substring
#### Substring {#substring}
###### substring(s, n from, n to (optional))
###### substring(s, n from, n to (optional)) {#substrings-n-from-n-to-optional}
Returns the substring of s starting from character index from, and up to (excluding) character index to. If the to argument is omitted, substring will output to the end of s. For example, `"profound".substring(3)` returns the string “found”, and `"profound".substring(2, 4)` returns the string “of”.
Remember that character indices start from zero. A negative character index counts from the end of the string. For example, `"profound".substring(0, -1)` returns the string “profoun”.
###### slice(s, n from, n to (optional))
###### slice(s, n from, n to (optional)) {#slices-n-from-n-to-optional}
Identical to substring() in relation to strings. Also works with arrays; see [Array functions section](#slicea-n-from-n-to-optional).
###### get(s, n from, n to (optional))
###### get(s, n from, n to (optional)) {#gets-n-from-n-to-optional}
Identical to substring() in relation to strings. Also works with named fields. Also works with arrays; see [Array functions section](#geta-n-from-n-to-optional).
#### Find and replace
#### Find and replace {#find-and-replace}
###### indexOf(s, sub)
###### indexOf(s, sub) {#indexofs-sub}
Returns the first character index of sub as it first occurs in s; or, returns -1 if s does not contain sub. For example, `"internationalization".indexOf("nation")` returns 5, whereas `"internationalization".indexOf("world")` returns -1.
###### lastIndexOf(s, sub)
###### lastIndexOf(s, sub) {#lastindexofs-sub}
Returns the first character index of sub as it last occurs in s; or, returns -1 if s does not contain sub. For example, `"parallel".lastIndexOf("a")` returns 3 (pointing at the second “a”).
###### replace(s, s or p find, s replace)
###### replace(s, s or p find, s replace) {#replaces-s-or-p-find-s-replace}
Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, `"The cow jumps over the moon and moos".replace("oo", "ee")` returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern. For example, `"The cow jumps over the moon and moos".replace(/\s+/, "_")` will return “The_cow_jumps_over_the_moon_and_moos”.
@ -137,17 +137,17 @@ You cannot find or replace nulls with this, as null is not a string. You can ins
1. Facet by null and then bulk-edit them to a string, or
2. Transform the column with an expression such as `if(value==null,"new",value)`.
###### replaceChars(s, s find, s replace)
###### replaceChars(s, s find, s replace) {#replacecharss-s-find-s-replace}
Returns the string obtained by replacing a character in s, identified by find, with the corresponding character identified in replace. For example, `"Téxt thát was optícálly recógnízéd".replaceChars("áéíóú", "aeiou")` returns the string “Text that was optically recognized”. You cannot use this to replace a single character with more than one character.
###### find(s, sub or p)
###### find(s, sub or p) {#finds-sub-or-p}
Outputs an array of all consecutive substrings inside string s that match the substring or [regex](expressions#grel-supported-regex) pattern p. For example, `"abeadsabmoloei".find(/[aeio]+/)` would result in the array [ "a", "ea", "a", "o", "oei" ].
You can supply a substring instead of p, by putting it in quotes, and OpenRefine will compile it into a regex pattern. Anytime you supply quotes, OpenRefine interprets the contents as a string, not regex. If you wish to use any regex notation, wrap the pattern in forward slashes.
###### match(s, p)
###### match(s, p) {#matchs-p}
Attempts to match the string s in its entirety against the [regex](expressions#grel-supported-regex) pattern p and, if the pattern is found, outputs an array of all [capturing groups](https://www.regular-expressions.info/brackets.html) (found in order). For example, `"230.22398, 12.3480".match(/.*(\d\d\d\d)/)` returns an array of 1 substring: [ "3480" ]. It does not find 2239 as the first sequence with four digits, because the regex indicates the four digits must come at the end of the string.
@ -164,31 +164,31 @@ For example, if `value` is “hello 123456 goodbye”, the following would occur
|`value.match(/.*(\d{6}).*/)` |[ "123456" ] (array with one value)|
|`value.match(/(.*)(\d{6})(.*)/)` |[ "hello ", "123456", " goodbye" ] (array with three values)|
### String parsing and splitting
### String parsing and splitting {#string-parsing-and-splitting}
###### toNumber(s)
###### toNumber(s) {#tonumbers}
Returns a string converted to a number. Will attempt to convert other formats into a string, then into a number. If the value is already a number, it will return the number.
###### split(s, s or p sep, b preserveTokens (optional))
###### split(s, s or p sep, b preserveTokens (optional)) {#splits-s-or-p-sep-b-preservetokens-optional}
Returns the array of strings obtained by splitting s by sep. The separator can be either a string or a regex pattern. For example, `"fire, water, earth, air".split(",")` returns an array of 4 strings: [ "fire", " water", " earth", " air" ]. Note that the space characters are retained but the separator is removed. If you include “true” for the preserveTokens boolean, empty segments are preserved.
###### splitByLengths(s, n1, n2, ...)
###### splitByLengths(s, n1, n2, ...) {#splitbylengthss-n1-n2-}
Returns the array of strings obtained by splitting s into substrings with the given lengths. For example, `"internationalization".splitByLengths(5, 6, 3)` returns an array of 3 strings: [ "inter", "nation", "ali" ]. Excess characters are discarded.
###### smartSplit(s, s or p sep (optional))
###### smartSplit(s, s or p sep (optional)) {#smartsplits-s-or-p-sep-optional}
Returns the array of strings obtained by splitting s by sep, or by guessing either tab or comma separation if there is no sep given. Handles quotes properly and understands cancelled characters. The separator can be either a string or a regex pattern. For example, `value.smartSplit("\n")` will split at a carriage return or a new-line character.
Note: [`value.escape('javascript')`](#escapes-s-mode) is useful for previewing unprintable characters prior to using smartSplit().
###### splitByCharType(s)
###### splitByCharType(s) {#splitbychartypes}
Returns an array of strings obtained by splitting s into groups of consecutive characters each time the characters change [Unicode categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category). For example, `"HenryCTaylor".splitByCharType()` will result in an array of [ "H", "enry", "CT", "aylor" ]. It is useful for separating letters and numbers: `"BE1A3E".splitByCharType()` will result in [ "BE", "1", "A", "3", "E" ].
###### partition(s, s or p fragment, b omitFragment (optional))
###### partition(s, s or p fragment, b omitFragment (optional)) {#partitions-s-or-p-fragment-b-omitfragment-optional}
Returns an array of strings [ a, fragment, z ] where a is the substring within s before the first occurrence of fragment, and z is the substring after fragment. Fragment can be a string or a regex. For example, `"internationalization".partition("nation")` returns 3 strings: [ "inter", "nation", "alization" ]. If s does not contain fragment, it returns an array of [ s, "", "" ] (the original unpartitioned string, and two empty strings).
@ -196,69 +196,69 @@ If the omitFragment boolean is true, for example with `"internationalization".pa
You can use regex for your fragment. The expresion `"abcdefgh".partition(/c.e/)` will output [“abc”, "cde", defgh” ].
###### rpartition(s, s or p fragment, b omitFragment (optional))
###### rpartition(s, s or p fragment, b omitFragment (optional)) {#rpartitions-s-or-p-fragment-b-omitfragment-optional}
Returns an array of strings [ a, fragment, z ] where a is the substring within s before the last occurrence of fragment, and z is the substring after the last instance of fragment. (Rpartition means “reverse partition.”) For example, `"parallel".rpartition("a")` returns 3 strings: [ "par", "a", "llel" ]. Otherwise works identically to partition() above.
### Encoding and hashing
### Encoding and hashing {#encoding-and-hashing}
###### diff(s1, s2, s timeUnit (optional))
###### diff(s1, s2, s timeUnit (optional)) {#diffs1-s2-s-timeunit-optional}
Takes two strings and compares them, returning a string. Returns the remainder of s2 starting with the first character where they differ. For example, `"cacti".diff("cactus")` returns "us". Also works with dates; see [Date functions](#diffd1-d2-s-timeunit).
###### escape(s, s mode)
###### escape(s, s mode) {#escapes-s-mode}
Escapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the [recipes](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#question-marks--showing-in-your-data) for examples of escaping and unescaping.
###### unescape(s, s mode)
###### unescape(s, s mode) {#unescapes-s-mode}
Unescapes s in the given escaping mode. The mode can be one of: "html", "xml", "csv", "url", "javascript". Note that quotes are required around your mode. See the [recipes](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#atampampt----att) for examples of escaping and unescaping.
###### md5(o)
###### md5(o) {#md5o}
Returns the [MD5 hash](https://en.wikipedia.org/wiki/MD5) of an object. If fed something other than a string (array, number, date, etc.), md5() will convert it to a string and deliver the hash of the string. For example, `"internationalization".md5()` will return 2c55a1626e31b4e373ceedaa9adc12a3.
###### sha1(o)
###### sha1(o) {#sha1o}
Returns the [SHA-1 hash](https://en.wikipedia.org/wiki/SHA-1) of an object. If fed something other than a string (array, number, date, etc.), sha1() will convert it to a string and deliver the hash of the string. For example, `"internationalization".sha1()` will return cd05286ee0ff8a830dbdc0c24f1cb68b83b0ef36.
###### phonetic(s, s encoding)
###### phonetic(s, s encoding) {#phonetics-s-encoding}
Returns a phonetic encoding of a string, based on an available phonetic algorithm. See the [section on phonetic clustering](cellediting#clustering-methods) for more information. Can be one of the following supported phonetic methods: [metaphone, doublemetaphone, metaphone3](https://www.wikipedia.org/wiki/Metaphone), [soundex](https://en.wikipedia.org/wiki/Soundex), [cologne](https://en.wikipedia.org/wiki/Cologne_phonetics). Quotes are required around your encoding method. For example, `"Ruth Prawer Jhabvala".phonetic("metaphone")` outputs the string “R0PRWRJHBFL”.
###### reinterpret(s, s encoderTarget, s encoderSource)
###### reinterpret(s, s encoderTarget, s encoderSource) {#reinterprets-s-encodertarget-s-encodersource}
Returns s reinterpreted through the given character encoders. You must supply one of the [supported encodings](http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html) for each of the original source and the target output. Note that quotes are required around your character encoder.
When an OpenRefine project is started, data is imported and interpreted. A specific character encoding is identified or manually selected at that time (such as UTF-8). You can reinterpret a column into another specificed encoding using this function. This function may not fix your data; it may be better to use this in conjunction with new projects to test the interpretation, and pre-format your data as needed.
###### fingerprint(s)
###### fingerprint(s) {#fingerprints}
Returns the fingerprint of s, a string that is the first step in [fingerprint clustering methods](cellediting#clustering-methods): it will trim whitespaces, convert all characters to lowercase, remove punctuation, sort words alphabetically, etc. For example, `"Ruth Prawer Jhabvala".fingerprint()` outputs the string “jhabvala prawer ruth”.
###### ngram(s, n)
###### ngram(s, n) {#ngrams-n}
Returns an array of the word n-grams of s. That is, it lists all the possible consecutive combinations of n words in the string. For example, `"Ruth Prawer Jhabvala".ngram(2)` would output the array [ "Ruth Prawer", "Prawer Jhabvala" ]. A word n-gram of 1 simply lists all the words in original order; an n-gram larger than the number of words in the string will only return the original string inside an array (e.g. `"Ruth Prawer Jhabvala".ngram(4)` would simply return ["Ruth Prawer Jhabvala"]).
###### ngramFingerprint(s, n)
###### ngramFingerprint(s, n) {#ngramfingerprints-n}
Returns the [n-gram fingerprint](cellediting#clustering-methods) of s. For example, `"banana".ngram(2)` would output “anbana”, after first generating the 2-grams “ba an na an na”, removing duplicates, and sorting them alphabetically.
###### unicode(s)
###### unicode(s) {#unicodes}
Returns an array of strings describing each character of s in their full unicode notation. For example, `"Bernice Rubens".unicode()` outputs [ 66, 101, 114, 110, 105, 99, 101, 32, 82, 117, 98, 101, 110, 115 ].
###### unicodeType(s)
###### unicodeType(s) {#unicodetypes}
Returns an array of strings describing each character of s by their unicode type. For example, `"Bernice Rubens".unicodeType()` outputs [ "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "space separator", "uppercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter", "lowercase letter" ].
## Format-based functions (JSON, HTML, XML)
## Format-based functions (JSON, HTML, XML) {#format-based-functions-json-html-xml}
###### jsonize(o)
###### jsonize(o) {#jsonizeo}
Quotes a value as a JSON literal value.
###### parseJson(s)
###### parseJson(s) {#parsejsons}
Parses a string as JSON. get() can then be used with parseJson(): for example, `parseJson(" { 'a' : 1 } ").get("a")` returns 1.
@ -286,9 +286,9 @@ For example, from the following JSON array in `value`, we want to get all instan
The GREL expression `forEach(value.parseJson().keywords,v,v.text).join(":::")` will output “York en route:::Anthony Eden:::President Eisenhower”.
### Jsoup XML and HTML parsing
### Jsoup XML and HTML parsing {#jsoup-xml-and-html-parsing}
###### parseHtml(s)
###### parseHtml(s) {#parsehtmls}
Given a cell full of HTML-formatted text, parseHtml() simplifies HTML tags (such as by removing “ /” at the end of self-closing tags), closes any unclosed tags, and inserts linebreaks and indents for cleaner code. You cannot pass parseHtml() a URL, but you can pre-fetch HTML with the <span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> menu option.
A cell cannot store the output of parseHtml() unless you convert it with toString(): for example, `value.parseHtml().toString()`.
@ -297,10 +297,10 @@ When parseHtml() simplifies HTML, it can sometimes introduce errors. When closin
You can then extract or [select()](#selects-element) which portions of the HTML document you need for further splitting, partitioning, etc. An example of extracting all table rows from a div using parseHtml().select() together is described more in depth at [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML).
###### parseXml(s)
###### parseXml(s) {#parsexmls}
Given a cell full of XML-formatted text, parseXml() returns a full XML document and adds any missing closing tags. You can then extract or [select()](#selects-element) which portions of the XML document you need for further splitting, partitioning, etc. Functions the same way as parseHtml() is described above.
###### select(s, element)
###### select(s, element) {#selects-element}
Returns an array of all the desired elements from an HTML or XML document, if the element exists. Elements are identified using the [Jsoup selector syntax](https://jsoup.org/apidocs/org/jsoup/select/Selector.html). For example, `value.parseHtml().select("img.portrait")[0]` would return the entirety of the first “img” tag with the “portrait” class found in the parsed HTML inside `value`. Returns an empty array if no matching element is found. Use with toString() to capture the results in a cell. A tutorial of select() is shown in [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML).
You can use select() more than once:
@ -309,69 +309,69 @@ You can use select() more than once:
value.parseHtml().select("div#content")[0].select("tr").toString()
```
###### htmlAttr(s, element)
###### htmlAttr(s, element) {#htmlattrs-element}
Returns a string from an attribute on an HTML element. Use it in conjunction with parseHtml() as in the following example: `value.parseHtml().select("a.email")[0].htmlAttr("href")` would retrieve the email address attached to a link with the “email” class.
###### xmlAttr(s, element)
###### xmlAttr(s, element) {#xmlattrs-element}
Returns a string from an attribute on an XML element. Functions the same way htmlAttr() is described above. Use it in conjunction with parseXml().
###### htmlText(element)
###### htmlText(element) {#htmltextelement}
Returns a string of the text from within an HTML element (including all child elements), removing HTML tags and line breaks inside the string. Use it in conjunction with parseHtml() and select() to provide an element, as in the following example: `value.parseHtml().select("div.footer")[0].htmlText()`.
###### xmlText(element)
###### xmlText(element) {#xmltextelement}
Returns a string of the text from within an XML element (including all child elements). Functions the same way htmlText() is described above. Use it in conjunction with parseXml() and select() to provide an element.
###### innerHtml(element)
###### innerHtml(element) {#innerhtmlelement}
Returns the [inner HTML](https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML) of an HTML element. This will include text and children elements within the element selected. Use it in conjunction with parseHtml() and select() to provide an element.
###### innerXml(element)
###### innerXml(element) {#innerxmlelement}
Returns the inner XML elements of an XML element. Does not return the text directly inside your chosen XML element - only the contents of its children. To select the direct text, use ownText(). To select both, use xmlText(). Use it in conjunction with parseXml() and select() to provide an element.
###### ownText(element)
###### ownText(element) {#owntextelement}
Returns the text directly inside the selected XML or HTML element only, ignoring text inside children elements (for this, use innerXml()). Use it in conjunction with a parser and select() to provide an element.
## Array functions
## Array functions {#array-functions}
###### length(a)
###### length(a) {#lengtha}
Returns the size of an array, meaning the number of objects inside it. Arrays can be empty, in which case length() will return 0.
###### slice(a, n from, n to (optional))
###### slice(a, n from, n to (optional)) {#slicea-n-from-n-to-optional}
Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. If the to value is omitted, it is understood to be the end of the array. For example, `[0, 1, 2, 3, 4].slice(1, 3)` returns [ 1, 2 ], and `[ 0, 1, 2, 3, 4].slice(2)` returns [ 2, 3, 4 ]. Also works with strings; see [String functions](#slices-n-from-n-to-optional).
###### get(a, n from, n to (optional))
###### get(a, n from, n to (optional)) {#geta-n-from-n-to-optional}
Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0.
If the to value is omitted, only one array item is returned, as a string, instead of a sub-array. To return a sub-array from one index to the end, you can set the to argument to a very high number such as `value.get(2,999)` or you can use something like `with(value,a,a.get(1,a.length()))` to count the length of each array.
Also works with strings; see [String functions](#gets-n-from-n-to-optional).
###### inArray(a, s)
###### inArray(a, s) {#inarraya-s}
Returns true if the array contains the desired string, and false otherwise. Will not convert data types; for example, `[ 1, 2, 3, 4 ].inArray("3")` will return false.
###### reverse(a)
###### reverse(a) {#reversea}
Reverses the array. For example, `[ 0, 1, 2, 3].reverse()` returns the array [ 3, 2, 1, 0 ].
###### sort(a)
###### sort(a) {#sorta}
Sorts the array in ascending order. Sorting is case-sensitive, uppercase first and lowercase second. For example, `[ "al", "Joe", "Bob", "jim" ].sort()` returns the array [ "Bob", "Joe", "al", "jim" ].
###### sum(a)
###### sum(a) {#suma}
Return the sum of the numbers in the array. For example, `[ 2, 1, 0, 3 ].sum()` returns 6.
###### join(a, sep)
###### join(a, sep) {#joina-sep}
Joins the items in the array with sep, and returns it all as a string. For example, `[ "and", "or", "not" ].join("/")` returns the string “and/or/not”.
###### uniques(a)
###### uniques(a) {#uniquesa}
Returns the array with duplicates removed. Case-sensitive. For example, `[ "al", "Joe", "Bob", "Joe", "Al", "Bob" ].uniques()` returns the array [ "Joe", "al", "Al", "Bob" ].
As of OpenRefine 3.4.1, uniques() reorders the array items it returns; in 3.4 beta 644 and onwards, it preserves the original order (in this case, [ "al", "Joe", "Bob", "Al" ]).
## Date functions
## Date functions {#date-functions}
###### now()
###### now() {#now}
Returns the current time according to your system clock, in the [ISO 8601 extended format](exploring#data-types) (converted to UTC). For example, 10:53am (and 00 seconds) on November 26th 2020 in EST returns [date 2020-11-26T15:53:00Z].
###### toDate(o, b monthFirst, s format1, s format2, ...)
###### toDate(o, b monthFirst, s format1, s format2, ...) {#todateo-b-monthfirst-s-format1-s-format2-}
Returns the inputted object converted to a date object. Without arguments, it returns the ISO 8601 extended format. With arguments, you can control the output format:
* monthFirst: set false if the date is formatted with the day before the month.
@ -405,17 +405,17 @@ For example, you can parse a column containing dates in different formats, such
| Z | Time zone | RFC 822 time zone | \-0800 |
| X | Time zone | ISO 8601 time zone | \-08; -0800; -08:00 |
###### diff(d1, d2, s timeUnit)
###### diff(d1, d2, s timeUnit) {#diffd1-d2-s-timeunit}
Given two dates, returns a number indicating the difference in a given time unit (see the table below). For example, `diff(("Nov-11".toDate('MMM-yy')), ("Nov-09".toDate('MMM-yy')), "weeks")` will return 104, for 104 weeks, or two years. The later date should go first. If the output is negative, invert d1 and d2.
Also works with strings; see [diff() in string functions](#diffsd1-sd2-s-timeunit-optional).
###### inc(d, n, s timeUnit)
###### inc(d, n, s timeUnit) {#incd-n-s-timeunit}
Returns a date changed by the given amount in the given unit of time (see the table below). The default unit is “hour”. A positive value increases the date, and a negative value moves it back in time. For example, if you want to move a date backwards by two months, use `value.inc(-2,"month")`.
###### datePart(d, s timeUnit)
###### datePart(d, s timeUnit) {#datepartd-s-timeunit}
Returns part of a date. The data type returned depends on the unit (see the table below).
@ -448,7 +448,7 @@ OpenRefine supports the following values for timeUnit:
| nanos | Nanoseconds | Number | value.datePart("n") → 789000 |
| time | Milliseconds between input and the [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time) | Number | value.datePart("time") → 1394775004000 |
## Math functions
## Math functions {#math-functions}
For integer division and precision, you can use simple evaluations such as `1 / 2`, which is equivalent to `floor(1/2)` - that is, it returns only whole number results. If either operand is a floating point number, they both get promoted to floating point and a floating point result is returned. You can use `1 / 2.0` or `1.0 / 2` or `1.0 * x / y` (if you're working with variables of unknown contents).
@ -493,12 +493,12 @@ Some of these math functions don't recognize integers when supplied as the first
|`tan(n)`|Returns the trigonometric tangent of an angle.|`tan(10)` returns 0.6483608274590866.|
|`tanh(n)`|Returns the hyperbolic tangent of a value.|`tanh(10)` returns 0.9999999958776927.|
## Other functions
## Other functions {#other-functions}
###### type(o)
###### type(o) {#typeo}
Returns a string with the data type of o, such as undefined, string, number, boolean, etc. For example, a [Transform](cellediting#transform) operation using `value.type()` will convert all cells in a column to strings of their data types.
###### facetCount(choiceValue, s facetExpression, s columnName)
###### facetCount(choiceValue, s facetExpression, s columnName) {#facetcountchoicevalue-s-facetexpression-s-columnname}
Returns the facet count corresponding to the given choice value, by looking for the facetExpression in the choiceValue in columnName. For example, to create facet counts for the following table, we could generate a new column based on “Gift” and enter in `value.facetCount("value", "Gift")`. This would add the column we've named “Count”:
| Gift | Recipient | Price | Count |
@ -510,13 +510,13 @@ Returns the facet count corresponding to the given choice value, by looking for
The facet expression, wrapped in quotes, can be useful to manipulate the inputted values before counting. For example, you could do a textual cleanup using fingerprint(): `(value.fingerprint()).facetCount(value.fingerprint(),"Gift")`.
###### hasField(o, s name)
###### hasField(o, s name) {#hasfieldo-s-name}
Returns a boolean indicating whether o has a member field called [name](expressions#variables). For example, `cell.recon.hasField("match")` will return false if a reconciliation match hasnt been selected yet, or true if it has. You cannot chain your desired fields: for example, `cell.hasField("recon.match")` will return false even if the above expression returns true).
###### coalesce(o1, o2, o3, ...)
###### coalesce(o1, o2, o3, ...) {#coalesceo1-o2-o3-}
Returns the first non-null from a series of objects. For example, `coalesce(value, "")` would return an empty string “” if `value` was null, but otherwise return `value`.
###### cross(cell, s projectName, s columnName)
###### cross(cell, s projectName, s columnName) {#crosscell-s-projectname-s-columnname}
Returns an array of zero or more rows in the project projectName for which the cells in their column columnName have the same content as the cell in your chosen column. For example, if two projects contained matching names, and you wanted to pull addresses for people by their names from a project called “People” you would apply the following expression to your column of names:
```
cell.cross("People","Name").cells["Address"].value[0]

View File

@ -4,17 +4,17 @@ title: Installing OpenRefine
sidebar_label: Installing
---
## System requirements
## System requirements {#system-requirements}
OpenRefine does not require internet access to run its basic functions. Once you download and install it, it runs as a small web server on your own computer, and you access that local web server by using your browser. It only requires an internet connection to import data from the web, reconcile data using a web service, or export data to the web.
OpenRefine requires three things on your computer in order to function:
#### Compatible operating system
#### Compatible operating system {#compatible-operating-system}
OpenRefine is designed to work with **Windows**, **Mac**, and **Linux** operating systems. [Our team releases packages for each](https://openrefine.org/download.html).
#### Java
#### Java {#java}
[Java](https://java.com/en/download/) must be installed and configured on your computer to run OpenRefine. The Mac version of OpenRefine includes Java; new in OpenRefine 3.4, there is also a Windows package with Java included.
@ -22,7 +22,7 @@ If you install and start OpenRefine on a Windows computer without Java, it will
We recommend you [download](https://java.com/en/download/) and install Java before proceeding with the OpenRefine installation.
#### Compatible browser
#### Compatible browser {#compatible-browser}
OpenRefine works best on browsers based on Webkit, such as:
@ -33,13 +33,13 @@ OpenRefine works best on browsers based on Webkit, such as:
We are aware of some minor rendering and performance issues on other browsers such as Firefox. We don't support Internet Explorer. If you are having issues running OpenRefine, see the [section on Running](running.md#troubleshooting).
### Release versions
### Release versions {#release-versions}
OpenRefine always has a [latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest), as well as some more recent developments available in beta, release candidate, or [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). If you are installing for the first time, we recommend [the latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest).
If you wish to use an extension that is only compatible with an earlier version of OpenRefine, and do not require the latest features, you may find that [an older stable version is best for you](https://github.com/OpenRefine/OpenRefine/releases) in our list of releases. Look at later releases to see which security vulnerabilities are being fixed, in order to assess your own risk tolerance for using earlier versions. Look for “final release” versions instead of “beta” or “release candidate” versions.
#### Unstable versions
#### Unstable versions {#unstable-versions}
If you need a recently developed function, and are willing to risk some untested code, you can look at [the most recent items in the list](https://github.com/OpenRefine/OpenRefine/releases) and see what changes appeal to you.
@ -47,7 +47,7 @@ If you need a recently developed function, and are willing to risk some untested
For the absolute latest development updates, see the [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). These are created with every commit.
#### Whats changed
#### Whats changed {#whats-changed}
Our [latest version is OpenRefine 3.4.1](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1), released September 24th 2020. The major changes in this version are listed on the [3.4.1 release page](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1) with the downloadable packages.
@ -57,8 +57,8 @@ You can find information about all OpenRefine versions on the [Releases page on
OpenRefine may also work in other environments, such as [Chromebooks](https://gist.github.com/organisciak/3e12e5138e44a2fed75240f4a4985b4f) where Linux terminals are available. Look at our list of [Other Distributions on the Downloads page](https://openrefine.org/download.html) for other ways of running OpenRefine, and refer to our contributor community to see new environments in development.
:::
## Installing or upgrading
### Back up your data
## Installing or upgrading {#installing-or-upgrading}
### Back up your data {#back-up-your-data}
If you are upgrading from an older version of OpenRefine and have projects already on your computer, you should create backups of those projects before you install a new version.
@ -70,7 +70,7 @@ For extra security you can [export your existing OpenRefine projects](exporting#
Take note of the [extensions](#installing-extensions) you have currently installed. They may not be compatible with the upgraded version of OpenRefine. Installations can be installed in two places, so be sure to check both your workspace directory and the existing installation directory.
:::
### Install or upgrade OpenRefine
### Install or upgrade OpenRefine {#install-or-upgrade-openrefine}
If you are upgrading an existing OpenRefine installation, you can delete the old program files and install the new files into the same space. Do not overwrite the files as some obsolete files may be left over unnecessarily.
@ -118,7 +118,7 @@ The long version:
[Homebrew](http://brew.sh) is a popular command-line package manager for Mac. Installing Homebrew is accomplished by pasting the installation command on the Homebrew website into a Terminal window. Once Homebrew is installed, applications like OpenRefine can be installed via a simple command. You can [install Homebrew from their website](http://brew.sh).
###### Install
###### Install {#install}
Install OpenRefine with this command:
@ -141,7 +141,7 @@ Behind the scenes, this command causes Homebrew to download the OpenRefine insta
If an existing `OpenRefine.app` is found in the Applications folder, Homebrew will not overwrite it, so installing via Homebrew requires either deleting or renaming previously installed copies.
###### Uninstall
###### Uninstall {#uninstall}
To uninstall OpenRefine, paste this command into the Terminal:
@ -155,7 +155,7 @@ You should see output like this:
==> Removing App '/Applications/OpenRefine.app'.
```
###### Update
###### Update {#update}
To update to the latest version of OpenRefine, paste this command into the Terminal:
@ -197,7 +197,7 @@ tar xzf openrefine-linux-3.4.tar.gz
---
### Set where data is stored
### Set where data is stored {#set-where-data-is-stored}
OpenRefine stores data in two places: program files in the program directory, wherever it is youve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking <span class="buttonLabels">Browse workspace directory</span>.
@ -276,7 +276,7 @@ You can change this when you run OpenRefine from the terminal, by pointing to th
---
### Logs
### Logs {#logs}
OpenRefine does not currently output an error log, but because the OpenRefine console window is always open (on Linux and Windows) while OpenRefine runs in your browser, you can copy information from the console if an error occurs.
@ -284,7 +284,7 @@ Using a Mac, you can [run OpenRefine using the terminal](running#starting-and-ex
---
## Increasing memory allocation
## Increasing memory allocation {#increasing-memory-allocation}
OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large datasets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators:
* more than one million total cells
@ -313,7 +313,7 @@ If your project is big enough to need more than the default amount of memory, co
<TabItem value="win">
#### Using openrefine.exe
#### Using openrefine.exe {#using-openrefineexe}
If you run `openrefine.exe`, you will need to edit the `openrefine.l4j.ini` file found in the program directory and edit the line
@ -328,7 +328,7 @@ The line “-Xmx1024M” defines the amount of memory available in megabytes. Ch
Once you increase the memory allocation, you may find that you cannot run `openrefine.exe`. In this case, your computer needs a 64-bit version of [Java](https://www.java.com/en/download/help/index_installing.xml) (this is different from [Java JDK](#install-or-upgrade-java). Look for the “Windows Offline (64-bit)” download on the Downloads page and install that. Your system must also be set to use the 64-bit version of Java by [changing the Java configuration](https://www.java.com/en/download/help/update_runtime_settings.xml).
:::
#### Using refine.bat
#### Using refine.bat {#using-refinebat}
On Windows, OpenRefine can also be run by using the file `refine.bat` in the program directory. If you start OpenRefine using `refine.bat`, the memory available to OpenRefine can be specified either through command line options, or through the `refine.ini` file.
@ -364,7 +364,7 @@ If you have downloaded the `.dmg` package and you start OpenRefine by double-cli
If you have downloaded the `.tar.gz` package and you start OpenRefine from the command line, add the “-m xxxxM” parameter like this:
`./refine -m 2048m`
#### Setting a default
#### Setting a default {#setting-a-default}
If you don't want to set this option on the command line each time, you can also set it in the `refine.ini` file. Edit the line
@ -381,7 +381,7 @@ Make sure it is not commented out (that is, that the line doesn't start with a
---
## Installing extensions
## Installing extensions {#installing-extensions}
Extensions have been created by our contributor community to add functionality or provide convenient shortcuts for common uses of OpenRefine. [We list extensions we know about on our downloads page](https://openrefine.org/download.html).
@ -389,7 +389,7 @@ Extensions have been created by our contributor community to add functionality o
If youd like to create or modify an extension, [see our developer documentation here](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Developers). If youre having a problem, [use our downloads page](https://openrefine.org/download.html) to go to the extensions page and report the issue there.
:::
### Two ways to install extensions
### Two ways to install extensions {#two-ways-to-install-extensions}
You can [install extensions in one of two places](#set-where-data-is-stored):
@ -398,7 +398,7 @@ You can [install extensions in one of two places](#set-where-data-is-stored):
We provide these options because you may wish to reinstall a given extension manually each time you upgrade OpenRefine, in order to be sure it works properly.
### Find the right place to install
### Find the right place to install {#find-the-right-place-to-install}
If you want to install the extension into the program folder, go to your program directory and then go to `webapp\extensions` (or create it if not does not exist).
@ -410,7 +410,7 @@ If you want to install the extension into your workspace, you can:
You can also [find your workspace on each operating system using these instructions](#set-where-data-is-stored).
### Install the extension
### Install the extension {#install-the-extension}
Some extensions have their own instructions: make sure you read the documentation before you begin installing.

View File

@ -4,7 +4,7 @@ title: Jython & Clojure
sidebar_label: Jython & Clojure
---
## Jython
## Jython {#jython}
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible.
@ -14,7 +14,7 @@ You will need to restart OpenRefine, so that new Jython or Python libraries are
OpenRefine now has [most of the Jsoup.org library built into GREL functions](grelfunctions#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements.
### Syntax
### Syntax {#syntax}
Expressions in Jython must have a `return` statement:
@ -47,13 +47,13 @@ To return the lower case of `value` (if the value is not null):
return None
```
### Tutorials
### Tutorials {#tutorials}
- [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules)
- [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python)
Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org).
## Clojure
## Clojure {#clojure}
Clojure 1.10.1 comes bundled with the default installation of OpenRefine 3.4.1. At this time, not all [variables](expressions#variables) can be used with Clojure expressions: only `value`, `row`, `rowIndex`, `cell`, and `cells` are available.

View File

@ -4,7 +4,7 @@ title: Reconciling
sidebar_label: Reconciling
---
## Overview
## Overview {#overview}
Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison might be produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, or interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services).
@ -23,7 +23,7 @@ Reconciliation is semi-automated: OpenRefine matches your cell values to the rec
We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data.
:::
## Sources
## Sources {#sources}
Start with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add.
@ -41,7 +41,7 @@ Of particular note is [reconcile-csv](http://okfnlabs.org/reconcile-csv/) which
Similiarly, you may choose to export some SPARQL output to a TSV to limit the scope of values you're reconciling against and/or for better peformance.
## Getting started
## Getting started {#getting-started}
Choose a column to reconcile and use its dropdown menu to select <span class="menuItems">Reconcile</span><span class="menuItems">Start reconciling</span>. If you want to reconcile only some cells in that column, first use filters and facets to isolate them.
@ -77,7 +77,7 @@ For matched values (those appearing as dark blue links), the underlying cell val
For each cell, you can manually “Create new item,” which will take the cells original value and apply it, as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature.
### Reconciliation facets
### Reconciliation facets {#reconciliation-facets}
Under <span class="menuItems">Reconcile</span><span class="menuItems">Facets</span> there are a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets when you reconcile some cells.
@ -105,7 +105,7 @@ You can also look at each best candidates:
These facets are useful for doing successive reconciliation attempts, against different types, and with different supplementary information. The information represented by these facets are held in the cells themselves and can be called using the [reconciliation variables](expressions#reconciliation) available in expressions.
### Reconciliation actions
### Reconciliation actions {#reconciliation-actions}
You can use the <span class="menuItems">Reconcile</span><span class="menuItems">Actions</span> menu options to perform bulk changes (which will apply only to your currently viewed set of rows or records):
* <span class="menuItems">Match each cell to its best candidate</span> (by highest score)
@ -120,7 +120,7 @@ The other options available under <span class="menuItems">Reconcile</span> are:
* [<span class="menuItems">Use values as identifiers</span>](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches)
* [<span class="menuItems">Add entity identifiers column</span>](#add-entity-identifiers-column).
## Reconciling with unique identifiers
## Reconciling with unique identifiers {#reconciling-with-unique-identifiers}
Reconciliation services use unique identifiers for their entities. For example, the 14th Dalai Lama has the VIAF ID [38242123](https://viaf.org/viaf/38242123/) and the Wikidata ID [Q17293](https://www.wikidata.org/wiki/Q37349). You can supply these identifiers directly to your chosen reconciliation service in order to pull more data, but these strings will not be “reconciled” against the external dataset.
@ -132,7 +132,7 @@ You may get false positives, which you will need to hover over or click on to id
![Hovering over an error.](/img/reconcileIDerror.png)
## Reconciling by type
## Reconciling by type {#reconciling-by-type}
Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc.
@ -150,7 +150,7 @@ If your column doesnt fit one specific type offered, you can <span class="fie
We recommend working in batches and reconciling against different types, moving from specific to broad. You can create a facet for <span class="menuItems">Best candidates types</span> facet to see which types are being represented. Some candidates may return more than one type, depending on the service. Types may appear in facets by their unique IDs, rather than by their semantic labels (for example, Q5 for “human” in Wikidata).
## Reconciling with additional columns
## Reconciling with additional columns {#reconciling-with-additional-columns}
Some of your cells may be ambiguous, in the sense that a string can point to more than one entity: there are dozens of places called “Paris” and many characters, people, and pieces of culture, too. Selecting non-geographic or more localized types can help narrow that down, but if your chosen service doesn't provide a useful type, you can include more properties that make it clear whether you're looking for Paris, France.
@ -164,7 +164,7 @@ Some services will not be able to search for the exact name of your desired <spa
![Including a birth-date type.](/img/reconcile-with-property.png)
## Fetching more data
## Fetching more data {#fetching-more-data}
One reason to reconcile to some external service is that it allows you to pull data from that service into your OpenRefine project. There are three ways to do this:
@ -172,11 +172,11 @@ One reason to reconcile to some external service is that it allows you to pull d
* Add columns from reconciled values
* Add column by fetching URLs.
### Add entity identifiers column
### Add entity identifiers column {#add-entity-identifiers-column}
Once you have selected matches for your cells, you can retrieve the unique identifiers for those cells and create a new column for these, with <span class="menuItems">Reconcile</span><span class="menuItems">Add entity identifiers column</span>. You will be asked to supply a column name. New items and other unmatched cells will generate null values in this column.
### Add columns from reconciled values
### Add columns from reconciled values {#add-columns-from-reconciled-values}
If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using <span class="menuItems">Edit column</span><span class="menuItems">Add columns from reconciled values...</span>.
@ -194,7 +194,7 @@ If you have left any values unreconciled in your column, you will see “&lt;not
This process may pull more than one property per row in your data (such as multiple occupations), so you may need to switch into records mode after you've added columns.
### Add columns by fetching URLs
### Add columns by fetching URLs {#add-columns-by-fetching-urls}
If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as “https&#58;//viaf.org/viaf/000000”). You can use the <span class="menuItems">Edit column</span><span class="menuItems">[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)</span> operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions).
@ -206,7 +206,7 @@ Alternatively, you can insert the ID directly from the matched column's reconcil
Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about the fetching process.
## Keep all the suggestions made
## Keep all the suggestions made {#keep-all-the-suggestions-made}
To generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column</span>. To create a list of all the possible matches, use something like
@ -222,7 +222,7 @@ forEach(cell.recon.candidates,c,c.id).join(", ")
This information is stored as a string, without any attached reconciliation information.
## Writing reconciliation expressions
## Writing reconciliation expressions {#writing-reconciliation-expressions}
OpenRefine supplies a number of variables related specifically to reconciled values. These can be used in GREL and Jython expressions. For example, some of the reconciliation variables are:
@ -235,7 +235,7 @@ OpenRefine supplies a number of variables related specifically to reconciled val
You can find out more in the [reconciliaton variables](expressions#reconciliaton-variables) section.
## Exporting reconciled data
## Exporting reconciled data {#exporting-reconciled-data}
Once you have data that is reconciled to existing entities online, you may wish to export that data to a user-editable service such as Wikidata. See the section on [uploading your edits to Wikidata](wikidata#upload-edits-to-wikidata) for more information, or the section on [exporting](exporting) to see other formats OpenRefine can produce.

View File

@ -4,7 +4,7 @@ title: Running OpenRefine
sidebar_label: Running
---
## Starting and exiting
## Starting and exiting {#starting-and-exiting}
OpenRefine does not require internet access to run its basic functions. Once you download and install it, it runs as a small web server on your own computer, and you access that local web server by using your browser.
@ -37,18 +37,18 @@ import TabItem from '@theme/TabItem';
<TabItem value="win">
#### With openrefine.exe
#### With openrefine.exe {#with-openrefineexe}
You can run OpenRefine by double-clicking `openrefine.exe` or calling it from the command line.
If you want to [modify the way `openrefine.exe` opens](#starting-with-modifications), you can edit the `openrefine.l4j.ini` file.
#### With refine.bat
#### With refine.bat {#with-refinebat}
On Windows, OpenRefine can also be run by using the file `refine.bat` in the program directory. If you start OpenRefine using `refine.bat`, you can do so by opening the file itself, or by calling it from the command line.
If you call `refine.bat` from the command line, you can [start OpenRefine with modifications](#starting-with-modifications).
If you want to modify the way `refine.bat` opens through double-clicking or using a shortcut, you can edit the `refine.ini` file.
#### Exiting
#### Exiting {#exiting}
To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. This will save any last changes to your projects.
@ -98,11 +98,11 @@ If you see this error, you need to [install and configure a JDK package](install
---
### Troubleshooting
### Troubleshooting {#troubleshooting}
If you are having problems connecting to OpenRefine with your browser, [check our Wiki for information about browser settings and operating-system issues](https://github.com/OpenRefine/OpenRefine/wiki/FAQ#i-am-having-trouble-connecting-to-openrefine-with-my-browser).
### Starting with modifications
### Starting with modifications {#starting-with-modifications}
When you run OpenRefine from a command line, you can change a number of default settings.
@ -166,7 +166,7 @@ To see the full list of command-line options, run `./refine -h`.
---
#### Modifications set within files
#### Modifications set within files {#modifications-set-within-files}
On Windows, you can modify the way `openrefine.exe` runs by editing `openrefine.l4j.ini`; you can modify the way `refine.bat` runs by editing `refine.ini`.
@ -192,7 +192,7 @@ REFINE_MIN_MEMORY=1400M
...
```
##### JVM preferences
##### JVM preferences {#jvm-preferences}
Further modifications can be performed by using JVM preferences. These JVM preferences are different options and have different syntax than the key/value descriptions used on the command line.
@ -291,13 +291,13 @@ JAVA_OPTIONS=-Drefine.data_dir=usr/lib/OpenRefineWorkspace
Refer to the [official Java documentation](https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html) for more preferences that can be set.
## The home screen
## The home screen {#the-home-screen}
When you first launch OpenRefine, you will see a screen with a menu on the left hand side that includes <span class="menuItems">Create Project</span>, <span class="menuItems">Open Project</span>, <span class="menuItems">Import Project</span>, and <span class="menuItems">Language Settings</span>. This is called the “home screen,” where you can manage your projects and general settings.
In the lower left-hand corner of the screen, you'll see <span class="menuItems">Preferences</span>, <span class="menuItems">Help</span>, and <span class="menuItems">About</span>.
### Language settings
### Language settings {#language-settings}
From the home screen, look in the options to the left for <span class="menuItems">Language Settings</span>. You can set your preferred interface language here. This language setting will persist until you change it again in the future. Languages are translated as a community effort; some languages are partially complete and default back to English where unfinished. Currently OpenRefine supports the following languages for 75% or more of the interface:
@ -322,7 +322,7 @@ To leave the Language Settings screen, click on the diamond “OpenRefine” log
We use Weblate to provide translations for the interface. You can check [our profile on Weblate](https://hosted.weblate.org/projects/openrefine/translations/) to see which languages are in the process of being supported. See [our technical reference if you are interested in contributing translation work](https://docs.openrefine.org/technical-reference/translating) to make OpenRefine accessible to people in other languages.
:::
### Preferences
### Preferences {#preferences}
In the bottom left corner of the screen, look for <span class="menuItems">Preferences</span>. At this time you can set preferences using a key/value pair: that is, selecting one of the keys below and setting a value for it.
@ -340,13 +340,13 @@ To leave the Preferences screen, click on the diamond “OpenRefine” logo.
If the preference youre looking for isnt here, look at the options you can set from the [command line or in an `.ini` file](#starting-with-modifications).
## The project screen
## The project screen {#the-project-screen}
The project screen (or work screen) is where you will spend most of your time once you have [begun to work on a project](starting). This is a quick walkthrough of the parts of the interface you should familiarize yourself with.
![A screenshot of the project screen.](/img/projectscreen.png)
### The project bar
### The project bar {#the-project-bar}
The project bar runs across the very top of the project screen. It contains the the OpenRefine logo, the project title, and the project control buttons on the right side.
@ -366,7 +366,7 @@ The <span class="menuItems">Open…</span> button will open up a new browser tab
<span class="menuItems">Help</span> will open up a new browser tab and bring you to this user manual on the web.
### The grid header
### The grid header {#the-grid-header}
The grid header sits below the project bar and above the project grid (where the data of your project is displayed). The grid header will tell you the total number of rows or records in your project, and indicate whether you are in [rows or records mode](exploring#rows-vs-records).
@ -376,11 +376,11 @@ Directly below the row number, you have the ability to switch between [row mode
To the right of the rows/records selection is the array of options for how many rows/records to view on screen at one time. At the far right of the screen you can navigate through your entire dataset one page at a time.
### Extensions
### Extensions {#extensions}
The <span class="menuItems">Extensions</span> dropdown offers you options for extending your data - most commonly by uploading your edited statements to Wikidata, or by importing or exporting schema. You can learn more about these functions on the [Wikidata page](wikidata). Other extensions may also add functions to this dropdown menu.
### The grid
### The grid {#the-grid}
The area of the project screen that displays your dataset is called the “grid” (or the “data grid,” or the “project grid”). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you.
@ -394,7 +394,7 @@ The project grid may display with both vertical and horizontal scrolling, depend
Mousing over individual cells will allow you to [edit cells individually](cellediting#edit-one-cell-at-a-time).
### Facet/Filter
### Facet/Filter {#facetfilter}
The <span class="tabLabels">Facet/Filter</span> tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](facets) and [filters](facets#text-filter) are explained more in [Exploring data](exploring).
@ -410,7 +410,7 @@ Removing your facets will clear out the sidebar entirely. If you have written cu
You can preserve your facets and filters for future use by copying a <span class="menuItems">[Permalink](#the-project-bar)</span>.
### History (Undo/Redo)
### History (Undo/Redo) {#history-undoredo}
In OpenRefine, any activity that changes the data can be undone. Changes are tracked from the very beginning, when a project is first created. The change history of each project is saved with the project's data, so quitting OpenRefine does not erase the steps you've taken. When you restart OpenRefine, you can view and undo changes that you made before you quit OpenRefine. OpenRefine [autosaves](starting#autosaving) your actions every five minutes by default, and when you close OpenRefine properly (using Ctrl + C). You can [change this interval](running#jvm-preferences).
@ -437,7 +437,7 @@ If you have moved back one or more states, and then you perform a new operation
The Undo/Redo tab will indicate which step youre on, and if youre about to risk erasing work - by saying something like “4/5" or “1/7” at the end.
#### Reusing operations
#### Reusing operations {#reusing-operations}
Operations that you perform in OpenRefine can be reused. For example, a formula you wrote inside one project can be copied and applied to another project later.
@ -447,13 +447,13 @@ Move to the second project, go to the <span class="tabLabels">Undo/Redo</span> t
Not all operations can be extracted. Edits to a single cell, for example, cant be replicated.
## Advanced OpenRefine uses
## Advanced OpenRefine uses {#advanced-openrefine-uses}
### Running OpenRefine's Linux version on a Mac
### Running OpenRefine's Linux version on a Mac {#running-openrefines-linux-version-on-a-mac}
You can run OpenRefine from the command line in Mac by using the Linux installation package. We do not promise support for this method. Follow the instructions in the Linux section.
### Running as a server
### Running as a server {#running-as-a-server}
:::caution
Please note that if your machine has an external IP (is exposed to the Internet), you should not do this, or should protect it behind a proxy or firewall, such as nginx. Proceed at your own risk.
@ -488,7 +488,7 @@ On Mac, you can add a specific entry to the `Info.plist` file located within the
OpenRefine has no built-in security or version control for multi-user scenarios. OpenRefine has a single data model that is not shared, so there is a risk of data operations being overwritten by other users. Care must be taken by users.
:::
### Automating OpenRefine
### Automating OpenRefine {#automating-openrefine}
Some users may wish to employ OpenRefine for batch processing as part of a larger automated pipeline. Not all OpenRefine features can work without human supervision and advancement (such as clustering), but many data transformation tasks can be automated.

View File

@ -4,7 +4,7 @@ title: Sort and view
sidebar_label: Sort and view
---
## Sort
## Sort {#sort}
You can temporarily sort your rows by one column. You can sort based on [data type](exploring#data-types):
* text alphabetically or reverse
@ -24,11 +24,11 @@ If you have multiple sorting methods applied, they will work in the order you ap
When the sorting method you've applied is temporary, you will see that the rows retain their original numbering. When you make that sorting method permanent, by selecting <span class="menuItems">Reorder rows permanently</span>, the row numbers will change and the <span class="menuItems">Sort</span> menu in the project grid header will disappear. This will apply all current sorting methods.
## View
## View {#view}
You can control what data you view in the grid. On each column, you will see a <span class="menuItems">View</span> menu option. From there, you can “collapse” (hide) that specific column, all other columns, all columns to the left, and all columns to the right. Using the <span class="menuItems">View</span> option that appears in the <span class="menuItems">All</span> columns dropdown menu, you can collapse all columns, and expand all the columns that you previously collapsed.
### Show/hide “null”
### Show/hide “null” {#showhide-null}
You can find, under <span class="menuItems">All</span><span class="menuItems">View</span>, the option to show and hide [“null” values](exploring#data-types). A small grey “null” will appear in each applicable cell. Remember that a null cell is not the same thing as an empty cell.

View File

@ -4,7 +4,7 @@ title: Starting a project
sidebar_label: Starting a project
---
## Overview
## Overview {#overview}
An OpenRefine project is started by importing in some existing data - OpenRefine doesnt allow you to create a dataset from nothing.
@ -14,7 +14,7 @@ The data and all of your edits are [automatically saved](#autosaving) inside the
You can also receive and open other peoples projects, or send them yours, by [exporting a project archive](exporting#export-a-project) and [importing it](#import-a-project).
## Create a project by importing data
## Create a project by importing data {#create-a-project-by-importing-data}
When you start OpenRefine, youll be taken to the <span class="menuItems">Create Project</span> screen. Youll see on the left side of the screen that your options are to:
@ -53,13 +53,13 @@ You cannot combine two datasets into one project by appending data within rows.
For whichever method you choose to start your project, when you click <span class="menuItems">Next >></span> you will be given a preview and a chance to configure the way OpenRefine interprets the data you input.
### Get data from this computer
### Get data from this computer {#get-data-from-this-computer}
Click on <span class="menuItems">Browse…</span> and select a file (or several) on your hard drive. All files will be shown, not just compatible ones.
If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files.
### Web addresses (URLs)
### Web addresses (URLs) {#web-addresses-urls}
Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you.
@ -67,7 +67,7 @@ If you supply two or more file URLs, OpenRefine will identify each one and ask y
Do not use this form to load a Google Sheet by its link; use [the Google Data form instead](#google-data).
### Clipboard
### Clipboard {#clipboard}
You can copy and paste in data from anywhere. OpenRefine will recognize comma-separated, tab-separated, or table-formatted information copied from sources such as word-processing documents, spreadsheets, and tables in PDFs. You can also just paste in a list of items that you want to turn into rows. OpenRefine recognizes each new text line as a row.
@ -75,7 +75,7 @@ This can be useful if you want to pre-select a specific number of rows from your
This can also be useful if you would like to paste in a list of URLs, which you can use later to [fetch more data](columnediting).
### Database (SQL)
### Database (SQL) {#database-sql}
If you are an administrator or have SQL access to a database of information, you may want to pull the latest dataset directly from there. This could include an online catalogue, a content management system, or a digital repository or collection management system. You can also load a database (`.db`) file saved locally. You will need to use an [SQL query](https://www.w3schools.com/sql/) to import your intended data.
@ -91,13 +91,13 @@ You can either connect just once to gather data, or save the connection to use i
If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way.
### Google data
### Google data {#google-data}
You have two ways to load in data from Google Sheets:
* providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and
* selecting a Google Sheet in your Google Drive.
#### Google Sheet by URL
#### Google Sheet by URL {#google-sheet-by-url}
You can import data from any Google Sheet that has link-sharing turned on. Paste in a URL that looks something like
@ -107,7 +107,7 @@ https://docs.google.com/spreadsheets/………/edit?usp=sharing
This will only work with Sheets, not with any other Google Drive file that might have an available link, including `.xls` and other valid files that are hosted in Google Drive. These links will not work when attempting to start a project [by URL](#web-addresses-urls) either, so you need to download those files to your computer.
#### Google Sheet from Drive
#### Google Sheet from Drive {#google-sheet-from-drive}
You can authorize OpenRefine to access your Google Drive data and import data from any Google Sheet it finds there. This will include Sheets that belong to you and Sheets that are shared with you, as well as Sheets that are in your trash.
@ -120,7 +120,7 @@ OpenRefine will generate a list of all Sheets it finds, with the most recently m
When you click <span class="buttonLabels">Preview</span> the Sheet will open in a new browser tab. When you click the Sheet title, OpenRefine will begin to process the data.
## Project preview
## Project preview {#project-preview}
Once OpenRefine is ready to import the data, you will see a screen with <span class="menuItems">Configure Parsing Options</span> at the top. Youll see a preview of the first 100 rows and all identified columns.
@ -141,7 +141,7 @@ Look for character encoding issues at this stage. You may want to manually selec
You should create a project name at this stage. You can also supply tags to keep your projects organized. When youre happy with the preview, click <span class="buttonLabels">Create Project</span>.
## Import a project
## Import a project {#import-a-project}
Because OpenRefine only runs locally on your computer, you cant have a project accessible to more than one person at the same time.
@ -163,17 +163,17 @@ Then, click <span class="buttonLabels">Import Project</span>. Your project shoul
OpenRefine will store the project in its own workspace directory, so you can now delete the original file that was sent to you.
## Project management
## Project management {#project-management}
You can access all of your created projects by clicking on <span class="menuItems">Open Project</span>. Your project list can be organized by modification date, title, row count, and other metadata you can supply (such as subject, descripton, tags, or creator). To edit the fields you see here, click <span class="menuItems">About</span> to the left of each project. There you can edit a number of available fields. You can also see the project ID that corresponds to the name of the folder in your work directory.
### Naming projects
### Naming projects {#naming-projects}
You may have multiple projects from the same dataset, or multiple versions from sharing a project with another person. OpenRefine automatically generates a project name from the imported file, or “clipboard” when you use <span class="menuItems">Clipboard</span> importing. Project names dont have to be unique, and OpenRefine will create many projects with the same name unless you intervene.
You can edit a project's name when you create it or import it, and you can rename a project later by opening it and clicking on the project name at the top of the screen.
### Autosaving
### Autosaving {#autosaving}
OpenRefine [saves all of your actions](running#history-undoredo) (everything you can see in the <span class="tabLabels">Undo/Redo</span> panel). That includes flagging and starring rows.
@ -183,12 +183,12 @@ Autosaving happens by default every five minutes. You can [change this preferenc
You can only save and share facets and filters, not any other type of view. To save current facets and filters, click <span class="menuItems">Permalink</span>. The project will reload with a different URL, which you can then copy and save elsewhere. This permalink will save both the facets and filters youve set, and the settings for each one (such as sorting by count rather than by name).
### Deleting projects
### Deleting projects {#deleting-projects}
You can delete projects, which will erase the project files from the workspace directory on your computer. This is immediate and cannot be undone.
Go to <span class="menuItems">Open Project</span> and find the project you want to delete. Click on the <span class="menuItems">X</span> to the left of the project name. There will be a confirmation dialog.
### Project files
### Project files {#project-files}
You can find all of your raw project files in your work directory. They will be named according to the unique “Project ID” that OpenRefine has assigned them, which you can find on the <span class="menuItems">Open Project</span> screen, under the “About” link for each project.

View File

@ -4,7 +4,7 @@ title: Transforming data
sidebar_label: Overview
---
## Overview
## Overview {#overview}
OpenRefine gives you powerful ways to clean, correct, codify, and extend your data. Without ever needing to type inside a single cell, you can automatically fix typos, convert things to the right format, and add structured categories from trusted sources.
@ -17,7 +17,7 @@ This section of ways to improve data are organized by their appearance in the me
* [add new columns](columnediting) based on existing data, with fetching new information, or through [reconciliation](reconciling)
* convert your rows of data into [multi-row records](exploring#rows-vs-records).
## Edit rows
## Edit rows {#edit-rows}
Moving rows around is a permanent change to your data.

View File

@ -4,11 +4,11 @@ title: Transposing
sidebar_label: Transposing
---
## Overview
## Overview {#overview}
These functions were created to solve common problems with reshaping your data: pivoting cells from a row into a column, or pivoting cells from a column into a row. You can also transpose from a repeated set of values into multiple columns.
## Transpose cells across columns into rows
## Transpose cells across columns into rows {#transpose-cells-across-columns-into-rows}
Imagine personal data with addresses in this format:
@ -21,7 +21,7 @@ You can transpose the address information from this format into multiple rows. G
![A screenshot of the transpose across columns window.](/img/transpose1.png)
### One column
### One column {#one-column}
You can transpose the multiple address columns into a series of rows:
@ -51,7 +51,7 @@ You can choose one column and include the column-name information in each cell b
||Country: USA|
||Postal code: 19010|
### Two columns
### Two columns {#two-columns}
You can retain the column names as separate cell values, by selecting <span class="fieldLabels">Two new columns</span> and naming the key and value columns.
@ -67,7 +67,7 @@ You can retain the column names as separate cell values, by selecting <span clas
||Country|USA|
||Postal code|19010|
## Transpose cells in rows into columns
## Transpose cells in rows into columns {#transpose-cells-in-rows-into-columns}
Imagine employee data in this format:
@ -107,7 +107,7 @@ value.replace("Employee: ", "")
If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try <span class="menuItems">Columnize by key/value columns</span>.
## Columnize by key/value columns
## Columnize by key/value columns {#columnize-by-keyvalue-columns}
This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at <span class="menuItems">Transpose</span><span class="menuItems">Columnize by key/value columns</span>.
@ -131,7 +131,7 @@ In this format, each flower species is described by multiple attributes on conse
| Galanthus nivalis | White | 162168 |
| Narcissus cyclamineus | Yellow | 161899 |
### Entries with multiple values in the same column
### Entries with multiple values in the same column {#entries-with-multiple-values-in-the-same-column}
If a new row would have multiple values for a given key, then these values will be grouped on consecutive rows, to form a [record structure](exploring#rows-vs-records).
@ -157,7 +157,7 @@ This table is transformed by the Columnize operation to:
The first key encountered by the operation serves as the record key, so the “Green” value is attached to the “Galanthus nivalis” name. See the [Row order](#row-order) section for more details about the influence of row order on the results of the operation.
### Notes column
### Notes column {#notes-column}
In addition to the key and value columns, you can optionally add a column for notes. This can be used to store extra metadata associated to a key/value pair.
@ -181,7 +181,7 @@ If the “Source” column is selected as the notes column, this table is transf
Notes columns can therefore be used to preserve provenance or other context about a particular key/value pair.
### Row order
### Row order {#row-order}
The order in which the key/value pairs appear matters. The Columnize operation will use the first key it encounters as the delimiter for entries: every time it encounters this key again, it will produce a new row, and add the following key/value pairs to that row.
@ -207,7 +207,7 @@ The occurrences of the “Name” value in the “Field” column define the bou
This sensitivity to order is removed if there are extra columns: in that case, the first extra column will serve as the key for the new rows.
### Extra columns
### Extra columns {#extra-columns}
If your dataset contains extra columns, that are not being used as the key, value, or notes columns, they can be preserved by the operation. For this to work, they must have the same value in all old rows corresponding to a new row.

View File

@ -4,15 +4,15 @@ title: Troubleshooting
sidebar_label: Troubleshooting
---
## Frequently asked questions
## Frequently asked questions {#frequently-asked-questions}
We collect and share FAQs and responses on Github at [https://github.com/OpenRefine/OpenRefine/wiki/FAQ](https://github.com/OpenRefine/OpenRefine/wiki/FAQ).
If you dont find your problem and solution there, continue on to the resources in the Community section below to see more conversations and look for solutions.
## Community
## Community {#community}
### If youre having a problem:
### If youre having a problem: {#if-youre-having-a-problem}
* Search the [User forum](https://groups.google.com/g/openrefine) to see if the problem is already reported
* Search [Github issues](https://github.com/OpenRefine/OpenRefine/issues) to see if the problem is already reported
* Read [Stack Overflow](https://stackoverflow.com/questions/tagged/openrefine) to see if others had a similar problem
@ -21,7 +21,7 @@ If you dont find your problem and solution there, continue on to the resource
* First as a new thread (conversation) in the [User forum](https://groups.google.com/g/openrefine).
* Then, if you wish, you can create a Github issue.
### If you want to contribute:
### If you want to contribute: {#if-you-want-to-contribute}
* [Help us translate the tool into more languages](https://docs.openrefine.org/technical-reference/translating), using Weblate
* [We have a guide to contributing](technical-reference/contributing) in the [Technical Reference](technical-reference/technical-reference-index) section
* Contribute your feature requests in the [User forum](https://groups.google.com/g/openrefine) or as [Github issues](https://github.com/OpenRefine/OpenRefine/issues/new/choose)

View File

@ -4,7 +4,7 @@ title: Wikidata
sidebar_label: Wikidata
---
## Overview
## Overview {#overview}
OpenRefine provides powerful ways to both pull data from Wikidata and add data to it.
@ -16,7 +16,7 @@ The best source for information about how OpenRefine works with Wikidata is [on
OpenRefines connections to Wikidata were formerly an optional extension, but are now included automatically with installation. The Wikidata extension can be removed manually by navigating to your OpenRefine installation folder, and then looking inside `webapp/extensions/` and deleting the `wikidata` folder found there.
## Reconciling with Wikidata
## Reconciling with Wikidata {#reconciling-with-wikidata}
The Wikidata [reconciliation service](reconciling) for OpenRefine [supports](https://reconciliation-api.github.io/testbench/):
* A large number of potential types to reconcile against
@ -28,7 +28,7 @@ You can find documentation and further resources on the reconciliation API [here
For the most part, Wikidata reconciliation behaves the same way other reconciliation services do, but there are a few processes and features specific to Wikidata.
### Language settings
### Language settings {#language-settings}
You can install a version of the Wikidata reconciliation service that uses your language. First, you need the language code: this is the [two-letter code found on this list](https://en.wikipedia.org/wiki/List_of_Wikipedias), or in the domain name of the desired Wikipedia/Wikidata (for instance, “fr” if your Wikipedia is https://fr.wikipedia.org/wiki/).
@ -36,7 +36,7 @@ Then, open the reconciliation window (under <span class="menuItems">Reconcile</s
When reconciling using this interface, items and properties will be displayed in your chosen language if the label is available. The matching score of the reconciliation is not influenced by your choice of language for the service: items are matched by considering all labels and returning the best possible match. The language of your dataset is also irrelevant to your choice of language for the reconciliation service; it simply determines which language labels to return based on the entity chosen.
### Restricting matches by type
### Restricting matches by type {#restricting-matches-by-type}
In Wikidata, types are items themselves. For instance, the [university of Ljubljana (Q1377)](https://www.wikidata.org/wiki/Q1377) has the type [public university (Q875538)](https://www.wikidata.org/wiki/Q875538), using the [instance of (P31)](https://www.wikidata.org/wiki/Property:P31) property. Types can be subclasses of other types, using the [subclass of (P279)](https://www.wikidata.org/wiki/Property:P279) property. For instance, [public university (Q875538)](https://www.wikidata.org/wiki/Q875538) is a subclass of [university (Q3918)](https://www.wikidata.org/wiki/Q3918). You can visualize these structures with the [Wikidata Graph Builder](https://angryloki.github.io/wikidata-graph-builder/).
@ -44,19 +44,19 @@ When you select or enter a type for reconciliation, OpenRefine will include that
Some items and types may not yet be set as an instance or subclass of anything (because Wikidata is crowdsourced). If you restrict reconciliation to a type, items without the chosen type will not appear in the results, except as a fallback, and will have a lower score.
### Reconciling via unique identifiers
### Reconciling via unique identifiers {#reconciling-via-unique-identifiers}
You can supply a column of unique identifiers (in the form "Q###" for entities) directly to Wikidata in order to pull more data, but [these strings will not be “reconciled” against the external dataset](reconciling#reconciling-with-unique-identifiers). Apply the operation <span class="menuItems">Reconcile</span><span class="menuItems">Use values as identifiers</span> on your column of QIDs. All cells will appear as dark blue “confirmed” matches. Some of the “matches” may be errors, which you will need to hover over or click on to identify. You cannot use this to reconcile properties (in the form "P###").
If the identifier you submit is assigned to multiple Wikidata items (because Wikidata is crowdsourced), all of the items are returned as candidates, with none automatically matched.
### Property paths, special properties, and subfields
### Property paths, special properties, and subfields {#property-paths-special-properties-and-subfields}
Wikidata's hierarchical property structure can be called by using property paths (using |, /, and . symbols). Labels, aliases, descriptions, and sitelinks can also be accessed. You can also match values against subfields, such as latitude and longitude subfields of a geographical coordinate.
For information on how to do this, read the [documentation and further resources here](https://wikidata.reconci.link/#documentation).
## Editing Wikidata with OpenRefine
## Editing Wikidata with OpenRefine {#editing-wikidata-with-openrefine}
The best resource is the [Editing section](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing) on Wikidata.
@ -80,7 +80,7 @@ If you upload edits that are redundant (that is, all the statements you want to
You can use OpenRefine's reconciliation preview to look at the target Wikidata elements and see what information they already have, and whether the elements' histories have had similar edits reverted in the past.
### Edit Wikidata schema
### Edit Wikidata schema {#edit-wikidata-schema}
The best resource is the [Schema alignment page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) on Wikidata.
@ -102,7 +102,7 @@ OpenRefine presents you with an easy visual way to map out the relationships in
You may wish to refer to [this Wikidata tutorial on how OpenRefine handles Wikidata schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic_editing).
#### Editing terms with your schema
#### Editing terms with your schema {#editing-terms-with-your-schema}
With OpenRefine, you can edit the terms (labels, aliases, descriptions, or sitelinks) of Wikidata entities as well as establish relationships between entities. For example, you may wish to upload pseudonyms, pen names, maiden names, or married names for authors.
@ -127,7 +127,7 @@ You could upload the “Translated titles” to “Label” with the language sp
![Constructing a schema with aliases and languages.](/img/wikidata-translated.png)
### Manage Wikidata account
### Manage Wikidata account {#manage-wikidata-account}
To edit Wikidata directly from OpenRefine, you must log in with a Wikidata account. OpenRefine can only upload edits with Wikidata user accounts that are “[autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users)” - at this time, that means accounts that have more than 50 edits and have existed for longer than four days.
@ -146,7 +146,7 @@ If your account or your bot is not properly authorized, OpenRefine will not disp
You can store your unencrypted username and password in OpenRefine, saved locally to your computer and available for future use. For security reasons, you may wish to leave this box unchecked. You can also save your OpenRefine-specific bot password in your browser or with a password management tool.
### Import and export schema
### Import and export schema {#import-and-export-schema}
You can save time on repetitive processes by defining a schema on one project, then exporting it and importing for use on new datasets in the future. Or you and your colleagues can share a schema with each other to coordinate your work.
@ -154,7 +154,7 @@ You can export a schema from a project using <span class="menuItems">Export</spa
You can import a schema using <span class="menuItems">Extensions</span><span class="menuItems">Import schema</span>. You can upload a JSON file, or paste JSON statements directly into a field in the window. An imported schema will look for columns with the same names, and you will see an error message if your project doesn't contain matching columns.
### Upload edits to Wikidata
### Upload edits to Wikidata {#upload-edits-to-wikidata}
The best resource is the [Uploading page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Uploading) on Wikidata.
@ -166,7 +166,7 @@ If you are ready to upload your edits, you can provide an “Edit summary” - a
If your edits have been successful, you will see them listed on [your Wikidata user contributions page](https://www.wikidata.org/wiki/Special:Contributions/), and on the [Edit groups page](https://editgroups.toolforge.org/). All edits can be undone from this second interface.
### QuickStatements export
### QuickStatements export {#quickstatements-export}
Your OpenRefine data can be exported in a format recognized by [QuickStatements](https://www.wikidata.org/wiki/Help:QuickStatements), a tool that creates Wikidata edits using text commands. OpenRefine generates “version 1” QuickStatements commands.
@ -177,11 +177,11 @@ In order to use QuickStatements, you must authorize it with a Wikidata account t
Follow the [steps listed on this page](https://www.wikidata.org/wiki/Help:QuickStatements#Running_QuickStatements).
To prepare your OpenRefine data into QuickStatements, select <span class="menuItems">Export</span><span class="menuItems">QuickStatements file</span>, or <span class="menuItems">Extensions</span><span class="menuItems">Export to QuickStatements</span>. Exporting your schema from OpenRefine will generate a text file called `statements.txt` by default. Paste the contents of the text file into a new QuickStatements batch using version 1. You can find [version 1 of the tool (no longer maintained) here](https://wikidata-todo.toolforge.org/quick_statements.php). The text commands will be processed into Wikidata edits and previewed for you to review before submitting.
### Schema alignment
### Schema alignment {#schema-alignment}
The best resource is the [Schema alignment page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) on Wikidata.
### Issue detection
### Issue detection {#issue-detection}
The best resource is the [Quality assurance page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Quality_assurance) on Wikidata.

View File

@ -8,7 +8,7 @@ OpenRefine is a web application, but is designed to be run locally on your own m
This architecture provides a good separation of concerns (data vs. UI); allows the use of familiar web technologies (HTML, CSS, Javascript) to implement user interface features; and enables the server side to be called by third-party software through standard GET and POST operations.
## Technology stack
## Technology stack {#technology-stack}
The server-side part of OpenRefine is implemented in Java as one single servlet which is executed by the [Jetty](http://jetty.codehaus.org/jetty/) web server + servlet container. The use of Java strikes a balance between performance and portability across operating systems (there is very little OS-specific code and has mostly to do with starting the application).
@ -27,7 +27,7 @@ String clustering is provided by the [SIMILE Vicino](http://code.google.com/p/si
OAuth functionality is provided by the [Signpost](https://github.com/mttkay/signpost) project.
## Server-side architecture
## Server-side architecture {#server-side-architecture}
OpenRefine's server-side is written entirely in Java (`main/src/`) and its entry point is the Java servlet `com.google.refine.RefineServlet`. By default, the servlet is hosted in the lightweight Jetty web server instantiated by `server/src/com.google.refine.Refine`. Note that the server class itself is under `server/src/`, not `main/src/`; this separation leaves the possibility of hosting `RefineServlet` in a different servlet container.
@ -35,7 +35,7 @@ The web server configuration is in `main/webapp/WEB-INF/web.xml`; that's where `
As mentioned before, the server-side maintains states of the data, and the primary class involved is `com.google.refine.ProjectManager`.
### Projects
### Projects {#projects}
In OpenRefine there's the concept of a workspace similar to that in Eclipse. When you run OpenRefine it manages projects within a single workspace, and the workspace is embodied in a file directory with sub-directories. The default workspace directories are listed in the [FAQs](https://github.com/OpenRefine/OpenRefine/wiki/FAQ-Where-Is-Data-Stored). You can get OpenRefine to use a different directory by specifying a -d parameter at the command line.
@ -45,14 +45,14 @@ A project's _actual_ data includes the columns, rows, cells, reconciliation reco
A project is loaded into memory when it needs to be displayed or modified, and it remains in memory until 1 hour after the last time it gets modified. Periodically the project manager tries to save modified projects, and it saves as many modified projects as possible within 30 seconds.
### Data Model
### Data Model {#data-model}
A project's data consists of
- _raw data_: a list of rows, each row consisting of a list of cells
- _models_ on top of that raw data that give high-level presentation or interpretation of that data. This design lets the same raw data be viewed in different ways by different models, and let the models be changed without costly changes to the raw data.
#### Column Model
#### Column Model {#column-model}
Cells in rows are not named and can only be addressed by their list position indices. So, a _column model_ is needed to give a name to each list position. The column model also stores other metadata for each column, including the type that cells in the column have been reconciled to and the overall reconciliation statistics of those cells.
@ -60,7 +60,7 @@ Each column also acts as a cache for data computed from the raw data related to
Columns in the column model can be removed and re-ordered without changing the raw data--the cells in the rows. This makes column removal and ordering operations really quick.
##### Column Groups
##### Column Groups {#column-groups}
Consider the following data:
@ -74,7 +74,7 @@ Blank cells play a very important role. The blank cell in a key column of a row
Currently (as of 12th December 2017) only the XML and JSON importers create column groups, and while the data table view does display column groups but it doesn't support modifying them.
### Changes, History, Processes, and Operations
### Changes, History, Processes, and Operations {#changes-history-processes-and-operations}
All changes to the project's data are tracked (N.B. this does not include changes to a project's metadata - such as the project name.)
@ -98,17 +98,17 @@ In summary,
- some processes are long-running and some are immediate; processes are run sequentially in a queue
- generalizable processes can be re-constructed from abstract operations
## Client-side architecture
## Client-side architecture {#client-side-architecture}
The client-side part of OpenRefine is implemented in HTML, CSS and Javascript and uses the following Javascript libraries:
* [jQuery](http://jquery.com/)
* [jQueryUI](http:jqueryui.com/)
* [Recurser jquery-i18n](https://github.com/recurser/jquery-i18n)
### Importing architecture
### Importing architecture {#importing-architecture}
OpenRefine has a sophisticated architecture for accommodating a diverse and extensible set of importable file formats and work flows. The formats range from simple CSV, TSV to fixed-width fields to line-based records to hierarchical XML and JSON. The work flows allow the user to preview and tweak many different import settings before creating the project. In some cases, such as XML and JSON, the user also has to select which elements in the data file to import. Additionally, a data file can also be an archive file (e.g., .zip) that contains many files inside; the user can select which of those files to import. Finally, extensions to OpenRefine can inject functionalities into any part of this architecture.
### The Index Page and Action Areas
### The Index Page and Action Areas {#the-index-page-and-action-areas}
The opening screen of OpenRefine is implemented by the file refine/main/webapp/modules/core/index.vt and will be referred to here as the index page. Its default implementation contains 3 finger tabs labeled Create Project, Open Project, and Import Project. Each tab selects an "action area". The 3 default action areas are for, obviously, creating a new project, opening an existing project, and importing a project .tar file.
@ -126,13 +126,13 @@ The UI class is a constructor function that takes one argument, a jQuery-wrapped
If your extension requires a very unique importing work flow, or a very novel feature that should be exposed on the index page, then add a new action area. Otherwise, try to use the existing work flows as much as possible.
### The Create Project Action Area
### The Create Project Action Area {#the-create-project-action-area}
The Create Project action area is itself extensible. Initially, it embeds a set of finger tabs corresponding to a variety of "source selection UIs": you can select a source of data by specifying a file on your computer, or you can specify the URL to a publicly accessible data file or data feed, or you can paste in from the clipboard a chunk of data.
There are actually 3 points of extension in the Create Project action area, and the first is invisible.
#### Importing Controllers
#### Importing Controllers {#importing-controllers}
The Create Project action area manages a list of "importing controllers". Each controller follows a particular work flow (in UI terms, think "wizard"). Refine comes with a "default importing controller" (refine/main/webapp/modules/core/scripts/index/default-importing-controller/controller.js) and its work flow assumes that the data can be retrieved and cached in whole before getting processed in order to generate a preview for the user to inspect. (If the data cannot be retrieved and cached in whole before previewing, then another importing controller is needed.)
@ -153,7 +153,7 @@ Refine.CreateProjectUI.controllers.push(Refine.DefaultImportingController); // r
We will cover the server-side code below.
#### Data Source Selection UIs
#### Data Source Selection UIs {#data-source-selection-uis}
Data source selection UIs are another point of extensibility in the Create Project action area. As mentioned previously, by default there are 3 data source UIs. Those are added by the default importing controller.
@ -192,34 +192,34 @@ The argument `form` is a jQuery-wrapped FORM element that will get submitted to
See refine/main/webapp/modules/core/scripts/index/default-importing-sources/sources.js for examples of such source selection UIs. While we write about source selection UIs managed by the default importing controller here, chances are your own extension will not be adding such a new source selection UI. Your extension probably adds with a new importing controller as well as a new source selection UI that work together.
#### File Selection Panel
#### File Selection Panel {#file-selection-panel}
Documentation not currently available
#### Parsing UI Panel
#### Parsing UI Panel {#parsing-ui-panel}
Documentation not currently available
### Server-side Components
### Server-side Components {#server-side-components}
#### ImportingController
#### ImportingController {#importingcontroller}
Documentation not currently available
#### UrlRewriter
#### UrlRewriter {#urlrewriter}
Documentation not currently available
#### FormatGuesser
#### FormatGuesser {#formatguesser}
Documentation not currently available
#### ImportingParser
#### ImportingParser {#importingparser}
Documentation not currently available
## Faceted browsing architecture
## Faceted browsing architecture {#faceted-browsing-architecture}
Faceted browsing support is core to OpenRefine as it is the primary and only mechanism for filtering to a subset of rows on which to do something _en masse_ (ie in bulk). Without faceted browsing or an equivalent querying/browsing mechanism, you can only change one thing at a time (one cell or row) or else change everything all at once; both kinds of editing are practically useless when dealing with large data sets.
In OpenRefine, different components of the code need to know which rows to process from the faceted browsing state (how the facets are constrained). For example, when the user applies some facet selections and then exports the data, the exporter serializes only the matching rows, not all rows in the project. Thus, faceted browsing isn't only hooked up to the data view for displaying data to the user, but it is also hooked up to almost all other parts of the system.
### Engine Configuration
### Engine Configuration {#engine-configuration}
As OpenRefine is a web app, there might be several browser windows opened on the same project, each in a different faceted browsing state. It is best to maintain the faceted browsing state in each browser window while keeping the server side completely stateless with regard to faceted browsing. Whenever the client-side needs something done by the server, it transfers the entire faceted browsing state over to the server-side. The faceted browsing state behaves much like the `WHERE` clause in a SQL query, telling the server-side how to select the rows to process.
@ -267,7 +267,7 @@ In the code, the faceted browsing state, or faceted browsing query, is actually
}
```
### Server-Side Subsystem
### Server-Side Subsystem {#server-side-subsystem}
From an engine configuration like the one above, the server-side faceted browsing subsystem is capable of producing:
@ -278,6 +278,6 @@ When the engine config JSON arrives in an HTTP request on the server-side, a `co
To produce information on how to render a particular facet in the UI, the engine follows the same procedure described in the previous except it skips over the facet in question. In other words, it produces an iteration over all rows constrained by the other facets. Then it feeds that iteration to the facet in question by calling the facet's `computeChoices()` method. This gives the method a chance to compute the rendering information for its UI counterpart on the client-side. When all facets have been given a chance to compute their rendering information, the engine calls all facets to serialize their information as JSON and returns the JSON to the client-side. Only one HTTP call is needed to compute all facets.
### Client-side subsystem
### Client-side subsystem {#client-side-subsystem}
On the client-side there is also an engine object (implemented in Javascript rather than Java) and zero or more facet objects (also in Javascript, obviously). The engine is responsible for distributing the rendering information computed on the server-side to the right facets, and when the user interacts with a facet, the facet tells the engine to update the whole UI. To do so, the engine gathers the configuration of each facet and composes the whole engine config as a single JSON object. Two separate AJAX calls are made with that engine config, one to retrieve the rows to render, and one to re-compute the rendering information for the facets because changing one facet does affect all the other facets.

View File

@ -15,7 +15,7 @@ You will need:
From the top level directory in the OpenRefine application you can build, test and run OpenRefine using the `./refine` shell script (if you are working in a \*nix shell), or using the `refine.bat` script from the Windows command line. Note that the `refine.bat` on Windows only supports a subset of the functionality, supported by the `refine` shell script. The example commands below are using the `./refine` shell script, and you will need to use `refine.bat` if you are working from the Windows command line.
### Set up JDK
### Set up JDK {#set-up-jdk}
You must [install JDK](https://jdk.java.net/15/) and set the JAVA_HOME environment variable (please ensure it points to the JDK, and not the JRE).
@ -77,7 +77,7 @@ export JAVA_HOME="$(/usr/libexec/java_home -v 13)"
<TabItem value="linux">
##### With the terminal
##### With the terminal {#with-the-terminal}
Enter the following:
@ -87,7 +87,7 @@ sudo apt install default-jre
This probably wont install the latest JDK package available on the Java website, but it is faster and more straightforward. (At the time of writing, it installs OpenJDK 11.0.7.)
##### Manually
##### Manually {#manually}
First, [extract the JDK package](https://openjdk.java.net/install/) to the new directory `usr/lib/jvm`:
@ -132,7 +132,7 @@ It should show the path you set above.
### Maven (Optional)
### Maven (Optional) {#maven-optional}
OpenRefine's build script will download Maven for you and use it, if not found already locally installed.
If you will be using your Maven installation instead of OpenRefine's build script download installation, then set the `MVN_HOME` environment variable. You may need to reboot your machine after setting these environment variables. If you receive a message `Could not find the main class: com.google.refine.Refine. Program will exit.` it is likely `JAVA_HOME` is not set correctly.
@ -145,7 +145,7 @@ MAVEN_HOME=E:\Downloads\apache-maven-3.5.4-bin\apache-maven-3.5.4\
NOTE: You can use Maven commands directly, but running some goals in isolation might fail (try adding the `compile test-compile` goals in your invocation if that is the case).
### Building
### Building {#building}
To see what functions are supported by OpenRefine's build system, type
```shell
@ -158,7 +158,7 @@ To build the OpenRefine application from source type:
./refine build
```
### Testing
### Testing {#testing}
Since OpenRefine is composed of two parts, a server and a in-browser UI, the testing system reflects that:
* on the server side, it's powered by [TestNG](http://testng.org/) and the unit tests are written in Java;
@ -182,14 +182,14 @@ If you want to run only the client side portion of the tests, use:
./refine ui_test chrome
```
## Running
## Running {#running}
To run OpenRefine from the command line (assuming you have been able to build from the source code successfully)
```shell
./refine
```
By default, OpenRefine will use [refine.ini](https://github.com/OpenRefine/OpenRefine/blob/master/refine.ini) for configuration. You can copy it and rename it to `refine-dev.ini`, which will be used for configuration instead. `refine-dev.ini` won't be tracked by Git, so feel free to put your custom configurations into it.
## Building Distributions (Kits)
## Building Distributions (Kits) {#building-distributions-kits}
The Refine build system uses Apache Ant to automate the creation of the installation packages for the different operating systems. The packages are currently optimized to run on Mac OS X which is the only platform capable of creating the packages for all three OS that we support.
@ -200,7 +200,7 @@ To build the distributions type
```
where 'version' is the release version.
## Building, Testing and Running OpenRefine from Eclipse
## Building, Testing and Running OpenRefine from Eclipse {#building-testing-and-running-openrefine-from-eclipse}
OpenRefine' source comes with Maven configuration files which are recognized by [Eclipse](http://www.eclipse.org/) if the Eclipse Maven plugin (m2e) is installed.
At the command line, go to a directory **not** under your Eclipse workspace directory and check out the source:
@ -224,22 +224,22 @@ Right click on the `server` subproject, click `Run as...` and `Run configuration
This will add a run configuration that you can then use to run OpenRefine from Eclipse.
## Testing in Eclipse
## Testing in Eclipse {#testing-in-eclipse}
You can run the server tests directly from Eclipse. To do that you need to have the TestNG launcher plugin installed, as well as the TestNG M2E plugin (for integration with Maven). If you don't have it, you can get it by [installing new software](https://help.eclipse.org/2020-03/index.jsp?topic=/org.eclipse.platform.doc.user/tasks/tasks-129.htm) from this update URL http://dl.bintray.com/testng-team/testng-eclipse-release/
Once the TestNG launching plugin is installed in your Eclipse, right click on the source folder "main/tests/server/src", select `Run As` -> `TestNG Test`. This should open a new tab with the TestNG launcher running the OpenRefine tests.
### Test coverage in Eclipse
### Test coverage in Eclipse {#test-coverage-in-eclipse}
It is possible to analyze test coverage in Eclipse with the `EclEmma Java Code Coverage` plugin. It will add a `Coverage as…` menu similar to the `Run as…` and `Debug as…` menus which will then display the covered and missed lines in the source editor.
### Debug with Eclipse
### Debug with Eclipse {#debug-with-eclipse}
Here's an example of putting configuration in Eclipse for debugging, like putting values for the Google Data extension. Other type of configurations that can be set are memory, Wikidata login information and more.
![Screenshot of Eclipse debug configuration](/img/eclipse-debug-config.png)
## Building, Testing and Running OpenRefine from IntelliJ idea
## Building, Testing and Running OpenRefine from IntelliJ idea {#building-testing-and-running-openrefine-from-intellij-idea}
At the command line, go to a directory you want to save the OpenRefine project and execute the following command to clone the repository:

View File

@ -6,14 +6,14 @@ sidebar_label: Contributing
Please read the general [guidelines on contributing to OpenRefine](https://github.com/OpenRefine/OpenRefine/blob/master/CONTRIBUTING.md) first, then review the information on [reporting and tracking issues](#reporting-and-tracking-issues), and on making your [first pull request](#your-first-pull-request) below)
## Reporting and tracking issues
## Reporting and tracking issues {#reporting-and-tracking-issues}
If you need to file a bug or request a feature, [create an Issue in the OpenRefine Github repository](https://github.com/OpenRefine/OpenRefine/issues). Github issues should be used for reporting specific bugs and requesting specific features. If you just don't know how to do something using OpenRefine, or want to discuss some ideas, please:
- [Try the user manual](/)
- [post to our OpenRefine mailing list](http://groups.google.com/group/openrefine/)
## Contributing to the documentation
## Contributing to the documentation {#contributing-to-the-documentation}
We use [Docusaurus](https://docusaurus.io/) for our docs. For small documentation changes, you should be able to edit the Markdown files directly and submit them as a pull request. A preview of the docs will be generated automatically. But it is also
possible to preview your changes locally. Assuming you have [Node.js](https://nodejs.org/en/download/) installed (which includes npm), you can install Docusaurus with:
@ -41,7 +41,7 @@ You can also spin a local web server to serve the docs for you, with auto-refres
yarn start
```
## Your first code pull request
## Your first code pull request {#your-first-code-pull-request}
This describes the overall steps to your first code contribution in OpenRefine. If you have trouble with any of these steps feel free to reach out on the [developer mailing list](https://groups.google.com/forum/#!forum/openrefine-dev) or the [Gitter channel](https://gitter.im/OpenRefine/OpenRefine).

View File

@ -6,7 +6,7 @@ sidebar_label: Data extension API
This page describes a new optional API for reconciliation services, allowing clients to pull properties of reconciled records. It is supported from OpenRefine 2.8 onwards. A sample server implementation is available in the [Wikidata reconciliation interface](https://tools.wmflabs.org/openrefine-wikidata/).
## Overview of the workflow
## Overview of the workflow {#overview-of-the-workflow}
1. Reconcile a column with a standard reconciliation service
@ -21,7 +21,7 @@ property. The user can run data extension again from that column.
[GIF Screencast](http://pintoch.ulminfo.fr/92dcdd20f3/recorded.gif)
## Specification
## Specification {#specification}
Services supporting data extension must add an `extend` field in their service metadata. This field is expected to have the following subfields, all optional:
* `propose_properties` stores the endpoint of an API which will be used to suggest properties to fetch (see specification below). The field contains an object with a `service_url` and `service_path` which will be concatenated to obtain the URL where the endpoint is available, just like the other services in the metadata. If this field is not provided, no property will be suggested in the dialog (the user will have to input them manually).
@ -39,7 +39,7 @@ Example service metadata:
"property_settings": []
}
```
### Property proposal protocol
### Property proposal protocol {#property-proposal-protocol}
The role of the property proposal endpoint is to suggest a list of properties to fetch. As only input, it accepts GET parameters:
* the `type` of a column was reconciled against. If no type is provided, it should suggest properties for a column reconciled against no type.
@ -73,7 +73,7 @@ The endpoint returns a JSON response as follows:
```
This endpoint must support JSONP via the `callback` parameter (just like all other endpoints of the reconciliation service).
### Data extension protocol
### Data extension protocol {#data-extension-protocol}
After calling the property proposal endpoint, the consumer (OpenRefine) calls the service endpoint with a JSON object in the `extend` parameter, containing the following fields:
* `ids` is a list of strings, each of which being an identifier of a record as returned by the reconciliation method. These are the records whose properties should be retrieved.
@ -204,7 +204,7 @@ Example of a full response (for the example query above):
]
}
```
### Settings specification
### Settings specification {#settings-specification}
The `property_settings` field in the service metadata allows the service to declare it accepts some settings for the properties it fetches. They are specified as a list of JSON objects which define the fields which should be exposed to the user.

View File

@ -9,17 +9,17 @@ Please be aware that the OpenRefine roadmap is subject to change at any time, so
If there are features you would like to see that are not currently listed here or in current [milestones](https://github.com/OpenRefine/OpenRefine/milestones), [projects](https://github.com/OpenRefine/OpenRefine/projects) and [issues](https://github.com/OpenRefine/OpenRefine/issues), please add them to the [issue tracker](https://github.com/OpenRefine/OpenRefine/issues).
## Planned releases
## Planned releases {#planned-releases}
### 4.0
### 4.0 {#40}
[New backend storage option to allow using much bigger datasets at the expense of real-time feedback.](https://github.com/OpenRefine/OpenRefine/milestone/7)
New UI (possibly Vue or React based)
## Work in progress
## Work in progress {#work-in-progress}
Alongside the planned releases there are often smaller pieces of work in progress. Check for [recently updated issues](https://github.com/OpenRefine/OpenRefine/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) and [pull requests](https://github.com/OpenRefine/OpenRefine/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc) to see what is currently in the works.
## On the back burner
## On the back burner {#on-the-back-burner}
Some aspects of OpenRefine have previously been targeted for release, but have not made it into a release and have not been worked on recently. If you would like to see features in these areas, please create an issue the describes what development you would like to see:
- Streamlining traditional features

View File

@ -6,7 +6,7 @@ sidebar_label: Functional tests
import useBaseUrl from '@docusaurus/useBaseUrl';
## Introduction
## Introduction {#introduction}
OpenRefine interface is tested with the [Cypress framework](https://www.cypress.io/).
With Cypress, tests are performing assertions using a real browser, the same way a real user would use the software.
@ -18,7 +18,7 @@ Cypress tests can be ran
If you are writing tests, the Cypress test runner is good enough, and the command-line is mainly used by the CI/CD platform (Github actions)
## Cypress brief overview
## Cypress brief overview {#cypress-brief-overview}
Cypress operates insides a browser, it's internally using NodeJS.
That's a key difference with tools such as Selenium.
@ -36,14 +36,14 @@ The general workflow of a Cypress test is to
- Trigger user actions
- Assert that the DOM contains expected texts and elements using selectors
## Getting started
## Getting started {#getting-started}
If this is the first time you use Cypress, it is recommended for you to get familiar with the tool.
- [Cypress overview](https://docs.cypress.io/guides/overview/why-cypress.html)
- [Cypress examples of tests and syntax](https://example.cypress.io/)
### 1. Install Cypress
### 1. Install Cypress {#1-install-cypress}
You will need:
@ -58,7 +58,7 @@ cd ./main/tests/cypress
yarn install
```
### 2. Start the test runner
### 2. Start the test runner {#2-start-the-test-runner}
The test runner assumes that OpenRefine is up and running on the local machine, the tests themselves do not launch OpenRefine, nor restarts it.
@ -74,21 +74,21 @@ Then start Cypress
yarn --cwd ./main/tests/cypress run cypress open
```
### 3. Run the existing tests
### 3. Run the existing tests {#3-run-the-existing-tests}
Once the test runner is up, you can choose to run one or several tests by selecting them from the interface.
Click on one of them and the test will start.
### 4. Add your first test
### 4. Add your first test {#4-add-your-first-test}
- Add a `test.spec.js` into the `main/tests/cypress/cypress/integration` folder.
- The test is instantly available in the list
- Click on the test
- Start to add some code
## Tests technical documentation
## Tests technical documentation {#tests-technical-documentation}
### A typical test
### A typical test {#a-typical-test}
A typical OpenRefine test starts with the following code
@ -117,14 +117,14 @@ For example
See below on the dedicated section 'Testing utilities'
### Testing guidelines
### Testing guidelines {#testing-guidelines}
- `cy.wait` should be used in the last resort scenario. It's considered a bad practice, though sometimes there is no other choice
- Tests should remain isolated from each other. It's best to try one feature at the time
- A test should always start with a fresh project
- The name of the files should mirror the OpenRefine UI organization
### Testing utilities
### Testing utilities {#testing-utilities}
OpenRefine contributors have added some utility methods on the top of the Cypress framework.
Those methods perform some common actions or assertions on OpenRefine, to avoid code duplication.
@ -156,12 +156,12 @@ The fixture parameter can be
Those datasets live in `cypress/fixtures`
### Browsers
### Browsers {#browsers}
In terms of browsers, Cypress is using what is installed on your operating system.
See the [Cypress documentation](https://docs.cypress.io/guides/guides/launching-browsers.html#Browsers) for a list of supported browsers
### Folder organization
### Folder organization {#folder-organization}
Tests are located in `main/tests/cypress/cypress` folder.
The test should not use any file outside the cypress folder.
@ -172,7 +172,7 @@ The test should not use any file outside the cypress folder.
- `/screenshots` and `/videos` contains the recording of the tests, Git ignored
- `/support` is a custom library of assertion and common user actions, to avoid code duplication in the tests themselves
### Configuration
### Configuration {#configuration}
Cypress execution can be configured with environment variables, they can be declared at the OS level, or when running the test
@ -182,11 +182,11 @@ Available variables are
Cypress contains [exaustive documentation](https://docs.cypress.io/guides/guides/environment-variables.html#Setting) about configuration, but here are two simple ways to configure the execution of the tests:
#### Overriding with a cypress.env.json file
#### Overriding with a cypress.env.json file {#overriding-with-a-cypressenvjson-file}
This file is ignored by Git, and you can use it to configure Cypress locally
#### Command-line
#### Command-line {#command-line}
You can pass variables at the command-line level
@ -194,7 +194,7 @@ You can pass variables at the command-line level
yarn --cwd ./main/tests/cypress run cypress open --env OPENREFINE_URL="http://localhost:1234"
```
### Visual testing
### Visual testing {#visual-testing}
Tests generally ensure application behavior by making assertions against the DOM, to ensure specific texts or css attributes are present in the document body.
Visual testing, on the contrary, is a way to test applications by comparing images.
@ -212,7 +212,7 @@ Identified cases are so far:
Reference screenshots (Called snapshots), are stored in /cypress/snapshots.
And a snapshot can be taken for the whole page, or just a single part of the page.
#### When a visual test fails
#### When a visual test fails {#when-a-visual-test-fails}
First, Cypress will display the following error message:
@ -223,7 +223,7 @@ The diff images shows the reference image on the left, the image that was taken
![Diff image when a visual test fails](/img/failed-visual-test.png)
## CI/CD
## CI/CD {#cicd}
In CI/CD, tests are run headless, with the following command-line
@ -233,7 +233,7 @@ In CI/CD, tests are run headless, with the following command-line
Results are displayed in the standard output
## Resources
## Resources {#resources}
[Cypress command line options](https://docs.cypress.io/guides/guides/command-line.html#Installation)
[Lots of good Cypress examples](https://example.cypress.io/)

View File

@ -4,19 +4,19 @@ title: Migrating older Extensions
sidebar_label: Migrating older Extensions
---
## Migrating from Ant to Maven
## Migrating from Ant to Maven {#migrating-from-ant-to-maven}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change}
Ant is a fairly old (antique?) build system that does not incorporate any dependency management.
By migrating to Maven we are making it easier for developers to extend OpenRefine with new libraries, and stop having to ship dozens of .jar files in the repository. Using the Maven repository also encourages developers to add dependencies to released versions of libraries instead of custom snapshots that are hard to update.
### When was this change made?
### When was this change made? {#when-was-this-change-made}
The migration was done between 3.0 and 3.1-beta with this commit:
https://github.com/OpenRefine/OpenRefine/commit/47323a9e750a3bc9d43af606006b5eb20ca397b8
### How to migrate an extension
### How to migrate an extension {#how-to-migrate-an-extension}
You will need to write a `pom.xml` in the root folder of your extension to configure the compilation process with Maven. Sample `pom.xml` files for extensions can be found in the extensions that are shipped with OpenRefine (`gdata`, `database`, `jython`, `pc-axis` and `wikidata`). A sample extension (`sample`) is also provided, with a minimal build file.
@ -56,17 +56,17 @@ And add the dependency to the `<dependencies>` section as usual:
<version>0.5.3-SNAPSHOT</version>
</dependency>
## Migrating to Wikimedia's i18n jQuery plugin
## Migrating to Wikimedia's i18n jQuery plugin {#migrating-to-wikimedias-i18n-jquery-plugin}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change-1}
This adds various important localization features, such as the ability to handle plurals or interpolation. This also restores the language fallback (displaying strings in English if they are not available in the target language) which did not work with the previous set up.
### When was the migration made?
### When was the migration made? {#when-was-the-migration-made}
The migration was made between 3.1-beta and 3.1, with this commit: https://github.com/OpenRefine/OpenRefine/commit/22322bd0272e99869ab8381b1f28696cc7a26721
### How to migrate an extension
### How to migrate an extension {#how-to-migrate-an-extension-1}
You will need to update your translation files, merging nested objets in one global object, concatenating keys. You can do this by running the following Python script on all your JSON translation files:
@ -97,27 +97,27 @@ Then your javascript files which retrieve the translated strings should be updat
You can then chase down the places where you are concatenating translated strings, and replace that with more flexible patterns using [the plugin's features](https://github.com/wikimedia/jquery.i18n#jqueryi18n-plugin).
## Migrating from org.json to Jackson
## Migrating from org.json to Jackson {#migrating-from-orgjson-to-jackson}
### Why are we doing this change?
### Why are we doing this change? {#why-are-we-doing-this-change-2}
The org.json (or json-java) library has multiple drawbacks.
* First, it has limited functionality - all the serialization and deserialization has to be done explicitly - an important proportion of OpenRefine's code was dedicated to implementing these;
* Second, its implementation is not optimized for speed - multiple projects have reported speedups when migrating to more modern JSON libraries;
* Third, and this was the decisive factor to initiate the migration: [its license](https://json.org/license) is the MIT license with an additional condition which makes it non-free. Getting rid of this dependency was required by the Software Freedom Conservancy as a prerequisite to become a fiscal sponsor for the project.
### When was the migration made?
### When was the migration made? {#when-was-the-migration-made-1}
This change was made between 3.1 and 3.2-beta, with this commit: https://github.com/OpenRefine/OpenRefine/commit/5639f1b2f17303b03026629d763dcb6fef98550b
### How to migrate an extension or fork
### How to migrate an extension or fork {#how-to-migrate-an-extension-or-fork}
You will need to use the Jackson library to serialize the classes that implement interfaces or extend classes exposed by OpenRefine.
The interface `Jsonizable` was deleted. Any class that used to implement this now needs to be serializable by Jackson, producing the same format as the previous serialization code. This applies to any operation, facet, overlay model or GREL function. If you are new to Jackson, have a look at [this tutorial](https://www.baeldung.com/jackson) to learn how to annotate your class for serialization. Once this is done, you can remove the `void write(JSONWriter writer, Properties options)` method from your class. Note that it is important that you do this migration for all classes implementing the `Jsonizable` interface that are exposed to OpenRefine's core.
We encourage you to migrate out of org.json completely, but this is only required for the classes that interact with OpenRefine's core.
#### General notes about migrating
#### General notes about migrating {#general-notes-about-migrating}
OpenRefine's ObjectMapper is available at `ParsingUtilities.mapper`. It is configured to only serialize the fields and getters that have been explicitly marked with `@JsonProperty` (to avoid accidental JSON format changes due to refactoring). On deserialization it will ignore any field in the JSON payload that does not correspond to a field in the Java class. It has serializers and deserializers for `OffsetDateTime` and `LocalDateTime`.
@ -130,13 +130,13 @@ Useful snippets to use in tests:
Before undertaking the migration, we recommend that you write some tests which serialize and deserialize your objects. This will help you make sure that the JSON format is preserved during the migration. One way to do this is to collect some sample JSON representations of your objects, and check in your tests that deserializing these JSON payloads and serializing them back to JSON preserves the JSON payload. Some utilities are available to help you with that in [`TestUtils`](https://github.com/OpenRefine/OpenRefine/blob/master/main/tests/server/src/com/google/refine/tests/util/TestUtils.java) (we had [some to test org.json serialization](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/tests/server/src/com/google/refine/tests/util/TestUtils.java) before we got rid of the dependency, feel free to copy them).
#### For functions
#### For functions {#for-functions}
Before the migration, you had to explicitly define JSON serialization of functions with a `write` method. You should now override the getters returning the various documentation fields.
Example: `Cos` function [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/expr/functions/math/Cos.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/expr/functions/math/Cos.java).
#### For operations
#### For operations {#for-operations}
Before the JSON migration we refactored engine-dependent operations so that the engine configuration is represented by an `EngineConfig` object instead of a `JSONObject`. Therefore the constructor for your operation should be updated to use this new class. Your constructor should also be annotated to be used during deserialization.
@ -144,25 +144,25 @@ Note that you do not need to explicitly serialize the operation type, this is al
Example: `ColumnRemovalOperation` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java).
#### For changes
#### For changes {#for-changes}
Changes are serialized in plain text but often relies on JSON serialization for parts of the data. Just use the methods above with `ParsingUtilities.mapper` to maintain this behaviour.
Example: `ReconChange` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/model/changes/ReconChange.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/operations/column/ColumnRemovalOperation.java).
#### For importers
#### For importers {#for-importers}
The importing options have been migrated from `JSONObject` to `ObjectNode`. Your compiler should help you propagate this change. Utility functions in `JSONUtilities` have been migrated to Jackson so you should have minimal changes if you used them.
Example: `TabularImportingParserBase` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/main/src/com/google/refine/importers/TabularImportingParserBase.java) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/importers/TabularImportingParserBase.java).
#### For overlay models
#### For overlay models {#for-overlay-models}
Migrate serialization and deserialization as for other objects.
Example: `WikibaseSchema` [before](https://github.com/OpenRefine/OpenRefine/blob/3.1/extensions/wikidata/src/org/openrefine/wikidata/schema/WikibaseSchema.java#L203) and [after](https://github.com/OpenRefine/OpenRefine/blob/master/extensions/wikidata/src/org/openrefine/wikidata/schema/WikibaseSchema.java#L60)
#### For preference values
#### For preference values {#for-preference-values}
Any class that is stored in OpenRefine's preference now needs to implement the `com.google.refine.preferences.PreferenceValue` interface. The static `load` method and the `write` method used previously for deserialization should be deleted and regular Jackson serialization and deserialization should be implemented instead. Note that you do not need to explicitly serialize the class name, this is already done for you by the interface.

View File

@ -6,7 +6,7 @@ sidebar_label: OpenRefine API
This is a generic API reference for interacting with OpenRefine's HTTP API.
## Create project:
## Create project: {#create-project}
> **Command:** _POST /command/core/create-project-from-upload_
@ -37,7 +37,7 @@ If the project creation is successful, you will be redirected to a URL of the fo
From the project parameter you can extract the project id for use in future API calls. The content of the response is the HTML for the OpenRefine interface for viewing the project.
### Get project models:
### Get project models: {#get-project-models}
> **Command:** _GET /command/core/get-models?_
@ -45,7 +45,7 @@ From the project parameter you can extract the project id for use in future API
Recovers the models for the specific project. This includes columns, records, overlay models, scripting. In the columnModel a list of the columns is displayed, key index and name, and column groupings.
### Response:
### Response: {#response}
**On success:**
```JSON
{
@ -108,7 +108,7 @@ Recovers the models for the specific project. This includes columns, records, o
## Apply operations
## Apply operations {#apply-operations}
> **Command:** _POST /command/core/apply-operations?_
@ -155,7 +155,7 @@ Example of a Valid JSON **Array**
On success returns JSON response
`{ "code" : "ok" }`
## Export rows
## Export rows {#export-rows}
> **Command:** _POST /command/core/export-rows_
@ -177,7 +177,7 @@ Returns exported row data in the specified format. The formats supported will de
* ods
* html
## Delete project
## Delete project {#delete-project}
> **Command:** _POST /command/core/delete-project_
@ -185,7 +185,7 @@ Returns exported row data in the specified format. The formats supported will de
Returns JSON response
## Check status of async processes
## Check status of async processes {#check-status-of-async-processes}
> **Command:** _GET /command/core/get-processes_
@ -193,13 +193,13 @@ Returns JSON response
Returns JSON response
## Get all projects metadata:
## Get all projects metadata: {#get-all-projects-metadata}
> **Command:** _GET /command/core/get-all-project-metadata_
Recovers the meta data for all projects. This includes the project's id, name, time of creation and last time of modification.
### Response:
### Response: {#response-1}
```json
{
"projects":{
@ -213,12 +213,12 @@ Recovers the meta data for all projects. This includes the project's id, name, t
}
```
## Expression Preview
## Expression Preview {#expression-preview}
> **Command:** _POST /command/core/preview-expression_
Pass some expression (GREL or otherwise) to the server where it will be executed on selected columns and the result returned.
### Parameters:
### Parameters: {#parameters}
* **cellIndex**: _[column]_
The cell/column you wish to execute the expression on.
* **rowIndices**: _[rows]_
@ -232,7 +232,7 @@ A boolean value (true/false) indicating whether or not this command should be re
* **repeatCount**: _[repeatCount]_
The maximum amount of times a command will be repeated.
### Response:
### Response: {#response-2}
**On success:**
```json
{
@ -252,30 +252,30 @@ The result array will hold up to ten results, depending on how many rows there a
}
```
## Third-party software libraries
## Third-party software libraries {#third-party-software-libraries}
Libraries using the [OpenRefine API](openrefine-api):
### Python
### Python {#python}
* [refine-client-py](https://github.com/PaulMakepeace/refine-client-py/)
* Or this fork of the above with an extended CLI [openrefine-client](https://github.com/felixlohmeier/openrefine-client)
* [refine-python](https://github.com/maxogden/refine-python)
### Ruby
### Ruby {#ruby}
* [refine-ruby](https://github.com/distillytics/refine-ruby)
* The above is a maintained fork of [google-refine](https://github.com/maxogden/refine-ruby)
* [google_refine](https://github.com/chengguangnan/google_refine)
### NodeJS
### NodeJS {#nodejs}
* [node-openrefine](https://github.com/pm5/node-openrefine)
### R
### R {#r}
* [rrefine](https://cran.r-project.org/web/packages/rrefine/index.html)
### PHP
### PHP {#php}
* [openrefine-php-client](https://github.com/keboola/openrefine-php-client)
### Java
### Java {#java}
* [refine-java](https://github.com/dtap-gmbh/refine-java)
### Bash
### Bash {#bash}
* [bash-refine.sh](https://gist.github.com/felixlohmeier/d76bd27fbc4b8ab6d683822cdf61f81d) (templates for shell scripts)

View File

@ -8,7 +8,7 @@ _This page is kept for the record. [A cleaner version of this specification](htt
_This is a technical description of the mechanisms behind the reconciliation system in OpenRefine. For usage instructions, see [Reconciliation](/manual/reconciling)._
## Introduction
## Introduction {#introduction}
A reconciliation service is a web service that, given some text which is a name or label for something, and optionally some additional details, returns a ranked list of potential entities matching the criteria. The candidate text does not have to match each entity's official name perfectly, and that's the whole point of reconciliation--to get from ambiguous text name to precisely identified entities. For instance, given the text "apple", a reconciliation service probably should return the fruit apple, the Apple Inc. company, and New York city (also known as the Big Apple).
@ -31,7 +31,7 @@ A standard reconciliation service is a HTTP-based RESTful JSON-formatted API. It
The specification of each of these endpoints is given in the following sections.
## Workflow overview
## Workflow overview {#workflow-overview}
OpenRefine communicates with reconciliation services in the following way.
@ -41,7 +41,7 @@ OpenRefine communicates with reconciliation services in the following way.
* When reconciliation starts, OpenRefine queries the service in [batch mode](reconciliation-api#multiple-query-mode) for small batches of rows and stores the responses of the service.
* Once reconciliation is complete, the results are displayed. The user makes reconciliation decisions based on the choices provided. If a [suggest service](reconciliation-api#suggest-apis) is available, it will be used to input custom reconciliation decisions. If a [preview service](reconciliation-api#preview-api) is available, the user will be able to preview the reconciliation candidates without leaving OpenRefine.
## Main reconciliation service
## Main reconciliation service {#main-reconciliation-service}
The root URL has two functions:
@ -50,7 +50,7 @@ The root URL has two functions:
There is a deprecated "single query" mode which is used if the `query` parameter is given. This mode is no longer supported or used by OpenRefine and other API consumers should not rely on it.
### Service metadata
### Service metadata {#service-metadata}
When a service is called with just a JSONP `callback` parameter and no other parameters, it must return its _service metadata_ as a JSON object literal with the following fields:
@ -87,9 +87,9 @@ Here are two live examples:
}
```
## Query Request
## Query Request {#query-request}
### Multiple Query Mode
### Multiple Query Mode {#multiple-query-mode}
A call to a standard reconciliation service API for multiple queries looks like this:
@ -109,7 +109,7 @@ curl -X POST -d 'queries={ "q0" : { "query" : "foo" }, "q1" : { "query" : "bar"
OpenRefine uses POST for all requests, so make sure your service supports the format above.
### **DEPRECATED** Single Query Mode
### **DEPRECATED** Single Query Mode {#deprecated-single-query-mode}
A call to a reconciliation service API for a single query looks like either of these:
@ -121,7 +121,7 @@ If the query parameter is a string, then it's an abbreviation of `query={"query"
1. [https://tools.wmflabs.org/openrefine-wikidata/en/api?query=boston](https://tools.wmflabs.org/openrefine-wikidata/en/api?query=boston)
2. [https://tools.wmflabs.org/openrefine-wikidata/en/api?query={%22query%22:%22boston%22,%22type%22:%22Q515%22}](https://tools.wmflabs.org/openrefine-wikidata/en/api?query={%22query%22:%22boston%22,%22type%22:%22Q515%22})
### Query JSON Object
### Query JSON Object {#query-json-object}
The query json object literal has a few fields
@ -160,7 +160,7 @@ Here is an example of a full query parameter:
}
```
## Query Response
## Query Response {#query-response}
For multiple queries, the response is a JSON literal object with the same keys as in the request
```json
@ -195,11 +195,11 @@ Each result consists of a JSON literal object with the structure
The results should be sorted by decreasing score.
The service must also support JSONP through a callback parameter ie &callback=foo.
## Preview API
## Preview API {#preview-api}
The preview service API (complementary to the reconciliation service API) is quite simple. Pass it an identifier and it renders information about the corresponding entity in an HTML page, which will be shown in an iframe inside OpenRefine. The given width and height dimensions tell OpenRefine how to size that iframe.
## Suggest APIs
## Suggest APIs {#suggest-apis}
In the "Start Reconciling" dialog box in OpenRefine, you can specify which type of entities the column in question contains. For instance, the column might contains titles of scientific journals. But you don't know the identifier corresponding to the "scientific journal" type. So we need a suggest API that translates "scientific journal" to something like, say, "[Q5633421](https://www.wikidata.org/wiki/Q5633421)" if we're reconciling against Wikidata.
@ -231,11 +231,11 @@ The `service_url` field is required and it should look like this: `http://foo.co
Refer to [the Suggest API documentation](suggest-api) for further details.
## Data Extension
## Data Extension {#data-extension}
From OpenRefine 2.8 it is possible to fetch values from reconcilied sources natively. This is only possible for the reconciliation endpoints that support this additional feature, described in the [Data Extension API documentation](Data-Extension-API).
## Examples
## Examples {#examples}
We've cloned a number of the Refine reconciliation services as a way of providing them visibility. They can be found at [https://github.com/OpenRefine](https://github.com/OpenRefine)

View File

@ -13,7 +13,7 @@ For the `suggest` entry point, it is important to balance speed versus accuracy.
Similarly, for the `flyout` entry point, it is important to respond quickly while providing enough essential details so that the user can visually check if the highlighted entity is the desired one. You probably would want to embed a thumbnail image, as we have found that images are excellent for visual identification.
## suggest Entry Point
## suggest Entry Point {#suggest-entry-point}
The `suggest` entry point takes the following URL parameters
@ -82,7 +82,7 @@ JSON response:
}
```
## flyout Entry Point
## flyout Entry Point {#flyout-entry-point}
The `flyout` entry point takes a single URL parameter: `id`, which is the identifier of the entity to render, as a string. It also takes a `callback` parameter to support JSONP. It returns a JSON object literal with a single field: `html`, which is the rendered view of the given entity.

View File

@ -12,11 +12,11 @@ You can help translate OpenRefine into your language by visiting [Weblate](https
Click to help translate --> [Weblate](https://hosted.weblate.org/engage/openrefine/?utm_source=widget)
## User entry of language data ##
## User entry of language data ## {#user-entry-of-language-data-}
Localized strings are entered in a .json file, one per language. They are located in the folder `main/webapp/modules/core/langs/` in a file named `translation-xx`.json, where xx is the language code (i.e. fr for French).
### Simple case of localized string ###
### Simple case of localized string ### {#simple-case-of-localized-string-}
This is an example of a simple string, with the start of the JSON file. This example is for French.
```
{
@ -28,22 +28,22 @@ This is an example of a simple string, with the start of the JSON file. This exa
So the key `core-index/help` will render as `"Aide"` in French.
### Localization with a parameterized value ###
### Localization with a parameterized value ### {#localization-with-a-parameterized-value-}
In this example, the name of the column (represented by `$1` in this example), will be substituted with the string of the name of the column.
`"core-facets/edit-facet-title": "Cliquez ici pour éditer le nom de la facette\nColonne : $1",`
### Localization with a singular/plural value ###
### Localization with a singular/plural value ### {#localization-with-a-singularplural-value-}
In this example, one of the parameter will have a different string depending if the value is 1 or another value.
In this example, the string for page, the second parameter, `$2`, will have an « s » or not depending on the value of `$2`.
`"core-views/goto-page": "$1 de $2 {{plural:$2|page|pages}}"`
## Front End Coding
## Front End Coding {#front-end-coding}
The OpenRefine front end has been localized using the [Wikidata jquery.i18n library](https://github.com/OpenRefine/OpenRefine/pull/1285. The localized text is stored in a JSON dictionary on the server and retrieved with a new OpenRefine command.
### Adding a new string
### Adding a new string {#adding-a-new-string}
There should be no hard-coded language strings in the HTML or JSON used for the front end. If you need a new string, first check the existing strings to make sure there isn't an equivalent string, **in an equivalent context**, that you can reuse. Context is important because it can affect how the same literal English text is translated. This cuts down on the amount of text which needs to be translated.
@ -70,7 +70,7 @@ or, if you need to embed HTML tags:
$('#new-html-element-id').html($.i18n('section/newkey']);
```
### Adding a new language
### Adding a new language {#adding-a-new-language}
The language dictionaries are stored in the `langs` subdirectory for the module e.g.
@ -81,14 +81,14 @@ The language dictionaries are stored in the `langs` subdirectory for the module
To add support for a new language, copy `translation-en.json` to `translation-<locale>.json` and have your translator translate all the value strings (ie right hand side).
#### Main interface
#### Main interface {#main-interface}
The translation is best done [with Weblate](https://hosted.weblate.org/engage/openrefine/?utm_source=widget). Files are periodically merged by the developer team.
Run the latest (hopefully cloned from github) version and check whether translated words fit to the layout. Not all items can be translated word by word, especially into non-Ìndo-European languages.
If you see any text which remains in English even when you have checked all items, please create bug report in the issue tracker so that the developers can fix it.
#### Extensions
#### Extensions {#extensions}
Extensions can be translated via Weblate just like the core software.
@ -100,6 +100,6 @@ To support a new language file, the developer should add a corresponding entry t
<option value="<locale>">[Language Label]</option>
```
## Server / Backend Coding
## Server / Backend Coding {#server--backend-coding}
Currently no back end functions are translated, so things like error messages, undo history, etc may appear in English form. Rather than sending raw error text to the front end, it's better to send an error code which is translated into text on the front end. This allows for multiple languages to be supported.

View File

@ -4,7 +4,7 @@ title: Writing Extensions
sidebar_label: Writing Extensions
---
## Introduction
## Introduction {#introduction}
This is a very brief overview of the structure of OpenRefine extensions. For more detailed documentation and step-by-step guides please see the following external documentation/tutorials:
@ -20,7 +20,7 @@ Extensions that come with the code base are located under [the extensions subdir
Please note that you should bundle any dependencies yourself, so you are insulated from OpenRefine packaging changes over time.
### Directory Layout
### Directory Layout {#directory-layout}
A OpenRefine extension sits in a file directory that contains the following files and sub-directories:
@ -62,11 +62,11 @@ The `pom.xml` file is an [Apache Maven](http://maven.apache.org/) build file. Yo
Note that your extension's Java code would need to reference some libraries used in OpenRefine and OpenRefine's Java classes themselves. These dependencies are reflected in the Maven configuration for the extension.
## Sample extension
## Sample extension {#sample-extension}
The sample extension is included in the code base so that you can copy it and get started on writing your own extension. After you copy it, make sure you change its name inside its `module/MOD-INF/controller.js` file.
### Basic Structure
### Basic Structure {#basic-structure}
The sample extension's code is in `refine/extensions/sample/`. In that directory, Java source code is contained under the `src` sub-directory, and webapp code is under the `module` sub-directory. Here is the full directory layout:
@ -99,15 +99,15 @@ Client-side code is in the inner `module` sub-directory. They can be plain old .
The `init()` function in `controller.js` allows the extension to register various client-side handlers for augmenting pages served by Refine's core. These handlers are feature-specific. For example, [this is where the jython extension adds its parser](https://github.com/OpenRefine/OpenRefine/blob/master/extensions/jython/module/MOD-INF/controller.js#L46). As for the sample extension, it adds its script `project-injection.js` and style `project-injection.less` into the `/project` page. If you [view the source of the /project page](http://127.0.0.1:3333/project), you will see references to those two files.
### Wiring Up the Extension
### Wiring Up the Extension {#wiring-up-the-extension}
The Extensions are loaded by the Butterfly framework. Butterfly refers to these as 'modules'. [The location of modules is set in the `main/webapp/butterfly.properties` file](https://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/WEB-INF/butterfly.properties#L27). Butterfly simply descends into each of those paths and looks for any `MOD-INF` directories.
For more information, see [Extension Points](https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points).
## Extension points
## Extension points {#extension-points}
### Client-side: Javascript and CSS
### Client-side: Javascript and CSS {#client-side-javascript-and-css}
The UI in OpenRefine for working with a project is coded in [the /main/webapp/modules/core/project.vt file](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/project.vt). The file is quite small, and that's because almost all of its content is to be expanded dynamically through the Velocity variables $scriptInjection and $styleInjection. So that your own Javascript and CSS files get loaded, you need to register them with the ClientSideResourceManager, which is done in the /module/MOD-INF/controller.js file. See [the controller.js file in this sample extension code](http://github.com/OpenRefine/OpenRefine/blob/master/extensions/sample/module/MOD-INF/controller.js) for an example.
@ -128,7 +128,7 @@ You can specify one or more files for registration, and their paths are relative
Javascript Bundling: Note that `project.vt` belongs to the core module and is thus under the control of the core module's [controller.js file](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/MOD-INF/controller.js). The Javascript files to be included in `project.vt` are by default bundled together for performance. When debugging, you can prevent this bundling behavior by setting `bundle` to `false` near the top of that `controller.js` file. (If you have commit access to this code base, be sure not to check that change in.)
### Client-side: Images
### Client-side: Images {#client-side-images}
We recommend that you always refer to images through your CSS files rather than in your Javascript code. URLs to images will thus be relative to your CSS files, e.g.,
@ -144,7 +144,7 @@ If you really really absolutely need to refer to your images in your Javascript
ModuleWirings["my-extension"] + "images/x.png"
```
### Client-side: HTML Templates
### Client-side: HTML Templates {#client-side-html-templates}
Beside Javascript, CSS, and images, your extension might also include HTML templates that get loaded on the fly by your Javascript code and injected into the page's DOM. For example, here is [the Cluster edit dialog template](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/dialogs/clustering-dialog.html), which gets loaded by code in [the equivalent javascript file 'clustering-dialog.js'](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/dialogs/clustering-dialog.js):
@ -154,11 +154,11 @@ var dialog = $(DOM.loadHTML("core", "scripts/dialogs/clustering-dialog.html"));
`DOM.loadHTML` returns the content of the file as a string, and `$(...)` turns it into a DOM fragment. Where `"core"` is, you would want your extension's name. The path of the HTML file is relative to your extension's `module` subdirectory.
### Client-side: Project UI Extension Points
### Client-side: Project UI Extension Points {#client-side-project-ui-extension-points}
Getting your extension's Javascript code included in `project.vt` doesn't accomplish much by itself unless your code also registers hooks into the UI. For example, you can surely implement an exporter in Javascript, but unless you add a corresponding menu command in the UI, your user can't use your exporter.
#### Main Menu
#### Main Menu {#main-menu}
The main menu can be extended by calling any one of the methods `MenuBar.appendTo`, `MenuBar.insertBefore`, and `MenuBar.insertAfter`. Each method takes 2 arguments: an array of strings that identify a particular existing menu item or submenu, and one new single menu item or submenu or an array of menu items and submenus. For example, to insert 2 menu items and a menu separator before the menu item Project > Export Filtered Rows > Templating..., write this Javascript code wherever that would execute when your Javascript files get loaded:
@ -183,7 +183,7 @@ The array `["core/project", "core/export", "core/export-templating"]` pinpoints
See the beginning of [/main/webapp/modules/core/scripts/project/menu-bar.js](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/project/menu-bar.js) for IDs of menu items and submenus.
#### Column Header Menu
#### Column Header Menu {#column-header-menu}
The drop-down menu of each column can also be extended, but the mechanism is slightly different compared to the main menu. Because the drop-down menu for a particular column is constructed on the fly when the user actually clicks the drop-down menu button, extending the column header menu can't really be done once at start-up time, but must be done every time a column header menu gets created. So, registration in this case involves providing a function that gets called each such time:
@ -207,7 +207,7 @@ MenuSystem.appendTo(menu, ["core/facet"], [
In addition to `MenuSystem.appendTo`, you can also call `MenuSystem.insertBefore` and `MenuSystem.insertAfter` which the same 3 arguments. To see what IDs you can use, see the function `DataTableColumnHeaderUI.prototype._createMenuForColumnHeader` in [/main/webapp/modules/core/scripts/views/data-table/column-header-ui.js](http://github.com/OpenRefine/OpenRefine/blob/master/main/webapp/modules/core/scripts/views/data-table/column-header-ui.js).
### Server-side: Ajax Commands
### Server-side: Ajax Commands {#server-side-ajax-commands}
The client-side of OpenRefine gets things done by calling AJAX commands on the server-side. These commands must be registered with the OpenRefine servlet, so that the servlet knows how to route AJAX calls from the client-side. This can be done inside the `init` function in your extension's `controller.js` file, e.g.,
@ -220,7 +220,7 @@ function init() {
Your command will then be accessible at [http://127.0.0.1:3333/command/my-extension/my-command](http://127.0.0.1:3333/command/my-extension/my-command).
### Server-side: Operations
### Server-side: Operations {#server-side-operations}
Most commands change the project's data. Most of them do so by creating abstract operations. See the Changes, History, Processes, and Operations section of the [Server Side Architecture](https://github.com/OpenRefine/OpenRefine/wiki/Server-Side-Architecture) document.
@ -242,7 +242,7 @@ static public AbstractOperation reconstruct(Project project, JSONObject obj) thr
}
```
### Server-side: GREL
### Server-side: GREL {#server-side-grel}
GREL can be extended with new functions. This is also done in the `init` function in `controller.js`, e.g.,
@ -258,7 +258,7 @@ Packages.com.google.refine.expr.ExpressionUtils.registerBinder(
new Packages.com.foo.bar.MyBinder());
```
### Server-side: Importers
### Server-side: Importers {#server-side-importers}
You can register an importer as follows:
@ -269,7 +269,7 @@ Packages.com.google.refine.importers.ImporterRegistry.registerImporter(
The string `"importer-name"` isn't important at all. It's not really related to file extension or mime-type. Just use something unique. Your importer will be explicitly called to test if it can import something.
### Server-side: Exporters
### Server-side: Exporters {#server-side-exporters}
You can register an exporter as follows:
@ -280,7 +280,7 @@ Packages.com.google.refine.exporters.ExporterRegistry.registerExporter(
The string `"exporter-name"` isn't important at all. It's only used by the client-side to tell the server-side which exporter to use. Just use something unique and, of course, relevant.
### Server-side: Overlay Models
### Server-side: Overlay Models {#server-side-overlay-models}
Overlay models are objects attached onto a core Project object to store and manage additional data for that project. For example, the schema alignment skeleton is managed by the Protograph overlay model. An overlay model implements the interface `com.google.refine.model.OverlayModel` and can be registered like so:
@ -306,7 +306,7 @@ public void write(JSONWriter writer, Properties options) throws JSONException {
}
```
### Server-side: Scripting Languages
### Server-side: Scripting Languages {#server-side-scripting-languages}
A scripting language (such as Jython) can be registered as follows: