Expressions

This commit is contained in:
allanaaa 2021-01-08 15:26:22 -05:00
parent 72af127010
commit 13c4dd8fee
3 changed files with 79 additions and 71 deletions

View File

@ -13,7 +13,7 @@ You can apply a text facet on numbers, boolean values, and dates, but if you edi
## Transform ## Transform
Select <span class="menuItems">Edit cells</span><span class="menuItems">Transforms</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column. Select <span class="menuItems">Edit cells</span><span class="menuItems">Transform...</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column.
Use the preview to ensure your data is being transformed correctly. Use the preview to ensure your data is being transformed correctly.

View File

@ -6,14 +6,12 @@ sidebar_label: Expressions
## Overview ## Overview
You can use expressions in multiple places in OpenRefine to extend data cleanup and manipulation. You can use expressions in multiple places in OpenRefine to extend data cleanup and transformation. Expressions are available with the following functions:
Expressions are available with the following functions:
* <span class="menuItems">Facet</span>: * <span class="menuItems">Facet</span>:
* <span class="menuItems">Custom text facet...</span> * <span class="menuItems">Custom text facet...</span>
* <span class="menuItems">Custom numeric facet…</span> * <span class="menuItems">Custom numeric facet…</span>
* You can also manually “change” most Customized facets after they have been created, which will bring up an expressions window. * <span class="menuItems">Customized facets</span> (click “change” after they have been created to bring up an expressions window)
* <span class="menuItems">Edit cells</span>: * <span class="menuItems">Edit cells</span>:
* <span class="menuItems">Transform…</span> * <span class="menuItems">Transform…</span>
@ -24,11 +22,11 @@ Expressions are available with the following functions:
* <span class="menuItems">Split</span> * <span class="menuItems">Split</span>
* <span class="menuItems">Join</span> * <span class="menuItems">Join</span>
* <span class="menuItems">Add column based on this column</span> * <span class="menuItems">Add column based on this column</span>
* <span class="menuItems">Add column by fetching URLs</span> * <span class="menuItems">Add column by fetching URLs</span>.
In the expressions editor window you will have the opportunity to select one supported language. The default is [GREL (General Refine Expression Language)](#grel-general-refine-expression-language); OpenRefine also comes with support for [Clojure](#clojure) and [Jython](#jython). Extensions may offer support for more expressions languages. In the expressions editor window you have the opportunity to select a supported language. The default is [GREL (General Refine Expression Language)](#grel-general-refine-expression-language); OpenRefine also comes with support for [Clojure](#clojure) and [Jython](#jython). Extensions may offer support for more expressions languages.
These languages have some syntax differences but support most of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`. These languages have some syntax differences but support many of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`.
This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the Wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples). This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the Wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples).
@ -49,34 +47,34 @@ Were you to apply a transformation to the “friend” column with the expressio
value.split(" ")[1] value.split(" ")[1]
``` ```
OpenRefine would work through each row, splitting the “friend” values based on a space character. `value` for row 1 would be “John Smith” so the output would be “Smith” (as "[1]" selects the second part of the created output); `value` for row 2 would be “Jane Doe” so the output would be “Doe.” Using variables, a single expression yields different results for different rows. The old information would be discarded; you couldn't get "John" and "Jane" back unless you undid the operation in the History tab. OpenRefine would work through each row, splitting the “friend” values based on a space character. The `value` for row 1 is “John Smith” so the output would be “Smith” (as "[1]" selects the second part of the created output); the `value` for row 2 is “Jane Doe” so the output would be “Doe”. Using variables, a single expression yields different results for different rows. The old information would be discarded; you couldn't get "John" and "Jane" back unless you undid the operation in the [History](running#history-undoredo) tab.
For another example, if you were to create a new column based on your data using the expression `row.starred`, it would generate a column of true and false values based on whether your rows were starred at that moment. If you were to then star more rows and unstar some rows, that data would not dynamically update - you would need to run the operation again to have current true/false values. For another example, if you were to create a new column based on your data using the expression `row.starred`, it would generate a column of true and false values based on whether your rows were starred at that moment. If you were to then star more rows and unstar some rows, that data would not dynamically update - you would need to run the operation again to have current true/false values.
Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is invoked. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns. Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns.
### The expressions editor ### The expressions editor
When you select a function that offers the ability to supply expressions, you will see a window overlay the screen with what we call the expressions editor. When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor.
![The expressions editor window with a simple expression: value + 10.](/img/expression-editor.png) ![The expressions editor window with a simple expression: value + 10.](/img/expression-editor.png)
The expressions editor offers you a field for entering your formula and shows you a preview of its transformation on your first few rows of cells. The expressions editor offers you a field for entering your formula and shows you a preview of its transformation on your first few rows of cells.
There is a dropdown menu from which you can choose an expression language. The default is GREL. Jython and Clojure are also offered with the installation package, and you may be able to add more language support with third-party extensions and customizations. There is a dropdown menu from which you can choose an expression language. The default at first is GREL; if you begin working with another language, that selection will persist across OpenRefine. Jython and Clojure are also offered with the installation package, and you may be able to add more language support with third-party extensions and customizations.
There are also tabs for: There are also tabs for:
* History, which shows you formulas youve recently used from across all your projects * <span class="tabLabels">History</span>, which shows you formulas youve recently used from across all your projects
* Starred, which shows you formulas from your History that youve starred for reuse * <span class="tabLabels">Starred</span>, which shows you formulas from your History that youve starred for reuse
* Help, a quick reference to GREL functions. * <span class="tabLabels">Help</span>, a quick reference to GREL functions.
Starring formulas youve used in the past can be very helpful for repetitive tasks youre performing in batches. Starring formulas youve used in the past can be helpful for repetitive tasks youre performing in batches.
You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell. You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell.
### Regular expressions ### Regular expressions
OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span>Text filter</span> or a <span>Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality. OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span class="menuItems">Text filter</span> or a <span class="menuItems">Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality.
If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/). If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/).
@ -92,19 +90,19 @@ the regular expression is `\s+`, and the syntax used in the expression wraps it
Do not use slashes to wrap regular expressions outside of a GREL expression. Do not use slashes to wrap regular expressions outside of a GREL expression.
The [GREL functions](#grel-general-refine-expression-language) that support regex are: On the [GREL functions](#grel-general-refine-expression-language) page, functions that support regex will indicate that with a “p” for “pattern.” The GREL functions that support regex are:
* contains * [contains](grelfunctions#containss-sub-or-p)
* replace * [replace](grelfunctions#replaces-s-or-p-find-s-replace)
* find * [find](grelfunctions#finds-sub-or-p)
* match * [match](grelfunctions#matchs-p)
* partition * [partition](grelfunctions#partitions-s-or-p-fragment-b-omitfragment-optional)
* rpartition * [rpartition](grelfunctions#rpartitions-s-or-p-fragment-b-omitfragment-optional)
* split * [split](grelfunctions#splits-s-or-p-sep)
* smartSplit * [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional)
#### Jython-supported regex #### Jython-supported regex
You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a Custom Text Facet: You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a <span class="menuItems">Custom Text Facet</span>:
``` ```
python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1) python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
@ -120,7 +118,7 @@ clojure (nth (re-find #"\u2014 (.*),\s*BWV" value) 1)
## Variables ## Variables
Most of the OpenRefine-specific variables have attributes: aspects of the variables that can be called separately. We call these attributes "member fields" because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to `record`, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). Most OpenRefine variables have attributes: aspects of the variables that can be called separately. We call these attributes “member fields” because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to the `record` variable, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). The square bracket syntax is also used for variables that can call columns by name, for example, `cells["Postal Code"]`.
|Variable |Meaning | |Variable |Meaning |
|-|-| |-|-|
@ -141,49 +139,51 @@ The `row` variable itself is best used to access its member fields, which you ca
|-|-| |-|-|
| `row.index` | The index value of the current row (the first row is 0) | | `row.index` | The index value of the current row (the first row is 0) |
| `row.cells` | The cells of the row, returned as an array | | `row.cells` | The cells of the row, returned as an array |
| `row.columnNames` | An array of the column names of the row, i.e. the column names in the project. This will report all columns, even those with null cell values in the particular row. Call a column by number with row.columnNames[3] | | `row.columnNames` | An array of the column names of the project. This will report all columns, even those with null cell values in that particular row. Call a column by number with `row.columnNames[3]` |
| `row.starred` | A boolean indicating if the row is starred | | `row.starred` | A boolean indicating if the row is starred |
| `row.flagged` | A boolean indicating if the row is flagged | | `row.flagged` | A boolean indicating if the row is flagged |
| `row.record` | The [record](#record) object containing the current row | | `row.record` | The [record](#record) object containing the current row |
For array objects such as `row.columnNames` you can preview the array using the expressions window, and output it as a string using `toString(row.columnNames)` or with something like: For array objects such as `row.columnNames` you can preview the array using the expressions window, and output it as a string using `toString(row.columnNames)` or with something like:
```forEach(row.columnNames,v,v).join("; ")``` ```
forEach(row.columnNames,v,v).join("; ")
```
### Cells ### Cells
The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. There is no `cells.value` - it can only be used with member fields. To get the corresponding column value inside the `cells` variable, use `.value` at the end, for example `cells["Postal Code"].value`. The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. To get the corresponding column's value inside the `cells` variable, use `.value` at the end, for example, `cells["Postal Code"].value`. There is no `cells.value` - it can only be used with member fields.
### Cell ### Cell
A `cell` object contains all the data of a cell and is stored as a single object that has two fields. A `cell` object contains all the data of a cell and is stored as a single object.
You can use `cell` on its own in the expressions editor to copy all the contents of a column to another column, including reconciliation information. Although the preview in the expressions editor will only show a small representation [object Cell], it will actually copy all the cell's data. Try this with <span class="menuItems">Edit Column</span><span class="menuItems">Add Column based on this column ...</span>. You can use `cell` on its own in the expressions editor to copy all the contents of a column to another column, including reconciliation information. Although the preview in the expressions editor will only show a small representation (“[object Cell]”), it will actually copy all the cell's data. Try this with <span class="menuItems">Edit Column</span><span class="menuItems">Add Column based on this column ...</span>.
|Field |Meaning |Member fields | |Field |Meaning |Member fields |
|-|-|-| |-|-|-|
| `cell` | An object containing the entire contents of the cell | .value, .recon, .errorMessage | | `cell` | An object containing the entire contents of the cell | .value, .recon, .errorMessage |
| `cell.value` | The value in the cell, which can be a string, a number, a boolean, null, or an error | | | `cell.value` | The value in the cell, which can be a string, a number, a boolean, null, or an error | |
| `cell.recon` | An object encapsulating reconciliation results for that cell | See the reconciliation section below | | `cell.recon` | An object encapsulating reconciliation results for that cell | See the [reconciliation](expressions#reconciliation) section |
| `cell.errorMessage` | Returns the message of an *EvalError* instead of the error object itself (use value to return the error object) | .value | | `cell.errorMessage` | Returns the message of an *EvalError* instead of the error object itself (use value to return the error object) | .value |
### Reconciliation ### Reconciliation
Several of the fields here are equivalent to what can be used through [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work. Several of the fields here provide the data used in [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work.
|Field|Meaning |Member fields | |Field|Meaning |Member fields |
|-|-|-| |-|-|-|
| `cell.recon.judgment` | A string, either "matched", "new", "none" | | | `cell.recon.judgment` | A string: either “matched”, "new”, "none” | |
| `cell.recon.judgmentAction` | A string, either "single" or "similar" (or "unknown") | | | `cell.recon.judgmentAction` | A string: either "single” or “similar” (or “unknown”) | |
| `cell.recon.judgmentHistory` | A number, the epoch timestamp (in milliseconds) of your judgment | | | `cell.recon.judgmentHistory` | A number, the epoch timestamp (in milliseconds) of your judgment | |
| `cell.recon.matched` | A boolean, true if judgment is "matched" | | | `cell.recon.matched` | A boolean, true if judgment is “matched” | |
| `cell.recon.match` | The recon candidate that has been matched against this cell (or null) | .id, .name, .type | | `cell.recon.match` | The recon candidate that has been matched against this cell (or null) | .id, .name, .type |
| `cell.recon.best` | The highest scoring recon candidate from the reconciliation service (or null) | .id, .name, .type, .score | | `cell.recon.best` | The highest scoring recon candidate from the reconciliation service (or null) | .id, .name, .type, .score |
| `cell.recon.features` | An array of reconciliation features to help you assess the accuracy of your matches | .typeMatch, .nameMatch, .nameLevenshtein, .nameWordDistance | | `cell.recon.features` | An array of reconciliation features to help you assess the accuracy of your matches | .typeMatch, .nameMatch, .nameLevenshtein, .nameWordDistance |
| `cell.recon.features.typeMatch` | A boolean, true if your chosen type is "matched" and false if not (or "(no type)" if unreconciled) | | | `cell.recon.features.typeMatch` | A boolean, true if your chosen type is “matched” and false if not (or “(no type)” if unreconciled) | |
| `cell.recon.features.nameMatch` | A boolean, true if the cell and candidate strings are identical and false if not (or "(unreconciled)") | | | `cell.recon.features.nameMatch` | A boolean, true if the cell and candidate strings are identical and false if not (or “(unreconciled)”) | |
| `cell.recon.features.nameLevenshtein` | A number, representing the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): larger if the difference is greater between value and candidate | | | `cell.recon.features.nameLevenshtein` | A number representing the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): larger if the difference is greater between value and candidate | |
| `cell.recon.features.nameWordDistance` | A number, based on the [word similarity](reconciling#reconciliation-facets) | | | `cell.recon.features.nameWordDistance` | A number based on the [word similarity](reconciling#reconciliation-facets) | |
| `cell.recon.candidates` | An array of the top 3 candidates (default) | .id, .name, .type, .score | | `cell.recon.candidates` | An array of the top 3 candidates (default) | .id, .name, .type, .score |
The `cell.recon.candidates` and `cell.recon.best` objects have a few deeper fields: `id`, `name`, `type`, and `score`. `type` is an array of type identifiers for a list of candidates, or a single string for the best candidate. The `cell.recon.candidates` and `cell.recon.best` objects have a few deeper fields: `id`, `name`, `type`, and `score`. `type` is an array of type identifiers for a list of candidates, or a single string for the best candidate.
@ -203,7 +203,7 @@ A `row.record` object encapsulates one or more rows that are grouped together, w
| `row.record.cells` | The cells of the row | | `row.record.cells` | The cells of the row |
| `row.record.fromRowIndex` | The row index of the first row in the record | | `row.record.fromRowIndex` | The row index of the first row in the record |
| `row.record.toRowIndex` | The row index of the last row in the record + 1 (i.e. the next row) | | `row.record.toRowIndex` | The row index of the last row in the record + 1 (i.e. the next row) |
| `row.record.rowCount` | count of the number of rows in the record | | `row.record.rowCount` | A count of the number of rows in the record |
## GREL (General Refine Expression Language) ## GREL (General Refine Expression Language)
@ -219,7 +219,9 @@ GREL is designed to resemble Javascript. Formulas use variables and depend on da
| `value.substring(7, 10)` | Output the substring of the value from character index 7, 8, and 9 (excluding character index 10) | | `value.substring(7, 10)` | Output the substring of the value from character index 7, 8, and 9 (excluding character index 10) |
| `value.substring(13)` | Output the substring from index 13 to the end of the string | | `value.substring(13)` | Output the substring from index 13 to the end of the string |
If you're used to Excel, note that the operator for string concatenation is + (not &). Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`). Note that the operator for string concatenation is `+` (not “&” as is used in Excel).
Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`).
### Syntax ### Syntax
@ -234,7 +236,7 @@ The second form is a shorthand to make expressions easier to read. It simply pul
| `value.trim().length()` | `length(trim(value))` | | `value.trim().length()` | `length(trim(value))` |
| `value.substring(7, 10)` | `substring(value, 7, 10)` | | `value.substring(7, 10)` | `substring(value, 7, 10)` |
So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. This allows you to string together multiple functions in a readable order.
The dot notation can also be used to access the member fields of [variables](#variables). For referring to column names that contain spaces (anything not a continuous string), use square brackets instead of dot notation: The dot notation can also be used to access the member fields of [variables](#variables). For referring to column names that contain spaces (anything not a continuous string), use square brackets instead of dot notation:
@ -243,7 +245,7 @@ The dot notation can also be used to access the member fields of [variables](#va
| `FirstName.cells` | Access the cell in the column named “FirstName” of the current row | | `FirstName.cells` | Access the cell in the column named “FirstName” of the current row |
| `cells["First Name"]` | Access the cell in the column called “First Name” of the current row | | `cells["First Name"]` | Access the cell in the column called “First Name” of the current row |
Brackets can also be used to get substrings and sub-arrays, and single items from arrays: Square brackets can also be used to get substrings and sub-arrays, and single items from arrays:
|Example |Description | |Example |Description |
|-|-| |-|-|
@ -251,24 +253,26 @@ Brackets can also be used to get substrings and sub-arrays, and single items fro
| `"internationalization"[1,-2]` | Will return “nternationalizati” (negative indexes are counted from the end) | | `"internationalization"[1,-2]` | Will return “nternationalizati” (negative indexes are counted from the end) |
| `row.columnNames[5]` | Will return the name of the fifth column | | `row.columnNames[5]` | Will return the name of the fifth column |
Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0). For example, partition() would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting the third part with "internationalization".partition("nation")[2] will output “alization” (and so will [-1], indicating the final item in the array). Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0).
### Controls For example, partition() would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array).
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to value, and how. ### GREL controls
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how.
Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror(). Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror().
#### if(e, expression eTrue, expression eFalse) #### if(e, eTrue, eFalse)
Expression o is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole `if` expression. Otherwise, expression eFalse is evaluated and that result is the value. Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value.
Examples: Examples:
| Example expression | Result | | Example expression | Result |
| ------------------------------------------------------------------------ | ------------ | | ------------------------------------------------------------------------ | ------------ |
| `if("internationalization".length() > 10, "big string", "small string")` | big string | | `if("internationalization".length() > 10, "big string", "small string")` | big string |
| `if(mod(37, 2) == 0, "even", "odd")` | odd | | `if(mod(37, 2) == 0, "even", "odd")` | odd |
Nested if (switch case) example: Nested if (switch case) example:
@ -290,7 +294,7 @@ Evaluates expression e1 and binds its value to variable v. Then evaluates expres
| `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] | | `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] |
| `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 | | `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 |
#### filter(e1, variable v, e test) #### filter(e1, v, e test)
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array. Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array.
@ -298,7 +302,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ---------------------------------------------- | ------------- | | ---------------------------------------------- | ------------- |
| `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] | | `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] |
#### forEach(e1, variable v, e2) #### forEach(e1, v, e2)
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array. Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array.
@ -306,7 +310,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu
| ------------------------------------------ | ------------------- | | ------------------------------------------ | ------------------- |
| `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] | | `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] |
#### forEachIndex(e1, variable i, variable v, e2) #### forEachIndex(e1, i, v, e2)
Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array. Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array.
@ -314,15 +318,15 @@ Evaluates expression e1 to an array. Then for each array element, binds its inde
| ------------------------------------------------------------------------------- | --------------------------- | | ------------------------------------------------------------------------------- | --------------------------- |
| `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy | | `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy |
#### forRange(n from, n to, n step, variable v, e) #### forRange(n from, n to, n step, v, e)
Iterates over the variable v starting at from, incrementing by step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array. Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array.
#### forNonBlank(e, variable v, expression eNonBlank, expression eBlank) #### forNonBlank(e, v, eNonBlank, eBlank)
Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if o evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead. Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead.
Unlike other GREL functions beginning with "for", forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an "if" statement. Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement.
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e) #### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e)
@ -333,7 +337,7 @@ Examples:
| Expression | Result | | Expression | Result |
| ------------------- | ------- | | ------------------- | ------- |
| `isBlank("abc")` | false | | `isBlank("abc")` | false |
| `isNonBlank("abc")` | true | | `isNonBlank("abc")` | true |
| `isNull("abc")` | false | | `isNull("abc")` | false |
| `isNotNull("abc")` | true | | `isNotNull("abc")` | true |
| `isNumeric(2)` | true | | `isNumeric(2)` | true |
@ -341,7 +345,7 @@ Examples:
| `isError("abc")` | false | | `isError("abc")` | false |
| `isError(1 / 0)` | true | | `isError(1 / 0)` | true |
Remember that these are controls and not functions: you cant use dot notation (the `e.isX()` syntax). Remember that these are controls and not functions: you cant use dot notation (for example, the format `e.isX()` will not work).
### Constants ### Constants
|Name |Meaning | |Name |Meaning |
@ -352,9 +356,13 @@ Remember that these are controls and not functions: you cant use dot notation
## Jython ## Jython
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup. Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible.
OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions), for parsing and working with HTML elements and extraction. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those.
You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup.
OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements.
### Syntax ### Syntax
@ -368,19 +376,19 @@ Expressions in Jython must have a `return` statement:
return rowIndex%2 return rowIndex%2
``` ```
Fields have to be accessed using the bracket operator rather than the dot operator: Fields have to be accessed using the bracket operator rather than dot notation:
``` ```
return cells["col1"]["value"] return cells["col1"]["value"]
``` ```
To access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value, use [recon variables](#reconciliation): For example, to access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value using [recon variables](#reconciliation):
``` ```
return cell["recon"]["features"]["nameLevenshtein"] return cell["recon"]["features"]["nameLevenshtein"]
``` ```
To return the lower case of value (if the value is not null): To return the lower case of `value` (if the value is not null):
``` ```
if value is not None: if value is not None:
@ -391,7 +399,7 @@ To return the lower case of value (if the value is not null):
### Tutorials ### Tutorials
- [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules) - [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules)
- [Working with Phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python) - [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python)
Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org). Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org).

View File

@ -59,7 +59,7 @@ Click on <span class="menuItems">Browse…</span> and select a file (or several)
If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files. If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files.
### Web Addresses (URLs) ### Web addresses (URLs)
Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you. Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you.
@ -91,7 +91,7 @@ You can either connect just once to gather data, or save the connection to use i
If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way. If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way.
### Google Data ### Google data
You have two ways to load in data from Google Sheets: You have two ways to load in data from Google Sheets:
* providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and * providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and