Expressions reorganizing (#3610)
* Expressions reorganizing Moved GREL basics, Jython/Clojure to separate pages Fix for https://groups.google.com/g/openrefine/c/Cvsg5IB4KZk * Promoting some headers * Fixed link * Partial fix for #3578 * Add top-level item, fix links Co-authored-by: Antonin Delpeuch <antonin@delpeuch.eu>
This commit is contained in:
parent
2f9e49534f
commit
5c95539de0
@ -1,7 +1,7 @@
|
|||||||
---
|
---
|
||||||
id: expressions
|
id: expressions
|
||||||
title: Expressions
|
title: Expressions
|
||||||
sidebar_label: Expressions
|
sidebar_label: Overview
|
||||||
---
|
---
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
@ -24,13 +24,13 @@ You can use expressions in multiple places in OpenRefine to extend data cleanup
|
|||||||
* <span class="menuItems">Add column based on this column</span>
|
* <span class="menuItems">Add column based on this column</span>
|
||||||
* <span class="menuItems">Add column by fetching URLs</span>.
|
* <span class="menuItems">Add column by fetching URLs</span>.
|
||||||
|
|
||||||
In the expressions editor window you have the opportunity to select a supported language. The default is [GREL (General Refine Expression Language)](#grel-general-refine-expression-language); OpenRefine also comes with support for [Clojure](#clojure) and [Jython](#jython). Extensions may offer support for more expressions languages.
|
In the expressions editor window you have the opportunity to select a supported language. The default is [GREL (General Refine Expression Language)](grel); OpenRefine also comes with support for [Clojure](jythonclojure#clojure) and [Jython](jythonclojure#jython). Extensions may offer support for more expressions languages.
|
||||||
|
|
||||||
These languages have some syntax differences but support many of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`.
|
These languages have some syntax differences but support many of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`.
|
||||||
|
|
||||||
This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the Wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples).
|
This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples).
|
||||||
|
|
||||||
### Expressions
|
## Expressions
|
||||||
|
|
||||||
There are significant differences between OpenRefine's expressions and the spreadsheet formulas you may be used to using for data manipulation. OpenRefine does not store formulas in cells and display output dynamically: OpenRefine’s transformations are one-time operations that can change column contents or generate new columns. These are applied using variables such as `value` or `cell` to perform the same modification to each cell in a column.
|
There are significant differences between OpenRefine's expressions and the spreadsheet formulas you may be used to using for data manipulation. OpenRefine does not store formulas in cells and display output dynamically: OpenRefine’s transformations are one-time operations that can change column contents or generate new columns. These are applied using variables such as `value` or `cell` to perform the same modification to each cell in a column.
|
||||||
|
|
||||||
@ -53,7 +53,7 @@ For another example, if you were to create a new column based on your data using
|
|||||||
|
|
||||||
Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns.
|
Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns.
|
||||||
|
|
||||||
### The expressions editor
|
## The expressions editor
|
||||||
|
|
||||||
When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor.
|
When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor.
|
||||||
|
|
||||||
@ -72,13 +72,13 @@ Starring formulas you’ve used in the past can be helpful for repetitive tasks
|
|||||||
|
|
||||||
You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell.
|
You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell.
|
||||||
|
|
||||||
### Regular expressions
|
## Regular expressions
|
||||||
|
|
||||||
OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span class="menuItems">Text filter</span> or a <span class="menuItems">Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality.
|
OpenRefine offers several fields that support the use of regular expressions (regex), such as in a <span class="menuItems">Text filter</span> or a <span class="menuItems">Replace…</span> operation. GREL and other expressions can also use regular expression markup to extend their functionality.
|
||||||
|
|
||||||
If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/).
|
If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/).
|
||||||
|
|
||||||
#### GREL-supported regex
|
### GREL-supported regex
|
||||||
|
|
||||||
To write a regular expression inside a GREL expression, wrap it between a pair of forward slashes (/) much like the way you would in Javascript. For example, in
|
To write a regular expression inside a GREL expression, wrap it between a pair of forward slashes (/) much like the way you would in Javascript. For example, in
|
||||||
|
|
||||||
@ -90,7 +90,7 @@ the regular expression is `\s+`, and the syntax used in the expression wraps it
|
|||||||
|
|
||||||
Do not use slashes to wrap regular expressions outside of a GREL expression.
|
Do not use slashes to wrap regular expressions outside of a GREL expression.
|
||||||
|
|
||||||
On the [GREL functions](#grel-general-refine-expression-language) page, functions that support regex will indicate that with a “p” for “pattern.” The GREL functions that support regex are:
|
On the [GREL functions](grelfunctions) page, functions that support regex will indicate that with a “p” for “pattern.” The GREL functions that support regex are:
|
||||||
* [contains](grelfunctions#containss-sub-or-p)
|
* [contains](grelfunctions#containss-sub-or-p)
|
||||||
* [replace](grelfunctions#replaces-s-or-p-find-s-replace)
|
* [replace](grelfunctions#replaces-s-or-p-find-s-replace)
|
||||||
* [find](grelfunctions#finds-sub-or-p)
|
* [find](grelfunctions#finds-sub-or-p)
|
||||||
@ -100,7 +100,7 @@ On the [GREL functions](#grel-general-refine-expression-language) page, function
|
|||||||
* [split](grelfunctions#splits-s-or-p-sep)
|
* [split](grelfunctions#splits-s-or-p-sep)
|
||||||
* [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional)
|
* [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional)
|
||||||
|
|
||||||
#### Jython-supported regex
|
### Jython-supported regex
|
||||||
|
|
||||||
You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a <span class="menuItems">Custom Text Facet</span>:
|
You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a <span class="menuItems">Custom Text Facet</span>:
|
||||||
|
|
||||||
@ -108,7 +108,7 @@ You can also use [regex with Jython expressions](http://www.jython.org/docs/libr
|
|||||||
python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
|
python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1)
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Clojure-supported regex
|
### Clojure-supported regex
|
||||||
|
|
||||||
[Clojure](https://clojure.org/reference/reader) uses the same regex engine as Java, and can be invoked with [re-find](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-find), [re-matches](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-matches), etc. You can use the #"pattern" reader macro as described [in the Clojure documentation](https://clojure.org/reference/other_functions#regex). For example, to get the nth element of a returned sequence, you can use the nth function:
|
[Clojure](https://clojure.org/reference/reader) uses the same regex engine as Java, and can be invoked with [re-find](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-find), [re-matches](http://clojure.github.io/clojure/clojure.core-api.html#clojure.core/re-matches), etc. You can use the #"pattern" reader macro as described [in the Clojure documentation](https://clojure.org/reference/other_functions#regex). For example, to get the nth element of a returned sequence, you can use the nth function:
|
||||||
|
|
||||||
@ -200,227 +200,9 @@ A `row.record` object encapsulates one or more rows that are grouped together, w
|
|||||||
|Field|Meaning |
|
|Field|Meaning |
|
||||||
|-|-|
|
|-|-|
|
||||||
| `row.record.index` | The index of the current record (starting at 0) |
|
| `row.record.index` | The index of the current record (starting at 0) |
|
||||||
| `row.record.cells` | The cells of the row |
|
| `row.record.cells` | An array of the [cells](#cells) in the given column of the record |
|
||||||
| `row.record.fromRowIndex` | The row index of the first row in the record |
|
| `row.record.fromRowIndex` | The row index of the first row in the record |
|
||||||
| `row.record.toRowIndex` | The row index of the last row in the record + 1 (i.e. the next row) |
|
| `row.record.toRowIndex` | The row index of the last row in the record + 1 (i.e. the next record) |
|
||||||
| `row.record.rowCount` | A count of the number of rows in the record |
|
| `row.record.rowCount` | A count of the number of rows in the record |
|
||||||
|
|
||||||
## GREL (General Refine Expression Language)
|
For example, you can facet by number of rows in each record by creating a <span class="menuItems">Custom Numeric Facet</span> (or a <span class="menuItems">Custom Text Facet</span>) and entering `row.record.rowCount`.
|
||||||
|
|
||||||
### Basics
|
|
||||||
|
|
||||||
GREL is designed to resemble Javascript. Formulas use variables and depend on data types to do things like string manipulation or mathematical calculations:
|
|
||||||
|
|
||||||
|Example|Output|
|
|
||||||
|---|---|
|
|
||||||
| `value + " (approved)"` | Concatenate two strings; whatever is in the cell gets converted to a string first |
|
|
||||||
| `value + 2.239` | Add 2.239 to the existing value (if a number); append text "2.239" to the end of the string otherwise |
|
|
||||||
| `value.trim().length()` | Trim leading and trailing whitespace of the cell value and then output the length of the result |
|
|
||||||
| `value.substring(7, 10)` | Output the substring of the value from character index 7, 8, and 9 (excluding character index 10) |
|
|
||||||
| `value.substring(13)` | Output the substring from index 13 to the end of the string |
|
|
||||||
|
|
||||||
Note that the operator for string concatenation is `+` (not “&” as is used in Excel).
|
|
||||||
|
|
||||||
Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`).
|
|
||||||
|
|
||||||
### Syntax
|
|
||||||
|
|
||||||
In OpenRefine expression language function can use either of these two forms:
|
|
||||||
* functionName(arg0, arg1, ...)
|
|
||||||
* arg0.functionName(arg1, ...)
|
|
||||||
|
|
||||||
The second form is a shorthand to make expressions easier to read. It simply pulls the first argument out and appends it to the front of the function, with a dot:
|
|
||||||
|
|
||||||
|Dot notation |Full notation |
|
|
||||||
|-|-|
|
|
||||||
| `value.trim().length()` | `length(trim(value))` |
|
|
||||||
| `value.substring(7, 10)` | `substring(value, 7, 10)` |
|
|
||||||
|
|
||||||
So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. This allows you to string together multiple functions in a readable order.
|
|
||||||
|
|
||||||
The dot notation can also be used to access the member fields of [variables](#variables). For referring to column names that contain spaces (anything not a continuous string), use square brackets instead of dot notation:
|
|
||||||
|
|
||||||
|Example |Description |
|
|
||||||
|-|-|
|
|
||||||
| `FirstName.cells` | Access the cell in the column named “FirstName” of the current row |
|
|
||||||
| `cells["First Name"]` | Access the cell in the column called “First Name” of the current row |
|
|
||||||
|
|
||||||
Square brackets can also be used to get substrings and sub-arrays, and single items from arrays:
|
|
||||||
|
|
||||||
|Example |Description |
|
|
||||||
|-|-|
|
|
||||||
| `value[1,3]` | A substring of value, starting from character 1 up to but excluding character 3 |
|
|
||||||
| `"internationalization"[1,-2]` | Will return “nternationalizati” (negative indexes are counted from the end) |
|
|
||||||
| `row.columnNames[5]` | Will return the name of the fifth column |
|
|
||||||
|
|
||||||
Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0).
|
|
||||||
|
|
||||||
For example, partition() would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array).
|
|
||||||
|
|
||||||
### GREL controls
|
|
||||||
|
|
||||||
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how.
|
|
||||||
|
|
||||||
Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror().
|
|
||||||
|
|
||||||
#### if(e, eTrue, eFalse)
|
|
||||||
|
|
||||||
Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value.
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
|
|
||||||
| Example expression | Result |
|
|
||||||
| ------------------------------------------------------------------------ | ------------ |
|
|
||||||
| `if("internationalization".length() > 10, "big string", "small string")` | “big string” |
|
|
||||||
| `if(mod(37, 2) == 0, "even", "odd")` | “odd” |
|
|
||||||
|
|
||||||
Nested if (switch case) example:
|
|
||||||
|
|
||||||
if(value == 'Place', 'http://www.example.com/Location',
|
|
||||||
|
|
||||||
if(value == 'Person', 'http://www.example.com/Agent',
|
|
||||||
|
|
||||||
if(value == 'Book', 'http://www.example.com/Publication',
|
|
||||||
|
|
||||||
null)))
|
|
||||||
|
|
||||||
#### with(e1, variable v, e2)
|
|
||||||
|
|
||||||
Evaluates expression e1 and binds its value to variable v. Then evaluates expression e2 and returns that result.
|
|
||||||
|
|
||||||
| Example expression | Result |
|
|
||||||
| ------------------------------------------------------------------------------------ | ---------- |
|
|
||||||
| `with("european union".split(" "), a, a.length())` | 2 |
|
|
||||||
| `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] |
|
|
||||||
| `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 |
|
|
||||||
|
|
||||||
#### filter(e1, v, e test)
|
|
||||||
|
|
||||||
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array.
|
|
||||||
|
|
||||||
| Expression | Result |
|
|
||||||
| ---------------------------------------------- | ------------- |
|
|
||||||
| `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] |
|
|
||||||
|
|
||||||
#### forEach(e1, v, e2)
|
|
||||||
|
|
||||||
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array.
|
|
||||||
|
|
||||||
| Expression | Result |
|
|
||||||
| ------------------------------------------ | ------------------- |
|
|
||||||
| `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] |
|
|
||||||
|
|
||||||
#### forEachIndex(e1, i, v, e2)
|
|
||||||
|
|
||||||
Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array.
|
|
||||||
|
|
||||||
| Expression | Result |
|
|
||||||
| ------------------------------------------------------------------------------- | --------------------------- |
|
|
||||||
| `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy |
|
|
||||||
|
|
||||||
#### forRange(n from, n to, n step, v, e)
|
|
||||||
|
|
||||||
Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array.
|
|
||||||
|
|
||||||
#### forNonBlank(e, v, eNonBlank, eBlank)
|
|
||||||
|
|
||||||
Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead.
|
|
||||||
|
|
||||||
Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement.
|
|
||||||
|
|
||||||
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e)
|
|
||||||
|
|
||||||
Evaluates the expression e, and returns a boolean based on the named evaluation.
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
|
|
||||||
| Expression | Result |
|
|
||||||
| ------------------- | ------- |
|
|
||||||
| `isBlank("abc")` | false |
|
|
||||||
| `isNonBlank("abc")` | true |
|
|
||||||
| `isNull("abc")` | false |
|
|
||||||
| `isNotNull("abc")` | true |
|
|
||||||
| `isNumeric(2)` | true |
|
|
||||||
| `isError(1)` | false |
|
|
||||||
| `isError("abc")` | false |
|
|
||||||
| `isError(1 / 0)` | true |
|
|
||||||
|
|
||||||
Remember that these are controls and not functions: you can’t use dot notation (for example, the format `e.isX()` will not work).
|
|
||||||
|
|
||||||
### Constants
|
|
||||||
|Name |Meaning |
|
|
||||||
|-|-|
|
|
||||||
| true | The boolean constant true |
|
|
||||||
| false | The boolean constant false |
|
|
||||||
| PI | From [Java's Math.PI](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#PI), the value of pi (that is, 3.1415...) |
|
|
||||||
|
|
||||||
## Jython
|
|
||||||
|
|
||||||
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible.
|
|
||||||
|
|
||||||
Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those.
|
|
||||||
|
|
||||||
You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup.
|
|
||||||
|
|
||||||
OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements.
|
|
||||||
|
|
||||||
### Syntax
|
|
||||||
|
|
||||||
Expressions in Jython must have a `return` statement:
|
|
||||||
|
|
||||||
```
|
|
||||||
return value[1:-1]
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
return rowIndex%2
|
|
||||||
```
|
|
||||||
|
|
||||||
Fields have to be accessed using the bracket operator rather than dot notation:
|
|
||||||
|
|
||||||
```
|
|
||||||
return cells["col1"]["value"]
|
|
||||||
```
|
|
||||||
|
|
||||||
For example, to access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value using [recon variables](#reconciliation):
|
|
||||||
|
|
||||||
```
|
|
||||||
return cell["recon"]["features"]["nameLevenshtein"]
|
|
||||||
```
|
|
||||||
|
|
||||||
To return the lower case of `value` (if the value is not null):
|
|
||||||
|
|
||||||
```
|
|
||||||
if value is not None:
|
|
||||||
return value.lower()
|
|
||||||
else:
|
|
||||||
return None
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tutorials
|
|
||||||
- [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules)
|
|
||||||
- [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python)
|
|
||||||
|
|
||||||
Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org).
|
|
||||||
|
|
||||||
## Clojure
|
|
||||||
|
|
||||||
Clojure 1.10.1 comes bundled with the default installation of OpenRefine 3.4.1. At this time, not all [variables](#variables) can be used with Clojure expressions: only `value`, `row`, `rowIndex`, `cell`, and `cells` are available.
|
|
||||||
|
|
||||||
For example, functions can take the form
|
|
||||||
```
|
|
||||||
(.. value (toUpperCase) )
|
|
||||||
```
|
|
||||||
|
|
||||||
Or can look like
|
|
||||||
```
|
|
||||||
(-> value (str/split #" ") last )
|
|
||||||
```
|
|
||||||
|
|
||||||
which functions like `value.split(" ")` in GREL.
|
|
||||||
|
|
||||||
For help with syntax, see the [Clojure website's guide to syntax](https://clojure.org/guides/learn/syntax).
|
|
||||||
|
|
||||||
User-contributed Clojure recipes can be found on our wiki at [https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure).
|
|
||||||
|
|
||||||
Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/).
|
|
154
docs/docs/manual/grel.md
Normal file
154
docs/docs/manual/grel.md
Normal file
@ -0,0 +1,154 @@
|
|||||||
|
---
|
||||||
|
id: grel
|
||||||
|
title: General Refine Expression Language
|
||||||
|
sidebar_label: General Refine Expression Language
|
||||||
|
---
|
||||||
|
|
||||||
|
## Basics
|
||||||
|
|
||||||
|
GREL (General Refine Expression Language) is designed to resemble Javascript. Formulas use variables and depend on data types to do things like string manipulation or mathematical calculations:
|
||||||
|
|
||||||
|
|Example|Output|
|
||||||
|
|---|---|
|
||||||
|
| `value + " (approved)"` | Concatenate two strings; whatever is in the cell gets converted to a string first |
|
||||||
|
| `value + 2.239` | Add 2.239 to the existing value (if a number); append text "2.239" to the end of the string otherwise |
|
||||||
|
| `value.trim().length()` | Trim leading and trailing whitespace of the cell value and then output the length of the result |
|
||||||
|
| `value.substring(7, 10)` | Output the substring of the value from character index 7, 8, and 9 (excluding character index 10) |
|
||||||
|
| `value.substring(13)` | Output the substring from index 13 to the end of the string |
|
||||||
|
|
||||||
|
Note that the operator for string concatenation is `+` (not “&” as is used in Excel).
|
||||||
|
|
||||||
|
Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`).
|
||||||
|
|
||||||
|
See the [GREL functions page for a thorough reference](grelfunctions) on each function and its inputs and outputs. Read on below for more about the general nature of GREL expressions.
|
||||||
|
|
||||||
|
## Syntax
|
||||||
|
|
||||||
|
In GREL, functions can use either of these two forms:
|
||||||
|
* functionName(arg0, arg1, ...)
|
||||||
|
* arg0.functionName(arg1, ...)
|
||||||
|
|
||||||
|
The second form is a shorthand to make expressions easier to read. It simply pulls the first argument out and appends it to the front of the function, with a dot:
|
||||||
|
|
||||||
|
|Dot notation |Full notation |
|
||||||
|
|-|-|
|
||||||
|
| `value.trim().length()` | `length(trim(value))` |
|
||||||
|
| `value.substring(7, 10)` | `substring(value, 7, 10)` |
|
||||||
|
|
||||||
|
So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. This allows you to string together multiple functions in a readable order.
|
||||||
|
|
||||||
|
The dot notation can also be used to access the member fields of [variables](expressions#variables). For referring to column names that contain spaces (anything not a continuous string), use square brackets instead of dot notation:
|
||||||
|
|
||||||
|
|Example |Description |
|
||||||
|
|-|-|
|
||||||
|
| `FirstName.cells` | Access the cell in the column named “FirstName” of the current row |
|
||||||
|
| `cells["First Name"]` | Access the cell in the column called “First Name” of the current row |
|
||||||
|
|
||||||
|
Square brackets can also be used to get substrings and sub-arrays, and single items from arrays:
|
||||||
|
|
||||||
|
|Example |Description |
|
||||||
|
|-|-|
|
||||||
|
| `value[1,3]` | A substring of value, starting from character 1 up to but excluding character 3 |
|
||||||
|
| `"internationalization"[1,-2]` | Will return “nternationalizati” (negative indexes are counted from the end) |
|
||||||
|
| `row.columnNames[5]` | Will return the name of the fifth column |
|
||||||
|
|
||||||
|
Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0).
|
||||||
|
|
||||||
|
For example, [partition()](grelfunctions#partitions-s-or-p-fragment-b-omitfragment-optional) would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array).
|
||||||
|
|
||||||
|
## Controls
|
||||||
|
|
||||||
|
GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how.
|
||||||
|
|
||||||
|
Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror().
|
||||||
|
|
||||||
|
#### if(e, eTrue, eFalse)
|
||||||
|
|
||||||
|
Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
| Example expression | Result |
|
||||||
|
| ------------------------------------------------------------------------ | ------------ |
|
||||||
|
| `if("internationalization".length() > 10, "big string", "small string")` | “big string” |
|
||||||
|
| `if(mod(37, 2) == 0, "even", "odd")` | “odd” |
|
||||||
|
|
||||||
|
Nested if (switch case) example:
|
||||||
|
|
||||||
|
if(value == 'Place', 'http://www.example.com/Location',
|
||||||
|
|
||||||
|
if(value == 'Person', 'http://www.example.com/Agent',
|
||||||
|
|
||||||
|
if(value == 'Book', 'http://www.example.com/Publication',
|
||||||
|
|
||||||
|
null)))
|
||||||
|
|
||||||
|
#### with(e1, variable v, e2)
|
||||||
|
|
||||||
|
Evaluates expression e1 and binds its value to variable v. Then evaluates expression e2 and returns that result.
|
||||||
|
|
||||||
|
| Example expression | Result |
|
||||||
|
| ------------------------------------------------------------------------------------ | ---------- |
|
||||||
|
| `with("european union".split(" "), a, a.length())` | 2 |
|
||||||
|
| `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] |
|
||||||
|
| `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 |
|
||||||
|
|
||||||
|
#### filter(e1, v, e test)
|
||||||
|
|
||||||
|
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array.
|
||||||
|
|
||||||
|
| Expression | Result |
|
||||||
|
| ---------------------------------------------- | ------------- |
|
||||||
|
| `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] |
|
||||||
|
|
||||||
|
#### forEach(e1, v, e2)
|
||||||
|
|
||||||
|
Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array.
|
||||||
|
|
||||||
|
| Expression | Result |
|
||||||
|
| ------------------------------------------ | ------------------- |
|
||||||
|
| `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] |
|
||||||
|
|
||||||
|
#### forEachIndex(e1, i, v, e2)
|
||||||
|
|
||||||
|
Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array.
|
||||||
|
|
||||||
|
| Expression | Result |
|
||||||
|
| ------------------------------------------------------------------------------- | --------------------------- |
|
||||||
|
| `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy |
|
||||||
|
|
||||||
|
#### forRange(n from, n to, n step, v, e)
|
||||||
|
|
||||||
|
Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array.
|
||||||
|
|
||||||
|
#### forNonBlank(e, v, eNonBlank, eBlank)
|
||||||
|
|
||||||
|
Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead.
|
||||||
|
|
||||||
|
Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement.
|
||||||
|
|
||||||
|
#### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e)
|
||||||
|
|
||||||
|
Evaluates the expression e, and returns a boolean based on the named evaluation.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
| Expression | Result |
|
||||||
|
| ------------------- | ------- |
|
||||||
|
| `isBlank("abc")` | false |
|
||||||
|
| `isNonBlank("abc")` | true |
|
||||||
|
| `isNull("abc")` | false |
|
||||||
|
| `isNotNull("abc")` | true |
|
||||||
|
| `isNumeric(2)` | true |
|
||||||
|
| `isError(1)` | false |
|
||||||
|
| `isError("abc")` | false |
|
||||||
|
| `isError(1 / 0)` | true |
|
||||||
|
|
||||||
|
Remember that these are controls and not functions: you can’t use dot notation (for example, the format `e.isX()` will not work).
|
||||||
|
|
||||||
|
## Constants
|
||||||
|
|Name |Meaning |
|
||||||
|
|-|-|
|
||||||
|
| true | The boolean constant true |
|
||||||
|
| false | The boolean constant false |
|
||||||
|
| PI | From [Java's Math.PI](https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#PI), the value of pi (that is, 3.1415...) |
|
@ -116,7 +116,7 @@ Identical to substring() in relation to strings. Also works with arrays; see [Ar
|
|||||||
|
|
||||||
###### get(s, n from, n to (optional))
|
###### get(s, n from, n to (optional))
|
||||||
|
|
||||||
Identical to substring() in relation to strings. Also works with named fields. Also works with arrays; see [Array functions section](#geta-n-or-s-from-n-to-optional).
|
Identical to substring() in relation to strings. Also works with named fields. Also works with arrays; see [Array functions section](#geta-n-from-n-to-optional).
|
||||||
|
|
||||||
#### Find and replace
|
#### Find and replace
|
||||||
|
|
||||||
|
76
docs/docs/manual/jythonclojure.md
Normal file
76
docs/docs/manual/jythonclojure.md
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
---
|
||||||
|
id: jythonclojure
|
||||||
|
title: Jython & Clojure
|
||||||
|
sidebar_label: Jython & Clojure
|
||||||
|
---
|
||||||
|
|
||||||
|
## Jython
|
||||||
|
|
||||||
|
Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible.
|
||||||
|
|
||||||
|
Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those.
|
||||||
|
|
||||||
|
You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup.
|
||||||
|
|
||||||
|
OpenRefine now has [most of the Jsoup.org library built into GREL functions](grelfunctions#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements.
|
||||||
|
|
||||||
|
### Syntax
|
||||||
|
|
||||||
|
Expressions in Jython must have a `return` statement:
|
||||||
|
|
||||||
|
```
|
||||||
|
return value[1:-1]
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
return rowIndex%2
|
||||||
|
```
|
||||||
|
|
||||||
|
Fields have to be accessed using the bracket operator rather than dot notation:
|
||||||
|
|
||||||
|
```
|
||||||
|
return cells["col1"]["value"]
|
||||||
|
```
|
||||||
|
|
||||||
|
For example, to access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value using [recon variables](#reconciliation):
|
||||||
|
|
||||||
|
```
|
||||||
|
return cell["recon"]["features"]["nameLevenshtein"]
|
||||||
|
```
|
||||||
|
|
||||||
|
To return the lower case of `value` (if the value is not null):
|
||||||
|
|
||||||
|
```
|
||||||
|
if value is not None:
|
||||||
|
return value.lower()
|
||||||
|
else:
|
||||||
|
return None
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tutorials
|
||||||
|
- [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules)
|
||||||
|
- [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python)
|
||||||
|
|
||||||
|
Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org).
|
||||||
|
|
||||||
|
## Clojure
|
||||||
|
|
||||||
|
Clojure 1.10.1 comes bundled with the default installation of OpenRefine 3.4.1. At this time, not all [variables](expressions#variables) can be used with Clojure expressions: only `value`, `row`, `rowIndex`, `cell`, and `cells` are available.
|
||||||
|
|
||||||
|
For example, functions can take the form
|
||||||
|
```
|
||||||
|
(.. value (toUpperCase) )
|
||||||
|
```
|
||||||
|
|
||||||
|
Or can look like
|
||||||
|
```
|
||||||
|
(-> value (str/split #" ") last )
|
||||||
|
```
|
||||||
|
|
||||||
|
which functions like `value.split(" ")` in GREL.
|
||||||
|
|
||||||
|
For help with syntax, see the [Clojure website's guide to syntax](https://clojure.org/guides/learn/syntax).
|
||||||
|
|
||||||
|
User-contributed Clojure recipes can be found on our wiki at [https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure).
|
||||||
|
|
||||||
|
Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/).
|
@ -20,11 +20,13 @@ module.exports = {
|
|||||||
{
|
{
|
||||||
type: 'category',
|
type: 'category',
|
||||||
label: 'Expressions',
|
label: 'Expressions',
|
||||||
items: ['manual/expressions', 'manual/grelfunctions'],
|
items: ['manual/expressions', 'manual/grel', 'manual/grelfunctions', 'manual/jythonclojure'],
|
||||||
},
|
},
|
||||||
'manual/exporting',
|
'manual/exporting',
|
||||||
'manual/troubleshooting'
|
'manual/troubleshooting'
|
||||||
],
|
],
|
||||||
|
'GREL Reference': [
|
||||||
|
'manual/grelfunctions'],
|
||||||
'Technical Reference': [
|
'Technical Reference': [
|
||||||
'technical-reference/technical-reference-index',
|
'technical-reference/technical-reference-index',
|
||||||
'technical-reference/architecture',
|
'technical-reference/architecture',
|
||||||
|
Loading…
Reference in New Issue
Block a user