From f7404799e35162586b8d6c3d5da10d1451936688 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 2 Dec 2020 13:44:55 -0500 Subject: [PATCH 01/21] Across-the-docs updates --- docs/docs/manual/cellediting.md | 10 +- docs/docs/manual/exploring.md | 14 ++ docs/docs/manual/glossary.md | 5 - docs/docs/manual/installing.md | 192 ++++++++++++++---------- docs/docs/manual/key_value_columnize.md | 152 ------------------- docs/docs/manual/running.md | 152 ++++++++++++++----- docs/docs/manual/starting.md | 30 ++-- docs/docs/manual/transforming.md | 4 +- docs/docs/manual/troubleshooting.md | 24 ++- docs/docs/manual/wikidata.md | 10 +- docs/sidebars.js | 1 - 11 files changed, 281 insertions(+), 313 deletions(-) delete mode 100644 docs/docs/manual/glossary.md delete mode 100644 docs/docs/manual/key_value_columnize.md diff --git a/docs/docs/manual/cellediting.md b/docs/docs/manual/cellediting.md index 73c8dc7cd..f2aed7f19 100644 --- a/docs/docs/manual/cellediting.md +++ b/docs/docs/manual/cellediting.md @@ -56,7 +56,7 @@ For example, the following column of strings on the left will transform into the |today|>|today| |never|>|never| -This is based on OpenRefine’s ability to recognize dates with the [`toDate()` function](expressions#dates). +This is based on OpenRefine’s ability to recognize dates with the [`toDate()` function](expressions#date-functions). Clicking the “today” cell and editing its data type manually will convert “today” into a value such as “2020-08-14T00:00:00Z”. Attempting the same data-type change on “never” will give you an error message and refuse to proceed. @@ -123,13 +123,17 @@ The clustering pop-up window offers you a variety of clustering methods: * levenshtein * ppm +#### Key collision + **Key collisions** are very fast and can process millions of cells in seconds: **Fingerprinting** is the least likely to produce false positives, so it’s a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “Wang Zhenyi”). This makes comparing those types of name values very easy. **N-gram fingerprinting** allows you to set the _n_ value to whatever number you’d like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a _fingerprint_. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself won’t identify). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”). -The next four methods are phonetic algorithsm: they know whether two letters sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after only hearing it spoken aloud. +##### Phonetic clustering + +The next four methods are phonetic algorithms: they know whether two letters sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after only hearing it spoken aloud. **Metaphone3 fingerprinting** is an English-language phonetic algorithm. For example, “Reuben Gevorkiantz” and “Ruben Gevorkyants” share the same phonetic fingerprint in English. @@ -139,6 +143,8 @@ The next four methods are phonetic algorithsm: they know whether two letters sou Regardless of the language of your data, applying each of them might find different potential matches: for example, Metaphone clusters “Cornwall” and “Corn Hill” and “Green Hill,” while Cologne clusters “Greenvale” and “Granville” and “Cornwall” and “Green Wall.” +#### Nearest neighbor + **Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups. We recommend setting the block number to at least 3, and then increasing it if you need to be more strict (for example, if every value with “river” is being matched, you should increase it to 6 or more). Note bigger block values will take much longer to process, while smaller blocks may miss matches. Increasing the radius will make the matches more lax, as bigger differences will be clustered: **Levenshtein distance** counts the number of edits required to make one value perfectly match another. As in the key collision methods above, it will do things like change uppercase to lowercase, fix whitespace, change special characters, etc. Each character that gets changed counts as 1 “distance.” “New York” and “newyork” have an edit distance value of 3 (“N” to “n”, “Y” to “y,” remove the space). It can do relatively advanced edits, such as understand the distance between “M. Makeba” and “Miriam Makeba” (5), but it may create false positives if these distances are greater than other, simpler transformations (such as the one-character distance to “B. Makeba,” another person entirely). diff --git a/docs/docs/manual/exploring.md b/docs/docs/manual/exploring.md index 6b7b051af..9b0f63b36 100644 --- a/docs/docs/manual/exploring.md +++ b/docs/docs/manual/exploring.md @@ -42,6 +42,20 @@ Converting a cell's data type is not the same operation as transforming its cont To transform data from one type to another, see [Transforming data](transforming#transform) for information on using common tranforms, and see [Expressions](expressions) for information on using `toString()`, `toDate()`, and other functions. +### Dates + +Date-formatted data in OpenRefine relies on a number of conversion tools and standards. When you convert a cell into a "date" data type, what you'll be doing is trying to transform the original contents in an ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ. + +You can convert dates when you [export your data using the custom tabular exporter](exporting#custom-tabular-exporter). You are given the option to keep your dates in ISO 8601 format, or to output short, medium, long, or full locale formats. This means that you can format your dates into, for example, MM/DD/YY (the US short standard) with or without including the time, after manipulating your data formatted into ISO 8601. + +The following table shows the [date and time formatting styles for the U.S. and French locales](https://docs.oracle.com/javase/tutorial/i18n/format/dateFormat.html): +|Style |U.S. Locale |French Locale| +|DEFAULT |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| +|SHORT |6/30/09 7:03 AM |30/06/09 07:03| +|MEDIUM |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| +|LONG |June 30, 2009 7:03:47 AM PDT |30 juin 2009 07:03:47 PDT| +|FULL |Tuesday, June 30, 2009 7:03:47 AM PDT |mardi 30 juin 2009 07 h 03 PDT| + ## Rows vs. records A row is a simple way to organize data: a series of cells, one cell per column. Sometimes there are multiple pieces of information in one cell, such as when a survey respondent can select more than one response. In cases where there is more than one value for a single column in one or more rows, you may wish to use OpenRefine’s records mode: this defines a single record (a survey response, for example) as potentially containing more than one row. From there you can transform cells into multiple rows, each cell containing one value you’d like to work with. diff --git a/docs/docs/manual/glossary.md b/docs/docs/manual/glossary.md deleted file mode 100644 index 333781973..000000000 --- a/docs/docs/manual/glossary.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -id: glossary -title: OpenRefine Glossary -sidebar_label: Glossary ---- diff --git a/docs/docs/manual/installing.md b/docs/docs/manual/installing.md index b5a6781fa..88f766eb8 100644 --- a/docs/docs/manual/installing.md +++ b/docs/docs/manual/installing.md @@ -31,28 +31,26 @@ We are aware of some minor rendering and performance issues on other browsers su ### Release versions -OpenRefine always has a latest stable release as well as some more recent work available in beta, release candidate, or nightly release versions. - -If you are installing for the first time, we recommend [the latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest). +OpenRefine always has a [latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest), as well as some more recent developments available in beta, release candidate, or [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). If you are installing for the first time, we recommend [the latest stable release](https://github.com/OpenRefine/OpenRefine/releases/latest). If you wish to use an extension that is only compatible with an earlier version of OpenRefine, and do not require the latest features, you may find that [an older stable version is best for you](https://github.com/OpenRefine/OpenRefine/releases) in our list of releases. Look at later releases to see which security vulnerabilities are being fixed, in order to assess your own risk tolerance for using earlier versions. Look for “final release” versions instead of “beta” or “release candidate” versions. #### Unstable versions -If you need a recently developed function, and are willing to risk some untested code, you can look at [the most recent items in the reverse-chronological list](https://github.com/OpenRefine/OpenRefine/releases) and see what changes appeal to you. +If you need a recently developed function, and are willing to risk some untested code, you can look at [the most recent items in the list](https://github.com/OpenRefine/OpenRefine/releases) and see what changes appeal to you. “Beta” and “release candidate” versions may both have unreported bugs and are most suitable for people who are wiling to help us troubleshoot these versions by [creating bug reports](https://github.com/OpenRefine/OpenRefine/issues). -For the absolute latest development updates, see the [snapshot releases](https://github.com/OpenRefine/OpenRefine-nightly-releases/releases). These are created with every commit. +For the absolute latest development updates, see the [snapshot releases](https://github.com/OpenRefine/OpenRefine-snapshot-releases/releases). These are created with every commit. #### What’s changed -Our [latest release is at the time of writing is OpenRefine 3.4](**link goes here!**), released **XXXX XX 2020**. The major changes in this version are listed on the [3.4 final release page](**link goes here!**) with the downloadable packages. +Our [latest version is OpenRefine 3.4.1](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1), released September 24th 2020. The major changes in this version are listed on the [3.4 release page](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1) with the downloadable packages. -You can find information about all of our releases on the [Releases page on Github](https://github.com/OpenRefine/OpenRefine/releases). +You can find information about all OpenRefine versions on the [Releases page on Github](https://github.com/OpenRefine/OpenRefine/releases). :::info Other distributions -OpenRefine may also work in other environments, such as [Chromebooks](https://gist.github.com/organisciak/3e12e5138e44a2fed75240f4a4985b4f) where Linux terminals are available. Look at our list of [Other Distributions](https://openrefine.org/download.html) on the Downloads page for other ways of running OpenRefine, and refer to our contributor community to see new environments in development. +OpenRefine may also work in other environments, such as [Chromebooks](https://gist.github.com/organisciak/3e12e5138e44a2fed75240f4a4985b4f) where Linux terminals are available. Look at our list of [Other Distributions on the Downloads page](https://openrefine.org/download.html) for other ways of running OpenRefine, and refer to our contributor community to see new environments in development. ::: ## Installing or upgrading @@ -60,9 +58,9 @@ OpenRefine may also work in other environments, such as [Chromebooks](https://gi If you are upgrading from an older version of OpenRefine and have projects already on your computer, you should create backups of those projects before you install a new version. -First, [locate your workspace directory](installing.md#where-is-data-stored). Then copy everything you find there and paste it into a folder elsewhere on your computer. +First, [locate your workspace directory](#where-is-data-stored). Then copy everything you find there and paste it into a folder elsewhere on your computer. -For extra security you can [export your existing OpenRefine projects](exporting.md#export-a-project). +For extra security you can [export your existing OpenRefine projects](exporting#export-a-project). :::caution Take note of the [extensions](#installing-extensions) you have currently installed. They may not be compatible with the upgraded version of OpenRefine. Installations can be installed in two places, so be sure to check both your workspace directory and the existing installation directory. @@ -93,16 +91,16 @@ import TabItem from '@theme/TabItem'; -1. On Windows 10, click the Windows start menu button, type `env`, and look at the search results. **Edit the system environment** variables. (If you are using an earlier version of Windows, use the **Search** or **Search programs and files** box in the start menu.) +1. On Windows 10, click the Windows start menu button, type “env,” and look at the search results. Click “Edit the system environment variables.” (If you are using an earlier version of Windows, use the “Search” or “Search programs and files” box in the start menu.) ![A screenshot of the search results for 'env'.](/img/env.png "A screenshot of the search results for 'env'.") -2. Click **Environment Variables…** at the bottom of the **Advanced** window that appears. -3. In the **Environment Variables** dialog that appears, click **New…** and create a variable with the key `JAVA_HOME`. You can set the variable for only your user account, as in the screenshot below, or set it as a system variable - it will work either way. +2. Click “Environment Variables…” at the bottom of the “Advanced” window that appears. +3. In the “Environment Variables” dialog that appears, click “New…” and create a variable with the key `JAVA_HOME`. You can set the variable for only your user account, as in the screenshot below, or set it as a system variable - it will work either way. ![A screenshot of 'Environment Variables'.](/img/javahome.png "A screenshot of 'Environment Variables'.") -4. Set the **Value** to the folder where you installed JDK, in the format `D:\Programs\OpenJDK`. You can locate this folder with the **Browse directory...** button. +4. Set the `Value` to the folder where you installed JDK, in the format `D:\Programs\OpenJDK`. You can locate this folder with the “Browse directory...” button. @@ -110,19 +108,27 @@ import TabItem from '@theme/TabItem'; First, find where Java is on your computer with this command: -```which java``` +``` +which java +``` Check the environment variable `JAVA_HOME` with: -```$JAVA_HOME/bin/java --version``` +``` +$JAVA_HOME/bin/java --version +``` To set the environment variable for the current Java version of your MacOS: -```export JAVA_HOME="$(/usr/libexec/java_home)"``` +``` +export JAVA_HOME="$(/usr/libexec/java_home)" +``` Or, for Java 13.x: -```export JAVA_HOME="$(/usr/libexec/java_home -v 13)"``` +``` +export JAVA_HOME="$(/usr/libexec/java_home -v 13)" +``` @@ -132,20 +138,27 @@ Or, for Java 13.x: Enter the following: -```sudo apt install default-jre``` +``` +sudo apt install default-jre +``` This probably won’t install the latest JDK package available on the Java website, but it is faster and more straightforward. (At the time of writing, it installs OpenJDK 11.0.7.) ##### Manually First, [extract the JDK package](https://openjdk.java.net/install/) to the new directory `usr/lib/jvm`: + ``` sudo mkdir -p /usr/lib/jvm sudo tar -x -C /usr/lib/jvm -f /tmp/openjdk-14.0.1_linux-x64_bin.tar.gz ``` -Then, navigate to this folder and confirm the final path (in this case, `usr/lib/jvm/jdk-14.0.1`. -Open a terminal and type -```sudo gedit /etc/profile``` + +Then, navigate to this folder and confirm the final path (in this case, `usr/lib/jvm/jdk-14.0.1`. Open a terminal and type + +``` +sudo gedit /etc/profile +``` + In the text window that opens, insert the following lines at the end of the `profile` file, using the path above: ``` @@ -154,10 +167,18 @@ PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH ``` + Save and close the file. When you are back in the terminal, type -```source /etc/environment``` + +``` +source /etc/environment +``` + Exit the terminal and restart your system. You can then check that JAVA_HOME is set properly by opening another terminal and typing -```echo $JAVA_HOME``` +``` +echo $JAVA_HOME +``` + It should show the path you set above. @@ -168,7 +189,7 @@ It should show the path you set above. ### Install or upgrade OpenRefine -If you are upgrading an existing OpenRefine installation, you can delete the old program files and install the new files into the same space. Do not overwrite the files as some obsolete files may be left over unnecessarily. +If you are upgrading an existing OpenRefine installation, you can delete the old program files and install the new files into the same space. Do not overwrite the files as some obsolete files may be left over unnecessarily. :::caution If you have extensions installed, do not delete the `webapp\extensions` folder where you installed them. You may wish to install extensions into the workspace directory instead of the program directory. There is no guarantee that extensions will be forward-compatible with new versions of OpenRefine, and we do not maintain extensions. @@ -201,19 +222,21 @@ Once you have downloaded the `.dmg` file, open it and drag the OpenRefine icon o The quick version: -1. Install[ Homebrew from here](http://brew.sh) +1. Install [Homebrew](http://brew.sh) 2. In Terminal enter ` brew cask install openrefine` 1. Then find OpenRefine in your Applications folder. The long version: -[Homebrew](http://brew.sh) is a popular command-line package manager for Mac. Installing Homebrew is accomplished by pasting the installation command on the Homebrew website into a Terminal window. Once Homebrew is installed, applications like OpenRefine can be installed via a simple command. You can [install Homebrew from their website]([http://brew.sh](http://brew.sh)). +[Homebrew](http://brew.sh) is a popular command-line package manager for Mac. Installing Homebrew is accomplished by pasting the installation command on the Homebrew website into a Terminal window. Once Homebrew is installed, applications like OpenRefine can be installed via a simple command. You can [install Homebrew from their website](http://brew.sh). ###### Install Install OpenRefine with this command: -``` brew cask install openrefine``` +``` +brew cask install openrefine +``` You should see output like this: @@ -228,23 +251,29 @@ You should see output like this: Behind the scenes, this command causes Homebrew to download the OpenRefine installer, verify the file’s authenticity (using a SHA-256 checksum), mount the disk image, copy the `OpenRefine.app` application bundle into the Applications folder, unmount the disk image, and save a copy of the installer and metadata about the installation for future use. -_If an existing `OpenRefine.app` is found in the Applications folder, Homebrew will not overwrite it, so installing via Homebrew requires either deleting or renaming previously installed copies._ +If an existing `OpenRefine.app` is found in the Applications folder, Homebrew will not overwrite it, so installing via Homebrew requires either deleting or renaming previously installed copies. ###### Uninstall To uninstall OpenRefine, paste this command into the Terminal: -``` brew cask uninstall openrefine``` +``` + brew cask uninstall openrefine +``` You should see output like this: -``` ==> Removing App '/Applications/OpenRefine.app'.``` +``` + ==> Removing App '/Applications/OpenRefine.app'. +``` ###### Update To update to the latest version of OpenRefine, paste this command into the Terminal: -``` brew cask reinstall openrefine``` +``` + brew cask reinstall openrefine +``` You should see output like this: @@ -269,7 +298,9 @@ If you had previously installed the `openrefine-dev` cask (containing a release Once you have downloaded the `.tar.gz` file, open a shell, navigate to the folder containing the download, and type: -```tar xzf openrefine-linux-3.4.tar.gz``` +``` +tar xzf openrefine-linux-3.4.tar.gz +``` @@ -280,7 +311,7 @@ Once you have downloaded the `.tar.gz` file, open a shell, navigate to the folde ### Set where data is stored -OpenRefine stores data in two places: program files in the program directory, wherever it is you’ve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running.md#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking "Browse workspace directory." +OpenRefine stores data in two places: program files in the program directory, wherever it is you’ve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking “Browse workspace directory.” By default this is: @@ -308,11 +339,15 @@ For older Google Refine releases, replace `OpenRefine` with `Google\Refine`. You can change this by adding this line to the file `openrefine.l4j.ini` and specifying your desired drive and folder path: -```-Drefine.data_dir=D:\MyDesiredFolder``` +``` +-Drefine.data_dir=D:\MyDesiredFolder +``` If your folder path has spaces, use neutral quotation marks around it: -```-Drefine.data_dir="D:\My Desired Folder"``` +``` +-Drefine.data_dir="D:\My Desired Folder" +``` If the folder does not exist, OpenRefine will create it. @@ -320,11 +355,15 @@ If the folder does not exist, OpenRefine will create it. -```~/Library/Application Support/OpenRefine/``` +``` +~/Library/Application Support/OpenRefine/ +``` For older versions as Google Refine: -```~/Library/Application Support/Google/Refine/ ``` +``` +~/Library/Application Support/Google/Refine/ +``` Logging is to `/var/log/daemon.log` - grep for `com.google.refine.Refine`. @@ -332,11 +371,15 @@ Logging is to `/var/log/daemon.log` - grep for `com.google.refine.Refine`. -```~/.local/share/openrefine/``` +``` +~/.local/share/openrefine/ +``` You can change this when you run OpenRefine from the terminal, by pointing to the workspace directory through the `-d` parameter: -``` ./refine -p 3333 -i 0.0.0.0 -m 6000M -d /My/Desired/Folder``` +``` + ./refine -p 3333 -i 0.0.0.0 -m 6000M -d /My/Desired/Folder +``` @@ -362,10 +405,10 @@ OpenRefine does not currently output an error log, but because the OpenRefine co You can access OpenRefine server logs from the terminal on Mac: * Find the OpenRefine app/icon in Finder -* Ctrl+Click on the icon and select **Show Package Contents** from the context menu that displays -* This should open a new Finder menu showing a folder called **Contents** - navigate into this folder then into the **MacOS** folder -* Ctrl+Click on **JavaAppLauncher** -* Choose **Open With** from menu, and select **Terminal** +* control-click on the icon and select “Show Package Contents” from the context menu that displays +* This should open a new Finder menu showing a folder called “Contents” - navigate into this folder then into the “MacOS” folder +* control-click on “JavaAppLauncher” +* Choose “Open With” from the menu, and select “Terminal” --- @@ -373,25 +416,21 @@ You can access OpenRefine server logs from the terminal on Mac: - - - ## Increasing memory allocation OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large data sets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators: -* more than **one million** rows -* more than **one million **total cells +* more than one million total cells * an input file size of more than 50 megabytes (MB) -* more than **50** [rows per record in records mode](**running.md#records-mode**) +* more than 50 [rows per record in records mode](running#records-mode) -By default OpenRefine is set to operate with 1 gigabyte (GB) of memory (1024MB). If OpenRefine is running slowly, or you are getting "out of memory" errors (for example, `java.lang.OutOfMemoryError`), or generally feel that OpenRefine is slow, you can try allocating more memory. +By default OpenRefine is set to operate with 1 gigabyte (GB) of memory (1024MB). If you feel that OpenRefine is running slowly, or you are getting “out of memory” errors (for example, `java.lang.OutOfMemoryError`), you can try allocating more memory. A good practice is to start with no more than 50% of whatever memory is left over after the estimated usage of your operating system, to leave memory for your browser to run. -All of the settings below use a four-digit number to specify the megabytes (MB) used. The default is usually 1024MB, but the new value doesn't need to be a multiple of 1024. +All of the settings below use a four-digit number to specify the megabytes (MB) used (actually [mebibytes](https://en.wikipedia.org/wiki/Mebibyte)). The default is usually 1024MB, but the new value doesn't need to be a multiple of 1024. :::info Dealing with large datasets -If your project is big enough to need more than the default amount of memory, consider turning off "Parse cell text into numbers, dates, ..." on import. It's convenient, but less efficient than explicitly converting any columns that you need as a data type other than the default "string" type. +If your project is big enough to need more than the default amount of memory, consider turning off “Parse cell text into numbers, dates, ...” on import. It's convenient, but less efficient than explicitly converting any columns that you need as a data type other than the default “string” type. ::: -If you have downloaded the **.dmg** package and you start OpenRefine by double-clicking on it: +If you have downloaded the `.dmg` package and you start OpenRefine by double-clicking on it: * close OpenRefine -* **control-click** on the OpenRefine icon (opens the contextual menu) -* click on **show package content** (a finder window opens) -* open the **Contents** folder -* open the **Info.plist** file with any text editor (like Mac's default TextEdit) -* Change `-Xmx1024M` into, for example, `-Xmx2048M` or `-Xmx8G` +* control-click on the OpenRefine icon (opens the contextual menu) +* click on "show package content” (a finder window opens) +* open the “Contents” folder +* open the `Info.plist` file with any text editor (like Mac's default TextEdit) +* Change “-Xmx1024M” into, for example, “-Xmx2048M” or “-Xmx8G” * save the file * restart OpenRefine. -If you have downloaded the `.tar.gz` package and you start OpenRefine from the command line, add the `-m xxxxM` parameter like this: - +If you have downloaded the `.tar.gz` package and you start OpenRefine from the command line, add the “-m xxxxM” parameter like this: `./refine -m 2048m` #### Setting a default If you don't want to set this option on the command line each time, you can also set it in the `refine.ini` file. Edit the line -```REFINE_MEMORY=1024M``` +``` +REFINE_MEMORY=1024M +``` -Make sure it is not commented out (that is, that the line doesn't start with a '#' character), and change `1024` to a higher value. Save the file, and when you next start OpenRefine it will use this value. +Make sure it is not commented out (that is, that the line doesn't start with a “#” character), and change “1024” to a higher value. Save the file, and when you next start OpenRefine it will use this value. @@ -483,7 +523,7 @@ If you’d like to create or modify an extension, [see our developer documentati ### Two ways to install extensions -You can [install extensions in one of two places](installing.md#set-where-data-is-stored): +You can [install extensions in one of two places](installing#set-where-data-is-stored): * Into your OpenRefine program folder, so they will only be available to that version/installation of OpenRefine (meaning the extension will not run if you upgrade OpenRefine), or * Into your workspace, where your projects are stored, so they will be available no matter which version of OpenRefine you’re using. @@ -496,11 +536,11 @@ If you want to install the extension into the program folder, go to your program If you want to install the extension into your workspace, you can: * launch OpenRefine and click Open Project in the sidebar -* At the bottom of the screen, click Browse workspace directory +* At the bottom of the screen, click Browse workspace directory * A file-explorer or finder window will open in your workspace -* Create a new folder called `extensions` inside the workspace if it does not exist. +* Create a new folder called “extensions” inside the workspace if it does not exist. -You can also [find your workspace on each operating system using these instructions](installing.md#set-where-data-is-stored). +You can also [find your workspace on each operating system using these instructions](installing#set-where-data-is-stored). ### Install the extension @@ -514,12 +554,4 @@ Generally, the installation process will be: * Extract the zip contents into the `extensions` directory, making sure all the contents go into one folder with the name of the extension * Start (or restart) OpenRefine. -To confirm that installation was a success, follow the instructions provided by the extension. Each extension will appear in its own way inside the OpenRefine interface: make sure you read the documentation to know where the functionality will appear, such as under specific dropdown menus. - -## Advanced OpenRefine uses - - -### Running as a server - - -### Automating OpenRefine +To confirm that installation was a success, follow the instructions provided by the extension. Each extension will appear in its own way inside the OpenRefine interface: make sure you read the documentation to know where the functionality will appear, such as under specific dropdown menus. \ No newline at end of file diff --git a/docs/docs/manual/key_value_columnize.md b/docs/docs/manual/key_value_columnize.md deleted file mode 100644 index d28bd2b9f..000000000 --- a/docs/docs/manual/key_value_columnize.md +++ /dev/null @@ -1,152 +0,0 @@ ---- -id: key_value_columnize -title: Columnize by key/value columns -sidebar_label: Columnize by key/value ---- - -This operation can be used to reshape a table which contains *key* and *value* columns, such that the repeating contents in the key column become new column names, and the contents of the value column are spread in the new columns. This operation can be invoked from -any column menu, via **Transpose** → **Columnize by key/value columns**. - -Overview --------- - -Consider the following table: - -| Field | Data | -|---------|-----------------------| -| Name | Galanthus nivalis | -| Color | White | -| IUCN ID | 162168 | -| Name | Narcissus cyclamineus | -| Color | Yellow | -| IUCN ID | 161899 | - -In this format, each flower species is described by multiple attributes, which are spread on consecutive rows. -In this example, the "Field" column contains the keys and the "Data" column contains the values. With -this configuration, the *Columnize by key/value columns* operations transforms this table as follows: - -| Name | Color | IUCN ID | -|-----------------------|----------|---------| -| Galanthus nivalis | White | 162168 | -| Narcissus cyclamineus | Yellow | 161899 | - -Entries with multiple values in the same column ------------------------------------------------ - -If an entry has multiple values for a given key, then these values will be grouped on consecutive rows, -to form a [record structure](exploring#rows-vs-records). - -For instance, flower species can have multiple colors: - -| Field | Data | -|-------------|-----------------------| -| Name | Galanthus nivalis | -| **Color** | **White** | -| **Color** | **Green** | -| IUCN ID | 162168 | -| Name | Narcissus cyclamineus | -| Color | Yellow | -| IUCN ID | 161899 | - -This table is transformed by the operation as follows: - -| Name | Color | IUCN ID | -|-----------------------|----------|---------| -| Galanthus nivalis | White | 162168 | -| | Green | | -| Narcissus cyclamineus | Yellow | 161899 | - -The first key encountered by the operation serves as the record key. -The "Green" value is attached to the "Galanthus nivalis" name because it is the latest record key encountered by the operation as it scans the table. See the [Row order](#row-order) section for more details about the influence of row order on -the results of the operation. - -Notes column ------------- - -In addition to the key and value columns, a *notes* column can be used optionally. This can be used -to store extra metadata associated to a key/value pair. - -Consider the following example: - -| Field | Data | Source | -|---------|-----------------------|-----------------------| -| Name | Galanthus nivalis | IUCN | -| Color | White | Contributed by Martha | -| IUCN ID | 162168 | | -| Name | Narcissus cyclamineus | Legacy | -| Color | Yellow | 2009 survey | -| IUCN ID | 161899 | | - -If the "Source" column is selected as notes column, this table is transformed to: - -| Name | Color | IUCN ID | Source : Name | Source : Color | -|-----------------------|----------|---------|---------------|-----------------------| -| Galanthus nivalis | White | 162168 | IUCN | Contributed by Martha | -| Narcissus cyclamineus | Yellow | 161899 | Legacy | 2009 survey | - -Notes columns can therefore be used to preserve provenance or other context about a particular key/value pair. - -Extra columns -------------- - -If the table contains extra columns, which are not used as key, value or notes columns, they can be preserved -by the operation. For this to work, they must have the same value in all old rows corresponding to a new row. - -Consider for instance the following table, where the "Field" and "Data" columns are used as key and value columns -respectively, and the "Wikidata ID" column is not selected: - -| Field | Data | Wikidata ID | -|---------|-----------------------|-------------| -| Name | Galanthus nivalis | Q109995 | -| Color | White | Q109995 | -| IUCN ID | 162168 | Q109995 | -| Name | Narcissus cyclamineus | Q1727024 | -| Color | Yellow | Q1727024 | -| IUCN ID | 161899 | Q1727024 | - -This will be transformed to - -| Wikidata ID | Name | Color | IUCN ID | -|-------------|-----------------------|----------|---------| -| Q109995 | Galanthus nivalis | White | 162168 | -| Q1727024 | Narcissus cyclamineus | Yellow | 161899 | - -If extra columns do not contain identical values for all old rows spanning an entry, this can -be fixed beforehand by using the [fill down operation](cellediting#fill-down). - -Row order ---------- - -In the absence of extra columns, it is important to note that the order in which -the key/value pairs appear matters. Specifically, the operation will use the first key it encounters as the delimiter for entries: -every time it encounters this key again, it will produce a new row and add the following other key/value pairs to that row. - -Consider for instance the following table: - -| Field | Data | -|----------|-----------------------| -| **Name** | Galanthus nivalis | -| Color | White | -| IUCN ID | 162168 | -| **Name** | Crinum variabile | -| **Name** | Narcissus cyclamineus | -| Color | Yellow | -| IUCN ID | 161899 | - -The occurrences of the "Name" value in the "Field" column define the boundaries of the entries. Because there is -no other row between the "Crinum variabile" and the "Narcissus cyclamineus" rows, the "Color" and "IUCN ID" columns -for the "Crinum variabile" entry will be empty: - -| Name | Color | IUCN ID | -|-----------------------|----------|---------| -| Galanthus nivalis | White | 162168 | -| Crinum variabile | | | -| Narcissus cyclamineus | Yellow | 161899 | - -This sensitivity to order is removed if there are extra columns: in that case, the first extra column will serve as root identifier -for the entries. - -Behaviour in records mode -------------------------- - -In records mode, this operation behaves just like in rows mode, except that any facets applied to it will be interpreted in records mode. diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index d39858b44..3503e1bae 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -10,7 +10,7 @@ OpenRefine does not require internet access to run its basic functions. Once you You will see a command line window open when you run OpenRefine. Leave that window alone while you work on datasets in your browser. -No matter how you load OpenRefine, it will load in your computer’s default browser. If you would like to use another browser instead, start OpenRefine and then point your chosen browser at the home screen: [http://127.0.0.1:3333/](http://127.0.0.1:3333/). +No matter how you load OpenRefine, it will load in your computer’s default browser. If you would like to use another browser instead, start OpenRefine and then point your chosen browser at the home screen: http://127.0.0.1:3333/. OpenRefine works best on browsers based on Webkit, such as: * Google Chrome @@ -20,7 +20,7 @@ OpenRefine works best on browsers based on Webkit, such as: We are aware of some minor rendering and performance issues on other browsers such as Firefox. We don't support Internet Explorer. -You can launch multiple projects at the same time by simply having multiple tabs or browser windows open. From the Open Project screen, you can right-click on project names and select Open in new tab. +You can launch multiple projects at the same time by simply having multiple tabs or browser windows open. From the Open Project screen, you can right-click on project names and open them in new tabs or windows. import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; @@ -37,7 +37,7 @@ import TabItem from '@theme/TabItem'; -To exit OpenRefine, close all the browser tabs, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. +To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. This will save any last changes to your projects. #### With openrefine.exe You can run OpenRefine by double-clicking `openrefine.exe` or calling it from the command line. If you want to [modify the way `openrefine.exe` opens](#starting-with-modifications), you can edit the `openrefine.l4j.ini` file. @@ -106,7 +106,9 @@ When you run OpenRefine from a command line, you can change a number of default On Windows, use a slash: -```C:>refine /i 127.0.0.2 /p 3334``` +``` +C:>refine /i 127.0.0.2 /p 3334 +``` Get a list of all the commands with `refine /?`. @@ -119,7 +121,6 @@ Get a list of all the commands with `refine /?`. |/d|Enable debugging (on port 8000)|refine /d| |/x|Enable JMX monitoring for Jconsole and JvisualVM|refine /x| - @@ -166,9 +167,31 @@ To see the full list of command-line options, run `./refine -h`. #### Modifications set within files -On Windows, you can modify the way `openrefine.exe` runs by editing `openrefine.l4j.ini`; you can modify the way `refine.bat` runs by editing `refine.ini`. You can modify the Mac application by editing `info.plist`. On Linux, you can edit `refine.ini`. +On Windows, you can modify the way `openrefine.exe` runs by editing `openrefine.l4j.ini`; you can modify the way `refine.bat` runs by editing `refine.ini`. +You can modify the Mac application by editing `Info.plist`. +On Linux, you can edit `refine.ini`. -These JVM preferences are different options and have different syntax than the key/value descriptions above. Some of the most common keys (with their defaults) are: +Some settings, such as changing memory allocations, are already set inside these files, and all you have to do is change the values. Some lines need to be un-commented to work. + +For example, inside `refine.ini`, you should see: +``` +no_proxy="localhost,127.0.0.1" +#REFINE_PORT=3334 +#REFINE_HOST=127.0.0.1 +#REFINE_WEBAPP=main\webapp + +# Memory and max form size allocations +#REFINE_MAX_FORM_CONTENT_SIZE=1048576 +REFINE_MEMORY=1400M + +# Set initial java heap space (default: 256M) for better performance with large datasets +REFINE_MIN_MEMORY=1400M +... +``` + +Further modifications can be performed by using JVM preferences. + +These JVM preferences are different options and have different syntax than the key/value descriptions used on the command line. Some of the most common keys (with their defaults) are: * -Drefine.autosave (5 [minutes]) * -Drefine.data_dir (/) * -Drefine.development (false) @@ -177,7 +200,7 @@ These JVM preferences are different options and have different syntax than the k * -Drefine.port (3333) * -Drefine.webapp (main/webapp) -The syntax within the `.ini` files is as follows: +The syntax is as follows: -Inside either of the `.ini` files, insert lines in this way: +Inside the `refine.l4j.ini` file, insert lines in this way: ``` --Drefine.port=3333 --Drefine.host=127.0.0.1 +-Drefine.port=3334 +-Drefine.host=127.0.0.2 -Drefine.webapp=broker/core ``` +In `refine.ini`, use a similar syntax, but set multiple parameters within a single line starting with `JAVA_OPTIONS=`: + +``` +JAVA_OPTIONS=-Drefine.data_dir=C:\Users\user\Documents\OpenRefine\ -Drefine.port=3334 + +``` @@ -257,7 +286,7 @@ Refer to the [official Java documentation](https://docs.oracle.com/javase/8/docs ## The home screen -When you first launch OpenRefine, you will see a screen with a menu on the left hand side that includes Create Project, Open Project, Import Project, and Language Settings. This is called the "home screen", where you can manage your projects and general settings. +When you first launch OpenRefine, you will see a screen with a menu on the left hand side that includes Create Project, Open Project, Import Project, and Language Settings. This is called the “home screen,” where you can manage your projects and general settings. ### Language settings @@ -294,7 +323,7 @@ At this time you can set preferences using a key/value pair: that is, selecting |Timeout for Google Drive authorization|googleConnectTimeOut|Number (microseconds)|180000|500000| |Maximum lag for Wikidata edit retries|wikibase.upload.maxLag|Number (seconds)|5|10| -To leave the Preferences screen, click on the “OpenRefine” logo. +To leave the Preferences screen, click on the diamond “OpenRefine” logo. If the preference you’re looking for isn’t here, look at the options you can set from the [command line or in an `.ini` file](#starting-with-modifications). @@ -316,21 +345,21 @@ Don’t click the “back” button on your browser - it will likely close your You can rename a project at any time by clicking inside the project title, which will turn into a text field. Project names don’t have to be unique, as OpenRefine organizes them based on a unique identifier behind the scenes. -Permalink allows you to return to a project at a specific view state - that is, with facets and filters applied. The permalink can help you pick up where you left off if you have to close your project while working with facets and filters. It puts view-specific information directly into the URL: clicking on it will load this current-view URL in the existing tab. You can right-click and copy the Permalink URL to copy the current view state to your clipboard, without refreshing the tab you’re using. -
-Open… will open up a new browser tab showing the “Create Project” screen. From here you can change settings, start a new project, or open an existing project. -
+The Permalink allows you to return to a project at a specific view state - that is, with facets and filters applied. The permalink can help you pick up where you left off if you have to close your project while working with facets and filters. It puts view-specific information directly into the URL: clicking on it will load this current-view URL in the existing tab. You can right-click and copy the Permalink URL to copy the current view state to your clipboard, without refreshing the tab you’re using. + +The Open… button will open up a new browser tab showing the Create Project screen. From here you can change settings, start a new project, or open an existing project. + Export is a dropdown menu that allows you to pick a format for exporting your current dataset. It will only export rows and records that are currently visible - the currently selected facets and filters, not the total data in the project. -
+ Help will open up a new browser tab and bring you to this user manual on the web. ### The grid header The grid header sits below the project bar and above the project grid (the data of your project). The grid header will tell you the total number of rows or records in your project, and indicate whether you are in rows or records mode. -It will also tell you if you’re currently looking at a select number of rows via facets or filtering, rather than the entire dataset, by displaying either, for example, 180 rows or 67 matching rows (180 total). +It will also tell you if you’re currently looking at a select number of rows via facets or filtering, rather than the entire dataset, by displaying either, for example, “180 rows” or “67 matching rows (180 total).” -Directly below the row number, you have the ability to switch between row mode and records mode. OpenRefine stores which projects are in records mode, and displays your data as records by default if you are. +Directly below the row number, you have the ability to switch between [row mode and records mode](exploring#rows-vs-records). OpenRefine stores which projects are in records mode, and displays your data as records by default if you are. To the right of the rows/records selection is the array of options for how many rows/records to view on screen at one time. At the far right of the screen you can navigate through your entire dataset one page at a time. @@ -340,23 +369,21 @@ The Extensions dropdown offers you options for ex ### The grid -The area of the project screen that displays your dataset is called the "project grid" (or the "data grid", or simply the "grid"). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you. +The area of the project screen that displays your dataset is called the “project grid” (or the “data grid,” or simply the “grid”). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you. Columns widths are automatically set based on their contents; some column headers may be cut off, but can be viewed by mousing over the headers. In each column header you will see a small arrow. Clicking on this arrow brings up a dropdown menu containing column-specific data exploration and transformation options. You will learn about each of these options in the [Exploring data](exploring) and [Transforming data](transforming) sections. -The first column in every project will always be All, which contains options to flag, star, and do non-column-specific operations. The All column is also where rows/records are numbered. +The first column in every project will always be “All,” which contains options to flag, star, and do non-column-specific operations. The “All” column is also where rows/records are numbered. The project grid may display with both vertical and horizontal scrolling, depending on the number and width of columns, and the number of rows/records displayed. You can control the display of the project grid by using [Sort and View options](exploring#sort-and-view). -Mousing over individual cells will allow you to [edit cells individually](transforming). +Mousing over individual cells will allow you to [edit cells individually](cellediting#edit-one-cell-at-a-time). -### The sidebar +### Facet/Filter -#### Facet/Filter - -The Facet/Filter tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](exploring#facets) and [filters](exploring#filters) are explained more in [Exploring data](exploring). +The Facet/Filter tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](facets) and [filters](facets#text-filter) are explained more in [Exploring data](exploring). ![A screenshot of facets and filters in action.](/img/facetfilter.png) @@ -368,7 +395,7 @@ Removing your facets will clear out the sidebar entirely. If you have written cu You can preserve your facets and filters for future use by copying a [Permalink](#the-project-bar). -#### History (Undo/Redo) +### History (Undo/Redo) In OpenRefine, any activity that changes the data can be undone. Changes are tracked from the very beginning, when a project is first created. The change history of each project is saved with the project's data, so quitting OpenRefine does not erase the steps you've taken. When you restart OpenRefine, you can view and undo changes that you made before you quit OpenRefine. @@ -376,7 +403,7 @@ Project history gets saved when you export a project archive, and restored when ![A screenshot of the History (Undo/Redo) tab with 13 steps.](/img/history.png "A screenshot of the History (Undo/Redo) tab with 13 steps.") -When you click on Undo / Redo in the sidebar of any project, that project’s history is shown as a list of changes in order, with the first change being the action of creating the project itself. (That first change, indexed as step zero, cannot be undone.) Here is a sample history with 3 changes: +When you click on Undo / Redo in the sidebar of any project, that project’s history is shown as a list of changes in order, with the first “change” being the action of creating the project itself. (That first change, indexed as step zero, cannot be undone.) Here is a sample history with 3 changes: ``` 0. Create project @@ -387,22 +414,77 @@ When you click on Undo / Redo in the sidebar of a The current state of the project is highlighted with a dark blue background. If you move back and forth on the timeline you will see the current state become highlighted, while the actions that came after that state will be grayed out. -To revert your data back to an earlier state, simply click on the last action in the timeline you want to keep. In the example above, if we keep the removal of 7 rows but revert everything we did after that, then click on Remove 7 rows. The last 2 changes will be undone, in order to bring the project back to state #1. +To revert your data back to an earlier state, simply click on the last action in the timeline you want to keep. In the example above, if we keep the removal of 7 rows but revert everything we did after that, then click on “Remove 7 rows.” The last 2 changes will be undone, in order to bring the project back to state #1. In this example, changes #2 and #3 will now be grayed out. You can redo a change by clicking on it in the history - everything up to and including it will be redone. If you have moved back one or more states, and then you perform a new operation on your data, the later actions (everything that’s greyed out) will be erased and cannot be re-applied. -The Undo/Redo tab will show you which step you’re on, and if you’re about to risk erasing work - by saying something like "4/5" or "1/7" at the end. +The Undo/Redo tab will show you which step you’re on, and if you’re about to risk erasing work - by saying something like “4/5" or “1/7” at the end. -##### Reusing operations +#### Reusing operations Operations that you perform in OpenRefine can be reused. For example, a formula you wrote inside one project can be copied and applied to another project later. -To reuse one or more operations, you first extract it from the project where it was first applied. Click to the Undo/Redo tab and click Extract…. This brings up a box that lists all operations up to the current state (it does not show undone operations). Select the operation or operations you want to extract using the checkboxes on the left, and they will be encoded as JSON on the right. Copy that JSON off to the clipboard. +To reuse one or more operations, you first extract it from the project where it was first applied. Click to the Undo/Redo tab and click Extract…. This brings up a box that lists all operations up to the current state (it does not show undone operations). Select the operation or operations you want to extract using the checkboxes on the left, and they will be encoded as JSON on the right. Copy that JSON off to the clipboard. -Move to the second project, go to the Undo/Redo tab, click Apply… and paste in that JSON. +Move to the second project, go to the Undo/Redo tab, click Apply… and paste in that JSON. Not all operations can be extracted. Edits to a single cell, for example, can’t be replicated. -### Common extension buttons +## Advanced OpenRefine uses + +### Running as a server + +:::caution +Please note that if your machine has an external IP (is exposed to the Internet), you should not do this, or should protect it behind a proxy or firewall, such as nginx. Proceed at your own risk. +::: + +By default (and for security reasons), OpenRefine only listens to TCP requests coming from localhost (127.0.0.1) on port 3333. If you want to share your OpenRefine instance with colleagues and respond to TCP requests to any IP address of the machine, start it from the command line like this: +``` +./refine -i 0.0.0.0 +``` + +or set this option in `refine.ini`: +``` +REFINE_HOST=0.0.0.0 +``` + +or set this JVM option: +``` +-Drefine.host=0.0.0.0 +``` + +On Mac, you can add a specific entry to the `Info.plist` file located within the app bundle (`/Applications/OpenRefine.app/Contents/Info.plist`): +``` +JVMOptions + + + -Drefine.host=0.0.0.0 + … + +``` + +:::caution +OpenRefine has no built-in security or version control for multi-user scenarios. OpenRefine has a single data model that is not shared, so there is a risk of data operations being overwritten by other users. Care must be taken by users. +::: + +### Automating OpenRefine + +Some users may wish to employ OpenRefine for batch processing as part of a larger automated pipeline. Not all OpenRefine features can work without human supervision and advancement (such as clustering), but many data transformation tasks can be automated. + +:::info +The following are all third-party extensions and code; the OpenRefine team does not maintain them and cannot guarantee that any of them work. +::: + + +Some examples: + +* This project allows OpenRefine to be run from the command line using [operations saved in a JSON file](running#reusing-operations): [OpenRefine batch processing](https://github.com/opencultureconsulting/openrefine-batch) +* A Python project for applying a JSON file of operations to a data file, outputting the new file, and deleting the temporary project, written by David Huynh and Max Ogden: [Python client library for Google Refine](https://github.com/maxogden/refine-python) +* And the same in Ruby: [Refine-Ruby](https://github.com/maxogden/refine-ruby) +* Another Python client library, by Paul Makepeace: [OpenRefine Python Client Library](https://github.com/PaulMakepeace/refine-client-py) + +To look for other instances, search our Google Groups [for users](https://groups.google.com/g/openrefine and [for developers](https://groups.google.com/g/openrefine-dev), where [these projects were originally posted](https://groups.google.com/g/openrefine/c/GfS1bfCBJow/m/qWYOZo3PKe4J). + + diff --git a/docs/docs/manual/starting.md b/docs/docs/manual/starting.md index 5e689fd1b..642b18f85 100644 --- a/docs/docs/manual/starting.md +++ b/docs/docs/manual/starting.md @@ -10,13 +10,13 @@ An OpenRefine project is started by importing in some existing data - OpenRefine No matter where your data comes from, OpenRefine doesn’t modify your original data source. It copies all the information from your input, creates its own project file, and stores it in your [workspace directory](installing#set-where-data-is-stored). -The data and all of your edits are automatically saved inside the project file. When you’re finished modifying the data, you can export it back out into the file format of your choice. +The data and all of your edits are [automatically saved](#autosaving) inside the project file. When you’re finished modifying the data, you can [export it back out](exporting) into the file format of your choice. -You can also receive and open other people’s projects, or send them yours, by exporting a project archive and importing it. +You can also receive and open other people’s projects, or send them yours, by [exporting a project archive](exporting#export-a-project) and [importing it](#import-a-project). ## Create project by importing data -When you start OpenRefine, you’ll be taken to the “Create Project” screen. You’ll see on the left side of the screen that your options are to: +When you start OpenRefine, you’ll be taken to the Create Project screen. You’ll see on the left side of the screen that your options are to: * import data from a file on your computer * import data from a link to the web @@ -49,6 +49,7 @@ If you supply two or more files for one project, the files’ rows will be loade |berries.csv||9|Mulberry|Greece| |berries.csv||2|Blueberry|Canada| +You cannot combine two datasets into one project by appending data within rows. You can, however, combine two projects later using functions such as [cross()](grelfunctions/#crosscell-s-projectname-s-columnname). For whichever method you choose, when you click Next >> you will be given a preview and a chance to configure the way OpenRefine interprets the file. @@ -94,16 +95,17 @@ If your connection is successful, you will see a Query Editor where you can run You have two ways to load in data from Google Sheets: * A link to an accessible Google Sheet (that is, one with link-sharing turned on) -* Selecting a Google Sheet in your Google Drive - +* Selecting a Google Sheet in your Google Drive. #### Google Sheet by URL You can import data from any Google Sheet that has link-sharing turned on. Paste in a URL that looks something like -```https://docs.google.com/spreadsheets/………/edit?usp=sharing``` +``` +https://docs.google.com/spreadsheets/………/edit?usp=sharing +``` -This will only work with Sheets, not with any other Google Drive file that might have an available link, including `.xls` and other valid files that are hosted in Google Drive. These links will also not work [by URL](#web-addresses-urls), so you need to download the files to your computer. +This will only work with Sheets, not with any other Google Drive file that might have an available link, including `.xls` and other valid files that are hosted in Google Drive. These links will not work when attempting to start a project [by URL](#web-addresses-urls) either, so you need to download those files to your computer. #### Google Sheet from Drive @@ -130,6 +132,10 @@ If you imported a spreadsheet with multiple worksheets, they will be listed alon Note that OpenRefine does not preserve any formatting, such as cell or text colour, that my have been in the original data file. +:::info +Look for character encoding issues at this stage. You may want to manually select an encoding, such as UTF-8, UTF-16, or ASCII, if OpenRefine does not display some characters correctly in the preview. Once your project is created, you can specify another encoding for specific columns using the [reinterpret() function](grelfunctions#reinterprets-s-encoder). +::: + You should create a project name at this stage. You can also supply tags to keep your projects organized. When you’re happy with the preview, click Create Project. @@ -137,12 +143,10 @@ You should create a project name at this stage. You can also supply tags to keep Because OpenRefine only runs locally on your computer, you can’t have a project accessible to more than one person at the same time. -The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting) and import them to new computers of your own, such as for working on the same project from the office and from home. +The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting#export-a-project) and import them to new computers of your own, such as for working on the same project from the office and from home. An exported project will include all of the [history](running#history-undoredo), so you can see (and undo) all the changes from the previous user. It is essentially a point-in-time snapshot of their work. OpenRefine only exports projects as `.tar.gz` files at this time. -### Instructions - Once someone has sent you a project archive file from their computer, you can save it anywhere, including your Downloads folder. In the left-hand menu of the home screen, click Import Project. Click Browse… and navigate to wherever you saved the file you were sent (for example, your Downloads folder). @@ -161,13 +165,13 @@ You can access all of your created projects by clicking on when you use Clipboard importing. Project names don’t have to be unique, so OpenRefine will create many projects with the same name unless you intervene. +You may have multiple projects from the same dataset, or multiple versions from sharing a project with another person. OpenRefine automatically generates a project name from the imported file, or “clipboard” when you use Clipboard importing. Project names don’t have to be unique, so OpenRefine will create many projects with the same name unless you intervene. You can name a project when you create it or import it, and you can rename a project by opening it and clicking on the project name at the top of the screen. ### Autosaving -OpenRefine saves all of your actions (everything you can see in the Undo/Redo panel). That includes flagging and starring rows. +OpenRefine [saves all of your actions](running#history-undoredo) (everything you can see in the Undo/Redo panel). That includes flagging and starring rows. It doesn’t, however, save your facets, filters, or any kind of view you may have in place while you work. This includes the number of rows showing, whether you are showing your data as rows or records, and any sorting or column collapsing you may have done. A good rule of thumb is: if it’s not showing in Undo/Redo, you will lose it when you leave the project workspace. @@ -181,4 +185,4 @@ Go to Open Project and find the project you want ### Project files -You can find all of your raw project files in your work directory. They will be named according to the unique Project ID that OpenRefine has assigned them, which you can find on the Open Project screen, under the “About” link for each project. \ No newline at end of file +You can find all of your raw project files in your work directory. They will be named according to the unique “Project ID” that OpenRefine has assigned them, which you can find on the Open Project screen, under the “About” link for each project. \ No newline at end of file diff --git a/docs/docs/manual/transforming.md b/docs/docs/manual/transforming.md index be808ea94..e77d1a2d3 100644 --- a/docs/docs/manual/transforming.md +++ b/docs/docs/manual/transforming.md @@ -8,7 +8,7 @@ sidebar_label: Overview OpenRefine gives you powerful ways to clean, correct, codify, and extend your data. Without ever needing to type inside a single cell, you can automatically fix typos, convert things to the right format, and add structured categories from trusted sources. -The following ways to improve data are organized by their appearance in the menu options in OpenRefine. You can: +This section of ways to improve data are organized by their appearance in the menu options in OpenRefine. You can: * change the order of rows or columns * edit cell contents within a particular column @@ -16,7 +16,7 @@ The following ways to improve data are organized by their appearance in the menu * transform rows into columns, and columns into rows * split or join columns * add new columns based on existing data or through reconciliation -* convert your rows of data into multi-row records +* convert your rows of data into multi-row records. ## Edit rows diff --git a/docs/docs/manual/troubleshooting.md b/docs/docs/manual/troubleshooting.md index 49e9e7403..9877c6673 100644 --- a/docs/docs/manual/troubleshooting.md +++ b/docs/docs/manual/troubleshooting.md @@ -4,32 +4,26 @@ title: Troubleshooting sidebar_label: Troubleshooting --- -## Frequently Asked Questions +## Frequently asked questions We collect and share FAQs and responses on Github at [https://github.com/OpenRefine/OpenRefine/wiki/FAQ](https://github.com/OpenRefine/OpenRefine/wiki/FAQ). If you don’t find your problem and solution there, continue on to the resources in the Community section to see more conversations and look for solutions. - ## Community If you’re having a problem: - - - * Search the [User forum](https://groups.google.com/g/openrefine) to see if the problem is already reported * Read [Github issues](https://github.com/OpenRefine/OpenRefine/issues) to see if the problem is already reported * Read [Stack Overflow](https://stackoverflow.com/questions/tagged/openrefine) to see if the problem is already reported * Check [Twitter](https://twitter.com/search?f=tweets&vertical=default&q=OpenRefine%20OR%20%22Open%20Refine%22%20OR%20%23OpenRefine&src=typd) to see if others are discussing the problem * Report an issue: * First as a new thread in the User forum - * Then, if you wish, you can create a Github issue + * Then, if you wish, you can create a Github issue. If you want to contribute: - - - -* [We have a guide to contributing here.](https://github.com/OpenRefine/OpenRefine/blob/master/CONTRIBUTING.md) -* Contribute your feature requests in the User forum or as Github issues -* Share with us your successes and use cases in the User forum -* Add your blog posts, guides, tips, tricks, tutorials to our list -* Respond to our biennial user survey -* Join the User Forum and/or [Developer Forum](https://groups.google.com/g/openrefine-dev) \ No newline at end of file +* [Help us translate the tool into more languages](https://docs.openrefine.org/technical-reference/translating), using Weblate +* [We have a guide to contributing](https://docs.openrefine.org/technical-reference/contributing) in the Technical Reference section +* Contribute your feature requests in the [User forum](https://groups.google.com/g/openrefine) or as [Github issues](https://github.com/OpenRefine/OpenRefine/issues) +* Join the User Forum and/or the [Developer Forum](https://groups.google.com/g/openrefine-dev) +* Share your successes and use cases with us, in the User forum +* Add your [blog posts, guides, tips, tricks, tutorials to our list](https://github.com/OpenRefine/OpenRefine/wiki/External-Resources) +* Keep an eye out for and respond to our biennial user survey. \ No newline at end of file diff --git a/docs/docs/manual/wikidata.md b/docs/docs/manual/wikidata.md index 6c7a9094b..76c6d75e0 100644 --- a/docs/docs/manual/wikidata.md +++ b/docs/docs/manual/wikidata.md @@ -8,7 +8,7 @@ sidebar_label: Wikidata OpenRefine provides powerful ways to both pull data from Wikidata and add data to it. -OpenRefine’s connections to Wikidata is supplied by an extension that is available by default in OpenRefine. The Wikidata extension can be removed manually by navigating to your OpenRefine installation folder, and then looking inside `webapp/extensions/` and deleting the `wikidata` folder inside. +OpenRefine’s connections to Wikidata were formerly an optional extension, but are now installed automatically with the downloadable package. The Wikidata extension can be removed manually by navigating to your OpenRefine installation folder, and then looking inside `webapp/extensions/` and deleting the `wikidata` folder inside. You do not need a Wikidata account to reconcile your local OpenRefine project to Wikidata. If you wish to [upload your cleaned dataset to Wikidata](#editing-wikidata-with-openrefine), you will need an [autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users) account, and you must [authorize OpenRefine with that account](#manage-wikidata-account). @@ -180,10 +180,4 @@ The best resource is the [Quality assurance page](https://www.wikidata.org/wiki/ OpenRefine will analyze your schema and make suggestions. It does not check for conflicts in your proposed edits, or tell you about redundancies. -One of the most common suggestions is to attach [a reference to your edits](https://www.wikidata.org/wiki/Help:Sources) - a citation for where the information can be found. This can be a book or newspaper citation, a URL to an online page, a reference to a physical source in an archival or special collection, or another source. If the source is itself an item on Wikidata, use the relationship [stated in (P248)](https://www.wikidata.org/wiki/Property:P248); otherwise, use [reference URL (P854)](https://www.wikidata.org/wiki/Property:P854) to identify an external source. - -## Wikibases - -Much of the above is also true of other Wikibase instances. You can reconcile your dataset against an available Wikibase reconciliation API. - -Wikibase administrators can configure a reconciliation API using the [instructions here](https://openrefine-wikibase.readthedocs.io/en/latest/index.html). +One of the most common suggestions is to attach [a reference to your edits](https://www.wikidata.org/wiki/Help:Sources) - a citation for where the information can be found. This can be a book or newspaper citation, a URL to an online page, a reference to a physical source in an archival or special collection, or another source. If the source is itself an item on Wikidata, use the relationship [stated in (P248)](https://www.wikidata.org/wiki/Property:P248); otherwise, use [reference URL (P854)](https://www.wikidata.org/wiki/Property:P854) to identify an external source. \ No newline at end of file diff --git a/docs/sidebars.js b/docs/sidebars.js index f4690102f..61839a88b 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -23,7 +23,6 @@ module.exports = { items: ['manual/expressions', 'manual/grelfunctions'], }, 'manual/exporting', - 'manual/glossary', 'manual/troubleshooting' ], 'Technical Reference': [ From dbf6434a2ca15ac4ac9a926518f6b20617dc7678 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 2 Dec 2020 14:01:58 -0500 Subject: [PATCH 02/21] Delete expression-editor.PNG --- docs/static/img/expression-editor.PNG | Bin 16230 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 docs/static/img/expression-editor.PNG diff --git a/docs/static/img/expression-editor.PNG b/docs/static/img/expression-editor.PNG deleted file mode 100644 index 51a1674c370881d6cd458b2836e53967a8719bf8..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 16230 zcmeHud00~E`?pO~%}k5aWLcBxRAYr}mboj_K5A-dX0BA0RxV_yxF9q&W@*Wk3n?m7 ziB>M;ii$!Tk{ME(D+1+4h(L;f$o3wz-^_e}`}=#p^Iq>CFW2RDJkL4jx%ba~pXc1q z^~0VAmw)s9H##~x%Uyro=dGjj1r_-F{;S2n$jkfNhk(NZnD@awI+Yy;JmBQZ@ZBD} zb#$uGOO+=V0q0*|_}L$(qqE|<_HO|V`rx#Vj*G%|-|nLcAp%xA+L{-{b z!)@cHmBzoF2>oTn(lu*duIGH~ZT>s{){Wl+f_^;247Yj^Z-zL%@pLHZ=Gjn;)6bhv zw$xhE!^Zkdu6_vneR9#`?L;T}z5}OSCNs8f+H6v0vU!J#5syzC%J(`Db{fHXFwyIW zQ;ux4-K70RWO|!vk7elp;>aNDX&@Te&1=B#C3UOibB_T0%u)?(v-atWRY%hI@7uTU zwB%nO6Ay#bHAIaMaB<)1s_Uv_pfxL3uDsQ4EDgjT*A5=T^g(k@0b?uugHKU>=ckGL zJ8qQ9-6s8z$>L!@KWv!;Sd(~F4H;BrVe(9lzUV`$PW~K3}Xue z5oB0^zX3a)D^oWNC}&WLnxZodS-ujjk@}E^r5zl$68-EVAP>!Lg2*_QF4HG%x!-=- z8{wTjRCD?|er9|eV?*%?D$&ch_Hxu%$_zK5qUz;u9O3$GoF)}w0=blYUk|5;pE2_f z1Faa=ia&CBl|mrS9#Tvwcfv?*1+!nwL{7cf}16 zavkxdr9ZupNcB~MSiUeZsEA(HIiUG)r#KcVDBQ}4M%{?#8EO(D>)U9LJt7bw>Qtv8 zB&Ki)1=$sXl_t=X`y(qv=q>#---NQkNd7KC(S5qu4b4&x9^$r(6Js=#EO2kxvj}}! zkr5hkNI|&R+v*Z3&x>S20NI_KNz%m)|K8{2wNM;@eKvf$+zN zu;p^qqz_Zo%eSRH*gn==n+c+_PNTE%{m~!g=pi~_yrLF^&EAfFe#($R_A;gFiBP`S)V+I!lHAjID`^q5AkuNKN^ z@tl6>caOM7){2pc{s`NkbZlrI=AE#4tR*Hvot84nh$6B-$VI)p*X@RcT|Lk0opA?& z80o*RlJeYRs+PrC-jh>JERV9{+?mlnh=Tb1rE$rc#M##0f=BolSC%Ra!Igd{tFMT+ z2q{_Q;T=c6913o*lRB`5@2tM%{Yd3BEphx%=Q^qGV?-E7!V_*1MH$ar+(SVvC3xC~ z{9Xgbl=^wSA||JX>Bmr*ar_v7i~1h~j5XzrXm}|-jrwK6)F4ex%JAb*EAu#hNwZ5m z&J`xyybPqPmx0Tnq^6Zgm z!3sEdYC=tj^a*S@gst+2cz+8+@8k&CmF6i6YfhR6H%>!8Pl| zk9_#Eh^!%16|UFq9BBr7IkI0JgKN@5H~9=|0y2{=OAJD~hWY8&UK#^&R%%YYqEC3f zzYMm!*pt#!cvNO~M5&X-6P47XPg5}j$) zJ2US*TN|+d_o7}FAJ9G^D)&PfQuH()26-^oJE5V-~+q~KYj(2p$}f#}EuqT{h1 zMmYBCgk_|h`XJaLNmDqK@D}-r;xpKCw!~VWzh!boCo+-WnEI?XculWk1kZj-q$iT) z+?EnK@egk3Wx`AC&M@_R^QLA~W1@GC*8wcB%4Yx;+L6o#v*smk+_N=;!wW3vOfVs; z(Ro&+N0Xi6TXLLn>*J*88n2TYAOPM10pKbEpb+@IwyNToP_?y4+ahJwU4|UG{Q-jV>P~|uNA#3>Z zin!ZH$Ezjdo$>IRLO8Lo^7~-YU|sREKELTq=VP2Z0bP?-OG?TF`FjQqanZQEYGJ$( zIhaMu67A$L>@4D+Sh;My)CjoiYkY#KM`?U6gMjp;ykd0e)s@uZPhonzy5zXvD{!!5 zYGdj4+}0MH%iUjFd4pAPWxP(Se1{^0ksR3?^QRTKmefGFE%WlR6ir#IE#$d2|1WSZ>PNvBJHN>#WY#HPE$w_)XXUr?j!jfvvmp~xoF z)vXPblAo@g>4w9^L;X(C$t~kT7$>1L`;A@U`T$#_@LhSF4~5%{8Y0g+Ise5BPHvOO zNT;`uz6(&sA?wiPFMStTm9Oe%T0>Ua>Ak4vsg4AnEGJnBdr0IKRwlvU1`v#ER0_Mo z7fKV?GJ>UdR<-)9v+LR1@TcDa(ghjlJHB6wTlHj68#J&Tf5TaIY}Klj_G>QLYm39h zIw#_^F|$DDc%im{)X_P%6(}j!>Rjsn=Uq;`2$}5!3b>gJN`w5Zk?Jj7SNTNuD5`{K zjpKgUl@O3=`D98fulTfD#nGgQO(|2RSe&XK$5N_A8ui$E&P?kVL?eNefxBY!ak&~M zMHSex2u+XpRBHgaG@1@bGS;S?veZnC^&$nmHX1ojbQjy8z zKQKT5t7$CLuR=i7Qy5#YcqiD$E6N7iJlQ(76*aZB3}RjiklS2=>46j}=mJUiSjlv2 z1)Eb!E?pu8??EiTY9`xdhwC=t985&RS*&-xzw()ab=!A_;>4i_*it*P*L40f8Um zIO9dLdyM*f1K3o=<7XLy8bgkK((PVlG;G-6u~?4=PvwG^0l~El-=NupLM1yhNDB{1 zpM#apu|y7%G;7aVXOS~x4VDAiV$#vdm*}Hla8s!MFIg7fi7lw|hzqaV2v5JUtY;)4 zxF10w;?-nyl0Es;2vFguy`5sckF}4JzLX>pA8zY~R5chjr~D?|u18M4C55HbH1_9a zoJST!_w+-tEc~{L_H7TRZ94t7oZ9H=D$bfV+Oi9*`v~>?O=6)5z88%1Yf`+cX%?y8 z9KrNppEpQFZ_gX`CztG!M&*|T68w<3H18l$=!}!y&$NW{u8=F&X;ld!k^;(pXMU?H z*D$>E&Q<5O$*7x_%G3wYiEhPd#}sUfjpS;#-^yp4M~|?&C)v>)xwwO!e2`7*J~~LQ zp{yi8`SK_ug@)ycPEqAgvcuZD?8K5`ScgPlOGk?yc8DXTsARh^d6I0#iJ=o}8WU;K)=jyEYfy`p)sPiCm9K-NCR%`2JeC4{k}@NnscI!enCdNbQTzVH_I}~Xv!wI=^*3sFTm%>D zViKxlH*+@fG_^md)>mjIjWkC*O21I90#B~O0vIDAs3X6L2`K7WFGF2?O!{a|m{c!J zdc`;LNCd~mVgwoc2ch9NLfJTys&sPjOzN5PneV3)n|n`;#5g5IdRJ#eMxSL3A&u-x zRvId1j)@9;W_DZ=wF?#7A*i5!mK*K&fh__{M|ELnh#h-!CN6iB8Lqf?Rcg;$Yk=)I zgd_+{NCZDd{6fWpqg|OMf>p%M*8iEB8P}ZZWH*ZrcOC70&kU&2$SFu)sp&&GShd;vW zdw=lc4uhM7m+r=)1;iQwk2`zckmH%xw(MrhCzsb5pQ2f{Q322y=U8}6Fp`*3-ow2> z_F!0L46hHcIX&TGx|jC8FE;EfWpC)m=8$uwS*t0@Rp}9x)6mQT-4sHsAt^k_CzJag zd4he4;=C8ceq>SdVd0~EOp4thhquqZ#uAca!Tn48cjdLdtufCk*60(I8i8pFVTWqT zK_0`pa15U!98tyRNm@!)Q3ag#-w#xkGZ$nH+=YzLOv7Fil=2dfp&Bmc;OdJhzvsA4-`T z)f=@eT47b@ZP?n23~{D8KzjT|Ggfv^+|aJZUsSDD1}Pnq={Mq+r3Z~T+io-lz+WWe z>lmFg*fXm#xo^Z*@-kFXCwxP-<3?S|j&HWU9a&}0fyBoc*5J;wuSF&PHXSiMo$_kh z=zF~83$QA$gr;h8@kwdW4=HtH@FA##QhA#0bkNB}j^gfp@^%AN@75LQsz7e^$RcCX zr=#PlY~0J=OHs?W7yxA{pd3h>5)jFeTroLHP#xp+rLog)&c->adt7NY;1$Chk_Tw8 z0={0lb4o&&`6qc~Zn&a)MJu^N21L;-9AM~)WlHc>B3HM%ob7F^oz7?NJ*c=<|2UfUze+0({$ zh}B=*p=c|HOZ(Svqx#0UQw;rjiuR}_SDq45b!znfw$lHC4WW4ocs;$pxK*3>7k9KS zT9$Cg#pli@ZQWwGUVr<0EBT?5=~~|2M+QB)L>t+e3ewg%ztqGmSZJ8V&?~v7lFHpD zRf4hE@$)d>*YvTpJS--42WHb z=y&yv>MHn|=lLqWUm*0e^3Bh3u7Cq6yU|lD&LYaOkqLvdz4oMEvl^iO1}NuxvLw2U zTOBC$yQ@c)9etN%8J$Jpq_IQg$>K5Tg(W*&LJEY*&Bah$&M&ZvgOgtzb3R~almLVwuLpRY}?14y;Z-j$n90QOoS1mcLwJOdI)LNt87!&qWyd1Ctey7W`81J zxQPjli6(X=N&D*M&*b{*aet1Hk7I4_4H6_*VNP6BTDRUiWsA7EY#v-TGEQrq(s>=R4V<=A;b>Fz7?FTX zC`u7M?XbIA6wax(MomN%dB9tDT_lZji)iZy!tFc-!>fzqhA%%Bb=ce3wf+b(TyDbpEQc*`fe4QoXUEi|pfcp~}cR<>hoa*0NpJ-;a`>yzuJy zam-!WfjTqFlN$(hY6d%+G~2vxAo}BE$pox;Xi@pjt3TejFRQASnCP;i%W9Bh&y%W_ z$K^>b{oy@(Y2ez(3%DbLUWX~UwpS42tv4jjE0`H zP=QJpY%yQ|TMvtpSr^JZ&gWLl7-qHyUdB5B{m+|%2+e}cHvoJtyED(ST; zY3GR(+Y3FP%znwAaMb&VNK_tv2|k@^5n;GFdE$IEw57ss`AvZTh8cnI<2w=ksqDC- zrI94=J%q{aW-fPTzjs;EQMVvP|M7T?A3fpWDUbPN+Bo7T^-z#?C|^OUK!S=L!Q!k_a8j{A9ld$M1Px^>VqnduLx- zuFU;R!h>(Ox>@F6MHlkh%sG z?g}>meRt_`msS%zF)^T~~X=<^;;lS^`30HT<$PE!Q|japPH1TqRU< zRyH7lOjOLmK8;yK@3hW5z)6^u?y?3mQG5gtR|OqZnp=P2!eNMbdm&`!P~$CgW+61~ zj0#QMRm+$$NP>ve-wmKLbI`00&Or`HUL$_FTl+Jvx9GT6>6~R5kaVwOn4t;Sdb<+D zEk`;}5xXETy);1+RGnA|_0216+rN`F{YrEJY=Z*bM)+I<)60F3LUi-n-$_0(4vWH-J@u#l!F{ixZnRq1UZ7 z{huyQ;9B%FeQ4DkU@0`{xXa8_-5$TB<5&Bj4Da*>m)dVTEHrbSTeZAU{chdj78R9F zL-L6>Mv5V(a6J2Sl=%iatZ#x#sP82i6f!dEXW!^WJC#^c}yhS2eJn z$+I2kmI|WQ5yraFm{*B4g34l4l=#3&h=A5t!iLsS5G_fAg@m+;QH3r8p6tY)IqnSh zwB=6_m!sG9o~iAFVj$_FXd6F2Ki^8Ih@aPf5m*v%Qcu{2um-z#E;!+$<$5)#sR4hl zKf4E^mZE%O24($^ev{_3)JAV30J7F!gw{#(iZ_Lbh!%GgQ~BTn;iS%lSEG)qLCCcl znwQVR!^GBn=K}>6nH!CuWD9M`*u&F_)X8%Ej5;1B{*C69YP*_677)b*0@Rj%`{?hI zQ^1;P+|}I`&+^<(1zU)t1IJ#vUoG;BadI9m zTNc^=BT$ak>Dj`6mcdn->5Hla#3^pxz&6}@CfMOUjtM({^zxkCpS zQW?<}E1eJ|)Sc1^(~_(j#Pi+-h~;t?+ie?h1L@QtqHZGus)@Tk%T)yrcG~6h8q6M! zZ7{qqk)#$k!5pnyD>_`NRm*N}UFWa`Ke;1dg2<)#7_>*4JWb@0D9ph#8KUSF#zefx zb21(op*rj$imqhhcdq88=?!zusJiAtolpC;dfs+{XSxK6u)=cux3cp#{D!Ld{dT=a zSAQ_?^n+j50BOsJ=HrDj&vg9gM0ous;dV!T|Jtk4_9KAI6g?=PC6LhT#|5!&=bN)g z^x>Q#AsKFeBm6i=>KtltPjBJl6M(OotZq0T(iAc9+AGzP#2)}P<(+4i|GuV&g%K(} zBMc6@q@2ce!>`HO;qXYTW&(~0>*P3BiCNH2U( z3fXNb&ECx{4SnpTXCpgzXlQ}XC9!Ma$-`d1Fjb~2s`RcB)=oeCX6Q+i zWU9w+{bKizmCc6r_J>Ll6B!UDhWY3RqT8p+W}#-`v&^K#oXGMk!w)4JvK|XRp(YOM zDFjY#F~Octclh-{_cZ>O69JQ9lK9c4ep2LUT5aQ_{r`0 zPZnF<3m4@6I0rd){^QFY11<@C;LyhZCF4JW`>)7x!gX#j^->nTL&shF6=s=Efmz(* zKRn zP;w9aG1X5;AS*ji(Z&CnZzv}4Ggpk1C%8|wogVUE06&xVy8~a-3IhM8uWB8S^ia~& zg70*6TCc0(IGwNPiRB_QL>9=muQonc^Edn0}( z8U~B2YaAa*f+a%;Qy+!4#9p3NbUhl0MEj;|WEwSljH1K{JEr6*{Bo)zM^1Uf<&03& zy&4}&dxCZD^guz^9U0@dIto!I!KI250uUv$j&&dK$;LfeOCai@Q=<}hZbGL}{+Kfi~UIRVv3IdNmoL}ld#S6(A^IE&-k zfrAx9k1)rkR3#}TU@XlEGd&Yi;Cc?ZQ;I+aS#HRfZnEVUDkMwZUj%aJQ+xyHL6ve0 zQhJS<$|GSPH3&u4#FEgr8`yr$5iO>JESf|1$T>rjp3E3hdJE|(zJ2ta_bu1i^uqyL z&$$J6mNU&vqp(t@8MxHCa!Xb5&6JLFEo5#1n5MKZz1U~K{1HL~Wu7zQzH>?=ThA8x z)ue2uGfKM{=Tda+d?$z)-*m@yCQkxWH#M0O=SDJxb;D>gn*!mY(=1?hiLX#uP_U+6$*F}(xev1I@OBR42X&xqsK zf*kPk{jDqh9PZ&);q9G|bT4#x7Zmfhr0lOhI+EY%GY}qfsgnU~+1q!+S>>N(NN;w+ z$0h7>a@UiPhy)4mB}nFaMex%%+xzNna%GR+_%N$D9=YuW*P%AF41 zmC16cghKazC=w3h8*aI|bCU3a&e(hokr5VRvgceO>=iEXou3fPLrl{nqavKH|CEA)?=e^!s4?(H6b0Fzo{hKjW0Zri-XKw?#4b0pg ze)M#`u5C29mz^%m{_hS(XTe+e!Blyo+cE!iidsU@_EWq(1UvNbV@dIz!Q$FxVA8ps zzM5B7_Yj+ zIB{xHVJJv7zSF}No;XC|6A) zE*_(^_S#DWK-CkOH@yo$8VEMAmIZ+gnA{#T-d>ZXOfPz`E1%%|?kroJ#A!V3@ssNH89>r(ehg7`E zzs=^uoO<)m;71hOj5EvnF@PVq15QxwXyX=B68p&=Z^!<+veV?1q;R;_!}m9zb)E{> z{dR#pk&ZifLxVvq$G5nS%Y>1%x`mOKZ-VNQ)49VR0w5{Cjss>P+8M7%Fk) zxBfnVOo^B%`5hxX3|QWnmi8c>%SRWA>K=N zcw5}~cvQcCY^X=sWJ^)#5lua87ySS!LUtH!j<&qWJI5Bw6CaPjDrz~_ghUG-7h$kz z6YTY$I%vn+;X%(ls`nV=iYQ@6I3CECep~ac^*@XWJ+op@I^H-#bW-&XO0Mg_7#k|; zR6eqU6m^IzIn0=UnTnp?%5Gc-jF_NA?P`wsGn8=064c zBaZqvtjF8=ILv4xZ3o)G$AlDmUoYHk2}IipJ9aD7?F>9Jd*~4F{ajExUHrq-5ZVse zyn}x_$-e~aym7~<8TxZ`c72|6q0WVsbCUs`5f*#q{1pJ2{oCezY=KV0S98w+<(dB7 zIj$G$yx%mhe;u8mU%j<_0=*O&hvyjdx`0Aa8k?FPm3D+%3}15Zg;&}ySupl+?#ZGb zg1_Wm*jUhR_r6n77}2r&QliNmm4gi}Efp4*Y;A4BTfXf$V{m~t+p&xL!;|KmIXiD2 zWU+ivMGC(X*t#V_O=K9|{;TcKIkqOhgf)KK^uu%UN5|fRD>~w%^IZI{_q+)}&UWwr z!Rg_qZ%a`5Ak|I}M%$uwMQPs49-IB|sNlb+&zQg04(QKz7~l$pl_68jV>?zH&loFc zx7S@TRyJo0!;O?rc%b>CX}b%t@y5cQO#sN*b2mOg%U|G={# z;a8127ct6?rWdB^h<)b_YxFiqkCAflEZ}LouH*X>|L}^Hba0zqm$G;4Tw)V!`d#MD zg$al%Jx4(JivQY&|9b}fGax#e6DId)qcSXiJ}Uo5fpj#|>px2I1&t&UGblJ%07Ic5 zmWd%+?d*4;bvEY00FC<17Y9KR7U}d} zT_A{xL$U@34a{2rGs?HkXXWJLj8Uz@<6N}1ycfH~?woL$EN0GRbJd4|wLT5TS)H%- zgju&L@wX?6=F|I-AxysFu4u=zF1{&sIbh7@&xv{A*hRS3HtuoddRzZOYRNY0;}vdJ zPpUuPS2x3d8$0PpLml6ViFuO$!h7ORzU{h{;zJ%6dALZocHHwfc9&@S?lD8ab;7({ z0QBttTvq@MUfKVI!P~yML;98>T40{9m4MRA@&BL-`;Y6jMKYNTTUJ(!n`vmtgb zHy_X{^0QViW3gEOpf9=FD?BGYfNy}``B;NOd_oJWP&(&Bxfj$+cD zwnod#b3Vok`m2Mvb3jhgvUl9=b4G&E=fdrtqp`7DjCt_1+c}FH47~N#Y_8#wa#u6* zq%uNlrnGsBG@P)2Umd)AnQJvqt@P*|HH&S*Uv?3Ex`2ADAn3-2#Cgn~6A`G1x8dhz z2O=%QdtRvjqrvJ|-pV2B(z*0_ZoU|eDgEdB&v$q`AWpSOaDY2I+w^b1cV5jvN{fn) zjvjv~)j$#P91d><8)MBdzYq_Tz z1b5+2GyrCx#%?@>Dk@;(u*!RPERa0KuZ*a_g7Caol*qYx z`6ESQT=@6jFZncc3goGEVc&m(DS`Fjf9oFq_w3t1CbI!N_H6{p7Oo3L8%6KcYO+J` zroYA*5WnM_?pW)$O>?!W?%xA`uz9j9Ox)*?k;!b6b=;t;N$Wy4${1B3%B7b06iXWnQYd&X5f69DR9*+ zN>ax0>|Xq7*ggC6z)@r){gX?A(F9_25oS^F+J`1$F%*V6xpoj-Y$Xt_BCoy5<;pwTuO zwUOjY9i2njO{d4ROMwoXe)mbNh7%@6sq2e$bS_=@#vCe~0-9W0l0oVYuqZzzX@_v_ zLY?i$#xo!8EwRXR7zJ8tQl!9Q&(>Q5DM@>HiXXJ4_$J!zrqJ=|2A?64%1y-?PsyrPEs}WkR8BgqTj}y2u*o;b28S z^~amGHyv+kyo-E5dl}kYfFrkoy$4yjLqv8`gt(i4j1#j&NVvE~lmfMK9rhUZ9HwM* zkG^^%)&p2=9C?9M6ys;aIE``)2Q)|9YJgtp>W1Q&(n&}s%mgn=Ob0T0&_%!io!$k= z{W#$Pi(TiG7-@kU$k_{+;g-90`%OrCSK&*12|*9j6MS)Q<-$J@cx1|3>7|+W%J6`I zEGkGN+qC;ps_OxgBY@(7rW~(!oEnTZ0x&;FAckQXvAJo93b`K^r#N&6yb;*qQyXm> z^BOG6n@VEbA`fTn*RHbMssr{PZMCFw>PBTuE(w7WWI><}xNqlHTza(ocbfH33ML0=OdLRr zj2^v4YG=>BY{!;iz}1PS9u&zfdi%p^0fs`BKk07AqMPH6Ywq$)srWd!D>`^BY*#C7 zUDrr#rkCy9s!zE&;LM32bq|p&d5n-eXYP=x>7~h-mS*`UKsc<#r6o~QpH0?u^*3!T z)+()HiYJ3s&bAhm23RVRs0wfk76QiTq7njU`DZLqGi7OlN26(i0R~Jq2yd^pTDF1a z@DHTMY$@``i1UF45M-1%r?VQ0Jnx4D*Qw;D1Yf+!hogKoR)-;`ep2B~$~BLtbA~#b zd>7LOe_WrO@5zi@s*Zs4tAm^|Q7e%G)Bft|!EcxX96X=xEDP)R3vq{c@?&ayOb`B5 z{doEPsWw?uWk^z#c(k(`H8V1n!n`~~_KHTSry{VJ%8-HP>CXCZDWW=h1QQHN611mA zm6Q_nXo-qtnT?jb#^%~YK|2%2=X$FOa(%l*OvDbAQ>_TmViJRsS4aPY8v&jVHiOL4 zgY!(ylau{skz#iyu1w;+yAtEuC8_R+@9OkRcjq5-rPje;`;S}MLasENcg932B4^zx z^3Hla&DfYbz1pczwS1!HAMCCHHWsKBipR$GS_Vh470IkKP~^7NFLa~Whq|E1vw^d0 zlDwtC>2>nYvu-ixSiwlBm>soIkZF&F+?3(zY3$dS?ykwkkuHji{T!AFGPrW2ef%Fh zU8EL@42!%XGftRf;jW3jua}qdJ7w3y5_5v@l9NPRs}sArw9MP|ubzrT(2slH_2}xJ z(H&O3NQT4X%7Di7a~`Nue&yKEBjyo?rRt^oBK%u|CueH!Yj<_M8~|3CvoJ$Y#PIs9 zW5Zb4QOJ_BQ$=)|c+?UZD!vlKMylggGDNd8YR{Ym$6VA!kS?voHzH^kgV8_dGn|G+Ut*-iW!M9x5wXIHDdZ?|Gw3tLej#>4EpTo8at7ux`4UESw)#Jx1o z?oXTFwf(njlhd8wn70X|_j!x9(Z4$4j?52?bY@{bY*EbfQr$D>Rhz13|BK@k;-Xly b@MZ1cpQ;K$2Eg8U9oPMy`zrUGxct8WTE!G4 From 699e80b94e8dc8303b18a3c72f85f6e7c542e112 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 2 Dec 2020 14:02:12 -0500 Subject: [PATCH 03/21] Create expression-editor.png --- docs/static/img/expression-editor.png | Bin 0 -> 16230 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 docs/static/img/expression-editor.png diff --git a/docs/static/img/expression-editor.png b/docs/static/img/expression-editor.png new file mode 100644 index 0000000000000000000000000000000000000000..51a1674c370881d6cd458b2836e53967a8719bf8 GIT binary patch literal 16230 zcmeHud00~E`?pO~%}k5aWLcBxRAYr}mboj_K5A-dX0BA0RxV_yxF9q&W@*Wk3n?m7 ziB>M;ii$!Tk{ME(D+1+4h(L;f$o3wz-^_e}`}=#p^Iq>CFW2RDJkL4jx%ba~pXc1q z^~0VAmw)s9H##~x%Uyro=dGjj1r_-F{;S2n$jkfNhk(NZnD@awI+Yy;JmBQZ@ZBD} zb#$uGOO+=V0q0*|_}L$(qqE|<_HO|V`rx#Vj*G%|-|nLcAp%xA+L{-{b z!)@cHmBzoF2>oTn(lu*duIGH~ZT>s{){Wl+f_^;247Yj^Z-zL%@pLHZ=Gjn;)6bhv zw$xhE!^Zkdu6_vneR9#`?L;T}z5}OSCNs8f+H6v0vU!J#5syzC%J(`Db{fHXFwyIW zQ;ux4-K70RWO|!vk7elp;>aNDX&@Te&1=B#C3UOibB_T0%u)?(v-atWRY%hI@7uTU zwB%nO6Ay#bHAIaMaB<)1s_Uv_pfxL3uDsQ4EDgjT*A5=T^g(k@0b?uugHKU>=ckGL zJ8qQ9-6s8z$>L!@KWv!;Sd(~F4H;BrVe(9lzUV`$PW~K3}Xue z5oB0^zX3a)D^oWNC}&WLnxZodS-ujjk@}E^r5zl$68-EVAP>!Lg2*_QF4HG%x!-=- z8{wTjRCD?|er9|eV?*%?D$&ch_Hxu%$_zK5qUz;u9O3$GoF)}w0=blYUk|5;pE2_f z1Faa=ia&CBl|mrS9#Tvwcfv?*1+!nwL{7cf}16 zavkxdr9ZupNcB~MSiUeZsEA(HIiUG)r#KcVDBQ}4M%{?#8EO(D>)U9LJt7bw>Qtv8 zB&Ki)1=$sXl_t=X`y(qv=q>#---NQkNd7KC(S5qu4b4&x9^$r(6Js=#EO2kxvj}}! zkr5hkNI|&R+v*Z3&x>S20NI_KNz%m)|K8{2wNM;@eKvf$+zN zu;p^qqz_Zo%eSRH*gn==n+c+_PNTE%{m~!g=pi~_yrLF^&EAfFe#($R_A;gFiBP`S)V+I!lHAjID`^q5AkuNKN^ z@tl6>caOM7){2pc{s`NkbZlrI=AE#4tR*Hvot84nh$6B-$VI)p*X@RcT|Lk0opA?& z80o*RlJeYRs+PrC-jh>JERV9{+?mlnh=Tb1rE$rc#M##0f=BolSC%Ra!Igd{tFMT+ z2q{_Q;T=c6913o*lRB`5@2tM%{Yd3BEphx%=Q^qGV?-E7!V_*1MH$ar+(SVvC3xC~ z{9Xgbl=^wSA||JX>Bmr*ar_v7i~1h~j5XzrXm}|-jrwK6)F4ex%JAb*EAu#hNwZ5m z&J`xyybPqPmx0Tnq^6Zgm z!3sEdYC=tj^a*S@gst+2cz+8+@8k&CmF6i6YfhR6H%>!8Pl| zk9_#Eh^!%16|UFq9BBr7IkI0JgKN@5H~9=|0y2{=OAJD~hWY8&UK#^&R%%YYqEC3f zzYMm!*pt#!cvNO~M5&X-6P47XPg5}j$) zJ2US*TN|+d_o7}FAJ9G^D)&PfQuH()26-^oJE5V-~+q~KYj(2p$}f#}EuqT{h1 zMmYBCgk_|h`XJaLNmDqK@D}-r;xpKCw!~VWzh!boCo+-WnEI?XculWk1kZj-q$iT) z+?EnK@egk3Wx`AC&M@_R^QLA~W1@GC*8wcB%4Yx;+L6o#v*smk+_N=;!wW3vOfVs; z(Ro&+N0Xi6TXLLn>*J*88n2TYAOPM10pKbEpb+@IwyNToP_?y4+ahJwU4|UG{Q-jV>P~|uNA#3>Z zin!ZH$Ezjdo$>IRLO8Lo^7~-YU|sREKELTq=VP2Z0bP?-OG?TF`FjQqanZQEYGJ$( zIhaMu67A$L>@4D+Sh;My)CjoiYkY#KM`?U6gMjp;ykd0e)s@uZPhonzy5zXvD{!!5 zYGdj4+}0MH%iUjFd4pAPWxP(Se1{^0ksR3?^QRTKmefGFE%WlR6ir#IE#$d2|1WSZ>PNvBJHN>#WY#HPE$w_)XXUr?j!jfvvmp~xoF z)vXPblAo@g>4w9^L;X(C$t~kT7$>1L`;A@U`T$#_@LhSF4~5%{8Y0g+Ise5BPHvOO zNT;`uz6(&sA?wiPFMStTm9Oe%T0>Ua>Ak4vsg4AnEGJnBdr0IKRwlvU1`v#ER0_Mo z7fKV?GJ>UdR<-)9v+LR1@TcDa(ghjlJHB6wTlHj68#J&Tf5TaIY}Klj_G>QLYm39h zIw#_^F|$DDc%im{)X_P%6(}j!>Rjsn=Uq;`2$}5!3b>gJN`w5Zk?Jj7SNTNuD5`{K zjpKgUl@O3=`D98fulTfD#nGgQO(|2RSe&XK$5N_A8ui$E&P?kVL?eNefxBY!ak&~M zMHSex2u+XpRBHgaG@1@bGS;S?veZnC^&$nmHX1ojbQjy8z zKQKT5t7$CLuR=i7Qy5#YcqiD$E6N7iJlQ(76*aZB3}RjiklS2=>46j}=mJUiSjlv2 z1)Eb!E?pu8??EiTY9`xdhwC=t985&RS*&-xzw()ab=!A_;>4i_*it*P*L40f8Um zIO9dLdyM*f1K3o=<7XLy8bgkK((PVlG;G-6u~?4=PvwG^0l~El-=NupLM1yhNDB{1 zpM#apu|y7%G;7aVXOS~x4VDAiV$#vdm*}Hla8s!MFIg7fi7lw|hzqaV2v5JUtY;)4 zxF10w;?-nyl0Es;2vFguy`5sckF}4JzLX>pA8zY~R5chjr~D?|u18M4C55HbH1_9a zoJST!_w+-tEc~{L_H7TRZ94t7oZ9H=D$bfV+Oi9*`v~>?O=6)5z88%1Yf`+cX%?y8 z9KrNppEpQFZ_gX`CztG!M&*|T68w<3H18l$=!}!y&$NW{u8=F&X;ld!k^;(pXMU?H z*D$>E&Q<5O$*7x_%G3wYiEhPd#}sUfjpS;#-^yp4M~|?&C)v>)xwwO!e2`7*J~~LQ zp{yi8`SK_ug@)ycPEqAgvcuZD?8K5`ScgPlOGk?yc8DXTsARh^d6I0#iJ=o}8WU;K)=jyEYfy`p)sPiCm9K-NCR%`2JeC4{k}@NnscI!enCdNbQTzVH_I}~Xv!wI=^*3sFTm%>D zViKxlH*+@fG_^md)>mjIjWkC*O21I90#B~O0vIDAs3X6L2`K7WFGF2?O!{a|m{c!J zdc`;LNCd~mVgwoc2ch9NLfJTys&sPjOzN5PneV3)n|n`;#5g5IdRJ#eMxSL3A&u-x zRvId1j)@9;W_DZ=wF?#7A*i5!mK*K&fh__{M|ELnh#h-!CN6iB8Lqf?Rcg;$Yk=)I zgd_+{NCZDd{6fWpqg|OMf>p%M*8iEB8P}ZZWH*ZrcOC70&kU&2$SFu)sp&&GShd;vW zdw=lc4uhM7m+r=)1;iQwk2`zckmH%xw(MrhCzsb5pQ2f{Q322y=U8}6Fp`*3-ow2> z_F!0L46hHcIX&TGx|jC8FE;EfWpC)m=8$uwS*t0@Rp}9x)6mQT-4sHsAt^k_CzJag zd4he4;=C8ceq>SdVd0~EOp4thhquqZ#uAca!Tn48cjdLdtufCk*60(I8i8pFVTWqT zK_0`pa15U!98tyRNm@!)Q3ag#-w#xkGZ$nH+=YzLOv7Fil=2dfp&Bmc;OdJhzvsA4-`T z)f=@eT47b@ZP?n23~{D8KzjT|Ggfv^+|aJZUsSDD1}Pnq={Mq+r3Z~T+io-lz+WWe z>lmFg*fXm#xo^Z*@-kFXCwxP-<3?S|j&HWU9a&}0fyBoc*5J;wuSF&PHXSiMo$_kh z=zF~83$QA$gr;h8@kwdW4=HtH@FA##QhA#0bkNB}j^gfp@^%AN@75LQsz7e^$RcCX zr=#PlY~0J=OHs?W7yxA{pd3h>5)jFeTroLHP#xp+rLog)&c->adt7NY;1$Chk_Tw8 z0={0lb4o&&`6qc~Zn&a)MJu^N21L;-9AM~)WlHc>B3HM%ob7F^oz7?NJ*c=<|2UfUze+0({$ zh}B=*p=c|HOZ(Svqx#0UQw;rjiuR}_SDq45b!znfw$lHC4WW4ocs;$pxK*3>7k9KS zT9$Cg#pli@ZQWwGUVr<0EBT?5=~~|2M+QB)L>t+e3ewg%ztqGmSZJ8V&?~v7lFHpD zRf4hE@$)d>*YvTpJS--42WHb z=y&yv>MHn|=lLqWUm*0e^3Bh3u7Cq6yU|lD&LYaOkqLvdz4oMEvl^iO1}NuxvLw2U zTOBC$yQ@c)9etN%8J$Jpq_IQg$>K5Tg(W*&LJEY*&Bah$&M&ZvgOgtzb3R~almLVwuLpRY}?14y;Z-j$n90QOoS1mcLwJOdI)LNt87!&qWyd1Ctey7W`81J zxQPjli6(X=N&D*M&*b{*aet1Hk7I4_4H6_*VNP6BTDRUiWsA7EY#v-TGEQrq(s>=R4V<=A;b>Fz7?FTX zC`u7M?XbIA6wax(MomN%dB9tDT_lZji)iZy!tFc-!>fzqhA%%Bb=ce3wf+b(TyDbpEQc*`fe4QoXUEi|pfcp~}cR<>hoa*0NpJ-;a`>yzuJy zam-!WfjTqFlN$(hY6d%+G~2vxAo}BE$pox;Xi@pjt3TejFRQASnCP;i%W9Bh&y%W_ z$K^>b{oy@(Y2ez(3%DbLUWX~UwpS42tv4jjE0`H zP=QJpY%yQ|TMvtpSr^JZ&gWLl7-qHyUdB5B{m+|%2+e}cHvoJtyED(ST; zY3GR(+Y3FP%znwAaMb&VNK_tv2|k@^5n;GFdE$IEw57ss`AvZTh8cnI<2w=ksqDC- zrI94=J%q{aW-fPTzjs;EQMVvP|M7T?A3fpWDUbPN+Bo7T^-z#?C|^OUK!S=L!Q!k_a8j{A9ld$M1Px^>VqnduLx- zuFU;R!h>(Ox>@F6MHlkh%sG z?g}>meRt_`msS%zF)^T~~X=<^;;lS^`30HT<$PE!Q|japPH1TqRU< zRyH7lOjOLmK8;yK@3hW5z)6^u?y?3mQG5gtR|OqZnp=P2!eNMbdm&`!P~$CgW+61~ zj0#QMRm+$$NP>ve-wmKLbI`00&Or`HUL$_FTl+Jvx9GT6>6~R5kaVwOn4t;Sdb<+D zEk`;}5xXETy);1+RGnA|_0216+rN`F{YrEJY=Z*bM)+I<)60F3LUi-n-$_0(4vWH-J@u#l!F{ixZnRq1UZ7 z{huyQ;9B%FeQ4DkU@0`{xXa8_-5$TB<5&Bj4Da*>m)dVTEHrbSTeZAU{chdj78R9F zL-L6>Mv5V(a6J2Sl=%iatZ#x#sP82i6f!dEXW!^WJC#^c}yhS2eJn z$+I2kmI|WQ5yraFm{*B4g34l4l=#3&h=A5t!iLsS5G_fAg@m+;QH3r8p6tY)IqnSh zwB=6_m!sG9o~iAFVj$_FXd6F2Ki^8Ih@aPf5m*v%Qcu{2um-z#E;!+$<$5)#sR4hl zKf4E^mZE%O24($^ev{_3)JAV30J7F!gw{#(iZ_Lbh!%GgQ~BTn;iS%lSEG)qLCCcl znwQVR!^GBn=K}>6nH!CuWD9M`*u&F_)X8%Ej5;1B{*C69YP*_677)b*0@Rj%`{?hI zQ^1;P+|}I`&+^<(1zU)t1IJ#vUoG;BadI9m zTNc^=BT$ak>Dj`6mcdn->5Hla#3^pxz&6}@CfMOUjtM({^zxkCpS zQW?<}E1eJ|)Sc1^(~_(j#Pi+-h~;t?+ie?h1L@QtqHZGus)@Tk%T)yrcG~6h8q6M! zZ7{qqk)#$k!5pnyD>_`NRm*N}UFWa`Ke;1dg2<)#7_>*4JWb@0D9ph#8KUSF#zefx zb21(op*rj$imqhhcdq88=?!zusJiAtolpC;dfs+{XSxK6u)=cux3cp#{D!Ld{dT=a zSAQ_?^n+j50BOsJ=HrDj&vg9gM0ous;dV!T|Jtk4_9KAI6g?=PC6LhT#|5!&=bN)g z^x>Q#AsKFeBm6i=>KtltPjBJl6M(OotZq0T(iAc9+AGzP#2)}P<(+4i|GuV&g%K(} zBMc6@q@2ce!>`HO;qXYTW&(~0>*P3BiCNH2U( z3fXNb&ECx{4SnpTXCpgzXlQ}XC9!Ma$-`d1Fjb~2s`RcB)=oeCX6Q+i zWU9w+{bKizmCc6r_J>Ll6B!UDhWY3RqT8p+W}#-`v&^K#oXGMk!w)4JvK|XRp(YOM zDFjY#F~Octclh-{_cZ>O69JQ9lK9c4ep2LUT5aQ_{r`0 zPZnF<3m4@6I0rd){^QFY11<@C;LyhZCF4JW`>)7x!gX#j^->nTL&shF6=s=Efmz(* zKRn zP;w9aG1X5;AS*ji(Z&CnZzv}4Ggpk1C%8|wogVUE06&xVy8~a-3IhM8uWB8S^ia~& zg70*6TCc0(IGwNPiRB_QL>9=muQonc^Edn0}( z8U~B2YaAa*f+a%;Qy+!4#9p3NbUhl0MEj;|WEwSljH1K{JEr6*{Bo)zM^1Uf<&03& zy&4}&dxCZD^guz^9U0@dIto!I!KI250uUv$j&&dK$;LfeOCai@Q=<}hZbGL}{+Kfi~UIRVv3IdNmoL}ld#S6(A^IE&-k zfrAx9k1)rkR3#}TU@XlEGd&Yi;Cc?ZQ;I+aS#HRfZnEVUDkMwZUj%aJQ+xyHL6ve0 zQhJS<$|GSPH3&u4#FEgr8`yr$5iO>JESf|1$T>rjp3E3hdJE|(zJ2ta_bu1i^uqyL z&$$J6mNU&vqp(t@8MxHCa!Xb5&6JLFEo5#1n5MKZz1U~K{1HL~Wu7zQzH>?=ThA8x z)ue2uGfKM{=Tda+d?$z)-*m@yCQkxWH#M0O=SDJxb;D>gn*!mY(=1?hiLX#uP_U+6$*F}(xev1I@OBR42X&xqsK zf*kPk{jDqh9PZ&);q9G|bT4#x7Zmfhr0lOhI+EY%GY}qfsgnU~+1q!+S>>N(NN;w+ z$0h7>a@UiPhy)4mB}nFaMex%%+xzNna%GR+_%N$D9=YuW*P%AF41 zmC16cghKazC=w3h8*aI|bCU3a&e(hokr5VRvgceO>=iEXou3fPLrl{nqavKH|CEA)?=e^!s4?(H6b0Fzo{hKjW0Zri-XKw?#4b0pg ze)M#`u5C29mz^%m{_hS(XTe+e!Blyo+cE!iidsU@_EWq(1UvNbV@dIz!Q$FxVA8ps zzM5B7_Yj+ zIB{xHVJJv7zSF}No;XC|6A) zE*_(^_S#DWK-CkOH@yo$8VEMAmIZ+gnA{#T-d>ZXOfPz`E1%%|?kroJ#A!V3@ssNH89>r(ehg7`E zzs=^uoO<)m;71hOj5EvnF@PVq15QxwXyX=B68p&=Z^!<+veV?1q;R;_!}m9zb)E{> z{dR#pk&ZifLxVvq$G5nS%Y>1%x`mOKZ-VNQ)49VR0w5{Cjss>P+8M7%Fk) zxBfnVOo^B%`5hxX3|QWnmi8c>%SRWA>K=N zcw5}~cvQcCY^X=sWJ^)#5lua87ySS!LUtH!j<&qWJI5Bw6CaPjDrz~_ghUG-7h$kz z6YTY$I%vn+;X%(ls`nV=iYQ@6I3CECep~ac^*@XWJ+op@I^H-#bW-&XO0Mg_7#k|; zR6eqU6m^IzIn0=UnTnp?%5Gc-jF_NA?P`wsGn8=064c zBaZqvtjF8=ILv4xZ3o)G$AlDmUoYHk2}IipJ9aD7?F>9Jd*~4F{ajExUHrq-5ZVse zyn}x_$-e~aym7~<8TxZ`c72|6q0WVsbCUs`5f*#q{1pJ2{oCezY=KV0S98w+<(dB7 zIj$G$yx%mhe;u8mU%j<_0=*O&hvyjdx`0Aa8k?FPm3D+%3}15Zg;&}ySupl+?#ZGb zg1_Wm*jUhR_r6n77}2r&QliNmm4gi}Efp4*Y;A4BTfXf$V{m~t+p&xL!;|KmIXiD2 zWU+ivMGC(X*t#V_O=K9|{;TcKIkqOhgf)KK^uu%UN5|fRD>~w%^IZI{_q+)}&UWwr z!Rg_qZ%a`5Ak|I}M%$uwMQPs49-IB|sNlb+&zQg04(QKz7~l$pl_68jV>?zH&loFc zx7S@TRyJo0!;O?rc%b>CX}b%t@y5cQO#sN*b2mOg%U|G={# z;a8127ct6?rWdB^h<)b_YxFiqkCAflEZ}LouH*X>|L}^Hba0zqm$G;4Tw)V!`d#MD zg$al%Jx4(JivQY&|9b}fGax#e6DId)qcSXiJ}Uo5fpj#|>px2I1&t&UGblJ%07Ic5 zmWd%+?d*4;bvEY00FC<17Y9KR7U}d} zT_A{xL$U@34a{2rGs?HkXXWJLj8Uz@<6N}1ycfH~?woL$EN0GRbJd4|wLT5TS)H%- zgju&L@wX?6=F|I-AxysFu4u=zF1{&sIbh7@&xv{A*hRS3HtuoddRzZOYRNY0;}vdJ zPpUuPS2x3d8$0PpLml6ViFuO$!h7ORzU{h{;zJ%6dALZocHHwfc9&@S?lD8ab;7({ z0QBttTvq@MUfKVI!P~yML;98>T40{9m4MRA@&BL-`;Y6jMKYNTTUJ(!n`vmtgb zHy_X{^0QViW3gEOpf9=FD?BGYfNy}``B;NOd_oJWP&(&Bxfj$+cD zwnod#b3Vok`m2Mvb3jhgvUl9=b4G&E=fdrtqp`7DjCt_1+c}FH47~N#Y_8#wa#u6* zq%uNlrnGsBG@P)2Umd)AnQJvqt@P*|HH&S*Uv?3Ex`2ADAn3-2#Cgn~6A`G1x8dhz z2O=%QdtRvjqrvJ|-pV2B(z*0_ZoU|eDgEdB&v$q`AWpSOaDY2I+w^b1cV5jvN{fn) zjvjv~)j$#P91d><8)MBdzYq_Tz z1b5+2GyrCx#%?@>Dk@;(u*!RPERa0KuZ*a_g7Caol*qYx z`6ESQT=@6jFZncc3goGEVc&m(DS`Fjf9oFq_w3t1CbI!N_H6{p7Oo3L8%6KcYO+J` zroYA*5WnM_?pW)$O>?!W?%xA`uz9j9Ox)*?k;!b6b=;t;N$Wy4${1B3%B7b06iXWnQYd&X5f69DR9*+ zN>ax0>|Xq7*ggC6z)@r){gX?A(F9_25oS^F+J`1$F%*V6xpoj-Y$Xt_BCoy5<;pwTuO zwUOjY9i2njO{d4ROMwoXe)mbNh7%@6sq2e$bS_=@#vCe~0-9W0l0oVYuqZzzX@_v_ zLY?i$#xo!8EwRXR7zJ8tQl!9Q&(>Q5DM@>HiXXJ4_$J!zrqJ=|2A?64%1y-?PsyrPEs}WkR8BgqTj}y2u*o;b28S z^~amGHyv+kyo-E5dl}kYfFrkoy$4yjLqv8`gt(i4j1#j&NVvE~lmfMK9rhUZ9HwM* zkG^^%)&p2=9C?9M6ys;aIE``)2Q)|9YJgtp>W1Q&(n&}s%mgn=Ob0T0&_%!io!$k= z{W#$Pi(TiG7-@kU$k_{+;g-90`%OrCSK&*12|*9j6MS)Q<-$J@cx1|3>7|+W%J6`I zEGkGN+qC;ps_OxgBY@(7rW~(!oEnTZ0x&;FAckQXvAJo93b`K^r#N&6yb;*qQyXm> z^BOG6n@VEbA`fTn*RHbMssr{PZMCFw>PBTuE(w7WWI><}xNqlHTza(ocbfH33ML0=OdLRr zj2^v4YG=>BY{!;iz}1PS9u&zfdi%p^0fs`BKk07AqMPH6Ywq$)srWd!D>`^BY*#C7 zUDrr#rkCy9s!zE&;LM32bq|p&d5n-eXYP=x>7~h-mS*`UKsc<#r6o~QpH0?u^*3!T z)+()HiYJ3s&bAhm23RVRs0wfk76QiTq7njU`DZLqGi7OlN26(i0R~Jq2yd^pTDF1a z@DHTMY$@``i1UF45M-1%r?VQ0Jnx4D*Qw;D1Yf+!hogKoR)-;`ep2B~$~BLtbA~#b zd>7LOe_WrO@5zi@s*Zs4tAm^|Q7e%G)Bft|!EcxX96X=xEDP)R3vq{c@?&ayOb`B5 z{doEPsWw?uWk^z#c(k(`H8V1n!n`~~_KHTSry{VJ%8-HP>CXCZDWW=h1QQHN611mA zm6Q_nXo-qtnT?jb#^%~YK|2%2=X$FOa(%l*OvDbAQ>_TmViJRsS4aPY8v&jVHiOL4 zgY!(ylau{skz#iyu1w;+yAtEuC8_R+@9OkRcjq5-rPje;`;S}MLasENcg932B4^zx z^3Hla&DfYbz1pczwS1!HAMCCHHWsKBipR$GS_Vh470IkKP~^7NFLa~Whq|E1vw^d0 zlDwtC>2>nYvu-ixSiwlBm>soIkZF&F+?3(zY3$dS?ykwkkuHji{T!AFGPrW2ef%Fh zU8EL@42!%XGftRf;jW3jua}qdJ7w3y5_5v@l9NPRs}sArw9MP|ubzrT(2slH_2}xJ z(H&O3NQT4X%7Di7a~`Nue&yKEBjyo?rRt^oBK%u|CueH!Yj<_M8~|3CvoJ$Y#PIs9 zW5Zb4QOJ_BQ$=)|c+?UZD!vlKMylggGDNd8YR{Ym$6VA!kS?voHzH^kgV8_dGn|G+Ut*-iW!M9x5wXIHDdZ?|Gw3tLej#>4EpTo8at7ux`4UESw)#Jx1o z?oXTFwf(njlhd8wn70X|_j!x9(Z4$4j?52?bY@{bY*EbfQr$D>Rhz13|BK@k;-Xly b@MZ1cpQ;K$2Eg8U9oPMy`zrUGxct8WTE!G4 literal 0 HcmV?d00001 From 5c133979656eb547cc577a3b8e2172865be21fa6 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Mon, 14 Dec 2020 16:11:48 -0500 Subject: [PATCH 04/21] Links, HTML, small rewrites --- docs/docs/manual/cellediting.md | 59 ++++++------ docs/docs/manual/exploring.md | 70 +++++++++----- docs/docs/manual/facets.md | 154 ++++++++++++++++++------------ docs/docs/manual/grelfunctions.md | 2 +- docs/docs/manual/installing.md | 38 ++++---- docs/docs/manual/running.md | 122 +++++++++++++---------- docs/docs/manual/sortview.md | 16 ++-- docs/docs/manual/starting.md | 76 ++++++++------- docs/docs/manual/transforming.md | 21 ++-- 9 files changed, 318 insertions(+), 240 deletions(-) diff --git a/docs/docs/manual/cellediting.md b/docs/docs/manual/cellediting.md index f2aed7f19..fc5b5540d 100644 --- a/docs/docs/manual/cellediting.md +++ b/docs/docs/manual/cellediting.md @@ -9,15 +9,17 @@ OpenRefine offers a number of features to edit and improve the contents of cells One way of doing this is editing through a [text facet](facets#text-facet). Once you have created a facet on a column, hover over the displayed results in the sidebar. Click on the small “edit” button that appears to the right of the facet, and type in a new value. This will apply to all the cells in the facet. +You can apply a text facet on numbers, boolean values, and dates, but if you edit a value it will be converted into the text [data type](exploring#data-types) (regardless of whether you edit a date into another correctly-formatted date, or a “true” value into “false”, etc.). + ## Transform -Select “Edit cells” → “Transforms” to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as `toUppercase` or `toLowercase`, used in expressions as `toUppercase(value)` or `toLowercase(value)`. In all of these cases, `value` is the value in each cell in the selected column. +Select Edit cellsTransforms to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. In these cases, `value` is the value in each cell in the selected column. Use the preview to ensure your data is being transformed correctly. -You can also switch to the “History” tab inside the expressions window to reuse expressions you’ve already attempted in this project, whether they have been undone or not. +You can also switch to the Undo / Redo tab inside the expressions window to reuse expressions you’ve already attempted in this project, whether they have been undone or not. -OpenRefine offers you some frequently-used transformations in the next menu option, “Common transforms.” For more custom transforms, read up on [expressions](expressions). +OpenRefine offers you some frequently-used transformations in the next menu option, Common transforms. For more custom transforms, read up on [expressions](expressions). ## Common transforms @@ -27,11 +29,11 @@ Often cell contents that should be identical, and look identical, are different ### Collapse consecutive whitespace -You may also find that some text cells contain what look like spaces but are actually tabs, or contain multiple spaces in a row. This function will remove all space characters that sit in sequence and replace them with a single space. +You may find that some text cells contain what look like spaces but are actually tabs, or contain multiple spaces in a row. This function will remove all space characters that sit in sequence and replace them with a single space. ### Unescape HTML -Your data may come from an HTML-formatted source that expresses some characters through references (such as “&nbsp;” for a space, or “%u0107” for a ć) instead of the actual Unicode characters. You can use the “unescape HTML entities” transform to look for these codes and replace them with the characters they represent. +Your data may come from an HTML-formatted source that expresses some characters through references (such as “&nbsp;” for a space, or “%u0107” for a ć) instead of the actual Unicode characters. You can use the “unescape HTML entities” transform to look for these codes and replace them with the characters they represent. For other formatting that needs to be escaped, try a custom transformation with [`escape()`](grelfunctions#escapes-s-mode). ### Replace smart quotes with ASCII @@ -39,28 +41,17 @@ Smart quotes (or curly quotes) recognize whether they come at the beginning or e ### Case transforms -You can transform an entire column of text into UPPERCASE, lowercase, or Title Case using these three options. This can be useful if you are planning to do textual analysis and wish to avoid case-sensitivity (which many functions are) causing problems in your analysis. +You can transform an entire column of text into UPPERCASE, lowercase, or Title Case using these three options. This can be useful if you are planning to do textual analysis and wish to avoid case-sensitivity (which some functions are) causing problems in your analysis. Consider also using a [custom facet](facets#custom-text-facet) to temporarily modify cases instead of this permanent operation if appropriate. ### Data-type transforms As detailed in [Data types](exploring#data-types), OpenRefine recognizes different data types: string, number, boolean, and date. When you use these transforms, OpenRefine will check to see if the given values can be converted, then both transform the data in the cells (such as “3” as a text string to “3” as a number) and convert the data type on each successfully transformed cell. Cells that cannot be transformed will output the original value and maintain their original data type. -For example, the following column of strings on the left will transform into the values on the right: +:::caution +Be aware that dates may require manual intervention to transform successfully: see the section on [Dates](exploring#dates) for more information. +::: -|Input|>|Output| -|---|---|---| -|23/12/2019|>|2019-12-23T00:00:00Z| -|14-10-2015|>|2015-10-14T00:00:00Z| -|2012 02 16|>|2012-02-16T00:00:00Z| -|August 2nd 1964|>|1964-08-02T00:00:00Z| -|today|>|today| -|never|>|never| - -This is based on OpenRefine’s ability to recognize dates with the [`toDate()` function](expressions#date-functions). - -Clicking the “today” cell and editing its data type manually will convert “today” into a value such as “2020-08-14T00:00:00Z”. Attempting the same data-type change on “never” will give you an error message and refuse to proceed. - -Because these common transforms do not offer the ability to output an error instead of the original cell contents, be careful to look for unconverted and untransformed values. You will see a yellow alert at the top of screen that will tell you how many cells were converted - if this number does not match your current row set, you will need to look for and manually correct the remaining cells. +Because these common transforms do not offer the ability to output an error instead of the original cell contents, be careful to look for unconverted and untransformed values. You will see a yellow alert at the top of screen that will tell you how many cells were converted - if this number does not match your current row set, you will need to look for and manually correct the remaining cells. Also consider faceting by data type, with the GREL function [`type()`](grelfunctions#typeo). You can also convert cells into null values or empty strings. This can be useful if you wish to, for example, erase duplicates that you have identified and are analyzing as a subset. @@ -68,41 +59,45 @@ You can also convert cells into null values or empty strings. This can be useful Fill down and blank down are two functions most frequently used when encountering data organized into [records](exploring#row-types-rows-vs-records) - that is, multiple rows associated with one specific entity. -If you receive information in rows mode and want to convert it to records mode, the easiest way is to sort your first column by the value that you want to use as a unique records key, [make that sorting permanent](transforming#edit-rows), then blank down all the duplicates in that column. OpenRefine will retain the first unique value and erase the rest. Then you can switch from “Show as rows” to “Show as records” and OpenRefine will convert the data based on the remaining values in the first column. Be careful that your data is sorted properly before you begin blanking down - not just the first column but other columns you may want to have in a certain order. For example, you may have multiple identical entries in the first column, one with a value in the second column and one with an empty cell in the second column. In this case you want the value to come first, so that you can clean up empty rows later, once you blank down. +If you receive information in rows mode and want to convert it to records mode, the easiest way is to sort your first column by the value that you want to use as a unique records key, [make that sorting permanent](transforming#edit-rows), then blank down all the duplicates in that column. OpenRefine will retain the first unique value and erase the rest. Then you can switch from “Show as rows” to “Show as records” and OpenRefine will associate rows to each other based on the remaining values in the first column. + +Be careful that your data is sorted properly before you begin blanking down - not just the first column but other columns you may want to have in a certain order. For example, you may have multiple identical entries in the first column, one with a value in the second column and one with an empty cell in the second column. In this case you want the row with the second-column value to come first, so that you can clean up empty rows later, once you blank down. If, conversely, you’ve received data with empty cells because it was already in something akin to records mode, you can fill down information to the rest of the rows. This will duplicate whatever value exists in the topmost cell with a value: if the first row in the record is blank, it will take information from the next cell, or the cell after that, until it finds a value. The blank cells above this will remain blank. ## Split multi-valued cells -Splitting cells with more than one value in them is a common way to get your data from single rows into multi-row records. Survey data, for example, frequently allows respondents to “Select all that apply,” or an inventory list might have items filed under more than one category. +Splitting cells with more than one value in them is a common way to get your data from single rows into [multi-row records](exploring#rows-vs-records). Survey data, for example, frequently allows respondents to “Select all that apply,” or an inventory list might have items filed under more than one category. You can split a column based on any character or series of characters you input, such as a semi-colon (;) or a slash (/). The default is a comma. Splitting based on a separator will remove the separator characters, so you may wish to include a space with your separator (; ) if it exists in your data. -You can use [expressions](expressions) to design the point at which a cell should split itself into two or more rows. This can be used to identify special characters or create more advanced evaluations. You can split on a line-break by entering `\n` and checking the “regular expression” checkbox. +You can use [expressions](expressions) to design the point at which a cell should split itself into two or more rows. This can be used to identify special characters or create more advanced evaluations. You can split on a line-break by entering `\n` and checking the “[regular expression](expressions#regular-expressions)” checkbox. -This can be useful if the split is not straightforward: say, if a capital letter indicates the beginning of a new string, or if you need to _not_ always split on a character that appears in both the strings and as a separator. Remember that this will remove all the matching characters. +Regular expressions can be useful if the split is not straightforward: say, if a capital letter (`[A-Z]`) indicates the beginning of a new string, or if you need to _not_ always split on a character that appears in both the strings and as a separator. Remember that this will remove all the matching characters. You can also split based on the lengths of the strings you expect to find. This can be useful if you have predictable data in the cells: for example, a 10-digit phone number, followed by a space, followed by another 10-digit phone number. Any characters past the explicit length you’ve specified will be discarded: if you split by “11, 10” any characters that may come after the 21st character will disappear. If some cells only have one phone number, you will end up with blank rows. -If you have data that should be split into multiple columns instead of multiple rows, see [split into several columns(columnediting#split-into-several-columns). +If you have data that should be split into multiple columns instead of multiple rows, see [Split into several columns](columnediting#split-into-several-columns). ## Join multi-valued cells -Joining will reverse the “split multi-valued cells” operation, or join up information from multiple rows into one row. All the strings will be compressed into the topmost cell in the record, in the order they appear. A window will appear where you can set the separator; the default is a comma and a space (, ). This separator is optional. +Joining will reverse the “split multi-valued cells” operation, or join up information from multiple rows into one row. All the strings will be compressed into the topmost cell in the record, in the order they appear. A window will appear where you can set the separator; the default is a comma and a space (, ). This separator is optional. We suggest the separator | as a sufficiently rare character. ## Cluster and edit Creating a facet on a column is a great way to look for inconsistencies in your data; clustering is a great way to fix those inconsistencies. Clustering uses a variety of comparison methods to find text entries that are similar but not exact, then shares those results with you so that you can merge the cells that should match. Where editing a single cell or text facet at a time can be time-consuming and difficult, clustering is quick and streamlined. -Clustering always requires the user to approve each suggested edit - it will display values it thinks are variations on the same thing, and you can select which version to keep and apply across all those matching cells (or type in your own version). OpenRefine will do a number of cleanup operations behind the scenes, in memory, in order to do its analysis, but only the merges you approve will modify your data. +Clustering always requires the user to approve each suggested edit - it will display values it thinks are variations on the same thing, and you can select which version to keep and apply across all the matching cells (or type in your own version). -You can start the process in two ways: using the dropdown menu on your column, select “Edit cells” → “Cluster and edit…” or create a text facet and then press the “Cluster” button that appears in the facet box. +OpenRefine will do a number of cleanup operations behind the scenes in order to do its analysis, but only the merges you approve will modify your data. Understanding those different behind-the-scenes cleanups can help you choose which clustering method will be more accurate and effective. + +You can start the process in two ways: using the dropdown menu on your column, select Edit cellsCluster and edit…; or create a text facet and then press the “Cluster” button that appears in the facet box. ![A screenshot of the Clustering window.](/img/cluster.png) The clustering pop-up window will take a small amount of time to analyze your column, and then make some suggestions based on the clustering method currently active. -For each cluster identified, you can pick one of the existing values to apply to all cells, or manually type in a new value in the text box. And, of course, you can choose not to cluster them at all. OpenRefine will keep analyzing every time you make a change, with “Merge selected & re-cluster,” and you can work through all the methods this way. +For each cluster identified, you can pick one of the existing values to apply to all cells, or manually type in a new value in the text box. And, of course, you can choose not to cluster them at all. OpenRefine will keep analyzing every time you make a change, with Merge selected & re-cluster, and you can work through all the methods this way. You can also export the currently identified clusters as a JSON file, or close the window with or without applying your changes. You can also use the histograms on the right to narrow down to, for example, clusters with lots of matching rows, or clusters of long or short values. @@ -127,9 +122,9 @@ The clustering pop-up window offers you a variety of clustering methods: **Key collisions** are very fast and can process millions of cells in seconds: -**Fingerprinting** is the least likely to produce false positives, so it’s a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “Wang Zhenyi”). This makes comparing those types of name values very easy. +**Fingerprinting** is the least likely to produce false positives, so it’s a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “wang zhenyi”). -**N-gram fingerprinting** allows you to set the _n_ value to whatever number you’d like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a _fingerprint_. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself won’t identify). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”). +**N-gram fingerprinting** allows you to set the _n_ value to whatever number you’d like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a _fingerprint_. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself won’t identify because it keeps words separated). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”). ##### Phonetic clustering diff --git a/docs/docs/manual/exploring.md b/docs/docs/manual/exploring.md index 9b0f63b36..b91d91217 100644 --- a/docs/docs/manual/exploring.md +++ b/docs/docs/manual/exploring.md @@ -1,4 +1,4 @@ ---- +--- id: exploring title: Exploring data sidebar_label: Overview @@ -6,7 +6,7 @@ sidebar_label: Overview ## Overview -OpenRefine is a powerful tool for learning about your dataset, even if you don’t change a single character. In this section we cover different ways for sorting through, filtering, and viewing your data. +OpenRefine offers lots of features to help you learn about your dataset, even if you don’t change a single character. In this section we cover different ways for sorting through, filtering, and viewing your data. Unlike spreadsheets, OpenRefine doesn’t store formulas and display the output of those calculations; it only shows the value inside each cell. It doesn’t support cell colors or text formatting. @@ -14,21 +14,19 @@ Unlike spreadsheets, OpenRefine doesn’t store formulas and display the output Each piece of information (each cell) in OpenRefine is assigned a data type. Some file formats, when imported, can set data types that are recognized by OpenRefine. Cells without an associated data type on import will be considered a “string” at first, but you can have OpenRefine convert cell contents into other data types later. This is set at the cell level, not at the column level. -You can see data types in action when you preview a new project: check the box that says “Attempt to parse cell text into numbers” and cells will be converted to the “number” data type based on their contents. You’ll see numbers change from black text to green if they are recognized. +You can see data types in action when you preview a new project: check the box next to Attempt to parse cell text into numbers, and cells will be converted to the “number” data type based on their contents. You’ll see numbers change from black text to green if they are recognized. The data type will determine what you can do with the value. For example, if you want to add two values together, they must both be recognized as the number type. You can check data types at any time by: * clicking “edit” on a single cell (where you can also edit the type) -* creating a Custom Text Facet on a column, and inserting `type(value)` into the “Expression” field. This will generate the data type in the preview, and you can facet by data type if you press “OK.” +* creating a Custom Text Facet on a column, and inserting `type(value)` into the Expression field. This will generate the data type in the preview, and you can facet by data type if you press OK. The data types supported are: * string (one or more text characters) * number (one or more characters of numbers only) * boolean (values of “true” or “false”) -* date (ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ) - -A “date” type is created when a text column is [transformed into dates](transforming#to-date), or when individual cells are set to have the data type “date.” +* [date](#dates) (ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ) OpenRefine recognizes two further data types as a result of its own processes: * error @@ -36,33 +34,59 @@ OpenRefine recognizes two further data types as a result of its own processes: An “error” data type is created when the cell is storing an error generated during a transformation in OpenRefine. -A “null” data type is a special value which basically means “this cell has no value.” It’s used to differentiate between cells that have values such as “0” or “false” - or a cell that looks empty but has, for example, spaces in it. When you use `type(value)`, it will show you that the cell’s value is “null” and its type is “undefined.” You can opt to [show “null” values](#view) to differentiate them from empty strings, by going to “All” → “View” → “Show/Hide ‘null’ values in cells.” +A “null” data type is a special type that means “this cell has no value.” It’s distinct from cells that have values such as “0” or “false”, or cells that look empty but have whitespace in them, or cells that contain empty strings. When you use `type(value)`, it will show you that the cell’s value is “null” and its type is “undefined.” You can opt to [show “null” values](sortview#showhide-null), by going to AllViewShow/Hide ‘null’ values in cells. -Converting a cell's data type is not the same operation as transforming its contents. For example, using a column-wide transform such as “Transform” → “Common transforms …” → “to date” may not convert all values successfully, but going to an individual cell, clicking “edit” and changing the data type can successfully convert text to a date. These operations use different underlying code. +Changing a cell's data type is not the same operation as transforming its contents. For example, using a column-wide transform such as TransformCommon transformsTo date may not convert all values successfully, but going to an individual cell, clicking “edit”, and changing the data type can successfully convert text to a date. These operations use different underlying code. Learn more about date formatting and transformations in the next section. + +To transform data from one type to another, see [Transforming data](cellediting#data-type-transforms) for information on using common tranforms, and see [Expressions](expressions) for information on using [toString()](grelfunctions#tostringo-string-format-optional), [toDate()](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-), and other functions. -To transform data from one type to another, see [Transforming data](transforming#transform) for information on using common tranforms, and see [Expressions](expressions) for information on using `toString()`, `toDate()`, and other functions. ### Dates -Date-formatted data in OpenRefine relies on a number of conversion tools and standards. When you convert a cell into a "date" data type, what you'll be doing is trying to transform the original contents in an ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ. +A “date” type is created when a column is [transformed into dates](transforming#to-date), when an expression is used to [convert cells to dates](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-) or when individual cells are set to have the data type “date.” -You can convert dates when you [export your data using the custom tabular exporter](exporting#custom-tabular-exporter). You are given the option to keep your dates in ISO 8601 format, or to output short, medium, long, or full locale formats. This means that you can format your dates into, for example, MM/DD/YY (the US short standard) with or without including the time, after manipulating your data formatted into ISO 8601. +Date-formatted data in OpenRefine relies on a number of conversion tools and standards. For something to be considered a date in OpenRefine, it will be converted into the ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ. + +When you run Edit cellsCommon transformsTo date, the following column of strings on the left will transform into the values on the right: + +|Input|→|Output| +|---|---|---| +|23/12/2019|→|2019-12-23T00:00:00Z| +|14-10-2015|→|2015-10-14T00:00:00Z| +|2012 02 16|→|2012-02-16T00:00:00Z| +|August 2nd 1964|→|1964-08-02T00:00:00Z| +|today|→|today| +|never|→|never| + +OpenRefine uses a variety of tools to recognize, convert, and format [dates](exploring#dates) and so some of the values above can be reformatted using other methods. In this case, clicking the “today” cell and editing its data type manually will convert “today” into a value such as “2020-08-14T00:00:00Z”. Attempting the same data-type change on “never” will give you an error message and refuse to proceed. + +You can do more precise conversion and formatting using expressions and arguments based on the state of your data: see the GREL functions reference section on [Date functions](grelfunctions#date-functions) for more help. + +You can convert dates into a more human-readable format when you [export your data using the custom tabular exporter](exporting#custom-tabular-exporter). You are given the option to keep your dates in the ISO 8601 format, to output short, medium, long, or full locale formats, or to specify a custom format. This means that you can format your dates into, for example, MM/DD/YY (the US short standard) with or without including the time, after working with ISO-8601-formatted dates in your project. + +The following table shows some example [date and time formatting styles for the U.S. and French locales](https://docs.oracle.com/javase/tutorial/i18n/format/dateFormat.html): -The following table shows the [date and time formatting styles for the U.S. and French locales](https://docs.oracle.com/javase/tutorial/i18n/format/dateFormat.html): |Style |U.S. Locale |French Locale| -|DEFAULT |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| -|SHORT |6/30/09 7:03 AM |30/06/09 07:03| -|MEDIUM |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| -|LONG |June 30, 2009 7:03:47 AM PDT |30 juin 2009 07:03:47 PDT| -|FULL |Tuesday, June 30, 2009 7:03:47 AM PDT |mardi 30 juin 2009 07 h 03 PDT| +|---|---|---| +|Default |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| +|Short |6/30/09 7:03 AM |30/06/09 07:03| +|Medium |Jun 30, 2009 7:03:47 AM |30 juin 2009 07:03:47| +|Long |June 30, 2009 7:03:47 AM PDT |30 juin 2009 07:03:47 PDT| +|Full |Tuesday, June 30, 2009 7:03:47 AM PDT |mardi 30 juin 2009 07 h 03 PDT| ## Rows vs. records -A row is a simple way to organize data: a series of cells, one cell per column. Sometimes there are multiple pieces of information in one cell, such as when a survey respondent can select more than one response. In cases where there is more than one value for a single column in one or more rows, you may wish to use OpenRefine’s records mode: this defines a single record (a survey response, for example) as potentially containing more than one row. From there you can transform cells into multiple rows, each cell containing one value you’d like to work with. +A row is a simple way to organize data: a series of cells, one cell per column. Sometimes there are multiple pieces of information in one cell, such as when a survey respondent can select more than one response. -Generally, when you import some data, OpenRefine reads that data in row mode. From there you can convert the project into records mode. OpenRefine remembers this action and will present you with records mode each time you open the project from then on. +In cases where there is more than one value for a single column in one or more rows, you may wish to use OpenRefine’s records mode: this defines a single record as potentially containing more than one row. From there you can transform cells into multiple rows, each cell containing one value you’d like to work with. -OpenRefine understands records based on the content of the first column, what we call the "key column." Splitting a row into a multi-row record will base all association on the first column in your dataset. If you have more than one column to split out into multiple rows, OpenRefine will keep your data associated with its original record: you can imagine this structure as a tree with many branches, all leading back to the same trunk. +Generally, when you import some data, OpenRefine reads that data in row mode. From the project screen, you can convert the project into records mode. OpenRefine remembers this action and will present you with records mode each time you open the project from then on. + +OpenRefine understands records based on the content of the first column, what we call the “key column.” Splitting a row into a multi-row record will base all association on the first column in your dataset. + +If you have more than one column to split out into multiple rows, OpenRefine will keep your data associated with its original record, and associate subgroups based on the top-most row in each group. + +You can imagine the structure as a tree with many branches, all leading back to the same trunk. For example, your key column may be a film or television show, with multiple cast members identified by name, associated to that work. You may have one or more roles listed for each person. The roles are linked to the actors, which are linked to the title. @@ -83,9 +107,9 @@ For example, your key column may be a film or television show, with multiple cas ||Margaret Hamilton|Miss Almira Gulch| |||The Wicked Witch of the West| -Once you are in records mode, you can still move columns around, but if you move a column to the beginning, you may find your data becomes misaligned. The new key column will sort into records based on empty cells, and values in the old key column will be assigned to the last row in the old record (the key value sitting above those values). +Once you are in records mode, you can still move some columns around, but if you move a column to the beginning, you may find your data becomes misaligned. The new key column will sort into records based on empty cells, and values in the old key column will be assigned to the last row in the old record (the key value sitting above those values). -OpenRefine assigns a unique key behind the scenes, so your records don’t need a unique identifier in the key column (but you will likely have one, to ensure data stays properly sorted). You can keep track of which rows are assigned to which record by the record number that appears under the “All” column. +OpenRefine assigns a unique key behind the scenes, so your records don’t need a unique identifier in the key column. You can keep track of which rows are assigned to each record by the record number that appears under the All column. To [split multi-valued cells](transforming#split-multi-valued-cells) and apply other operations that take advantage of records mode, see [Transforming data](transforming). diff --git a/docs/docs/manual/facets.md b/docs/docs/manual/facets.md index 1eb37ab31..20e99e95a 100644 --- a/docs/docs/manual/facets.md +++ b/docs/docs/manual/facets.md @@ -1,4 +1,4 @@ ---- +--- id: facets title: Exploring facets sidebar_label: Facets @@ -6,16 +6,18 @@ sidebar_label: Facets ## Overview -Facets are one of OpenRefine’s strongest features - that’s where the diamond logo comes from! Faceting allows you to look for patterns and trends. Facets are essentially aspects or angles of data variance in a given column. For example, if you had survey data where respondents indicated one of five responses from “Strongly agree” to “Strongly disagree,” those five responses make up a text facet, showing how many people selected each option. +Facets are one of OpenRefine’s strongest features - that’s where the diamond logo comes from! + +Faceting allows you to look for patterns and trends. Facets are essentially aspects or angles of data variance in a given column. For example, if you had survey data where respondents indicated one of five responses from “Strongly agree” to “Strongly disagree,” those five responses make up a text facet, showing how many people selected each option. Faceted browsing gives you a big-picture look at your data (do they agree or disagree?) and also allows you to filter down to a specific subset to explore it more (what do people who disagree say in other responses?). -Typically, you create a facet on a particular column. That facet selection appears on the left, in the Facet/Filter tab, and you can click on a displayed facet to view all the records that match. You can also “exclude” the facet, to view every record that does _not_ match, and you can select more than one facet by clicking “include.” +Typically, you create a facet on a particular column. That facet selection appears on the left, in the Facet/Filter tab, and you can click on a displayed facet to view all the records that match. You can also “exclude” the facet, to view every record that does _not_ match, and you can select more than one facet by clicking “include.” ### An example -You can learn about facets and filtering with the following example. +You can learn about facets and filtering with the following example. You can copy the following table and paste it using the Clipboard method of starting a project if you would like to try it yourself. We collected a list of the [10 most populous cities from Wikidata](https://w.wiki/3Em), using an example query of theirs. We removed the GPS coordinates and added the country. @@ -32,9 +34,9 @@ We collected a list of the [10 most populous cities from Wikidata](https://w.wik | Guangzhou | 13080500 | People's Republic of China | | São Paulo | 12106920 | Brazil | -If we want to see which countries have the most populous cities, we can create a “text facet” on the “countryLabel” column and OpenRefine will generate a list of all the different strings used in these cells. +If we want to see which countries have the most populous cities, we can create a text facet on the “countryLabel” column and OpenRefine will generate a list of all the different strings used in these cells. -We will see in the sidebar that the countries identified are displayed, along with the number of matches (the “count”). We can sort this list alphabetically or by the count. If you sort by count, you’ll learn which countries hold the most populous cities. +We will see in the sidebar that the countries identified are displayed, along with the number of matches (the “count”). We can sort this list alphabetically or by the count. If you sort by count at the top of the facet window, you’ll learn which countries hold the most populous cities. |Facet|Count| |---|---| @@ -48,25 +50,25 @@ We will see in the sidebar that the countries identified are displayed, along wi If we want to learn more about a particular country, we can click on its appearance in the facet sidebar. This narrows our dataset down temporarily to only rows matching that facet. -You’ll see the “10 rows” notification change to “4 matching rows (10 total)” if you click on “People’s Republic of China”. In the data grid, you’ll see the same number of rows, but only the ones matching your current filter. Each row will maintain its original numbering, though - in this case, rows #1, 2, and 8. +You’ll see the “10 rows” indicator change to “4 matching rows (10 total)” if you click on “People’s Republic of China”. In the data grid, you’ll see fewer rows: only the ones matching your current filter. Each row will maintain its original numbering, though - in this case, rows #1, 2, and 8. -If you want to go back to the original dataset, click “reset” or “exclude.” If you want to view the most populous cities in both China and India, click “include” next to each facet. Now you’ll see 5 rows - #1, 2, 5, 8, 9. +If you want to go back to the original dataset, click Reset All or the small “exclude” text next to the facet. If you want to view the most populous cities in both China and India, click “include” next to each facet. Now you’ll see 5 rows - #1, 2, 5, 8, 9. We can also explore our data using the population information. In this case, because population is a number, we can create a numeric facet. This will give us the ability to explore by range rather than by exact matching values. With the numeric facet, we are given a scale from the smallest to the largest value in the column. We can drag the range minimum and maximum to narrow the results. In this case, if we narrow down to only cities with more than 20 million in population, we get 3 matching rows out of the original 10. -When you look at the facet display of countries, you should see a smaller list with a reduced count: OpenRefine is now displaying the facets of the 3 matching rows, not the total dataset of 10 rows. +When you look back at the text facet display of country names, you should see a smaller list with a reduced count: OpenRefine is now displaying the facets of the 3 matching rows, not the total dataset of 10 rows. We can combine these facets - say, by narrowing to only the Chinese cities with populations greater than 20 million - simply by clicking in both. You should see 2 matching rows for both these criteria. ### Things to know about facets -When you have facets applied, you will see “matching rows” in the [project grid header](running#project-grid-header). If you press “Export” and copy your data out of OpenRefine while facets are active, you will only export the matching rows, not all the rows in your project. +When you have facets applied, you will see “matching rows” in the [project grid header](running#project-grid-header). If you click Export and copy your data out of OpenRefine while facets are active, many of the exporting options will only export the matching rows, not all the rows in your project. OpenRefine has several default facets, which you’ll learn about below. The most powerful facets are the ones designed by you - custom facets, written using [expressions](expressions) to transform the data behind the scenes and help you narrow down to precisely what you’re looking for. -Facets are not saved in the project along with the data. But you can save a link to the current state of the application. Find the "permalink" next to the project’s name. +Facets are not saved in the project along with the data. But you can save a link to the current state of the application. Find the [Permalink](running#the-project-bar) next to the project’s name. You can modify any facet expression by clicking the “change” button to the right of the column name in the facet sidebar. @@ -74,15 +76,17 @@ Facet boxes that appear in the sidebar can be resized and rearranged. You can dr ## Text facet -A text facet can be generated on any column with the “text” data type. Select the column dropdown and go to “Facet” → “Text facet”. The created facet will be sorted alphabetically, and can be sorted by count. +A text facet can be generated on any column with the “text” data type. Select the column dropdown and go to FacetText facet. The created facet will be sorted alphabetically, and can be sorted by count. A text facet is very simple: it takes the total contents of the cells of the column in question and matches them up. It does no guessing about typos or near-matches. -You can edit any entry that appears in the facet display, by hovering over the facet and clicking the “edit” button that appears. You can then type in a new value manually. This will mass-edit every identical cell in your data. This is a great way to fix typos, whitespace, and other issues that may be affecting the way facets appear. You can also automate the cleanup of facets by using [clustering](transforming#cluster-and-edit), with the “Cluster” button displayed within the facet window. It may be most efficient to cluster cells to one value, and then mass-edit that value to your desired string within the clustering operation window. +You can edit any entry that appears in the facet display, by hovering over the facet and clicking the “edit” button that appears. You can then type in a new value manually. This will mass-edit every identical cell in the column. This is a great way to fix typos, whitespace, and other issues that may be affecting the way facets appear. You can also automate the cleanup of facets by using [clustering](transforming#cluster-and-edit): a “Cluster” button is displayed within the facet window. It may be most efficient to cluster cells to one value, and then mass-edit that value to your desired string within the clustering operation window. -Each text facet shows up to 2,000 choices by default. You can [increase this limit on the Preferences screen](running#preferences) if you need to, which will increase the processing work required by your browser. If your applied facet has more choices than the current limit, you'll be offered the option to increase the limit, which will edit that preference for you. +Each text facet shows up to 2,000 choices by default. You can [increase this limit on the Preferences screen](running#preferences) if you need to, which may slow down your browser. If your applied facet has more choices than the current limit, you'll be offered the option to increase the limit, which will permanently edit that preference for you. -The choices and counts displayed in each facet can be copied as tab-separated values. To do so, click on the "X choices" link near the top left corner of the facet. +The choices and counts displayed in each facet can be copied as tab-separated values. To do so, click on the "X choices" link near the top left corner of the facet. This can be useful to generate small summary tables of your data. + +![A column of years faceted as text and numbers, and with the count ready to be copied.](/img/yeardata.png) ## Numeric facet @@ -92,73 +96,95 @@ Whereas a text facet groups unique text values into groups, a numeric facet sort You will be offered the option to include blank, non-numeric, and error values in your numeric visualization; these will appear in the visual range as “0” values. +:::info +You can create a text facet on numeric data, which will treat each entry as a string. This can be useful if you wish, for example, to manually include facets instead of selecting a range, or sort by count, or copy that count. +::: + ## Timeline facet ![A screenshot of an example timeline facet.](/img/timelinefacet.png) -Much like a numeric facet, a timeline facet will display as a small histogram with the values sorted: in this case, chronologically. A timeline facet only works on dates formatted as “date” data types (e.g. by [using the `toDate()` function](expressions#dates) to transform text into dates, or by manually setting the [data type](#cell-data-types) on individual cells) and in the structure of the ISO-8601-compliant extended format with time in UTC: **YYYY**-**MM**-**DD**T**HH**:**MM**:**SS**Z. +Much like a numeric facet, a timeline facet will display as a small histogram with the values sorted: in this case, chronologically. A timeline facet only works on cells formatted as the [“date” data type](exploring#dates). + +The facet appears with a count of blank cells and those with errors, which can help you analyze whether your date cells are correctly converted. ## Scatterplot facet -A scatterplot facet can be generated on any number-formatted column. You require two or more number columns to generate scatterplots. +A scatterplot is a visual representation of two related sets of numeric data. You have the option to generate linear scatterplots (where the X and Y axes show continuous increases) or logarithmic scatterplots (where the X and Y axes show exponential or scaled increases). You can also rotate the plot by 45 degrees in either direction, and you can choose the size of the dot indicating a datapoint. You can make these choices in both the preview and in the facet display. -Selecting “Facet” → “Scatterplot facet” will create a preview of data plotted from every number-formatted column in your dataset, comparing every column against every other column. Each scatterplot will show in its own square, allowing you to choose which data comparison you would like to analyze further. +A scatterplot facet can be generated on any column. You require two or more number columns to generate scatterplots. Selecting FacetScatterplot facet will create a preview of data plotted from every number-formatted column in your dataset, comparing every column against every other column. Each scatterplot will show in its own square, allowing you to choose which data comparison you would like to analyze further. You can control which columns are on the X and Y axes by rearranging the columns in your dataset. -When you click on your desired square, that two-column comparison will appear in the facets sidebar. From here, you can drag your mouse to draw a rectangle inside the scatterplot, which will narrow down to just the rows matching the points plotted inside that rectangle. This rectangle can be resized by dragging any of the four edges. To draw a new rectangle, simply click and drag your mouse again. To add more scatterplots to the facet sidebar, re-run this process and select a different square. +![A simple scatterplot of two numeric values.](/img/scatterplot.png) + +When you click on your desired square, that two-column comparison will appear in the facets sidebar. From here, you can drag your mouse to draw a rectangle inside the scatterplot, which will narrow down to just the rows matching the points plotted inside that rectangle (as shown by the rectangle inside the square in the image above). This rectangle can be resized by dragging any of the four edges. To draw a new rectangle, simply click and drag your mouse again. To add more scatterplots to the facet sidebar, re-run this process and select a different square. If you have multiple facets applied, plotted points in your scatterplot displays will be greyed out if they are not part of the current matching data subset. If the rectangle you have drawn within a scatterplot display only includes grey dots, you will see no matching rows. -If you would like to export a scatterplot, OpenRefine will open a new tab with a generated PNG image that you can save. +If you would like to export a scatterplot, OpenRefine will open a new tab with a generated PNG file that you can save. ## Custom text facet -You may want to explore your textual data in a way that doesn’t involve modifying it but does require being more selective about what gets considered. Creating custom text facets will load your column into memory, transform the data, and store those transformations inside the facet. +You may want to explore your textual data with modifications that aren't permanent. Creating custom text facets will load your column into memory, transform the data temporarily, and store those transformations inside the facet. -You can also use text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and false” as values. +You can also use custom text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and false” as values. -If you would like to build your own version of a text facet, you can use the “Custom Text Facet” option. Clicking on “Facets” → “Custom text facet…” will bring up an [expressions](expressions) window where you can enter in a GREL, Python or Jython, or Clojure expression to modify how the facet works. +Clicking on FacetCustom text facet… will bring up an [expressions](expressions) window where you can enter in a GREL, Jython, or Clojure expression to modify how the facet works. -A custom text facet operates just like a [text facet](#text-facet) by default. Unlike a text facet, however, you cannot edit the facets that appear in the sidebar and change the matching cells in your dataset. +A custom text facet operates just like a [text facet](#text-facet) by default. Unlike a text facet, however, you cannot click “edit” on the facets that appear in the sidebar and change the matching cells in your dataset - because what they display is modified, not the original entries. For example, you may wish to analyze only the first word in a text field - perhaps the first name in a column of “[First Name] [Last Name]” entries. In this case, you can tell OpenRefine to facet only on the information that comes before the first space: -```value.split(" ")[0]``` +``` +value.split(" ")[0] +``` -In this case, `split()` is creating an array of text strings based on every space in the cells - in this case, one space, so two values in the array. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. We can do the same splitting and ask for the last name with +In this case, `split()` is creating an array of text strings based on every space in the cells ["Firstname", "Lastname"]. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. (Assuming the first name is one word, not something like “Mary Anne”.) We can do the same splitting and ask for the last name with -```value.split(" ")[1]``` +``` +value.split(" ")[1] +``` -You may want to create a facet that references several columns. For example, let’s say you have two columns, "First Name" and "Last Name", and you want out how many people have the same initial letter for both names (e.g., Marilyn Monroe, Steven Segal). To do so, create a custom text facet on either column and enter the expression +You may want to create a facet that references several columns. For example, let’s say you have two columns, “First Name” and “Last Name”, and you want out how many people have the same initial letter for both names (e.g., Marilyn Monroe, Steven Segal). To do so, create a custom text facet on either column and enter the expression -```cells["First Name"].value[0] == cells["Last Name"].value[0]``` +``` +cells["First Name"].value[0] == cells["Last Name"].value[0] +``` -That expression will facet your rows into `true` and `false`. +That expression will look for the first letter (the character at index 0) of each entry and compare them. Then it will facet your rows into `true` and `false`. You can learn more about text-modification functions on the [Expressions page](expressions). ## Custom numeric facet -You may want to explore your numerical data in a way that doesn’t involve modifying it but does require being more selective about what gets considered. You can also use custom numeric facets to analyze textual data, such as by getting the length of text strings (with `value.length()`), or by analyzing it as though it were formatted as numbers (with `toNumber(value)`). +You may want to explore your numerical data with modifications that aren't permanent. You can also use custom numeric facets to analyze textual data, such as by getting the length of text strings (with `value.length()`), or by analyzing it as though it were formatted as numbers (with `toNumber(value)`). -If you would like to build your own version of a numeric facet, you can use the “Custom Numeric Facet” option. Clicking on “Facets” → “Custom Numeric Facet…” will bring up an [expressions](expressions) window where you can enter in a GREL, Python or Jython, or Clojure expression to modify how the facet works. A custom numeric facet operates just like a [numeric facet](#numeric-facet) by default. +If you would like to build your own version of a numeric facet, you can use the Custom Numeric Facet option. Clicking on FacetCustom Numeric Facet… will bring up an [expressions](expressions) window where you can enter in a GREL, Jython, or Clojure expression to modify how the facet works. A custom numeric facet operates just like a [numeric facet](#numeric-facet) by default. For example, you may wish to create a numeric facet that rounds your value to the nearest integer, enter -```round(value)``` +``` +round(value) +``` If you have two columns of numbers and for each row you wish to create a numeric facet only on the larger of the two, enter -```max(cells["Column1"].value, cells[“Column2”].value)``` +``` +max(cells["Column1"].value, cells["Column2"].value) +``` If the numeric values in a column are drawn from a power law distribution, then it's better to group them by their logs: -```value.log()``` +``` +value.log() +``` If the values are periodic you could take the modulus by the period to understand if there's a pattern: -```mod(value, 7)``` +``` +mod(value, 7) +``` You can learn more about numeric-modification functions on the [Expressions page](expressions). @@ -166,13 +192,15 @@ You can learn more about numeric-modification functions on the [Expressions page Customized facets have been added to expand the number of default facets users can apply with a single click. They represent some common and useful functions you shouldn’t have to work out using an [expression](expressions). -All facets that display in the “Facet/Filter” sidebar can be edited by clicking on the “change” button to the right of the column title. This brings up the expressions window that will allow you to modify and preview the expression being used. +All facets that display in the Facet/Filter tab can be edited by clicking on the “change” button to the right of the column title. This brings up the expressions window that will allow you to modify and preview the expression being used. ### Word facet -Word facet is a simple version of a text facet: it splits up the content of the cells based on spaces, and outputs each character string as a facet: +A Word facet is a simple version of a text facet: it splits up the content of the cells based on spaces, and outputs each character string as a facet: -```value.split(" ")``` +``` +value.split(" ") +``` This can be useful for exploring the language used in a corpus, looking for common first and last names or titles, or seeing what’s in multi-valued cells you don’t wish to split up. @@ -180,19 +208,21 @@ Word facet is case-sensitive and only splits by spaces, not by line breaks or ot ### Duplicates facet -A duplicates facet will return only rows that have non-unique values in the column you’ve selected. It will create a facet of “true” and “false” values - true being cells that are not unique, and “false” being cells that are. The actual expression being used is +A Duplicates facet will return only rows that have non-unique values in the column you’ve selected. It will create a facet of “true” and “false” values - true being cells that are not unique, and “false” being cells that are. The actual expression being used is -```facetCount(value, 'value', '[Column]') > 1``` +``` +facetCount(value, 'value', '[Column]') > 1 +``` Duplicates facets are case-sensitive and you may wish to filter out things like leading and trailing whitespace or other hard-to-see issues. You can modify the facet expression, for example, with: -```facetCount(trim(toLowercase(value)), 'trim(toLowercase(value))', 'cityLabel') > 1``` +``` +facetCount(trim(toLowercase(value)), 'trim(toLowercase(value))', 'cityLabel') > 1 +``` ### Numeric log facet -Logarithmic scales reduce wide-ranging quantities to more compact and manageable ranges. A log transformation can be used to make highly skewed distributions less skewed. If your numerical data is unevenly distributed (say, lots of values in one range, and then a long tail extending off into different magnitudes), a numeric log facet can represent that range better than a simple numeric facet. It will break these values down into more navigable segments than the buckets of a numeric facet. This facet can make patterns in your data more visible. - -OpenRefine uses a base-10 log, the "common logarithm." +Logarithmic scales reduce wide-ranging quantities to more compact and manageable ranges. A log transformation can be used to make highly skewed distributions less skewed. If your numerical data is unevenly distributed (say, lots of values in one range, and then a long tail extending off into different magnitudes), a Numeric log facet can represent that range better than a simple numeric facet. It will break these values down into more navigable segments than the buckets of a numeric facet. This facet can make patterns in your data more visible. OpenRefine uses a base-10 log, the “common logarithm.” For example, we can look at [this data about the body weight of various mammals](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Brain2BodyWeight): @@ -220,13 +250,15 @@ A 1-bounded numeric log facet can be used if you'd like to exclude all the value ### Text-length facet -The text-length facet returns a numerical value for each cell and plots it on a numeric facet chart. The expression used is +The Text-length facet returns a numerical value for each cell and plots it on a numeric facet chart. The expression used is -```value.length()``` +``` +value.length() +``` -This can be useful to, for example, look for values that did not successfully split on an earlier split operation, or to validate that data is a certain expected length (such as whether a date, as YYYY/MM/DD, is eight to ten characters). +This can be useful to, for example, look for values that did not successfully split on an earlier split operation, or to validate that data is a certain expected length (such as whether a date in YYYY/MM/DD is eight to ten characters). -You can also employ a log of text-length facet that allows you to navigate more easily a wide range of string lengths. This can be useful in the case of web-scraping, where lots of textual data is loaded into single cells and needs to be parsed out. +You can also employ a Log of text-length facet that allows you to navigate more easily a wide range of string lengths. This can be useful in the case of web-scraping, where lots of textual data is loaded into single cells and needs to be parsed out. ### Unicode character-code facet @@ -243,19 +275,23 @@ An error is a data type created by OpenRefine in the process of transforming dat ![A view of the expressions window with an error converting a string to a number.](/img/error.png) -To store errors in cells, ensure that you have “store error” selected for the “On error” option in the expressions window. +To store errors in cells, ensure that you have store error selected for the “On error” option in the expressions window. ### Facet by null, empty, or blank -Any column can be faceted for [null and/or empty cells](#cell-data-types). These can help you find cells where you want to manually enter content. “Blank” means both null values and empty values. All three facets will generate “true” and “false” facets, “true” being blank. +Any column can be faceted for [null and/or empty cells](#cell-data-types). These can help you find cells where you want to manually enter content. -An empty cell is a cell that is set to contain a string, but doesn’t have any characters in it (a zero-length string). This can be a leftover from an operation that removed characters, or from manually editing a cell and deleting its contents. +“Blank” means both null values and empty values. All three facets will generate “true” and “false” facets, “true” being blank. + +An empty cell is a cell that is set to contain a string, but doesn’t have any characters in it (a zero-length string). This can be left over from an operation that removed characters, or from manually editing a cell and deleting its contents. ### Facet by star or flag Stars and flags offer you the opportunity to mark specific rows for yourself for later focus. Stars and flags persist through closing and opening your project, and thus can provide a different function than using a permalink to persist your facets. Stars and flags can be used in any way you want, although they are designed to help you flag errors and star rows of particular importance. -You can manually star or flag rows simply by clicking on the icons to the left of each row. You can also apply stars or flags to all matching rows by using the “All” dropdown menu and selecting “Edit rows” → “Star rows” or “Flag rows.” These operations will modify all matching rows in your current subset. You can unstar or unflag them as well. +You can manually star or flag rows simply by clicking on the icons to the left of each row. + +You can also apply stars or flags to all matching rows by using the All dropdown menu (on the first column) and selecting Edit rowsStar rows or Flag rows. This will create “true” and “false” facets in the Facet/Filter. These operations will modify all matching rows in your current subset. You can unstar or unflag them as well. You may wish to create a custom subset of your data through a series of separate faceting activities (rather than successively narrowing down with multiple facets applied). For example, you may wish to: * apply a facet @@ -266,21 +302,21 @@ You may wish to create a custom subset of your data through a series of separate * remove that facet * and then work with all of the cumulative starred rows. -You can use the dropdown menu on the “All” column and selecting “Facet by star” or “Facet by flag.” This will create “true” and “false” facets in the facet sidebar. - You can also create a text facet on any column with the expression `row.starred` or `row.flagged`. ## Text filter Filters allow you to narrow down your data based on whether a given column includes a text string. -When you choose “Text filter” a box appears in the “Facet/Filter” sidebar that allows you to enter in text. Matching rows will narrow dynamically with every character you enter. You can set the search to be case-sensitive or not, and you can use this box to enter in a regular expression. +When you choose Text filter a box appears in the Facet/Filter tab that allows you to enter in text. Matching rows will narrow dynamically with every character you enter. You can set the search to be case-sensitive or not, and you can use this box to enter in a regular expression. -For example, you can enter in "side" as a text filter, and it will return all cells in that column containing "side," "sideways," "offside," etc. +For example, you can enter in “side” as a text filter, and it will return all cells in that column containing “side,” “sideways,” “offside,” etc. -The text filter field supports [Java's regular expression language](http://download.oracle.com/javase/tutorial/essential/regex/). For example, you can employ a regular expression to view all properly-formatted emails: +The text filter field supports [regular expressions](expressions#regular-expressions). For example, you can employ a regular expression to view all properly-formatted emails: -```([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9\-\.]+)\.([a-zA-Z0-9\-]{2,15})``` +``` +([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9\-\.]+)\.([a-zA-Z0-9\-]{2,15}) +``` You can press “invert” on this facet to then see blank cells or invalid email addresses. diff --git a/docs/docs/manual/grelfunctions.md b/docs/docs/manual/grelfunctions.md index 4de5425ce..22393b45d 100644 --- a/docs/docs/manual/grelfunctions.md +++ b/docs/docs/manual/grelfunctions.md @@ -182,7 +182,7 @@ Returns the array of strings obtained by splitting s into substrings with the gi Returns the array of strings obtained by splitting s by sep, or by guessing either tab or comma separation if there is no sep given. Handles quotes properly and understands cancelled characters. The separator can be either a string or a regex pattern. For example, `value.smartSplit("\n")` will split at a carriage return or a new-line character. -Note: `value.[escape](#escapes-s-mode)('javascript')` is useful for previewing unprintable characters prior to using smartSplit(). +Note: [`value.escape('javascript')`](#escapes-s-mode) is useful for previewing unprintable characters prior to using smartSplit(). ###### splitByCharType(s) diff --git a/docs/docs/manual/installing.md b/docs/docs/manual/installing.md index 88f766eb8..adda2313c 100644 --- a/docs/docs/manual/installing.md +++ b/docs/docs/manual/installing.md @@ -45,7 +45,7 @@ For the absolute latest development updates, see the [snapshot releases](https:/ #### What’s changed -Our [latest version is OpenRefine 3.4.1](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1), released September 24th 2020. The major changes in this version are listed on the [3.4 release page](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1) with the downloadable packages. +Our [latest version is OpenRefine 3.4.1](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1), released September 24th 2020. The major changes in this version are listed on the [3.4.1 release page](https://github.com/OpenRefine/OpenRefine/releases/tag/3.4.1) with the downloadable packages. You can find information about all OpenRefine versions on the [Releases page on Github](https://github.com/OpenRefine/OpenRefine/releases). @@ -70,7 +70,7 @@ Take note of the [extensions](#installing-extensions) you have currently install [Java Development Kit (JDK)](https://jdk.java.net/) is required to run OpenRefine and should be installed first. [OpenRefine installation packages for Mac and Windows come bundled with JDK](https://openrefine.org/download.html), so you do not need to install it separately if you use those bundles. -There are JDK packages for Mac, Windows, and Linux. We recommend you install the latest “Ready for use” version: at the time of writing, this is [JDK 14.0.1](https://jdk.java.net/14/). +There are JDK packages for Mac, Windows, and Linux. We recommend you install the latest “Ready for use” version. At the time of writing, this is [JDK 14.0.1](https://jdk.java.net/14/). Download the archive (either a `.tar.gz` or a `.zip`) to your computer and then extract its contents to a location of your choice. There is no installation process, so you may wish to extract this folder directly into a place where you put program files, or another stable folder. @@ -91,16 +91,16 @@ import TabItem from '@theme/TabItem'; -1. On Windows 10, click the Windows start menu button, type “env,” and look at the search results. Click “Edit the system environment variables.” (If you are using an earlier version of Windows, use the “Search” or “Search programs and files” box in the start menu.) +1. On Windows 10, click the Start Menu button, type `env`, and look at the search results. Click Edit the system environment variables. (If you are using an earlier version of Windows, use the “Search” or “Search programs and files” box in the Start Menu.) ![A screenshot of the search results for 'env'.](/img/env.png "A screenshot of the search results for 'env'.") -2. Click “Environment Variables…” at the bottom of the “Advanced” window that appears. -3. In the “Environment Variables” dialog that appears, click “New…” and create a variable with the key `JAVA_HOME`. You can set the variable for only your user account, as in the screenshot below, or set it as a system variable - it will work either way. +2. Click Environment Variables… at the bottom of the Advanced window. +3. In the Environment Variables window that appears, click New… and create a variable with the key `JAVA_HOME`. You can set the variable for only your user account, as in the screenshot below, or set it as a system variable - it will work either way. ![A screenshot of 'Environment Variables'.](/img/javahome.png "A screenshot of 'Environment Variables'.") -4. Set the `Value` to the folder where you installed JDK, in the format `D:\Programs\OpenJDK`. You can locate this folder with the “Browse directory...” button. +4. Set the `Value` to the folder where you installed JDK, in the format `D:\Programs\OpenJDK`. You can locate this folder with the Browse directory... button. @@ -174,7 +174,7 @@ Save and close the file. When you are back in the terminal, type source /etc/environment ``` -Exit the terminal and restart your system. You can then check that JAVA_HOME is set properly by opening another terminal and typing +Exit the terminal and restart your system. You can then check that `JAVA_HOME` is set properly by opening another terminal and typing ``` echo $JAVA_HOME ``` @@ -208,7 +208,9 @@ If you have extensions installed, do not delete the `webapp\extensions` folder w -Once you have downloaded the `.zip` file, extract it into a folder where you wish to store program files (such as `D:\Program Files\OpenRefine`). You can right-click on `openrefine.exe` or `refine.bat` and pin one of those programs to your Start Menu or create shortcuts for easier access. +Once you have downloaded the `.zip` file, extract it into a folder where you wish to store program files (such as `D:\Program Files\OpenRefine`). + +You can right-click on `openrefine.exe` or `refine.bat` and pin one of those programs to your Start Menu or create shortcuts for easier access. @@ -311,7 +313,7 @@ tar xzf openrefine-linux-3.4.tar.gz ### Set where data is stored -OpenRefine stores data in two places: program files in the program directory, wherever it is you’ve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking “Browse workspace directory.” +OpenRefine stores data in two places: program files in the program directory, wherever it is you’ve installed it; and project files in what we call the “workspace directory.” You can access this folder easily from OpenRefine by going to the [home screen](running#the-home-screen) (at [http://127.0.0.1:3333/](http://127.0.0.1:3333/)) and clicking Browse workspace directory. By default this is: @@ -359,7 +361,7 @@ If the folder does not exist, OpenRefine will create it. ~/Library/Application Support/OpenRefine/ ``` -For older versions as Google Refine: +For older versions, as Google Refine: ``` ~/Library/Application Support/Google/Refine/ @@ -418,7 +420,7 @@ You can access OpenRefine server logs from the terminal on Mac: ## Increasing memory allocation -OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large data sets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators: +OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large datasets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators: * more than one million total cells * an input file size of more than 50 megabytes (MB) * more than 50 [rows per record in records mode](running#records-mode) @@ -430,7 +432,7 @@ A good practice is to start with no more than 50% of whatever memory is left ove All of the settings below use a four-digit number to specify the megabytes (MB) used (actually [mebibytes](https://en.wikipedia.org/wiki/Mebibyte)). The default is usually 1024MB, but the new value doesn't need to be a multiple of 1024. :::info Dealing with large datasets -If your project is big enough to need more than the default amount of memory, consider turning off “Parse cell text into numbers, dates, ...” on import. It's convenient, but less efficient than explicitly converting any columns that you need as a data type other than the default “string” type. +If your project is big enough to need more than the default amount of memory, consider turning off Parse cell text into numbers, dates, ... on import. It's convenient, but less efficient than explicitly converting any columns that you need as a data type other than the default “string” type. ::: Open Project in the sidebar @@ -540,7 +542,7 @@ If you want to install the extension into your workspace, you can: * A file-explorer or finder window will open in your workspace * Create a new folder called “extensions” inside the workspace if it does not exist. -You can also [find your workspace on each operating system using these instructions](installing#set-where-data-is-stored). +You can also [find your workspace on each operating system using these instructions](#set-where-data-is-stored). ### Install the extension @@ -551,7 +553,7 @@ Some extensions may have multiple versions, to match OpenRefine versions, so be Generally, the installation process will be: * Download the extension (usually as a zip file from GitHub) -* Extract the zip contents into the `extensions` directory, making sure all the contents go into one folder with the name of the extension +* Extract the zip contents into the “extensions” directory, making sure all the contents go into one folder with the name of the extension * Start (or restart) OpenRefine. -To confirm that installation was a success, follow the instructions provided by the extension. Each extension will appear in its own way inside the OpenRefine interface: make sure you read the documentation to know where the functionality will appear, such as under specific dropdown menus. \ No newline at end of file +To confirm that installation was a success, follow the instructions provided by the extension. Each extension will appear in its own way inside the OpenRefine interface. Make sure you read its documentation to know where the functionality will appear, such as under specific dropdown menus. \ No newline at end of file diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index 3503e1bae..97877be44 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -1,4 +1,4 @@ ---- +--- id: running title: Running OpenRefine sidebar_label: Running @@ -8,9 +8,9 @@ sidebar_label: Running OpenRefine does not require internet access to run its basic functions. Once you download and install it, it runs as a small web server on your own computer, and you access that local web server by using your browser. -You will see a command line window open when you run OpenRefine. Leave that window alone while you work on datasets in your browser. +You will see a command line window open when you run OpenRefine. Ignore that window while you work on datasets in your browser. -No matter how you load OpenRefine, it will load in your computer’s default browser. If you would like to use another browser instead, start OpenRefine and then point your chosen browser at the home screen: http://127.0.0.1:3333/. +No matter how you start OpenRefine, it will load its interface in your computer’s default browser. If you would like to use another browser instead, start OpenRefine and then point your chosen browser at the home screen: [http://127.0.0.1:3333/](http://127.0.0.1:3333/). OpenRefine works best on browsers based on Webkit, such as: * Google Chrome @@ -20,7 +20,7 @@ OpenRefine works best on browsers based on Webkit, such as: We are aware of some minor rendering and performance issues on other browsers such as Firefox. We don't support Internet Explorer. -You can launch multiple projects at the same time by simply having multiple tabs or browser windows open. From the Open Project screen, you can right-click on project names and open them in new tabs or windows. +You can view and work on multiple projects at the same time by simply having multiple tabs or browser windows open. From the Open Project screen, you can right-click on project names and open them in new tabs or windows. import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; @@ -37,21 +37,28 @@ import TabItem from '@theme/TabItem'; -To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. This will save any last changes to your projects. - #### With openrefine.exe -You can run OpenRefine by double-clicking `openrefine.exe` or calling it from the command line. If you want to [modify the way `openrefine.exe` opens](#starting-with-modifications), you can edit the `openrefine.l4j.ini` file. +You can run OpenRefine by double-clicking `openrefine.exe` or calling it from the command line. + +If you want to [modify the way `openrefine.exe` opens](#starting-with-modifications), you can edit the `openrefine.l4j.ini` file. #### With refine.bat On Windows, OpenRefine can also be run by using the file `refine.bat` in the program directory. If you start OpenRefine using `refine.bat`, you can do so by opening the file itself, or by calling it from the command line. -If you call `refine.bat` from the command line, you can [start OpenRefine with modifications](#starting-with-modifications). If you want to modify the way `refine.bat` opens through double-clicking or using a shortcut, you can edit the `refine.ini` file. +If you call `refine.bat` from the command line, you can [start OpenRefine with modifications](#starting-with-modifications). +If you want to modify the way `refine.bat` opens through double-clicking or using a shortcut, you can edit the `refine.ini` file. + +#### Exiting + +To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down `Control` and press `C` on your keyboard. This will save any last changes to your projects. -You can find OpenRefine in your Applications folder, or you can call it from the command line. To exit, close all your OpenRefine browser tabs, go back to the terminal window and press `Command` and `Q` to close it down. +You can find OpenRefine in your Applications folder, or you can call it from the command line with `./refine`. + +To exit, close all your OpenRefine browser tabs, go back to the terminal window and press `Command` and `Q` to close it down. :::caution Problems starting? If you are using an older version of OpenRefine or are on an older version of MacOS, [check our Wiki for solutions to problems with MacOS](https://github.com/OpenRefine/OpenRefine/wiki/Installation-Instructions#macos). @@ -64,7 +71,7 @@ If you are using an older version of OpenRefine or are on an older version of Ma Use a terminal to launch OpenRefine. First, navigate to the installation folder. Then call the program: ``` -cd openrefine-3.4 +cd openrefine-3.4.1 ./refine ``` @@ -134,9 +141,9 @@ To see the full list of command-line options, run `./refine -h`. |-m|Memory maximum heap|./refine -m 6000M| |-p|Port|./refine -p 3334| |-i|Interface (IP address, or IP and port)|./refine -i 127.0.0.2:3334| -|-k|Add a Google API key|_need an example_| -|-v|Verbosity (from low to high)|error,warn,info,debug,trace| -|-x|Additional configuration parameters|_need an example_| +|-k|Add a Google API key|./refine -k YOUR_API_KEY| +|-v|Verbosity (from low to high: error,warn,info,debug,trace)|./refine -v info| +|-x|Additional configuration parameters|| |--debug|Enable debugging (on port 8000)|./refine --debug| |--jmx|Enable JMX monitoring for Jconsole and JvisualVM|./refine --jmx| @@ -153,9 +160,9 @@ To see the full list of command-line options, run `./refine -h`. |-m|Memory maximum heap|./refine -m 6000M| |-p|Port|./refine -p 3334| |-i|Interface (IP address, or IP and port)|./refine -i 127.0.0.2:3334| -|-k|Add a Google API key|_need an example_| -|-v|Verbosity (from low to high)|error,warn,info,debug,trace| -|-x|Additional configuration parameters|_need an example_| +|-k|Add a Google API key|./refine -k YOUR_API_KEY| +|-v|Verbosity (from low to high: error,warn,info,debug,trace)|./refine -v info| +|-x|Additional configuration parameters|| |--debug|Enable debugging (on port 8000)|./refine --debug| |--jmx|Enable JMX monitoring for Jconsole and JvisualVM|./refine --jmx| @@ -168,7 +175,9 @@ To see the full list of command-line options, run `./refine -h`. #### Modifications set within files On Windows, you can modify the way `openrefine.exe` runs by editing `openrefine.l4j.ini`; you can modify the way `refine.bat` runs by editing `refine.ini`. -You can modify the Mac application by editing `Info.plist`. + +You can modify the Mac application by editing `info.plist`. + On Linux, you can edit `refine.ini`. Some settings, such as changing memory allocations, are already set inside these files, and all you have to do is change the values. Some lines need to be un-commented to work. @@ -189,16 +198,18 @@ REFINE_MIN_MEMORY=1400M ... ``` -Further modifications can be performed by using JVM preferences. +##### JVM preferences -These JVM preferences are different options and have different syntax than the key/value descriptions used on the command line. Some of the most common keys (with their defaults) are: -* -Drefine.autosave (5 [minutes]) -* -Drefine.data_dir (/) -* -Drefine.development (false) -* -Drefine.headless (false) -* -Drefine.host (127.0.0.1) -* -Drefine.port (3333) -* -Drefine.webapp (main/webapp) +Further modifications can be performed by using JVM preferences. These JVM preferences are different options and have different syntax than the key/value descriptions used on the command line. + +Some of the most common keys (with their defaults) are: +* The project [autosave](starting#autosaving) frequency: `-Drefine.autosave` (5 [minutes]) +* The workspace director: `-Drefine.data_dir` (/) +* Development mode: `-Drefine.development` (false) +* Headless mode: `-Drefine.headless` (false) +* IP: `-Drefine.host` (127.0.0.1) +* Port: `-Drefine.port` (3333) +* The application folder: `-Drefine.webapp` (main/webapp) The syntax is as follows: @@ -214,7 +225,7 @@ The syntax is as follows: -Inside the `refine.l4j.ini` file, insert lines in this way: +Locate the `refine.l4j.ini` file, and insert lines in this way: ``` -Drefine.port=3334 @@ -232,9 +243,11 @@ JAVA_OPTIONS=-Drefine.data_dir=C:\Users\user\Documents\OpenRefine\ -Drefine.port -Find the 'array' element that follows the line: +Locate the `info.plist`, and find the `array` element that follows the line -`JVMOptions` +``` +JVMOptions +``` Typically this looks something like: @@ -248,7 +261,7 @@ Typically this looks something like: ``` -Add in values like: +Add in values such as: ``` JVMOptions @@ -267,7 +280,7 @@ Add in values like: -In `refine.ini`, add `JAVA_OPTIONS=` before the `-Drefine.preference` declaration. You can un-comment and edit the existing suggested lines, or add lines: +Locate the `refine.ini` file, and add `JAVA_OPTIONS=` before the `-Drefine.preference` declaration. You can un-comment and edit the existing suggested lines, or add lines: ``` JAVA_OPTIONS=-Drefine.autosave=2 @@ -288,9 +301,11 @@ Refer to the [official Java documentation](https://docs.oracle.com/javase/8/docs When you first launch OpenRefine, you will see a screen with a menu on the left hand side that includes Create Project, Open Project, Import Project, and Language Settings. This is called the “home screen,” where you can manage your projects and general settings. +In the lower left-hand corner of the screen, you'll see Preferences, Help, and About. + ### Language settings -You can set your preferred interface language here. This language setting will persist until you change it again in the future. Languages are translated as a community effort; some languages are partially complete and default back to English where unfinished. Currently OpenRefine supports the following languages for 75% or more of the interface: +From the home screen, look in the options to the left for Language Settings. You can set your preferred interface language here. This language setting will persist until you change it again in the future. Languages are translated as a community effort; some languages are partially complete and default back to English where unfinished. Currently OpenRefine supports the following languages for 75% or more of the interface: * Cebuano * German @@ -307,13 +322,15 @@ You can set your preferred interface language here. This language setting will p * Tagalog * Chinese (简体中文) +To leave the Language Settings screen, click on the diamond “OpenRefine” logo. + :::info We use Weblate to provide translations for the interface. You can check [our profile on Weblate](https://hosted.weblate.org/projects/openrefine/translations/) to see which languages are in the process of being supported. See [our technical reference if you are interested in contributing translation work](https://docs.openrefine.org/technical-reference/translating) to make OpenRefine accessible to people in other languages. ::: ### Preferences -At this time you can set preferences using a key/value pair: that is, selecting one of the keys below and setting a value for it. +In the bottom left corner of the screen, look for Preferences. At this time you can set preferences using a key/value pair: that is, selecting one of the keys below and setting a value for it. |Setting|Key|Value syntax|Default|Example| |---|---|---|---|---| @@ -337,7 +354,7 @@ The project screen (or work screen) is where you will spend most of your time on The project bar runs across the very top of the project screen. It contains the the OpenRefine logo, the project title, and the project control buttons on the right side. -At any time you can close your current project and go back to the home screen by clicking on the OpenRefine logo. If you’d like to open another project in a new browser tab or window, you can right-click on the logo and use “Open in a new tab.” You will lose your current facets and view settings if you close your project (but data transformations will be saved in the [History](#history-undoredo) of the project). +At any time you can close your current project and go back to the home screen by clicking on the OpenRefine logo. If you’d like to open another project in a new browser tab or window, you can right-click on the logo and use “Open in a new tab.” You will lose [your current facets and view settings](#facetfilter) if you close your project (but data transformations will be saved in the [History](#history-undoredo) of the project). :::caution Don’t click the “back” button on your browser - it will likely close your current project and you will lose your facets and view settings. @@ -345,21 +362,21 @@ Don’t click the “back” button on your browser - it will likely close your You can rename a project at any time by clicking inside the project title, which will turn into a text field. Project names don’t have to be unique, as OpenRefine organizes them based on a unique identifier behind the scenes. -The Permalink allows you to return to a project at a specific view state - that is, with facets and filters applied. The permalink can help you pick up where you left off if you have to close your project while working with facets and filters. It puts view-specific information directly into the URL: clicking on it will load this current-view URL in the existing tab. You can right-click and copy the Permalink URL to copy the current view state to your clipboard, without refreshing the tab you’re using. +The Permalink allows you to return to a project at a specific view state - that is, with [facets and filters](facets) applied. The Permalink can help you pick up where you left off if you have to close your project while working with facets and filters. It puts view-specific information directly into the URL: clicking on it will load this current-view URL in the existing tab. You can right-click and copy the Permalink URL to copy the current view state to your clipboard, without refreshing the tab you’re using. The Open… button will open up a new browser tab showing the Create Project screen. From here you can change settings, start a new project, or open an existing project. -Export is a dropdown menu that allows you to pick a format for exporting your current dataset. It will only export rows and records that are currently visible - the currently selected facets and filters, not the total data in the project. +Export is a dropdown menu that allows you to pick a format for exporting a dataset. Many of the export options will only export rows and records that are currently visible - the currently selected facets and filters, not the total data in the project. Help will open up a new browser tab and bring you to this user manual on the web. ### The grid header -The grid header sits below the project bar and above the project grid (the data of your project). The grid header will tell you the total number of rows or records in your project, and indicate whether you are in rows or records mode. +The grid header sits below the project bar and above the project grid (where the data of your project is displayed). The grid header will tell you the total number of rows or records in your project, and indicate whether you are in [rows or records mode](exploring#rows-vs-records). It will also tell you if you’re currently looking at a select number of rows via facets or filtering, rather than the entire dataset, by displaying either, for example, “180 rows” or “67 matching rows (180 total).” -Directly below the row number, you have the ability to switch between [row mode and records mode](exploring#rows-vs-records). OpenRefine stores which projects are in records mode, and displays your data as records by default if you are. +Directly below the row number, you have the ability to switch between [row mode and records mode](exploring#rows-vs-records). OpenRefine stores projects persistently in one of the two modes, and displays your data as records by default if you are. To the right of the rows/records selection is the array of options for how many rows/records to view on screen at one time. At the far right of the screen you can navigate through your entire dataset one page at a time. @@ -369,13 +386,13 @@ The Extensions dropdown offers you options for ex ### The grid -The area of the project screen that displays your dataset is called the “project grid” (or the “data grid,” or simply the “grid”). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you. +The area of the project screen that displays your dataset is called the “grid” (or the “data grid,” or the “project grid”). The grid presents data in a tabular format, which may look like a normal spreadsheet program to you. Columns widths are automatically set based on their contents; some column headers may be cut off, but can be viewed by mousing over the headers. In each column header you will see a small arrow. Clicking on this arrow brings up a dropdown menu containing column-specific data exploration and transformation options. You will learn about each of these options in the [Exploring data](exploring) and [Transforming data](transforming) sections. -The first column in every project will always be “All,” which contains options to flag, star, and do non-column-specific operations. The “All” column is also where rows/records are numbered. +The first column in every project will always be All, which contains options to flag, star, and do non-column-specific operations. The All column is also where rows/records are numbered. Numbering shows the permanent order of rows and records; a temporary sorting or facet may reorder the rows or show a limited set, but numbering will show you the original identifiers unless you make a permanent change. The project grid may display with both vertical and horizontal scrolling, depending on the number and width of columns, and the number of rows/records displayed. You can control the display of the project grid by using [Sort and View options](exploring#sort-and-view). @@ -383,17 +400,19 @@ Mousing over individual cells will allow you to [edit cells individually](celled ### Facet/Filter -The Facet/Filter tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](facets) and [filters](facets#text-filter) are explained more in [Exploring data](exploring). +The Facet/Filter tab is one of the main ways of exploring your data: displaying the patterns and trends in your data, and helping you narrow your focus and modify that data. [Facets](facets) and [filters](facets#text-filter) are explained more in [Exploring data](exploring). ![A screenshot of facets and filters in action.](/img/facetfilter.png) -In the interface, you will see three buttons: Refresh, Reset all, and Remove all. Refreshing your facets will ensure you are looking at the latest information about each facet, if you have changed the counts or eliminated some options, for example. +In the tab, you will see three buttons: Refresh, Reset all, and Remove all. -Resetting your facets will remove any inclusion or exclusion you may have set - the facet options will stay in the sidebar, but your view settings will be reset. +Refreshing your facets will ensure you are looking at the latest information about each facet, for example if you have changed the counts or eliminated some options. -Removing your facets will clear out the sidebar entirely. If you have written custom facets using expressions, these will be lost. +Resetting your facets will remove any inclusion or exclusion you may have set - the facet options will stay in the sidebar, but your view settings will be undone. -You can preserve your facets and filters for future use by copying a [Permalink](#the-project-bar). +Removing your facets will clear out the sidebar entirely. If you have written custom facets using [expressions](expressions), these will be lost. + +You can preserve your facets and filters for future use by copying a [Permalink](#the-project-bar). ### History (Undo/Redo) @@ -403,7 +422,7 @@ Project history gets saved when you export a project archive, and restored when ![A screenshot of the History (Undo/Redo) tab with 13 steps.](/img/history.png "A screenshot of the History (Undo/Redo) tab with 13 steps.") -When you click on Undo / Redo in the sidebar of any project, that project’s history is shown as a list of changes in order, with the first “change” being the action of creating the project itself. (That first change, indexed as step zero, cannot be undone.) Here is a sample history with 3 changes: +When you click on the Undo / Redo tab in the sidebar of any project, that project’s history is shown as a list of changes in order, with the first “change” being the action of creating the project itself. (That first change, indexed as step zero, cannot be undone.) Here is a sample history with 3 changes: ``` 0. Create project @@ -420,15 +439,15 @@ In this example, changes #2 and #3 will now be grayed out. You can redo a change If you have moved back one or more states, and then you perform a new operation on your data, the later actions (everything that’s greyed out) will be erased and cannot be re-applied. -The Undo/Redo tab will show you which step you’re on, and if you’re about to risk erasing work - by saying something like “4/5" or “1/7” at the end. +The Undo/Redo tab will indicate which step you’re on, and if you’re about to risk erasing work - by saying something like “4/5" or “1/7” at the end. #### Reusing operations Operations that you perform in OpenRefine can be reused. For example, a formula you wrote inside one project can be copied and applied to another project later. -To reuse one or more operations, you first extract it from the project where it was first applied. Click to the Undo/Redo tab and click Extract…. This brings up a box that lists all operations up to the current state (it does not show undone operations). Select the operation or operations you want to extract using the checkboxes on the left, and they will be encoded as JSON on the right. Copy that JSON off to the clipboard. +To reuse one or more operations, first extract it from the project where it was first applied. Click to the Undo/Redo tab and click Extract…. This brings up a box that lists all operations up to the current state (it does not show undone operations). Select the operation or operations you want to extract using the checkboxes on the left, and they will be encoded as JSON on the right. Copy that JSON to the clipboard. -Move to the second project, go to the Undo/Redo tab, click Apply… and paste in that JSON. +Move to the second project, go to the Undo/Redo tab, click Apply… and paste in that JSON. Not all operations can be extracted. Edits to a single cell, for example, can’t be replicated. @@ -477,7 +496,6 @@ Some users may wish to employ OpenRefine for batch processing as part of a large The following are all third-party extensions and code; the OpenRefine team does not maintain them and cannot guarantee that any of them work. ::: - Some examples: * This project allows OpenRefine to be run from the command line using [operations saved in a JSON file](running#reusing-operations): [OpenRefine batch processing](https://github.com/opencultureconsulting/openrefine-batch) @@ -485,6 +503,4 @@ Some examples: * And the same in Ruby: [Refine-Ruby](https://github.com/maxogden/refine-ruby) * Another Python client library, by Paul Makepeace: [OpenRefine Python Client Library](https://github.com/PaulMakepeace/refine-client-py) -To look for other instances, search our Google Groups [for users](https://groups.google.com/g/openrefine and [for developers](https://groups.google.com/g/openrefine-dev), where [these projects were originally posted](https://groups.google.com/g/openrefine/c/GfS1bfCBJow/m/qWYOZo3PKe4J). - - +To look for other instances, search our Google Groups [for users](https://groups.google.com/g/openrefine) and [for developers](https://groups.google.com/g/openrefine-dev), where [these projects were originally posted](https://groups.google.com/g/openrefine/c/GfS1bfCBJow/m/qWYOZo3PKe4J). \ No newline at end of file diff --git a/docs/docs/manual/sortview.md b/docs/docs/manual/sortview.md index 70f721acd..a6f3614f0 100644 --- a/docs/docs/manual/sortview.md +++ b/docs/docs/manual/sortview.md @@ -6,30 +6,30 @@ sidebar_label: Sort and view ## Sort -You can temporarily sort your rows by one column. You can sort: +You can temporarily sort your rows by one column. You can sort based on [data type](exploring#data-types): * text alphabetically or reverse * numbers by largest or smallest * dates by earliest or latest -* boolean values by false first or true first +* boolean values by false first or true first. -You can also choose where to place errors and blank cells in the sorting. Text can be case-sensitive or not: cells that start with lowercase characters will appear ahead of uppercase. +You can also choose where to place errors and blank cells in the sorting. Text can be case-sensitive or not: if so, cells that start with lowercase characters will appear ahead of those that start with uppercase characters. ![A screenshot of the Sort window.](/img/sort.png) -After you apply a sorting method, you can make it permanent, remove it, reverse it, or apply a subsequent sorting. You’ll find “Sort” in the project grid header to the right of the rows-display setting, which will show all current sorting settings. +After you apply a sorting method, you can make it permanent, remove it, reverse it, or apply a subsequent sorting. When it is applied, you’ll find Sort in the project grid header to the right of the rows-display setting, which will show all current sorting settings. -If you have multiple sorting methods applied, they will work in the order you applied them (represented in order in the "Sort" menu). For example, you can sort an "authors" column alphabetically, and then sort books by publication date, for those authors that have more than one book. If you apply those in a different order - sort all the publication dates in the dataset first, and then alphabetically by author - your dataset will look different. +If you have multiple sorting methods applied, they will work in the order you applied them (represented in order in the Sort menu). For example, you can sort an “authors” column alphabetically, and then sort their books by publication date, for those authors that have more than one book. If you apply those in a different order - sort all the publication dates in the dataset first, and then alphabetically by author - your dataset will look different. ![Temporarily sorted rows.](/img/sort2.png) -When the sorting method you've applied is temporary, you will see that the rows retain their original numbering. When you make that sorting method permanent, by selecting "Reorder rows permanently," the row numbers will change and the "Sort" menu in the project grid header will disappear. This will apply all current sorting methods. +When the sorting method you've applied is temporary, you will see that the rows retain their original numbering. When you make that sorting method permanent, by selecting Reorder rows permanently, the row numbers will change and the Sort menu in the project grid header will disappear. This will apply all current sorting methods. ## View -You can control what data you view in the grid. On each column, you can “collapse” that specific column, all other columns, all columns to the left, and all columns to the right. Using the “All” column’s dropdown menu, you can collapse all columns, and expand all the columns that you previously collapsed. +You can control what data you view in the grid. On each column, you will see a View menu option. From there, you can “collapse” (hide) that specific column, all other columns, all columns to the left, and all columns to the right. Using the View option that appears in the All column’s dropdown menu, you can collapse all columns, and expand all the columns that you previously collapsed. ### Show/hide “null” -You can also use the “All” dropdown to show and hide [“null” values](#cell-data-types). A small grey “null” will appear in each applicable cell. +You can find, under AllView, the option to show and hide [“null” values](exploring#data-types). A small grey “null” will appear in each applicable cell. Remember that a null cell is not the same thing as an empty cell. ![A screenshot of what a null value looks like.](/img/null.png) diff --git a/docs/docs/manual/starting.md b/docs/docs/manual/starting.md index 642b18f85..795c380b6 100644 --- a/docs/docs/manual/starting.md +++ b/docs/docs/manual/starting.md @@ -1,4 +1,4 @@ ---- +--- id: starting title: Starting a project sidebar_label: Starting a project @@ -8,21 +8,21 @@ sidebar_label: Starting a project An OpenRefine project is started by importing in some existing data - OpenRefine doesn’t allow you to create a dataset from nothing. -No matter where your data comes from, OpenRefine doesn’t modify your original data source. It copies all the information from your input, creates its own project file, and stores it in your [workspace directory](installing#set-where-data-is-stored). +No matter where your data comes from, OpenRefine won’t modify your original data source. It copies all the information from your input, creates its own project file, and stores it in your [workspace directory](installing#set-where-data-is-stored). The data and all of your edits are [automatically saved](#autosaving) inside the project file. When you’re finished modifying the data, you can [export it back out](exporting) into the file format of your choice. You can also receive and open other people’s projects, or send them yours, by [exporting a project archive](exporting#export-a-project) and [importing it](#import-a-project). -## Create project by importing data +## Create a project by importing data When you start OpenRefine, you’ll be taken to the Create Project screen. You’ll see on the left side of the screen that your options are to: -* import data from a file on your computer -* import data from a link to the web +* import data from one or more files on your computer +* import data from one or more links on the web * import data by pasting in text from your clipboard * import data from a database (using SQL), and -* import Sheets from Google Drive. +* import one or more Sheets from Google Drive. From these sources, you can load any of the following file formats: @@ -40,7 +40,7 @@ From these sources, you can load any of the following file formats: More formats can be imported by [adding extensions to provide that functionality](https://openrefine.org/download.html). -If you supply two or more files for one project, the files’ rows will be loaded in the order that you specify, and OpenRefine will create a column at the beginning of the dataset with the source URL or file name in it to help you identify where each row came from. If the files have matching columns, the data will load in each column; if not, the successive files will append all of their new columns to the end of the dataset: +If you supply two or more files for one project, the files’ rows will be loaded in the order that you specify, and OpenRefine will create a column at the beginning of the dataset with the source URL or file name in it to help you identify where each row came from. If the files have columns with identical names, the data will load in those columns; if not, the successive files will append all of their new columns to the end of the dataset: |File|Fruit|Quantity|Berry|Berry source| |---|---|---|---|---| @@ -49,19 +49,19 @@ If you supply two or more files for one project, the files’ rows will be loade |berries.csv||9|Mulberry|Greece| |berries.csv||2|Blueberry|Canada| -You cannot combine two datasets into one project by appending data within rows. You can, however, combine two projects later using functions such as [cross()](grelfunctions/#crosscell-s-projectname-s-columnname). +You cannot combine two datasets into one project by appending data within rows. You can, however, combine two projects later using functions such as [cross()](grelfunctions/#crosscell-s-projectname-s-columnname), or [fetch further data](columnediting) using other methods. -For whichever method you choose, when you click Next >> you will be given a preview and a chance to configure the way OpenRefine interprets the file. +For whichever method you choose to start your project, when you click Next >> you will be given a preview and a chance to configure the way OpenRefine interprets the data you input. ### Get data from this computer -Click on Browse… and select a file on your hard drive. All files will be shown, not just compatible ones. +Click on Browse… and select a file (or several) on your hard drive. All files will be shown, not just compatible ones. If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files. ### Web Addresses (URLs) -Type or paste the URL to the data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview it for you. +Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you. If you supply two or more file URLs, OpenRefine will identify each one and ask you to choose which (or all) to load. @@ -69,33 +69,33 @@ Do not use this form to load a Google Sheet by its link; use [the Google Data fo ### Clipboard -You can copy and paste in data from anywhere. OpenRefine will recognize comma-separated, tab-separated, or table-formatted information copied from sources such as word-processing documents, spreadsheets, and tables in PDFs. You can also just paste in a list of items that you want to turn into multi-column rows. OpenRefine recognizes each new text line as a row. +You can copy and paste in data from anywhere. OpenRefine will recognize comma-separated, tab-separated, or table-formatted information copied from sources such as word-processing documents, spreadsheets, and tables in PDFs. You can also just paste in a list of items that you want to turn into rows. OpenRefine recognizes each new text line as a row. This can be useful if you want to pre-select a specific number of rows from your source data, or paste together rows from different places, rather than delete unwanted rows later in the project interace. -This can also be useful if you would like to paste in a list of URLs, which you can use later to fetch the data online and build columns with. +This can also be useful if you would like to paste in a list of URLs, which you can use later to [fetch more data](columnediting). ### Database (SQL) If you are an administrator or have SQL access to a database of information, you may want to pull the latest dataset directly from there. This could include an online catalogue, a content management system, or a digital repository or collection management system. You can also load a database (`.db`) file saved locally. You will need to use an [SQL query](https://www.w3schools.com/sql/) to import your intended data. -There are some publicly-accessible databases you can query, such as [one provided by Rfam](https://docs.rfam.org/en/latest/database.html). The instructions provided by Rfam can help you understand how to connect to and query from any database. +There are some publicly-accessible databases you can query, such as [one provided by Rfam](https://docs.rfam.org/en/latest/database.html). The instructions provided by Rfam can help you understand how to connect to and query from other databases. -OpenRefine can connect to PostgreSQL, MySQL, MariaDB, and SQLite database systems. It will automatically populate the Port field based on which of these you choose, but you can manually edit this if needed. +OpenRefine can connect to PostgreSQL, MySQL, MariaDB, and SQLite database systems. It will automatically populate the Port field based on which of these you choose, but you can manually edit this if needed. -If you have a `.db` file, you can supply the path to the file on your computer directly in the Database field at the bottom of the form. You can leave the rest of the fields blank. +If you have a `.db` file, you can supply the path to the file on your computer in the Database field at the bottom of the form. You can leave the rest of the fields blank. -To import data directly from a database, you will need the database type (such as MySQL), database name, the hostname (either an IP address or the domain that hosts the database), and the port on the host. You will need an account authorized for access, and you may need to add OpenRefine's IP address or host to the allowable hosts for that account. You can find that information by pressing Test and getting the IP from the error message that results. +To import data directly from a database, you will need the database type (such as MySQL), database name, the hostname (either an IP address or the domain that hosts the database), and the port on the host. You will need an account authorized for access, and you may need to add OpenRefine's IP address or host to the "allowable hosts" for that account. You can find that information by pressing Test and getting the IP address from the error message that results. -You can either connect just once to gather data, or save the connection to use it again later. If you press Connect without saving, OpenRefine will forget all the information you just entered. If you’d like to save the connection, name your connection in a way you will recognize later. Click Save and it will appear in the Saved Connections list on the left. From now on, you can click on the ... ellipsis to the right of the connection you’ve saved, and click Connect. +You can either connect just once to gather data, or save the connection to use it again later. If you press Connect without saving, OpenRefine will forget all the information you just entered. If you’d like to save the connection, name your connection in a way you will recognize later. Click Save and it will appear in the Saved Connections list on the left. From now on, you can click on the ... ellipsis to the right of the connection you’ve saved, and click Connect. If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way. ### Google Data You have two ways to load in data from Google Sheets: -* A link to an accessible Google Sheet (that is, one with link-sharing turned on) -* Selecting a Google Sheet in your Google Drive. +* providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and +* selecting a Google Sheet in your Google Drive. #### Google Sheet by URL @@ -111,11 +111,13 @@ This will only work with Sheets, not with any other Google Drive file that might You can authorize OpenRefine to access your Google Drive data and import data from any Google Sheet it finds there. This will include Sheets that belong to you and Sheets that are shared with you, as well as Sheets that are in your trash. +When you select a Google option (either here, or [when exporting project data to Google Drive or Google Sheets](exporting), you will see a pop-up window that asks you to select a Google account to authorize with. You may see an error message when you authorize: if so, try your import or export operation again and it should succeed. + OpenRefine will not show spreadsheets that are in your email inbox or stored in any other Google property - only in Drive. It also won’t show all compatible file formats, only Sheets files. OpenRefine will generate a list of all Sheets it finds, with the most recently modified Sheets at the top. If a file you’ve just added isn’t showing in this list, you can close and restart OpenRefine, or simply navigate to an existing project, open it, then head back to the Create Project window and check again. -When you click Preview the Sheet will open in a new browser tab. When you click the Sheet title, OpenRefine will begin to process the data. +When you click Preview the Sheet will open in a new browser tab. When you click the Sheet title, OpenRefine will begin to process the data. ## Project preview @@ -124,36 +126,39 @@ Once OpenRefine is ready to import the data, you will see a screen with Parse data as and some settings. You can specify a custom separator now, or split columns later on in the project interface. +If OpenRefine isn’t certain what format you imported, it will provide a list of possibilities under Parse data as and some settings. You can specify a custom separator now, or split columns later while [transforming your data](transforming). If you imported a spreadsheet with multiple worksheets, they will be listed along with the number of rows they contain. You can only select data from one worksheet. -Note that OpenRefine does not preserve any formatting, such as cell or text colour, that my have been in the original data file. +Note that OpenRefine does not preserve any formatting, such as cell or text colour, that my have been in the original data file. Hyperlinked text will be input as plain text, but OpenRefine will recognize links and make them clickable inside the project interface. :::info Look for character encoding issues at this stage. You may want to manually select an encoding, such as UTF-8, UTF-16, or ASCII, if OpenRefine does not display some characters correctly in the preview. Once your project is created, you can specify another encoding for specific columns using the [reinterpret() function](grelfunctions#reinterprets-s-encoder). ::: -You should create a project name at this stage. You can also supply tags to keep your projects organized. When you’re happy with the preview, click Create Project. +You should create a project name at this stage. You can also supply tags to keep your projects organized. When you’re happy with the preview, click Create Project. ## Import a project Because OpenRefine only runs locally on your computer, you can’t have a project accessible to more than one person at the same time. -The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting#export-a-project) and import them to new computers of your own, such as for working on the same project from the office and from home. +The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](exporting#export-a-project) and import them to other computers, such as for working on the same project from the office and from home. An exported project will include all of the [history](running#history-undoredo), so you can see (and undo) all the changes from the previous user. It is essentially a point-in-time snapshot of their work. OpenRefine only exports projects as `.tar.gz` files at this time. +:::caution +If you wish to hide the original state of your data and your history of edits (for example, if you are using OpenRefine to anonymize information), export your cleaned dataset only and do not share your project archive. +::: -Once someone has sent you a project archive file from their computer, you can save it anywhere, including your Downloads folder. +Once someone has sent you a project archive file from their computer, you can save it anywhere. OpenRefine will import it like a new project and save its information to your workspace directory. -In the left-hand menu of the home screen, click Import Project. Click Browse… and navigate to wherever you saved the file you were sent (for example, your Downloads folder). +In the left-hand menu of the home screen, click Import Project. Click Browse… and navigate to wherever you saved the file you were sent (for example, your Downloads folder). You can rename the project if you’d like - we recommend adding your name, a date, or a version number, if you’re planning to continue collaborating with another person (or working from multiple computers). -Then, click Import Project. Your project should appear with a step count beside Undo/Redo if steps were saved by the exporter. +Then, click Import Project. Your project should appear with a step count beside Undo/Redo if steps were saved by the exporter. OpenRefine will store the project in its own workspace directory, so you can now delete the original file that was sent to you. @@ -162,27 +167,26 @@ OpenRefine will store the project in its own workspace directory, so you can now You can access all of your created projects by clicking on Open Project. Your project list can be organized by modification date, title, row count, and other metadata you can supply (such as subject, descripton, tags, or creator). To edit the fields you see here, click About to the left of each project. There you can edit a number of available fields. You can also see the project ID that corresponds to the name of the folder in your work directory. - ### Naming projects -You may have multiple projects from the same dataset, or multiple versions from sharing a project with another person. OpenRefine automatically generates a project name from the imported file, or “clipboard” when you use Clipboard importing. Project names don’t have to be unique, so OpenRefine will create many projects with the same name unless you intervene. +You may have multiple projects from the same dataset, or multiple versions from sharing a project with another person. OpenRefine automatically generates a project name from the imported file, or “clipboard” when you use Clipboard importing. Project names don’t have to be unique, and OpenRefine will create many projects with the same name unless you intervene. -You can name a project when you create it or import it, and you can rename a project by opening it and clicking on the project name at the top of the screen. +You can edit a project's name when you create it or import it, and you can rename a project later by opening it and clicking on the project name at the top of the screen. ### Autosaving -OpenRefine [saves all of your actions](running#history-undoredo) (everything you can see in the Undo/Redo panel). That includes flagging and starring rows. +OpenRefine [saves all of your actions](running#history-undoredo) (everything you can see in the Undo/Redo panel). That includes flagging and starring rows. -It doesn’t, however, save your facets, filters, or any kind of view you may have in place while you work. This includes the number of rows showing, whether you are showing your data as rows or records, and any sorting or column collapsing you may have done. A good rule of thumb is: if it’s not showing in Undo/Redo, you will lose it when you leave the project workspace. +It doesn’t, however, save your facets, filters, or any kind of view you may have in place while you work. This includes the number of rows showing, and any sorting or column collapsing you may have done. A good rule of thumb is: if it’s not showing in Undo/Redo, you will lose it when you leave the project workspace. You can only save and share facets and filters, not any other type of view. To save current facets and filters, click Permalink. The project will reload with a different URL, which you can then copy and save elsewhere. This permalink will save both the facets and filters you’ve set, and the settings for each one (such as sorting by count rather than by name). ### Deleting projects -You can delete projects, which will erase the project files from the work directory on your computer. This is immediate and cannot be undone. +You can delete projects, which will erase the project files from the workspace directory on your computer. This is immediate and cannot be undone. Go to Open Project and find the project you want to delete. Click on the X to the left of the project name. There will be a confirmation dialog. ### Project files -You can find all of your raw project files in your work directory. They will be named according to the unique “Project ID” that OpenRefine has assigned them, which you can find on the Open Project screen, under the “About” link for each project. \ No newline at end of file +You can find all of your raw project files in your work directory. They will be named according to the unique “Project ID” that OpenRefine has assigned them, which you can find on the Open Project screen, under the “About” link for each project. diff --git a/docs/docs/manual/transforming.md b/docs/docs/manual/transforming.md index e77d1a2d3..50cb13610 100644 --- a/docs/docs/manual/transforming.md +++ b/docs/docs/manual/transforming.md @@ -10,13 +10,12 @@ OpenRefine gives you powerful ways to clean, correct, codify, and extend your da This section of ways to improve data are organized by their appearance in the menu options in OpenRefine. You can: -* change the order of rows or columns -* edit cell contents within a particular column -* edit cell contents across all rows and columns -* transform rows into columns, and columns into rows -* split or join columns -* add new columns based on existing data or through reconciliation -* convert your rows of data into multi-row records. +* change the order of [rows](#edit-rows) or [columns](columnediting#rename-remove-and-move) +* edit [cell contents](cellediting) within a particular column +* [transform](transposing) rows into columns, and columns into rows +* [split or join columns](columnediting#split-or-join) +* [add new columns](columnediting) based on existing data, with fetching new information, or through [reconciliation](reconciling) +* convert your rows of data into [multi-row records](exploring#rows-vs-records). ## Edit rows @@ -26,8 +25,10 @@ You can [sort your data](sortview#sort) based on the values in one column, but t ![A screenshot of where to find the Sort menu with a sorting applied.](/img/sortPermanent.png) -In the project grid header, the word “Sort” will appear when a sort operation is applied. Click on it to show the dropdown menu, and select “Reorder rows permanently.” You will see the numbering of the rows change under the “All” column. +In the project grid header, the word “Sort” will appear when a sort operation is applied. Click on it to show the dropdown menu, and select Reorder rows permanently. You will see the numbering of the rows change under the All column. -Reordering rows permanently will affect all rows in the dataset, not just those currently viewed through facets and filters. +:::info +Reordering rows permanently will affect all rows in the dataset, not just those currently viewed through [facets and filters](facets). +::: -You can undo this action using the [“History” sidebar](running#history-undoredo). \ No newline at end of file +You can undo this action using the [History tab](running#history-undoredo). \ No newline at end of file From 61f531b6e7eb305000e81da28ab37c97a2396463 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Mon, 14 Dec 2020 16:22:13 -0500 Subject: [PATCH 05/21] Images are good --- docs/docs/manual/running.md | 2 +- docs/static/img/scatterplot.png | Bin 0 -> 5943 bytes docs/static/img/yeardata.png | Bin 0 -> 21833 bytes 3 files changed, 1 insertion(+), 1 deletion(-) create mode 100644 docs/static/img/scatterplot.png create mode 100644 docs/static/img/yeardata.png diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index 97877be44..6d385c275 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -162,7 +162,7 @@ To see the full list of command-line options, run `./refine -h`. |-i|Interface (IP address, or IP and port)|./refine -i 127.0.0.2:3334| |-k|Add a Google API key|./refine -k YOUR_API_KEY| |-v|Verbosity (from low to high: error,warn,info,debug,trace)|./refine -v info| -|-x|Additional configuration parameters|| +|-x|Additional Java configuration parameters (see Java documentation)|| |--debug|Enable debugging (on port 8000)|./refine --debug| |--jmx|Enable JMX monitoring for Jconsole and JvisualVM|./refine --jmx| diff --git a/docs/static/img/scatterplot.png b/docs/static/img/scatterplot.png new file mode 100644 index 0000000000000000000000000000000000000000..215bcea6a7c87454e65a2a071167c32be0b993e9 GIT binary patch literal 5943 zcmdT|S6q|Jz70(fM0!U+s)UUkK#`4iZ|AGt?)=D$a}gN8@R(cLPw`C%i%}&^u`xYXbo9<5-R!Fi^{9-SsU!0f6(L zPk%J+ZWwz2fVam0tPSnrPiZ$y)-Ry z{v{~7SeKt`wELEg6P-_5=f~JC!EtS=>$V-Opvrp$O*yfb@*Rerk8*P}zBHfpx+W%m z#jED$I?>(7`S$Z93BG*W4R4KaBT~pEtC~n@_AoL1E;`6XrbIFb$`qEb z$)khxevYY1a_$yE$62995p;kMXZE7Uq{IuJ1D?hOuXtbw9O#(INEJ`Lob{1s^Z>xT z_EdRYazRx%#_eND948FQ*t{>fr%2BLK+$-_bl{?7^sR&;S%TtLTW%O6695nc%B;v% zVC+ACs8l|Oavh^V_xk_~ko1VUP|C%)Q>+sW(LI5i&g}uKl|0xaFQPY zn0xry8?I}y$_Xop^s@$hBE~vhfzaX@(k0Z^-q;CpG=Z)QP?vl2$S4gx)?p>BnRnUDvMo_Yrv@hdC&IyZq_o zXyUf_gB_o;h&eHgq)f&;jA2c&DbyBT#nd4bkC)=$nCFCPKsDi;Yz!q$|0FJn`r}2J z#I8S91wZ>Ra35muSaF;lTZyQ8UPEM7y99L`@JBPtHw2bgen|W(y<__G{x<#b83fx^ z20AxpvwvuKJWS6ZB@ zpd5AXwM#o9IA>GSNNTA%Jk_@>V>5fwAo!&Cd^#!nz^Yk=`}yaJ{5p!#;%1vn@2*2s z)5gp0mun1b&d-$>{hOKy`8=yhjv}#A_8HBKTa&e0%I*BCN&oNEQx^H)~Lt zr8}xlq~Tv*+m|gYx%B=`N$rd{MKc`9GLwT}WE&_iXUV(pjCDzW(uqL4LkdXlk_Imy z!MKvfY(F2I7Ys*X#qw#=NySgB7HT3<(e9O&iGO7Hqh6EB_?l*;E|&@hT zvtV@Yc7evgW5a6W?*NVMsN_kv84LmIFWU(g%{?mTQFb7l@8%S-Gn}P zum7prb==}68X?4uGvg207pp$=P7e2Q@C9SbT^&LX(>V%RQwp(4rfCT(+lweDxxH?k zoMn6!0wXs=A<5T+?f8OxFA~;=$tQnQC$F$v$>xOkmfoS)N#SR6*p{6z8s{F}cFZ%zCTXF5O>FBIT9P^Us@ue? zqpq@>u9z;GlfH5?os7QRVMLGVT|hI}Ftqa}z673$+1~$7pvgUsVDuVSakAW*8gF0e zl2P@}opL+w%2ZhU`fYxs58;06L0w%t{jRAhGLo5Fk@l_2whx_gvf+em8UimirRLrb z|MQ}+c||aP=n>a>@Kr}wVUXiHr@eBiR04OT34a^Wso(E&%FC(~5WrVl7)6y8wNIfK z001lTzZ785regpg{v10GKx_{L1%!;yL1+NxY+wd}zQ#N!z8<*-aOSt_4J9UdM7gqF z(`~5^&IXAdc^6Kc>C>|HCGb=&ek7eud7cCYSCPM+{-!VZ@F^_*&NE1OFL;sbXy z85zkgClv3Asz-c?$!d9Cdw6+&*UadpcY8J5)&QW8n#|sRMftH-Nn-5_f+$Zp?OV2bpxsR{e5o*wfg25=P_=j}A9MyM>?x>3G`fHN$a~+6dcC*o63v zSncZy9xbP9VyIX=4ZkTDL}alVue3q*v(&M}z@ZrG5}5C*o2+(9Oa5Ng}zuJ)TEQD`u#6Z{4V_H{#3rPxCY;>Bx7j73?Vlz>+sl|5lJ`I^Ua!X zXq3klh(M|kUvFsntr@v0BeFBW`fJm6+6>*jWGd?BrE`>l!Mu17MBpNiBV>AgwMgjv zi`RS%7DJ{CVn6`5815mM^XAQ)gz@Qz;63v(_Nvl(0lcX7vO*z3tM~rJ-mjYAV3$Mz zm|PX_=)vpaV_7PUFM*(KBNDjWTur~udOhT1y=Jh6sgm6ziE_9=avH2kZtWEnl^_~4P+AZY!$xIQ#7HDA z(CVRE?s#)xL*0>@|Ndj~fP-((d{|fkf>^HPMp<8Xtd{s(OrvaFLxXj)zHmqLPN6-3 zn+lmNMTOzzsX>Pwtwo1z3F4Q9>1kUv6 zA8%T=g{?;&L^3Xsl*&)(V33CaJ^ep3N0I|l60n=&+?K2*UGnos)MhF+f+p@O zjQ^9oSZ;hm#YlBombdP<3hzqZA&7vE_ZAM1&0=&l$wyX^(E zGW{mF$tt)l941mTG&ZI;6LeTYr0?vwszZB?&M5bvb zz2^Dr!wImIG7CqD%A592&Rbg=20Hw{!na*O*uY1E;CeCQ!|_iK505JG_A5bN2DBX4PToH$g7S>Xs#l1^8KY|mR!lbgL!&* zf(ij#<%$QM6c2SIsSv_Jas}CTHs$tb5mU7g#PBIB^&2p;qFT18+W439@1CNqYTa{B zV8>V(w8E*su+DR7Xl1wv@6CD&#LB4E?eCv|tQ6&>2X7Dr#VbQYY5b^2Wa;jTTpiRC zEokyTAeR|RQ1?UB>{F}`)N`>Xx3Ex}d_?TWA6gO?7$7un;velz57p{3k@hM!kv^9Y~`-bGJs;(A{d@gZ-UEq9YgNt0Hc38;3!kape-!z7i9RR1C z=LJhCk+wgxxvY+0D;>Id1%-wGsALFlE-HHA*qaW!D`I8>jZ@!cLV9_nrWrS3C2>yK zzF%y1H>R7c5oKpiLs&d1j53?{UAo{m8_Kv`(XW|j*B+ZT{_<$CtfC@mvu;TQ+3*xy zPwczi`vM3(WX*YPh_0WHy}pbs$~K>Lr9u&OQ4Y=QvxmRqvb)f^*v|sC700>^7h>0# z31F!kq`4@CqdR*ALyBFZh_2%@LU&4WuqI@~R_L_%QLxloo4Uj&4I3(c>+jovhK849 zaEDtTW~KtPo^58c=X~5Sm#O-LCttc9GLA+!tHx~eYG#5mRNSU-_4f8sENophbbX6^iTzq`ml>;m zgMbKw&Kqd7U%Ixg^>dcallb9v8qeu`Ccsx;UpumRSQ%(F=Vr)D>X?DLAXMsnZtr$# z^#WvX2>q~>M4%JfGeF0iTQ`CQ66m~C|N+FUMQrYhAt=^?vUn)wPV%0#xl?g~sjWm{Qoe*s?n z>!W|D-*s^0z}p2i>2aK7>z+RqwHU^gn<9^*^*A00E!~4Q><`?r&cIh+p$c5%r_*hz z>k>OphukT9aMAyn%B!iUnH$5?9wCR5qAQd4&X5nCYn?P>y!y;$TyOJDz4(C-&WCG> zD*inj7~Rgyt3N~odD?`62z0md)=kN(Lk$_alSRXI^K#<;c@^s4wWrmyD&PKc2m%af zS79Y<4yaPMV_oj%E?<*qImzfdtbBPbU{1^{kB97%7t$U zXEcWIriC6%`zsjdXl5<;q}jzfczKP$ji~lfoQ|4+DwK}oRo`9vdh%+q%XqoNsu~TR zoFwo{f3-`lET1Ylsik#`f_}u_YjTucm=vTl+GVLPOV>OF9`vVirfS0FrXS`+jvtT4 zN(;V!e-Lb}!Onwgb)AB z(eSUXn@UO>$u~%Hrsi&BYAV_vDyDh5jL$I(yQYq2VN)2%~iLB2(am4ls=K3-~7kKT=wZW znadpKehg{5MWS8Uj2p4x+47GoUcBZd?~byNnOh%{>mv!0$6B=w1DhSV$1@iph?O@& z;hIOt=%XVCm*1xDwELY0XH>f%BsK0c%%0U=co&}i?GwBL#2gN{f7tm^zmG4Lq_*7s zb_oG5&y>BM2b`>t$BCp1+O}_5H=9)C)Bg>?xlddn8ns4e8cHe3=eD=Ej|e%!n3SazFR9L(^L4*inF6wo4do5o1IZ8i!k;#06}yXxp>DwOfF2Cw5$nk0sUHYuppVZ$jz5~P<&rQPu=KT!Tg9yVgunVh=7nFi$qU7w!9vg2kwGQT zsCTW9p+UVLgWhNnid7U4XwVOE>nNZ2k`l_uwvt5$_~;YgIdHN=$i5YQo|U`{F}Hvw zDocP%vcMS!?1=l5on@>v05N8j^4pqDH31}^!EKB}mJ;?7kX}3f3O)J;yX0OdhMnHP d7ex`Rx!VN%@CQQ?^|Bjapko3qy>UP6zW`TL3=;qV literal 0 HcmV?d00001 diff --git a/docs/static/img/yeardata.png b/docs/static/img/yeardata.png new file mode 100644 index 0000000000000000000000000000000000000000..8250d4036172dfafc7004d526741cfdc57ff6c64 GIT binary patch literal 21833 zcmagG1yof1+BXcMARr(JNF&|dEhQi&Ej4s^wwZuL z+`v02ONqgi4HNAGJ5S9-1AaN3+S@4+xDj#EfmEQ_ndjs=v;{M?fXr6d?j2@6&EyI83FtLkrUkB#K1d z+T~-Wtqu^jC6R1`xx98POiau%T%!|mU@OK#WC3on>HFq;tZaxsete?7j7U1?WwQ1& ze5+)7Cu{c<#U#tiDWP`!S*UyMLPfiVJLK7=ZyxJ<+u007?^B%=?>iv0B;1_OJr=tr zFD73B-)7I@D|ip#J~H$6Sjw^~IJhq5PMd;sHk97GwRP)&g1p$w_9P~SdDpFDR8nG! z@^MRW`uoF4?3^okHvcwadAu4o6PA*V^!4o$pKaF z%d#IUE(aALISjHWQshiHuNhanVKY`$kb?Lh;+mo)TI_W?dXX4{?ZNHW{639mlcCC@%W z?3TLqaz%oU7g_=fx*mUhLQ^XBKA-0~jn3QyYfDI|j&M&+nmdD zof5hK(c0T|@M?Uaf%lPL;W72~$@=4+(e-@!eeyixjfU@o)Ri!ufa{%)YqCG{`qCm_ z^MT%7Nha7U7})BT^8GZJTFUDc%i%??UXhMj{T#e3pU--Kx1N)J1l&{co}f)_X4{Bm|^ZGA$`#UG@3}2zKVxI#(ot8&xV` zcfLRIRF)VhlNDzG_#d4Z|R(3 zaUD&hE0RJW_DI1Ir+uF0xE-k-TD-Jr@ab{MhXW#w=-VEcI zft2M0KGwwdrR0MUXl(vB1I5URxJh%=%ZgRAYrYrO^(R!vMxnmjOT_oI8PV}9l-x_H z8tuU>KH@$;$5N-UiUQ5%+Xg4(a|y!Ves%J3S>vBic69cSq!oe(CTan>e^5 zw2(QO8f_#7n)u0fVk|J_b6P%=VxxZIG_^N67HwT~w~>4VB;_Dabx7vuUQPDYN?A5!t}BL8$9<$A z;cn>LgV#sb>L1#nv1nh~GWCWbkW8?*qKlnR&*s8D7K90Uu9mf=0orf_wPoH=hz8t4 zO4uQ7*sZ3kB6b%l%hOQQedx8+VY%LI7-aj#@$M!mQ1ee*W$iH|d1OUJ;K@C$jJ4wenh%%uG&tz>WbfdL}upx1JcM%!xU z!jg|q*1q~68#Q**@zjI8C*pQ)NKcQBQWB)qiE}@+RnD>o|osR zcU0C5_0Nvw@sGvnZaMme>bKp7Hqm%UW+5sYZs6A6%CLiQ8DsUhJF?+xcLh1Vy+i|> zw(G?;ZFPzih_#=ip9@z(mRdz_-KdwL70JtVP9r>bJiB}aUOwH&k*p%iLyycVd6mCI z3(+srN4Q1$3{$Q^H-YEiCX2IPZsJzIEJ*R8oj3ZU_WkBvsp7rBcGn0KSYEry;)CUn zg)K2?k~isFjnjUY)+XfgLX3oR_x6IbQn~`%&Q{wS*72UC{p)dG#uE*w#wT50`+lSZ z2$x<@qVir#zIPX&NRSg>E$P`R)#j1%$=Cj6#+@a-OCi0R2%pc?q)AKbBw39oyLyuS zP8rrt-*5Rgrr+6aJs|Tzw$rtUQp>pdr$>w6PDs55amoJR2G z5>NEK$NTTJw*rq3BP;V;x0hazhmb^vp?t*z$o57n#xtLnnY}l}1dn0fEG`+o`**`h z)DCWJ+pQ07ksqKJCf@z%x*7ZF*NU#0)eD(+H|+ze7O<(Id?nta>uB``dtzhd5q0Ki z_CXNbXowzO;yEw8<@6-)>`<_Rlm89FtpE4Yv%&MnaUasdV;5mT;*9-=@>$63k>Z`% zc9(Y0Z=EgAO#BBfUD=gmq^&m6SJZ-#fv0NqyE|si6Jc8YUhE$%DL1Ycv|<(RHqOtM z+9I4Vorc*noEbtV!+LK}rkIN`bQMV{>Ny&KF)-(a5_OV&l#JZPeFppGf?Rf~SH;LS8OsbRDn-6+7>~_z@xUxq4a(&ceGt7@ zO?<^VsP{N!(iw0owFCxtlYtv5MZ#}??s@hSWR~WN2i4bgJ*?2r@cj1A)a&7J{m#Y3 zqe*>d;>&&8b!GtdkeP_{9%*auqS??|oXhQJ=aQxVe1kBNL+`U3X0`-nZrytR`?#fN z;t`89#!AEYrS~3>v!2ulDGv3UcLkq@+Lq2N$NhPFO}mU{x$l9`_8a(Yc7uB0Ue`4j zs|xFM_pJtRHPVarRSWuY;^574z54vViAi(Z32b$RxBbFeq?eCsr4Rjy1f2NF*>~-A zLg0H}GZD%K)3GO!JE{8@+#bmoLdbka&=bz~Mp@hQ(3?^Ato#p5sq?BQD2j>rk8t>W(1KYXJbm^b-SrMjGaC3@#rP`OT z`TBO;EXKX5O@?l%oVkH+dXAlr%1gqmxXW)k+XtrKHOYhXjIFrZjoYQ{QCPTjE8dCzV}$P4)r$ z#!XpC?<)^1l;1qunlem5=P8c;PPg2%llSz__s5T6kM2c$lOnGQX4Q`DGE6y(T)cV} zhL0b9nR#!=LiaiAcY~e_3SW3U3Jtb_?+!8zGV)oM_j3`cAb1`kgh$a$c5TZ@BgeXZ zP<3g(_H$|pz5{~%7;wv}+j{cqbyM5cQy{=1_y*qSqUvIg{c(h&du&fRg zg%iI(IV?C^Y;c%ri-ZBf!EFSC5|3#Np3;Jq2z%P zS!=D!V9+hgV?XNE*@6fZ?qLNYVsp#@UK4Y9vZipc_FA|E9Bl5z_BE#}Ioh=@Fw*yK zou#*bu^nlt5xkU`ILVE6xA5xvvF!@7!@Csx6^XQ3e7eo2+5TWL{EE%4r46NJo!I^C zgv4t;YF!u_z`r(#k^kl}!wQ;gD_hBP4>gJZROx_n^xyb=7!38NxT=9Ejqgp%G^^u#M zfyt(q`)O!qwO)ITGx$MRx$A1!2YP5U)Y!BPQB0d-a}jCX?xi}GKlXV-8%zkqSW_vsP(xU!_zNb0EPAp%D?ZI`81R}NK=5BDYuO?>N6lKI5}4ZAf8k*V@Oq$gf?J-)n7=1JJ3 zJANptT?xjWb$HV0<|Mf=o=u%q_XeH{19hz#>2=@N>zT(X(@d#OzqtFMU#YfCDL+!z z!@ajThSHY6eK*FB6Ep;eK4%2TSl!;Nw}POXE%2i0I~L^UEcct4jkN|<%+Oy8>0r$J zqR2w%MYa2u&~bHQ1UiSQY(g5;%=hT^A$Xvv+7?&Jnr!*m)Xah=$=C_IuQiE-*=3tivwbmI>`{KP!GMWPd?W- zk+aA3hoiBs?({)v;vkChcrS+STX2Ez5_7DUbR!R}a)&5f$`ZFkZYp?uT2i#Rrzw*$ ztohWXKXdH69;%!hT^TO4LhF1K#HUR%m3F+bt|@twtT4-4uW&OwEz6tr;wE$~{NlZT z%3iY@h;~Llr5B%Wy>Cy^Mghy=V+jvBY+YiNfm*NRy_WCpbTkbH1}0Zz;5gE}&g$;l zsB}D9p$$&Hn=V^sqmvsq_}C4J0n!u29(q;1t+9|dWXe~mkn;nP!)kF=KA){YA6wy% zw;2H0w|O0Tr@9V>Vl;74bV674Y?Qwml}RR$9xd7hp=c%&5F@{#mX2GO`gFl3AQ4B1 zOOk+Wn1A&n4Nl`5C(frU-@(RXpX#$kS%BDhazkKy^{ zp~q4yn)v9sU}O5@DYByBmi8=E6F>}AQCkU7$38MlS)3xSg2b+=$N?=PDjd^ubEb(Q zr7voM(;#}*D<3}ILQ6WR+)U2Juw*JO{v@)#uIS@oNSex%uS5)?KL&)$fg##Ge_#Qk zf#xa39&ms^hFws^yfwac?<2oY@aIfU-jdQZ7SgN!{uGlh2$=|kC8+$kA9(=wf_~}K zKqg;0mw_8}i*S^t@jzBQYon6($b83?H`>Z5VTrie+B=cC<%Y;WEQ`b1zZ+Mbtr>{pt-N$lbP5Fv&Vz%DF z$fDj05WGJB>9$oJz#X%1JM!OU6U65KeuPxF7yaot051cqweUS&3vd%$?BDT!+x^9p z_7qcXdAq%Tl`=Tk6-Doa-*x5>!{`$+1~i{Jx|vDof00t)_8*RFa(SB4=uFsG;)&3V zNrQuGVh#_83^-7po=M%zq5BTtMc*Y>jQEx+NoB1N$=gCOjOmm1U z(bXyN>U{6P?n57?#hw*sbzNcYdw<@@qtb2gm!)^+D*3?0>HN|s6ymUY_Dv-hvpmM>&!^|OrfI;zASXnHo20<>M;_*Cor|2Fo7n7lx=T#}p@D`Y@%&c> z!Jn#q8?|AnDk=?)KX`Pje6hfctuny3$NrFdv~Ge_T!%QLWNm@wWU6bSwEtmKnbb6> zMBFRw2~Pi9irQGZT5X$OqvZF-(lNSD5+X|k+-+xhO%QIGJgBRsMKH*P>G2cnvwojl z;03w*mZ|x>WHL5mfG#T6Q3H2L)qL`ne)4l9JVxR^k;pVa{JD(?ROV^ zT%vZ*YyAfJdbYIMJYF$KX`i$ffj<{C**mv0SCknzR^m(9qlON(@?wrdL+YZX4%aXD zO3L@R$jMQf&b}rdNwC~EE-0K~)%+$rd&(03S#^{^B#y=Xu=4A4dK>T$;wjid9acBg zX+t?58@c;dBe_K^M4~n9nplJz4}O17Mcirr|rP<8tXDtp|CNs2BeiaK5lShMlgK@&;sV$`X}VY zmHkL8s6}IJatls%DdY|>-rY}is9Rzi-F}Xkd5+k0s5S(>MgR2lC`hNNggL`fsi$kr4Mjo_v_S&>MF$|fLvX~HnVy`625mf(Vsjq=<;Gv4q~i$JuZt>&zU zL?V*kCntE+wsQK6FXeM3Y9?6SF{g6mo^&a9aj*xceY*6ht$Qs{i1X*Q-a;-z=q2fgG`yec{zQ>|mFXzrOW7!adJyJ*1AB@|i`)>cQcsHGuP zdLRD^Q;Cg8!LRT)y+E1-V#q3`?G96yNUpzcLkL=zOhXrYu%I|c9DR9+_i;0Fw7Wuh zCS*>)5YM3WHKrQ=*Vjtn4sdF~Bs5Zy4`+Wi+9+%K-a4rDc)`Z(G`@7@>{+WGQ{sUP zVYPAp8x_~ps$|wwpezbM#S^!j4bSo#75Thfp6$XT;16!=hS~-LGwb#GOlY-u@xcLz zk$U;3iI=+dwoadNYYl7^ckuM)k+xyH%9F4WA&@s`)CE!V(haM{Vd8_i#29hPV3y*$OYxwJ88at z3B0Hcg~_vWqXKlJ9l7dn(k34a2AA=Yx!lau?jM0(;erEK%g5!V5NIrEbw5zF?i|MF zp-MsapTGu%jj@VPi(_p~5uA%lw+(e#vggW=@0d12#pd3a3S(ILWpfO32w}?>zoKjd zCC2i9n|Ld3)^2Jy{odDF3F@T=UZQ_L+V6Elz{`D|o@mY#hAF$~&#EFsH_Irhr9(*NtpG?Rxy0 zpT9Y6l#2Llu}zy}055YsoLRt}Yt7yLb(z9$hQ8}8vw!64=eh$tY+Np{6nt+@7+a79 zmpVr!RXY?dUsp&p;=zB9 zmShHfA6eAxm_nR~I5=r(f`vR7xW$&0dg(Wwn>as-=f9;$o~;9n6E5}3l{|N-8J8rv zdF>qANf9J(F0Glt3qx}GS>vDpN~8OrPGGO zB(Mcx->}Za2LLv-_Wuhm0saLBQF1ZJJWu+9i7WZ z*(jz6wb_=t{T|Gn`3B)?+@_c_m;sgw^KDw)buJ?^WSqi4^m*AI=kPI!7f-<-JE?HK zr>GSY{53{0LMoYXng~=2l}*ebhHN*8%=6V25AaRovM-&0tdksRHq3B*I+H!e1Y(CX)=XBuP-&aZSt>rC+I~jK<7GeDrBzG zlR~R8cs4{Q2Ra{|mX+}6b>{2BBXLa_oBaJ~YD~nelF=K%58(P^;<(9^gh=v{U8{Z% zyPLAq;KD2J#hw9lRU5R5oSAWgt-+@8?CC7=Nf_86o?FK*teBeq@$Klx-r?pa!~f zMYOmtucV>(!7e*CbU{cPbrzxA`N4>DleH za#mEZExc630eD_QHF&TqiC9|L-`w{XPoCly%1bAVD91Mf2p|4)`2!6LMu_4SexkrZ z>0!J;I$sM8w%$ZyStM)Y+p?gJBPeT0BPd%#Ux{0^vnGvd372)9HxV9a;JZY;%HUm9 z=$Dn)R5qYQOJh7G9l5iv`UcDOOPZ$`l;I>04|SVwbiARF-rZMVkujHQs{dsc~=s;m`ctid;ala*RP}r2&2b3VoVOaJJF~ccIAKB-kLbB@)o6R+4t3U~%bg;r|7bbZ z!X-1F?DHhxE3l&8#BqJ5$FxcBNB>T*NKEY@Zig=)&5O8L%(9EY6&T_*tS-e^P~GBR zQi;slO&pvIXC_@Cn{qtrFWu6jRl9f6@|gd8?Z+ihUx3wA%od@eC{g2NZOe-c^ccCP z`QJVNANJXQGL?Tl3J@I2A7@=M*FC#uk*k{($~kxwNY;2g1CZM^hZb)mcZ)ChECXJP zqJ(;3!Mz;>EZO|AFo?%9vHc4(gG9WO_!~%R)#5LNC*hjw*$ugMe{QiHn*PQte+wT! zq{tliDZ70!#0S!a(n%B}A;H&Q{69!iM>>P{mZRV!?v`<@6KSAAANFiSz<}M#-E~vz z68^+1Yp+O%`My9>x^1vI((D<$DhFL|Mp#p=NV3Eeyc4h?73+7aR@Kjyq19^ogb|RK z-C)xkLd!`@#n#9%O$k0nASzTS0G8E1!S)svx10vwXF4s&PNWGzqJT3ZMAzYT!{O17#aj0;L_O?(X>bgcnVng_$*jdcoHu`I8t*7%)wHHJ-@l?oowdGD19iMs z*Jrp91$9*dtaJd_;HrN9WzNqX&N;I$y55_-l6J$e?>=O)`_mTaI*U6UIL4tW(j`JS zw{Fjhla(wX?}WH?ETL;p&ye1cXA{tixbG^Q4pq{e{7`uBilzZlK+{k{Wk|~47>di| z8w#(ln7jRYP%oP9y;Q}pHHdF~3 zd3f|}(wh;nq^cme7K}8PvleYOahD^2A1HQATwYIMPP=1b9ujTBkVyj2`!Gr;1Sl53 zCPegi76o=UESdrc3IM-@oK{5<5O@6&GAp6OT4oKYjVI_%68=uXolVm_ZOl4eiM$|= zWN7+Qo%Hx7^<_vR5C6k7>YfWGrU}ev95%*cr`-$YFk6)-THR%%T*%sqmwRHQ-mXpZ zo~DXoO4hxRRt2pa(_*~Yn_YV5Ygx)U(Coj`f<;?_41fqCFf%5ER!Tci$mU2zcVt?N zjLYEIAaHaL1+v^aq|or|rEqv0>oasyJ(Yi+(Uc%c9v-6t9UQT(vX$QYT2rtMG#EcY znnjDBr>=wq0lgIQqwTZcQ{JZWHft_6wiObZjzXAp->Ji$Ph$kmL2(3ANix3 z;GMC*2_0oIKcUK$(;cfFe2hw{*1TB+StN>j$P z)VcNH4q7rsWDc4f|6=H8A&eZ3$2I5cH^YKf)LpLq<;C8K>Bia-D)5J0!*xQL>wjbp zYsE78nh8}`USy9ZmJmYEffEe6RZ`ndku-@m@b-(LHyz9t5zj{}(Ec2z$f3w}DpAIw zssy<`Lndw!U{(h<8O#E!W0Vr(q zW(E=WM){ywYVqd4{VZaD3IfzKCck&SK{Y$5m6Jz7VX4z{AH&g_1Sr|L1ORD_m+#-v zcbYk+pkr`;AL3-9Z!_(!I_${e)_1U#wA-$KG=J^s<#mb;>f);70_52-ZdHDF8F>xI zp>x%2DOrS+Q#ch@n<QPtMw`6*W{%XGgPp3_$f>!$-`E z41FVtDGcE$Y1p#+30QWE3jrbJkyrPDt%c~1eggN%3;Oor9Y6OMubsDpFFN;DeM)=X z9_CKk>>^-dNexge!L&2%A#5M-DTdPmDE??B_EN2K>k1|){EPkkVY zrf#+Rs*Ynhcq8=$tI2pp;~07r)ZIuBLi-sIHTUEopX(7B0L1t-i{>&vrt!(I1%%uM zfq=ChYT zHwwi7>&I@18VKop`YI^g^bZ@lghT$*c_!2>0Kow!fhR>41F_hGV02-fl#rV8A|Q(B z;woEwFIZ?(a%D=X=lFzJguIgFq@CC667Ff)FnQJeM&~uA68axwEuL6GEaOOaEpUnb zPF(mpDD?z`Ci?i2Pn@HnOFkG5ZT`&{4vVt1E}54~tZaaKgNO8i3Yd=nVnrqZh1O{b z-%x8^O=p%?)$jj~woN%F`Vlx6HBND5rzw?D{wS*AL5Eb+H*kHjvP~ZX=mCmjr0n`H z$(8vPR?|_}ko9kXILuwG{Se-MDGnq5hj4jc!Ie?vxceW$wkvz|rp88pNvr}~e`xtu zzD9+9BGZ~5if-_sc!9MS<6tTL5E?6veqJx-Y;0`o6}yX*<}nql0YHEFi9C#-BF0PEUf9TuG5u_l?1Y=bGokSnzLUJKinfT?NHjXNs|Dr2l+!EZ}n zR9fXc9vX2?7+F+~h#990^4=|i`;~XRRMQ0%jUM)?$x-~y{J>!WPTLxo!l@I^?%&~U z@-383xx{Uiol?;apB;$I1uQLr1OveK;WptC0Yz{z=-q5s`=bJmg=dt8&$hJ&DP!Ks zjdbH-kR1|ZE=_-v)=4a(5@5AL>-%1IWqyC#Wcb#x%LR~2K)WkcT)KcP;Suze6%bRy zkdmsP=A|pQ7yNUb;Z*!X1F;bq`Q6DTVI#@tw;qhkza}T;lbA0V>`DPqxWB~fyE$H! z=@;qA7iy$p7tVMI0!3K?ni5e=OVmyz#rs4=7jlS8#FCT?!*dSgC}}YrR;#!4TNM$f z=L%BpW|e6L1P*Wo!$;kQ%73^+oAoP0(X$kxB<}5Maoy-u$}7@tmB{uGtJ|XA+Pr?R z?!a+7SpjojLgLKox+kC7ek|WFeL2VW#kz8^+a1BS(uU-f(QN+jaQXOJvpnWbOSf%N zK)HUQfG?qHq|N=gSS~_{a7Y_U<FSb%K(H;)K+b7>TLd~UA3^ua^GQ_6T)KOlSW?0)KI2%HQ*l6XW-+B#MHB&l= z0|X)c-Xv9#8%yb=q9_`WITt5syM)5z7bdi|f(mx!19e#s!`@3-{{n~Y*aWKP1&6=* zso_hxYGwUo*v(vw5(G8<-Ez#gTnn5_ygzzpj5VNlew7c0Ek6GkF8`u2|IP@)9<3@` zskOgBOKUdH|K&OZ9aI%-tHRM~`wl7DjN#~s65sOWEU$dc@eGETlQlsgemfHjN;|sPA~E32ZF}&wq4+0t@o?))*D`h zi>qu6Ccr`&1W8CbV{DXevj0)=Ttvwv6oU*}tsip!Q^Io^NpJMKIq91NyB6&eT}(H* zNMRz^gCBr=v7?~!_H9BPBF=YpdykIUjQ@jd`%Q&#dfM%!xH$c2x9hs)85)F$@Id5S zSy|3ZIS(P^Xv|9Df>rGBr$j%;yrJb2o*&>#_2-94z*4zo*aBbAE49@x*h%yCwcU?N z$&ZKD=9Jkh9=R(4Mr6PE!EB$FiW*ZoZQ~0G4pJ(O3a+x8j+oNwt!mZ3D5yD0i>Xj# z2szChA(;x)k^0!on)Vvz_MOgYE3Q0!45RXNUqlEd3t3$?@RA84Z`JJQ> zcrJ;;MQv(tA{_R8lXQ{cRFrN+PJY(_TKNYLvJIyc|y;2pDL>wE}3KQ}dXM9Cw2KZd!rG zm`1_~yRCw}VP)fMT`Bpm6qss(C=9QeF0l7rnZmS5(Y7^xE{^v*1Yf7h{c?ROs{qg; z8XruH*H5A#HY5Ii!D0}dCTpha-OhvdxZQw_SPjCo53gkNMbxXSK#mCTO+Blb-A3GI zRhhxrbk&mmfyns4*1KZDcoK!q_u+=tYZ*)9{D!ky-_yv*9D`y{(7BY{KL))R4cz)>)Ni>3x)G zm<#E2zH84FBgs~YJ9VF7Cl8Fq1&u#oF8TaX^;&1}ciZ$@na6Y8o_+D}04OcTRJ0rq zUy8DZtOPbA3{1V-G8mlJ4|sVPdA+4I_;0sL?SII1d)FOs)0D7&h{#@sEgdAmN|zzQ zE!;RQ#{J)&0REBCwFE5b)z^S;a^tsDo*>KNSugk6I0C|W)Dj_~DvD1z{JJutWMT*Ba^~GaOM&ws zuGt|=-6{O@cVI5E{Q3n&N7bXr_*1(Z@ z8PpJFx?PE+9Ff>%GUKR^zl}^)X&Qq`ad}bdUVfxTX*6t`S#aCP3bR)toNs~om%6te zUy@e}-_W$iNk-41f{}*9nO&-h6VB2}r`0*B=G83@c1_l*6uTmo)PSiPso`Fb9WL@Y z+OPgEitO#47*cqneiXi-YM(V`}j0P1=GIe=88yM9NAZ(2vjiHZ_CAJFA`TX+qTuwwf;HFft3$* z@B*yW)MgPFN%(^~M)ult#`_-b2yO$v@o;#`#ID!YXS_WQNuM+d*wfwY0a_;eyJs!| zivf@LUc}i5u@*uKp)(pVbsPo8j(f5s)g}`T-?rQP;5emw0cqLkS{dLH20UOa2b<4( z)~s+Dtu6#uJFUsC@9!KqT6O_EkeFQT)^X_YT_TMM)1s!-e3oMhTB3uKB^F?&@9Ee^J>NqQo{bIDpHh`kX16*40w}3JH{D^|0>MzMs zz`q%VmOU|x4kPcLq6G>f1<<5dB`^z>(kZ61>aFCr@@k~^!C+#9m>fDBvXEfE0KnTF zcE)hGst^mll6vW>Y)7BW@ODt>ZGWrEa})b#l?VIcN*7)QFRM(VW>~X*!htUT2O+lQ zP$SV~AAW(^(nV2mhzbePZ_cZ`_X4+~nG#HUImJ%(dDi=|#nio`%A0vC1YRImO zo-+Mv@~II0zKPs&b5IDInZmD)aKbNE;{%j|hxwCl)Os;7@qzJkDb1fE>$KAQ#_@J) zduk4J@Lj#3dpy_dR;h?;dhAP5eMotJF1Ib-?gt_YIb9&G0ej9nvBbqO9Udf@u2H8@ z3^S@_1L8SH*SF&SQLHmodhoVor%2MZVlE$;g3Tv>rfw}nW@&{bwvDcPl8&=?@9vEGTl=OycY!Bc6# zi)@iil=_*(I{RnEi;Uw5Y|)nAwVdVTqtM8&xBVS!LE1!K87(nTT8*e6xlg6 z9o_CQYsMc2O3wq4@wacq-ven$_3cJ?ZsXzUVR}6#ccZET&-z}{^6L%*}vzkr+A!u#6(*SD9IUFS^cpFevXR79GHZ>c^%5(8FV`!_f zGr?}HmWQ+_WC?Upy!qVr%PcH^RF^aKtUN%ng%|z8rvZQ5&^+jGF_m=cA!lcg|J%K1 zwL>|w6Ra2JWp6rUA$^t+CQyr6E*%d(E}eL4`3Ufm=Mx&g(*u5BaU9E!d)*?J`JP!S&2n|-o9wOL)Nkh&_2(Fqs?nLzvE-D znm}q`p%Yl{=g?HTnS1cizuj-QQTI8iU0kWW*6*Pr8AfhM_I{y-J?tu-xlixWX;z6t z83s6>`U^1nSK+L?Yl2w;*T;_cigee+iDD|e6i-1ce@JGjm$;o_ATAVb-TmhGlK~#w z1UI4zYKgVY@QC)!h0`A1Pp>hbX=ox8VvS7~xu3*D@0sP`S!;+&v~A^+& z0c8fI;pj_Ctxg~pszf$*g8xe);=s!){G9GiM)F=>< z1)l+)6NFrg@T%y5F);Z*DVYOeQ3#=ZC|VI0A&I&yLDqz5GG%Ho?KPmnxTYMz|B9OI zumGCvDRGmuGsdsaxhkeeCBgNoD%*JY_#M)MVY{Jp_HsslA)CXuiwN|bGWN;?D>Ahd z7r83>PARX_&qS?hujWRvfpk?jfOihfah3lAuk{Jn(xx9u0cyFjFnmpQX>@QQqjxck zW8JIfz=U0nX>oNaIg#!{=|@J61pgR;G=I0kkQTt?0g=Y7QnrZeUFwmMtB)P{vQxQK8Ye*%$Sim z`OJ(n{;?ceg;|zdz$S;<$T0cLf39EgH!6cV$c4SZD#VJKyto!fmpNzGL!E@am{GaV z{cUx8VraNqqwnNzu?ghl19|Nur>%SrmIyGxk`7wK3)xIIQi3TWp6|c&itQ@7pKDS_ zKQ;91{|;Dh$I_R9LfJkdC_bTkPVZKZHI8Wj=n6o&4;~#>zdIYx=^4|7!c<%fm=50} z%sEeFpKyzl@YW1W)a*=to|TV;c`VvMAZFV_s@ds&3QCn}T~j8%#w8d4%4g1AYeHiE zXx!jxg&Qm}^zq^k`)iD4(n^wiI=klW#AX_Wr$W{vu;4RP1Yt|!=oH3Of{RB7DNn@f zh%e7t%O{rP!4J8ArfV*^|683k2OH1U8raf!BWnZkwxvRp6o+>rWV?b?Mg0Ph8~zh0 z)B4sVC2p$Mn9C$5FVzadeDM9yx#5orHp$fM?z3HL*eIZBF1aPGX~Kwm zDbLvjX&0k`{1^R{to$%ZWivgpk`7C>WGVu_9&te6*x(g;wED+GgkkT9@kmSSCvC}Z zQptf3oSHG}6XZa;n(8ZL%i<8=naY{=&X0gDI7f>^Z}Ck0g^>lwWn#q0{Uct1$V zYCVS^A2Aa=5mxAtN~T6j*nG%xXtVs;S_Y>J1l%I3to|K0(@Z|O{bv~L#GvQ`c{6tWtJ&6% za&2D*P?fWv%mPV~`EtInY9A|U&_{p3_K8 zvq`RxGUh{HHyseXAdx52J_vJOwrf{Uq^Wvk3yx?k<0Y5(1Y~2lcz|4y@1IY2z1%-1 zZG*Y4+-tMAv&woj|+~9|p6Vm=eZK7@T8MCE>Q<|%b!Bx0r<%VUYS-%>YbW&RaZ^Btp(v@$pzX|+{7XX7fR3bV zKLgToWOk1HpH;>mt;{oTr5?DMhjtb{4n#cTe3f3QeSmyMR-nKWEOPZ<_${7Ch+tbm z;NQ;sl|j9EyIo36zm@27K4AVHu8#R78uEzG!SAtV2-F~Q^6bnpy!v508z~O3&C)UE z)A_IBHb2+WCg?JQi~C^BQ>d5}-ziywB-s=O03m9@IeRQ1t`g3(a)56^AD7<+{Y=6w z#=j9s3S&{l6;wTaUzM*aFdY|A(9@@g4-T`aHt2e+gP?c0PWM(nRdjiq^p44k5&orN zbmAX$OaU#3&p{shvjSgMtdCfsQ(#h}ZI#?{q}_q{o7M^X7XwZ*EK*}YZhGsn;m|^< zPv!DhS|?0gW3BMg>6>fT2;mR!b@$D%Q#HUYnJV~u-L+5wa=5Tdz<`*Q`-bqJpq47h zf4R8-wF`Jofb4s!n)63>91$?6bNf*izHaZ)^J|VbLSy%gpi&u+IQ3`i8(15n!)8gU z+!Gl7XL}?HWme0OkW)m)mZBB<7H2k`DJS+$U|I#STUG#TcK75ztJY@-mTb7Mru~|04klWR|`n82?G>{n0gXjE|=XC zwKtdAzROeBE?_hQ**KI#T7KcPwoX-AaPYTT)sq;yufw`pBHYYf^Qo2Z`c zTCk$wGw#n?aKakH-6MJXLgobq#=I;)_Lq94Dz7()l<8HgtDIt(zOg1nj^yO@KMqz* zMkaXM%Ia(IWe*W{qj*4C|E%gKpFer@WHRJVw(un^Frm@UOYVKW{W=N42XB@#V zO35;P{kQbpWNi#n++-7gLq}%vjHM|4Kh#>7`f>OFS3i_wIc@=M(Pt6aB!WukeX$V$ zNo3e%uCQ&H_vOip;R;kd6|;m5U}`xD_5r(kG4S=hyp92;ImEN}GsId6vm zy}6NARRf5jgg_8s3rXSVB{8b`T#ptDeHa%Ab`2juqZ&G77jzL5CLP}fln2VAmA;q2 z2>nyGC*37qp;KoF^xlg|2^ybk(@vFPqQK7Tjb%$%J@|7s&zGtCbZTkgL+Kh9dZCC! zA=%^buy|N6_sSEU&x zh2eN~GTS<<-z|}}rcO&3m`YIY5hiM2TGRq}@GB)y9i+_mvb!eW* z)F1V04Twh4qry@RY{W}Lu|Aewobfgc2S=w@yUJv+1o=%7EJ3dOSAx8}D*ZB@hRQ@B zS||hPq&1arAmut+K)~^bUwi%G*QT)n5kM^+MWU2@j2RDG603 z58#NV(XYFgdN1Qoga+3s{4F8PjZV7T;WxNQ1(d1jr@I)y>Ir8Z6~wNw4!brGsItR) zwUf0tVCjK9fLLD3wUU9Z`$7k{9}oO2h$N5%CHSK&1=oEeY-xk|z+esVVm z0EokX1dQnV!^4&c;%Nq|4S}AVmIQN}!aIAc2ZyFRBX_=*bqAY2K{Rh8XOEsilukTQ zionCi&t(9n2|6zNtqi)|`UV;Reh~x$ik=Y^M;Ege%()9-4|rAA6L$Ck z!kV+glDM)A((3k4K8_0y^nS6)rsriJ(I0CDKHS5Gg8@po zmlXoZQYEV>p*IB=2|)?ciy}o@Kp+%BkZz*DoiEB>_ujo{`6uUa=A_Iw^UnJ|@9&w! zQ^5STa0K@SbB#MidH7RR@-oV0nog-2cJy3L@7D^|wFqyK!~-cQ<@hClqXnj!RGfhF zU&i|OUfNMmxXZa7gPS_3^;GY6@y7I}573(@e3UhuU9JiJL4Tt| zKc;5ApH$hFh+CP&dCVu3xi<`7N`_Bq%njXp+_Leqh80>Jn{yn68N-D6?J*^hLFgrC zDIT>xyJoQ0l*c{9?!eUz)7Ya@(hn?wnCdNwBAEhsDOI|K7LjG>=1d@bZQM z0w{x{tsezdd#5Qs5tO-)@OkEepD=OtDVDWLoqz?*7rhae{__%(V3s3A=5hQNAfuJ9@Su zm#yU(ao{y7P2`ciWtylD7T9y8 z0bVg?ZbI-i5zdQKF<)&Qp~tW=Sa(@2$PNq`tJ8~fN`jQ;(!LSUw|!WDhUEj+f|hz1 zjhM^r{_(-@d0$R|fGlVgIp=gCt}7V6l*OsU<;5}jCpLBLXvx3`kB6VaBT_< zR({?Fgmgc#uWVQ(E)Neva1QGKv`c{K+HJ?s>HYjY_9-adkpEoB8MtcXasc~2{vGX! zWc^8dHp!}au2eDZ2TSW%R^P1HVpCekAOYZ90aEgpzu+i!KP9_;1DL_q@WGVW`B~&H zI&eXB8IzeN?8y7}^R1ja*vVfUPu{iNdHc2fe#8UCv?<-&oN6On3L42{m$k?zEz^{c zUaCT#2x2b+;;h&tYM5MjX3U#i_u8S@lBWWNUIN@D#EkVX;!p5kU!AN&d=|%`fGxu1 zJiTvpqDjviVP<~~KLrI<_HwwK3@$j@{gshhU1*k6?BzW6DYkbphk_wMi)8hjiR zE+8F%46@0i*s#GfEI1^ORg?oG1AwBpz!K=-g!<0Y+4aM$wu8&UeAcz}F@0b|lzt#K zdjCeR>gI0RL5oKv@zANeaQ?+i^g$ttc>S;wW)CmD0)5^78GR-Gii?C})XdP4%>?LJ zC6ACHQIFm6>J~dh_7$Sh+DFCy)eRV3|Ik`7Gwa7>;lwh|z+WgCL4x|(dCe4m=?4yO zK-1AotrIHP;ERiEByWpFu4TYj>?m-^d&bZXm^6d%6pi2ua z1>(>oJo@1q3#-e|jJC9FGR|(Xo?F%gsdyLs#FNl4i^Gww-aMiT=)|d>RgElr`8$SH z=#a>3S3ISg9`d8T-$2j{B+Bo$7+CnM5GEs2?u@AP6OTn?%TkFwLQm%@5l3x zepb|p@cHJd%2MEO_}y_uZv9EUT0lqN`&*$pW0TNRT^LN$>m`8P=HomOn6LvCt7}f0{RM;_N4dx&tyFqM=!8C>(x%f%86fb z>#Yr#i~4f|a`d_F??2jXWgGxYTQ1_3eGAoaGeMk^!NjXWAR^dmgrvSAoDqSAY~s%rJRfJNt#( z%q*6#w=HhH=M-qr@eyO8+z0q2#FO;SF`(RsTItpEn#3J6q+19MD-m#52|}q4{uPw4 zWzHf8=o8P`eY`!nTCb4TCUf>ZAj_n5WNzyz=%1N31_^F7RF)9;@@v%7s=4B(_3UV# ztv_b5*GvyPn|0M9ru{Qn8`*UL`#1JpXSv4N>t5{6^DEQAdtH@mhe~l9C}vNrWaAZTtQb<2_8|;@ER_B>JvqS zpVtFMK*(hC8R_gp^-wQ&Dwt+xNTN1mNXrDR?_M5Fotl_rYNh4s;&sh@77={fat_!v z;O}p|%pB)fsf-z%jTat!8H_Zn`zsz}x!Sl=k6xeH<9OUQOG`nexQm+Vg>&DIoq8)? zZGAFbtP-n9zv{)??=tyGG19=8C*6+^Ii0Ns)tJdFkg{M#c33rszkA>lKX(@>=o+*j;!yR`c1|FX|`a@ z>TU?IovKpD#VEIO%D~wm3JoW_0M_uk;s1yP4Ib!e#+4^^EIXJih@FY8j)S}~+krDcfGEBv64`q7t? zXJ*uU2FJVb{Xt@@!IBPtS23(Op~~MyoB-!?f8J zxyfY7L=2DAa7zXbi*2XU5ogwm^~<8-e&%H=_A1V+kSu$Xbbc?sDRZ(%dC>u{8r)Pl zo-uI(?V@A@_Qi6E5GBu)qdmWu>K&WuHZeyjx=|b7HzOu??^+?@X3op1rB!eXU0_@; z9v@Ecm*nz~u8qqW&d3^@F2V2wQ4}z&I?U`$p;oS|X1U2?H>JkuWsymL}4jr;Nv0)*XSGRGX~^Gd@lgCw_$V4Z?;QYm8Gc*7UBWV`z8 z=eJd^#?i9&KSfGk;EvP6!_5;iWG;*(s3A>NZbWx+AU-(JeZ%W!@cl)XHoZyo{v5j; z<*gaukY^z!N_`7t4d6I=E(^=g(!XdS5&EhMNfT=C3RCe^GfM&4HjmC^rR3aq%?+4L zw{Z;GQvqg_+@-w#{qqZRE934QzNNzImBrD5N;1GtV+U>+kc~91O;YaEQuv{k^RPg{ ztY-z8wqyGZ@U;qm<OTvY9Wox!3ji1z+t-- zjp;`J_6zX9d;FIX+|~>7y)xyIZ7NdLL=qvroi%|;5ia{SlVc&>5jSP$PoVg7JpSN7 z>UdOeqYLqrp1megkxe2f*;gq?1m&Dk`yUR@A2Ewq5->POe|);j(aROZ)g;M zXybmy41S4m_p#hNYu0r}%%ef~>NVvx(U;ANx5wBH1TZC15{r&}ffXWL4wf_mday9; zC0i>1%pb=oql4+oP!6gdvV+sNO4RaAxwUy1c+Ta(eGOjNSNeh1|AzTScNRAYM?CFa z-U(J0pLK=^MR=1BOdDMi3dDDUqHLGOp07!i^gumKulzq= Date: Wed, 16 Dec 2020 11:18:00 -0500 Subject: [PATCH 06/21] More --- docs/docs/manual/cellediting.md | 24 ++++++------ docs/docs/manual/columnediting.md | 63 +++++++++++++++--------------- docs/docs/manual/facets.md | 6 +-- docs/docs/manual/transposing.md | 24 +++++++----- docs/src/css/custom.css | 2 +- docs/static/img/transpose1.png | Bin 30532 -> 23886 bytes 6 files changed, 62 insertions(+), 57 deletions(-) diff --git a/docs/docs/manual/cellediting.md b/docs/docs/manual/cellediting.md index fc5b5540d..c2c525ac1 100644 --- a/docs/docs/manual/cellediting.md +++ b/docs/docs/manual/cellediting.md @@ -13,7 +13,7 @@ You can apply a text facet on numbers, boolean values, and dates, but if you edi ## Transform -Select Edit cellsTransforms to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. In these cases, `value` is the value in each cell in the selected column. +Select Edit cellsTransforms to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column. Use the preview to ensure your data is being transformed correctly. @@ -122,13 +122,15 @@ The clustering pop-up window offers you a variety of clustering methods: **Key collisions** are very fast and can process millions of cells in seconds: -**Fingerprinting** is the least likely to produce false positives, so it’s a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “wang zhenyi”). +**Fingerprinting** is the least likely to produce false positives, so it’s a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split up all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “wang zhenyi”). -**N-gram fingerprinting** allows you to set the _n_ value to whatever number you’d like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a _fingerprint_. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself won’t identify because it keeps words separated). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”). +**N-gram fingerprinting** allows you to set the _n_ value to whatever number you’d like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a fingerprint. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). + +This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself won’t identify because it separates words). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”). ##### Phonetic clustering -The next four methods are phonetic algorithms: they know whether two letters sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after only hearing it spoken aloud. +The next four methods are phonetic algorithms: they identify letters that sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after hearing it spoken aloud. **Metaphone3 fingerprinting** is an English-language phonetic algorithm. For example, “Reuben Gevorkiantz” and “Ruben Gevorkyants” share the same phonetic fingerprint in English. @@ -140,24 +142,24 @@ Regardless of the language of your data, applying each of them might find differ #### Nearest neighbor -**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups. We recommend setting the block number to at least 3, and then increasing it if you need to be more strict (for example, if every value with “river” is being matched, you should increase it to 6 or more). Note bigger block values will take much longer to process, while smaller blocks may miss matches. Increasing the radius will make the matches more lax, as bigger differences will be clustered: +**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups. -**Levenshtein distance** counts the number of edits required to make one value perfectly match another. As in the key collision methods above, it will do things like change uppercase to lowercase, fix whitespace, change special characters, etc. Each character that gets changed counts as 1 “distance.” “New York” and “newyork” have an edit distance value of 3 (“N” to “n”, “Y” to “y,” remove the space). It can do relatively advanced edits, such as understand the distance between “M. Makeba” and “Miriam Makeba” (5), but it may create false positives if these distances are greater than other, simpler transformations (such as the one-character distance to “B. Makeba,” another person entirely). +We recommend setting the block number to at least 3, and then increasing it if you need to be more strict (for example, if every value with “river” is being matched, you should increase it to 6 or more). Note that bigger block values will take much longer to process, while smaller blocks may miss matches. Increasing the radius will make the matches more lax, as bigger differences will be clustered. -**PPM (or Prediction by Partial Matching)** uses compression to see whether two values are similar or different. In practice, this method is very lax even for small radius values and tends to generate many false positives, but because it operates at a sub-character level it is capable of finding substructures that are not easily identifiable by distances that work at the character level. So it should be used as a 'last resort' clustering method. It is also more effective on longer strings than on shorter ones. +**Levenshtein distance** counts the number of edits required to make one value perfectly match another. As in the key collision methods above, it will do things like change uppercase to lowercase, fix whitespace, change special characters, etc. Each character that gets changed counts as 1 “distance.” “New York” and “newyork” have an edit distance value of 3 (“N” to “n”; “Y” to “y”; remove the space). It can do relatively advanced edits, such as understand the distance between “M. Makeba” and “Miriam Makeba” (5), but it may create false positives if these distances are greater than other, simpler transformations (such as the one-character distance to “B. Makeba,” another person entirely). + +**PPM (Prediction by Partial Matching)** uses compression to see whether two values are similar or different. In practice, this method is very lax even for small radius values and tends to generate many false positives, but because it operates at a sub-character level it is capable of finding substructures that are not easily identifiable by distances that work at the character level. So it should be used as a “last resort” clustering method. It is also more effective on longer strings than on shorter ones. For more of the theory behind clustering, see [Clustering In Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth). ## Replace -OpenRefine provides a find/replace function for you to edit your data. Selecting “Edit cells” → “Replace” will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use regular expressions in this field. - -You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first. +OpenRefine provides a find/replace function for you to edit your data. Selecting Edit cellsReplace will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use [regular expressions](expressions#regular-expressions) in this field. You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first. You can also perform a sort of find/replace operation by editing one cell, and selecting “apply to all identical cells.” ## Edit one cell at a time -You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue button labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the data type of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window. +You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue link labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the [data type](exploring#data-types) of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window. You will likely want to avoid doing this except in rare cases - the more efficient means of improving your data will be through automated and bulk operations. \ No newline at end of file diff --git a/docs/docs/manual/columnediting.md b/docs/docs/manual/columnediting.md index 9f2ff645b..470db2441 100644 --- a/docs/docs/manual/columnediting.md +++ b/docs/docs/manual/columnediting.md @@ -6,47 +6,45 @@ sidebar_label: Column editing ## Overview -Column editing contains some of the most powerful data-improvement methods in OpenRefine. While we call it “edit column,” this includes using one column of data to add entirely new columns and fields to your dataset. +Column editing contains some of the most powerful data-improvement methods in OpenRefine. The operations in the Edit column menu involve using one column of data to add entirely new columns and fields to your dataset. -## Split or Join +## Splitting or joining -Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. You may want to split out an address column into columns for street addresses, cities, territories, and postal codes. - -The reverse is also often true: you may have several columns of category values that you want to join into one “category” column. - -### Split into several columns... +Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. The reverse is also often true: you may have several columns of category values that you want to join into one “category” column. +. +### Split into several columns ![A screenshot of the settings window for splitting columns.](/img/columnsplit.png) -Splitting one column into several columns requires you to identify the character, string lengths, or evaluating expression you want to split on. Just like [splitting multi-valued cells into rows](cellediting#split-multi-valued-cells), splitting cells into multiple columns will remove the separator character or string you indicate. Lengths will discard any information that comes after the specified total length. +You can find this operation at Edit columnSplit into several columns.... Splitting one column into several columns requires you to identify the character, string lengths, or evaluating expression you want to split on. Just like [splitting multi-valued cells into rows](cellediting#split-multi-valued-cells), splitting cells into multiple columns will remove the separator character or string you indicate. Splitting by lengths will discard any information that comes after the specified total length. You can also specify a maximum number of new columns to be made: separator characters after this limit will be ignored, and the remaining characters will end up in the last column. -New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can have the original column removed with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting to numbers, and may not work with dates. +New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can choose to remove the original column with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting strings to numbers, and may not work with [dates](exploring#dates). -### Join columns… +### Join columns ![A screenshot of the settings window for joining columns.](/img/columnjoin.png) -You can join columns by selecting “Edit column” → “Join columns…”. All the columns currently in your dataset will appear in the pop-up window. You can select or un-select all the columns you want to join, and drag columns to put them in the order you want to join them in. You will define a separator character (optional) and define a string to insert into empty cells (nulls). +You can join columns by selecting Edit columnJoin columns.... All the columns currently in your dataset will appear in the pop-up window. You can select or un-select all the columns you want to join, and drag columns to put them in the order you want to join them in. You will define a separator character (optional) and define a string to insert into empty cells (nulls). -The joined data will appear in the column you originally selected, or you can create a new column based on this join and specify a name. You can delete all the columns that were used in this join operation. +The joined data will appear in the column you originally selected, or you can create a new column for this content and specify a name. You can delete all the columns that were used in this join operation. ## Add column based on this column -This selection will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external reconciliation sources. +Selecting Edit columnAdd column based on this column... will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external sources. -The simplest way to use this operation is simply leave the default `value` in the expression field, to create an exact copy of your column. +Expressions used in this operation will rely on your knowledge of variables. You can learn more in the [Expressions section on variables](expressions#variables). -For a reconciled column, you can use the variable `cell` instead, to copy both the original string and the existing reconciliation data. This will include matched values, candidates, and new items. You can learn other useful variables in the [Expressions section on GREL variables](expressions#variables). +The simplest way to use this operation is simply leave the default `value` in the expression field, to create an exact copy of your column. For a column of [reconciled data](reconciling), you can use the variable `cell` instead, to copy both the original string and the existing reconciliation data. This will include matched values, candidates, and new items. -You can create a column based on concatenating (merging) two other columns. Select either of the source columns, apply “Column editing” → “Add column based on this column...”, name your new column, and use the following format in the expression window: +One useful expression is to create a column based on concatenating (merging) two other columns. Select either of the source columns, choose Edit columnAdd column based on this column..., name your new column, and use the following format in the expression window: ``` -cells[“Column 1”].value + cells[“Column 2”].value +cells["Column 1"].value + cells["Column 2"].value ``` -If your column names do not contain spaces, you can use the following format: +If your column names do not contain spaces, you can use the following format instead: ``` cells.Column1.value + cells.Column2.value @@ -58,29 +56,29 @@ If you are in records mode instead of rows mode, you can concatenate using the f row.record.cells.Column1.value + row.record.cells.Column2.value ``` -You may wish to add separators or spaces, or modify your input during this operation with more advanced GREL. +You may wish to add separators or spaces, or modify your input during this operation with more advanced expressions. ## Add column by fetching URLs -Through the “Add column by fetching URLs” function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be taking strings from your selected column and inserting them into URL strings. This presumes your chosen column contains parts of paths to valid HTML pages or files online. +Through the Add column by fetching URLs function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be building URL strings based on your column of data, by using `value` to insert a relevant substring. Your chosen column needs to contains parts of paths to valid HTML pages or files online. -If you have a column of URLs and watch to fetch the information that they point to, you can simply run the expression as `value`. If your column has, for example, unique identifiers for Wikidata entities (numerical values starting with Q), you can download the JSON-formatted metadata about each entity with +If you have a column of URLs and want to fetch the information that they point to, you can simply run the expression as `value`. If your column has, for example, unique identifiers for Wikidata entities (numerical values starting with Q), you can download the JSON-formatted metadata about each entity with ``` “https://www.wikidata.org/wiki/Special:EntityData/” + value + “.json” ``` -or whatever metadata format you prefer. [Information about these Wikidata options can be found here](https://www.wikidata.org/wiki/Wikidata:Data_access). - -This service is more useful when getting metadata files instead of HTML, but you may wish to work with a page’s entire HTML contents and then parse out information from that. - -Be aware that the fetching process can take quite some time and that servers may not want to fulfill hundreds or thousands of page requests in seconds. Fetching allows you to set a “throttle delay” which determines the amount of time between requests. The default is 5 seconds per row in your dataset (5000 milliseconds). +or whatever metadata format you prefer. Information about the format options in Wikidata can be found [here](https://www.wikidata.org/wiki/Wikidata:Data_access). The service you are fetching data from may have similar documentation on its provided options. ![A screenshot of the settings window for fetching URLs.](/img/fetchingURLs.png) -Note the following: +This service is more useful when getting metadata files instead of HTML, but you may wish to work with a page’s entire HTML contents and then parse out information from that. -* Many systems prevent you from making too many requests per second. To avoid this problem, set the throttle delay, which tells OpenRefine to wait the specified number of milliseconds between URL requests. +:::caution +Be aware that the fetching process can take quite some time and that servers may not want to fulfill hundreds or thousands of page requests in seconds. Fetching allows you to set a “throttle delay” which determines the amount of time between requests. The default is 5 seconds per row in your dataset (5000 milliseconds). We recommend leaving this at 1000 or greater. +::: + +Note the following: * Before pressing “OK,” copy and paste a URL or two from the preview and test them in another browser tab to make sure they work. * In some situations you may need to set [HTTP request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers). To set these, click the small “Show” button next to “HTTP headers to be used when fetching URLs” in the settings window. The authorization credentials get logged in your operation history in plain text, which may be a security concern for you. You can set the following request headers: * [User-Agent](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent) @@ -89,7 +87,7 @@ Note the following: ### Common errors -When OpenRefine attempts to fetch information from a web page or service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function. +When OpenRefine attempts to fetch information from a web service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function. First, make sure that your fetching operation is storing errors (check “store error”). Then run the fetch and look at the error messages. @@ -109,7 +107,7 @@ Note that for Mac users and for Windows users with the OpenRefine installation w * On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security`. * On Windows: `\server\target\jre\lib\security`. -An error that includes **“javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed”** can occur when you try to retrieve information over HTTPS but the remote site is using a certificate not trusted by your local Java installation. You will need to make sure that the certificate, or (more likely) the root certificate, is trusted. +**“javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed”** can appear when the remote site is using an HTTPS certificate not trusted by your local Java installation. You will need to make sure that the certificate, or (more likely) the root certificate, is trusted. The list of trusted certificates is stored in an encrypted file called `cacerts` in your local Java installation. This can be read and updated by a tool called “keytool.” You can find directions on how to add a security certificate to the list of trusted certificates for a Java installation [here](http://magicmonster.com/kb/prg/java/ssl/pkix_path_building_failed.html) and [here](http://javarevisited.blogspot.co.uk/2012/03/add-list-certficates-java-keystore.html). @@ -118,8 +116,9 @@ Note that for Mac users and for Windows users with the OpenRefine installation w * On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts`. * On Windows: `\server\target\jre\lib\security\`. -## Rename, Remove, and Move +## Renaming, removing, and moving -Any column can be repositioned, renamed, or deleted with these actions. They can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to “View” → “Collapse this column” instead. +Every column's Edit column dropdown contains options to move it (to the beginning, end, left, or right), rename it, and delete it. +These operations can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to [View](sortview#view)Collapse this column instead. Be cautious about moving columns in [records mode](cellediting#rows-vs-records): if you change the first column in your dataset (the key column), your records may change in unintended ways. diff --git a/docs/docs/manual/facets.md b/docs/docs/manual/facets.md index 20e99e95a..19e38913a 100644 --- a/docs/docs/manual/facets.md +++ b/docs/docs/manual/facets.md @@ -128,7 +128,7 @@ If you would like to export a scatterplot, OpenRefine will open a new tab with a You may want to explore your textual data with modifications that aren't permanent. Creating custom text facets will load your column into memory, transform the data temporarily, and store those transformations inside the facet. -You can also use custom text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and false” as values. +You can also use custom text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and “false” as values. Clicking on FacetCustom text facet… will bring up an [expressions](expressions) window where you can enter in a GREL, Jython, or Clojure expression to modify how the facet works. @@ -140,7 +140,7 @@ For example, you may wish to analyze only the first word in a text field - perha value.split(" ")[0] ``` -In this case, `split()` is creating an array of text strings based on every space in the cells ["Firstname", "Lastname"]. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. (Assuming the first name is one word, not something like “Mary Anne”.) We can do the same splitting and ask for the last name with +In this case, `split()` is creating an array of text strings based on every space in the cells ["Firstname", "Lastname"]. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. (Assuming the first name is one word, not something like “Mary Anne.”) We can do the same splitting and ask for the last name with ``` value.split(" ")[1] @@ -152,7 +152,7 @@ You may want to create a facet that references several columns. For example, let cells["First Name"].value[0] == cells["Last Name"].value[0] ``` -That expression will look for the first letter (the character at index 0) of each entry and compare them. Then it will facet your rows into `true` and `false`. +That expression will look for the first letter (the character at index 0) of each entry and compare them. Then it will facet your rows into “true” and “false.” You can learn more about text-modification functions on the [Expressions page](expressions). diff --git a/docs/docs/manual/transposing.md b/docs/docs/manual/transposing.md index fb81110c9..08210a65a 100644 --- a/docs/docs/manual/transposing.md +++ b/docs/docs/manual/transposing.md @@ -17,13 +17,13 @@ Imagine personal data with addresses in this format: |Jacques Cousteau|23, quai de Conti|Paris||France|75270| |Emmy Noether|010 N Merion Avenue|Bryn Mawr|Pennsylvania|USA|19010| -You can transpose the address information from this format into multiple rows. Go to the “Street” column and select “Transpose” → “Transpose cells across columns into rows.” From there you can select all of the five columns, starting with “Street” and ending with “Postal code,” that correspond to address information. Once you begin, you should put your project into [records mode](exploring#rows-vs-records) to associate the subsequent rows with “Name” as the key column. +You can transpose the address information from this format into multiple rows. Go to the “Street” column and select TransposeTranspose cells across columns into rows. From there you can select all of the five columns, starting with “Street” and ending with “Postal code,” that correspond to address information. Once you begin, you should put your project into [records mode](exploring#rows-vs-records) to associate the subsequent rows with “Name” as the key column. ![A screenshot of the transpose across columns window.](/img/transpose1.png) ### One column -You can transpose the multiple address columns into a series of rows instead: +You can transpose the multiple address columns into a series of rows: |Name|Address| |---|---| @@ -37,7 +37,7 @@ You can transpose the multiple address columns into a series of rows instead: ||USA| ||19010| -You can include the column-name information in each cell by prepending it to the value, with or without a separator: +You can choose one column and include the column-name information in each cell by prepending it to the value, with or without a separator: |Name|Address| |---|---| @@ -53,7 +53,7 @@ You can include the column-name information in each cell by prepending it to the ### Two columns -You can retain the column names as separate cell values, by selecting “Two new columns” and naming the key and value columns. +You can retain the column names as separate cell values, by selecting Two new columns and naming the key and value columns. |Name|Address part|Address| |---|---|---| @@ -91,7 +91,7 @@ The goal is to sort out all of the information contained in one column into sepa |Joe Khoury |Junior analyst |Beirut| |Samantha Martinez |CTO |Tokyo| -By selecting “Transpose” → “Transpose cells in rows into columns...” a window will appear that simply asks how many rows to transpose. In this case, each employee record has three rows, so input “3” (do not subtract one for the original column). The original column will disappear and be replaced with three columns, with the name of the original column plus a number appended. +By selecting TransposeTranspose cells in rows into columns... a window will appear that simply asks how many rows to transpose. In this case, each employee record has three rows, so input “3” (do not subtract one for the original column). The original column will disappear and be replaced with three columns, with the name of the original column plus a number appended. |Column 1 |Column 2 |Column 3| |---|---|---| @@ -99,13 +99,17 @@ By selecting “Transpose” → “Transpose cells in rows into columns...” a |Employee: Joe Khoury |Job title: Junior analyst |Office: Beirut| |Employee: Samantha Martinez |Job title: CTO |Office: Tokyo| -From here you can use “Cell editing” → “Replace” to remove “Employee: ”, “Job title: ”, and “Office: ”, or use [GREL functions](expressions#grel) with “Edit cells” → “Transform...” to clean out the extraneous characters: `value.replace('Employee: ', '')`, etc. +From here you can use Cell editingReplace to remove “Employee: ”, “Job title: ”, and “Office: ” if you wish, or use [expressions](expressions) with Edit cellsTransform... to clean out the extraneous characters: -If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try “Columnize by key/value columns.“ +``` +value.replace("Employee: ", "") +``` + +If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try Columnize by key/value columns. ## Columnize by key/value columns -This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at “Transpose” → “Columnize by key/value columns.” +This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at TransposeColumnize by key/value columns. ![A screenshot of the Columnize window.](/img/transpose2.png) @@ -120,7 +124,7 @@ Consider the following example, with flowers, their colours, and their Internati |Color |Yellow | |IUCN ID |161899 | -In this format, each flower species is described by multiple attributes on consecutive rows. The “Field” column contains the keys and the “Data” column contains the values. In the “Columnize by key/value columns” window you can select each of these from the available columns. It transforms the table as follows: +In this format, each flower species is described by multiple attributes on consecutive rows. The “Field” column contains the keys and the “Data” column contains the values. In the Columnize by key/value columns window you can select each of these from the available columns. It transforms the table as follows: | Name | Color | IUCN ID | |-----------------------|----------|---------| @@ -227,4 +231,4 @@ This will be transformed to: This actually changes the operation: OpenRefine no longer looks for the first key (“Name”) but simply pivots all information based on the first extra column's values. Every old row with the same value gets transposed into one new row. If you have more than one extra column, they are pivoted as well but not used as the new key. -You can use [“Fill down”](cellediting#fill-down) to put identical values in the extra columns if you need to. \ No newline at end of file +You can use [Fill down](cellediting#fill-down-and-blank-down) to put identical values in the extra columns if you need to. \ No newline at end of file diff --git a/docs/src/css/custom.css b/docs/src/css/custom.css index 05504f1bb..05723e88a 100644 --- a/docs/src/css/custom.css +++ b/docs/src/css/custom.css @@ -67,7 +67,7 @@ font-size: 35px; } - .markdown .menuItems { + .markdown .menuItems, .markdown .fieldLabels, .markdown .tabLabels, .markdown .buttonLabels { font-size: var(--ifm-code-font-size); display: inline-block; border: 2px solid #4dc6e1; diff --git a/docs/static/img/transpose1.png b/docs/static/img/transpose1.png index 63dfea7fcf38fe6b1c924592ae38cb9bdc91ccd4..649c9315c8a1f1831552605a4194a6057a7b5333 100644 GIT binary patch literal 23886 zcmeIac~sL^w?CYYR&7P30wN?8L{wB75t%~LDuW}6ObSR^Vg*6K7#Twn+iICYQk+l( zQbc7YrOKp`_$VY&hDa5H1VSM|C=g-@Nq~^#JppQ!_TG2h^{)H7zxS@^50?SHIp=)$ zygz%NoqX8c?UQ9j%ODWQCwuoeAAvyLWr2UzAAJaZ^PX)l4E*;F?ugqiNLj1#AK-`g zgTFlZB?MAVT|WEu2jJ(WXZ9S!K_DxhX#c&#jx0C9=V$i70k8MovsoE-68F{4^AFO$eNg@0q0(!s_U^qB`;`u&=DR)V z?`}g>Z~XSK@0E>5K3SUFBaGMK_*VqAccs@btZrsI(w4-gmt(}8j#VxEsix@! z(`Ekor^h!aw=OI)zWrsSE@p1!vl)%VR71VFLp(en_g^?r_cy=vn$DkcVB2Mt{=(5? zZu}RQ#_Ph>XyP=95Gx;h8KI_g)TqJo6DW;xm zK9D%&r5dQ7 zf}Z?(LOrgM!G)_ZO-aHup`W>=%T@W1n=scwDK3!3tAkNsv=B*(1=wX_(`N4g8xUIUyv5FhzV(k8UGZ2Bx)MX2+>Q z*sH|5O1BGY@L^N@O`ppP06DUCq;R1$=8_z zjGR?(j9Vd=+RPZzW!IIq`w%{&HvAX%_kHSyko58q96}Rv;L2?bYhWr<&Kfw!-|cPE zPv8jT<1SHwX&kAqL@9tP3j7{&k2m1FM`e;YreL(Nzrr_679WI8U=9t87K~%EETOg! zqPaf;otVC?igtjv>J%d?;|sjmp^oFl^$nP!#A69|aaoGEPf}oH$K0CxHWF=h8#VGcNj<5#Eu6iBzNcvVdjjgS`N2t zs=KU;bCXArPtQ296ntR{yWk+u-q}W?W457kUAiDKMUa?!#AGy)#uk}~>*12zOzzkc zzl1%O#G^Cs7{R*tF)DZK3|Iefy=}U9?$oS=t=cme$%dv^_X_K32u^X$A*8b73>fxc zD;#syRS3KDz+xZ6s>UgvI)rVHm&=`^i56(LtD)lvaV}PZfL=!V=HW*ND%b-vg9@r= zDZH{SknSR#s%7|4u@h0ic-WJE>2jP&~B$wH1|8``aGcP%trS&b-r0Ar6iMicMgsYNx#aSGO{I%o#luD#O`jVrk- zJksA;_aHtYFprWfYcFm+)*c|r8VLXb=t zKR2iWXHcqiD3T3?Ih{PPn(E=IBphd~a0%lwMrNdO#c(W*9Y?I9_815!9(*ceW2rzy zyW8PxiWJ$P?8k2}L$SD7y}SAJDgI`@Z?1es=T>U)G`~C zr3c%?H=ve6iO>u|qBrs9!TKfP?ezL3=!9oO151dlxl086I*^t-7D{bQ z#su}N$colv?wMiYl(F~m?6&e0_E?WCS_#Lv7E-Vigd)~PY!-dfcBzx{WaZ)NvG$U>?0X7milRe_V5D$;P0*!I{F|+{uQ^}73I-+qSR+bI zhOWoCChG*{8^sJ8^TqgKNqi~G9v_y* z4U}27BP4d_W&a}7l)0Dr+iatHoFxDe*cZs`S7zc0wy|G)Nr=nhuB1AiM9+Q<7$LRn z!F#nFZ*lbdUa?6+9O6704s>J;jnwIKocu29QE`ZvF{)xy;325Eo(SBF@SvQRnks2t z>2r&Ocla;)0LCCrHd*;QtM^pT;j~{k-lA1gMNLCfdsjld1MpA)K45J(l^)GZ~j1W4mPd>c@z$u%|5sflW6XRXl2`lu1&*BTWotp&*`by*ULQ4u+uB2@j%M%N16ze97ISbV+SH7LQBmw0$gP zi$;vPeG;Hlj`AJIS4a?Nc$0AxPrP*O^mxQqYf!h-X1e78?N9bf>jkdlR#A<#@~H^J z9S%puN-J9384UaT{F05nPxFI(R>u?Ygc-Xq`ATiOsZ> zyVQM z2;_5vmy!AUn7qUQ{kv1ojQ6HBhUXFRqbrT99by84191_shpgSwX7@lbU)2=Sn;X`1 z(<(>#jL;lrSfSH$&!Js5E=p}~W^t!&f-78Il-Ag6JoiT><=9#X zqMM!b4W6W5S;HO$utN8mc`GRk;z zU>Yy4eD#Cgax3mYCz@=AZLPxv_je|AmlIsWMN1&(r=Jq>dO=%W$ahaXdv6k~nHWwy zbE-DTRoFU879i;M(P?q5yA`23A5=6Qrq&dc*@{zbR$lV0u!#_> zJUR3Q23j;)D*uCcigpW67*jP>_%Pdyubp<08gV>Ls%-N+N3C*kF8E`*F!aSzn|C3QPOWtbyuOd(@FiJztH{IR#YYgx zSNnl^iT!%B{mA?9oYW%M?5tm0(KEkgK#}rts#C)7Zdm-(qb0~!{oY#ML8@144cKze zMh}Z5Nr0El$iSyIGin7>^D6mJLVuqbZ2IXFAxblySk+5Z_C4%Xji>rnRG?M8O1>fx z3e4T-R=^etE*1_H=+(z=D>8~Yn_@(m6~IVSFY=MHS5IK-SA#iQyEK0sBF!||0ZaH& zOHHc-=58QC!+Eq9T+Z(pZbGXko+P4X`!Y%iJKP1-zS1gwyf&Hne1vI{0L;$svs&wF zaQBj#a9S!HR2$CiWuIRb+^{yXR`RNbLm2-i43_qsWc9*F5bzBRsRTVC0 zGmAUba8&#b2{!zoU%89U?|_dO#HZn$Z+(`6@R8!7X%f6Uk6mJ>$GuA5DsPCY$4+1G z?r^Ae%|%u@`Ox`t2f_3O_EIzq^2q5&ZDP@pagQWRQf^n@vh8YRQU)rMbt<~6z+omI zd9Dy?cVtw>o9HS9l_Pqv)}p~Z$nlE-G+f8MU8IvoHTx?u&#FCV*I+7C$Y=O3xI>%@ zd2u#XH&2nxeqaTCXiGgCVzNe*D(-uli%aXc!kg~StE4_Jeku)l_~1jx`F4;f+%X(o z-lda0_06^y6fD}NEYz}#LWZ8Ma@RLIj^S5TX}p4J-AbMpoQzWjHp=XSoh&CaVum%I zvs#W1<`~Jsx-sfss3T5-F8}R4V&CI*W%ICgwvZYgDGXu0l82bwnAM3%ue0$D+g_7R z$Gt%ApdRKW1L&jVfSk9f;pp)49yFhyFPbX;dR$s*SRuPe7iF!6y?@m>>g;AXCF*It z8sVckb349xXFRIg(L2o{uE;R;5OLzNl^WHMs7A4j6o#`o$y0>i73X>CMB&KwHsv;I znkY2qX>QaBIll@Y#`pJ(2vbk>*o49|EpAEjkl@{)5*kP;C4#Byt*#>c z%y0Aopj9d5&%Qr4@p%yahoY;Ls`GQZ0?8PSyq`*vW~;Z9t9Lur<@84~JXKQRbMbr3 zC0p_(PPv#J?qdX&bhaiu{nHxp^{5m54-m9Km$B5aR$-dpTv?|9*V0vl2tF;0YN!AN zaF_E=0RrsrQM#h}5kicjYx?WEBD?6-zyb|?8Z_q6kXUi}6X&w8q4|Vm&FfgMqyxK1 zKOEJ3o2?e>#d|B^qy*Djid|(@is`y^l=3kqtJ9!SDy%eY5tQRcpH3NCFqk-+P(sA4 z@o8Phdd%&Q-3oyegD4{uF3+d{v8?l~&9MOt^_1ha`|Snb%{`OKpAo1Ef5{VnUfKQ< zx<|H~rf7y=5c;BVQv(RRzWM+tk~!;)pZ0`aL9?GuSbe29;GO3_)eY>~>DT5dz@Mzz zZ#|m?Z0M^R;2{1ow>6c9QufA5U@b3jOhA}-O$(4b_B#8bv7*6n?$<_dtu$}w^g`c< zKz;^ULwo(yODm1-Sys|NneMj#&fgyUk}TsMu(Ukft~J-;{8+&!zZ{*juG#KnYmP=) zWlNx|GU(wf+Gb&v(nxn+f;)Nx4fHg>t*DUMcAM-DE@?w=5@Gs{K9yx>vP&dKfz|)c zkp78jh5ie@-?yJ&X73|wMsD^JVpaDdnUS=r4}uZ`oPoj5+C?^}acC&g%NbpdJ&vX? zuezu8IA##{GVU;s(SZX?l5~A`BTVA=jnvgsyG8WdgKY zQR=3pv}eQ7Rh!qCtUKk+Gz;=C_wvQKbZ2~k(7>0aB?lQdykz6I^~oHOJ0q4;_2RNx z>AsO}wdh9XEk{Y_-;O(OAedT?r6e&&QV@Y6%`BQZ9pu;~52XEWAP5KcYghjoQbo%i z^{o@Gx6Wv-q~A{I(Ok715@xlg;>RK@x~Fo>-D@Xxsd|39v)ICCl?tKkj{xk7Nbzw` zAXg{#1qr1rr4%)p^)5$LA3p~78?E^Ac1I9BXNK}r^Aq6{ifT)M(i+zol_T(@6EvnX zIaz&mG5hA^G-bf%l1nT-`y1c3gqd$*>4$ixtx=0`%M-7MUYBxwpbLdHyvmYwk*OUBM(R_ls-M~jCZ$h0zQ zf7Q@!^4k>HnQ%9Wha+(~h}2!#IdIg}#!daI2$f@bNx9>Y{^-U+;9;-D!Dll zwK479{Pv-2Wz4>Qf=%F)T%1o>_B-{lTU})(44!*qiCFC4XRL_4n>k#U!E}tw)_Mw%FGnb)HV*UHv?o<+{EY8rCnl(QWIO%3L`)_&ez=MW@j2 zP|ovkbI#Tkb6kknORf;%vpr}SuhiKvde{~FbF?(>K=n0jXWpropMV`vGeiZ zDWkIbRq;03teg)mm0rgm(dCq$8@XEL5;5x6pIK367a1rl=jKa_=P=G|S6CXGo2EC~ zVKHa?rwjtC?lrSeZY3c+jC{s9OW9$jwN8aWB|4W-7hGXG31`xfXVgKjmbAmZAbs9L z!CYiQ$d!W516h`yvQspn!jH@^dsV0J3Fuvek@+BmGrZgL0RI1&IF>)Fu!1H zzhna?Erdqp?UqhY9o58^CGfjD;`(b|WN=G-BdPpI_IX3Pi#KOhy(&~m$W&s`GH4k; zGO@L<+{l0pbrkaLa>Hr+F+q-c$|7lvRSvqI5ldm0JtsJ9pjqlsdH%i#bj7yIy!@F~ ziCr$kP8bTz!H5ASIc$SmN%i@t!bN5oxjsb}h|YQaj$0+VazP&#U5}D z@AWhhB<{26FqWD#W6C-+*n`mlmklcjS%EG&Lc^7Uih_|s#l8geSY1Z3;>TVLUR*U& zeIUmz)}LPP)K$jv=H@Y`ZU9tM5C_9k>>?s5%cwnT6))nG5!qx@RJz@)g6Fn!ZN~`}}@(L)WmeLxfIRqt4H>U+7}} z2ci}0zB#MZ9>_2mn}8m1g7VT4KE_gFF(1!20Er(vZAK1GBd`gE;t6=6U3B(ik)fzb zV47dR4KZP{uX`w;A$EmZw3Zubqi>fvVex6OBUSt6YPIV#mada?KO^)2OEojWozgY( zCA8r+!|QTYbGfP~^7e{v=lRTdhIbR^ri?hIv&ZXtCyap*ajq+i1GJrY5ODV=50yd-QBW`0NKz*!yOPj%49VWR(CnSTXaj z+n=${Y<873i*sj@J)@qEteA>-f$_o10%S1ttWM=MFHLCNzHQhQ)ms-OYbOScroWEi z=-7d!7yq-(2}k8P@{xBor*>D!r>Ug-UuNfN)qht4JSn$%BQsGNIt^G4{ z>C<66Qu_BCOp%~pDfn}RKEGjUES)4-25APE*B|w<)QyO%v*elatMk5(`qXsGi^`2c z2bLV8Z01iL)28J?A9J09#yC46kG7m(9`856^r2Og4fy36zu8sVsHLzz*6VqZPbhj9 zIZ~oJqK(1~%a-=GKt4AE0FW*w*Q%wCbf?Y&rT(q%DrqZm>AV-%A`dOfETrgGSgm7K zO%dmIs@)k+9jz-<+zYG=pN+#!@@>^wEF>p`y-e(DI&D0S^A)`ZsonsVU4~= zlX3T|--9%Np{4uOrdMJ0*XC)?`L(nTJs-Xuyw>62$lAqQEoeiSj&o>M08>3_rW9=t z%7VRn-cXE96=%%9#O2=S<)ZMq%+B!?2|OeL3b5HnrvZ0d$&n&-D*8LH^s!^F*%2H5 z%Y9b#HLp5F`*;eLPZ7SV6gCS#jSGaF4+BZZ8u$+it!!JSO15g*2UN=Ah)cx&eeFJz z#iBjY4Y`#%DF|eFJV5kL4twB3TtiB#j+5`LEz+6V0f{Fo{}*Tr{N3 zAfv~R1&?9u6tg$x$t`8Ajr_6UU5F5rM1cmM`c>md+jvBH#dJQ@ABxjXO5%qgHNhMe=b= z>AjM2tbc6QQ^7f6UA$wzYM**_Pb1e0utvBN6Rt$g1lnN(YMCizHB#uxP`16b#IE;w zolSCQhEkvoNcCLEfL>cY(K)7}f9B?2lAO2f*g$Qsz1D*8DM0OaKfOO|%b}5`(HF?3 zAMd!=3odl56iV#xi&Sw02j8qGyu`9XpiCe$^~2YyM{=h9Ca|Vr>Hz*{+vszLq7GDS z&=prDNnu3|o3v~Us@<9NV!;AOZ-(oy`grd}y@m}{F2-w>f@t&24%83!s6yTPv-Vxt zSRruu<%^{i70gSPG~3q8!gKzOgvUxVY=qXc0NZNV%dW!@g{5+X?xT8_i&C*98=qKp zcT97pX-?iqQ-$3Jz!Kg#Mvh|k2XDEh%0qheXz=f+`++cj1ol{fx{Xf7pzvHZIq+<@ zu)4!eSS}rPlzu8s#|0d6mWl~X^mlW1-PtAWgSN_uTON-3NI5%v`5{WfotaA>B!usJ z#i^Vv9UkP`QcQpKT(nJh5~%Ev`EzM_GPZVGUetKcy#xZECiGNZSA0>A3zm0ynFywP zGV)aUw?U|4or;YjwdCRpjmVTb3t$gAi-q>R;)I+Wmi;(7OWHXd=|aBP5kDgQCUGte zIQ*s8_1fc3dmJ%lmta&I$<%@IwWSRgJ8nLY{0(E?Iv?*})*b+tiVb9TF@n?WcDCAP zK2c#Ef2+v{*-#W&NT)b1nJ3Ogan!(dykn8D4g^FE1v%69XG5nDPHDJq<{UwsE-??m zo|yA>{DRdUBbVk}N|aQ}`D6NIq20Fosds$)mK5-?CC?C&CTAY@^oG`}g){d#Qv6m| zOnd9QP+=pERY^%zuy}axdaW_f0zkQMk99bE^fw6+7D65^QelalzUb~&@^~R-O}hsV z8H42J`el6w)TwzJtLV#;E99m4my4SeMym-)|R5?TBl3Dohgk zOjI-=JxiMgq6$dP6+1!sk$ag+ACgMqG`Uee%55$HL?y!IH|;j8>iwyv)O4iXb|-ap zgr!5q;quuJlR}yc_IT5}uJ65@k8sr@E8YQy>GqoX! z#OBA}1BwlWcjuks83%3lwA>~IO=-^vzO8J?R$dSu)uyRM@gvr{y-)D8^g6{uQOsNrzLN7DD(<%Vwa%5+1OZR40_puTV%6^7HJ$n!x)$D5r5Fs54Tflz0nZFy zCD`aQlR$PZzVvxTh2}bE@9mkfkX}AEk5Pwhm&LZ+UJZe!{>u3jYDMH| zdv!K7_1x$;LA|-tZUo)EApxnv7sXh-xEmxlUVq!vC-k;eSOe2Ib=IW_Ss^v3dREC= zPH1BxTSghJ2Gd(XhS~rR+S0G=QN^z=w<7*9fhf-wCb5&-f~_J_J$kh%D*Yu5U1eP^ z?i|&VRL4><+%+6?IfJ@=L|Owd?4U{2xey*(7?t+QE;U3Le_vz+RjT5f9R%eW+QQG9 z@o&T(SM_hr=u0#GFNiMar)tv{bhr8!d3C2ncm7mr{0G`3uHAb1ggg_TRD3M7b)batjt}SHoP7g6&S#Cx+nL*Q zF73~HFMik;-`?lH3WWTzFd}^3^8U&+QldGI!YOmbD&cv-vpVJwz2VI`MGq8g z^pG)O-y6~wy!Go)BxfEd^e$_&-Z!_`u$PgN%|Z)lttQzA4H+G?i_@#x_7Lr9>$GGyj$VIz;oL?WZt^KdV}1* zl}1vztzrKz7+>>xSI4V{O6H0d(}Y&k^-}3NRuo7qG}BC?v~QZEk)Nbh$H0Bgq0fI$i#;8d;}l0r2Ei97Anc!Y=3Cv0b->nE3G9|f@M&f*$;=~c!503tV^rN7 z>j+=MvHLhjsO<=@h;*hv%j)g-422x&Mf&CjFulQ$^RA#sYG+y|)y ztqjH>2-6ejwtw##p889rO=7Z$Ex%DGvDOB@hgM@6gFp3hb6(S2z%f0Qo~CPdV-j0~ zO8was!F&bBK77tKDIQjPaj%RPX9$^N`*I&&^>th|7A?p5FT9+%PmTAU< z7AGM(hPgo+H~XrEeD~_U*$aweh>P$0R9gX{%pPZ|k1>HIkEVq0zh{?k7{>dBv{7q4 z?xg~MzHDz%&-R@7bIp zbax3sP5$?8t#P9MuCFn?I3GloYeF9LY0DY;h#{g@VLsD#+ko_Bp=b?AKQTFPdEP3~ zfQopsHsGc?f|B#rKTFh?%M72y{nyIpht1wxV8Jc_zocBmWc9l`nyr4q-@B#0f1x|M zt_zu!IqMSJP}N>?DAz16yPVqh&reyfJfNUR++v|h!zE$2?qPp8U(o;aM4(?JT%6;m-fvgl7k(Em4(@Jr#jh4`Kw$V!f@Yvu&GDd)&N-RlKDE&%$%%Xw)}L=;bK=;WcmG{T<@cr%H$2z zOe0GKY25MTL0yiAZ10s~8A=POEF3Pspbc~%s1iZqDRG$uM>MTcJq?yb|se+8Np!4oT)=0!#2bD0M?%5#d zwCfF_><%g+Oxu-L-5oLHpiN@70i9lGvp`h1f+mIx7;?5xhk4owF)M*-Bu_Z%F%DxY z@6&j1UPu-zMiOjsa)WE{l8krL{g#g`Jp&Lx{sqJ3vS}`VV@4N+`R-_}=29^z$P<5x zn@Ic$%+v4Mh(T$-;KtQ^=$N#m_Xwz5;|Y^J{* zm(=tU_NmR*fQE@b0fk1-u*rYv5MTAJ?VVz!hO2f3B$vxm=_J*t4!us^l`6tWU@pH7 zq0`^#^sc4@xMsiJ;;$gU^fH*Pd(GHj|v`~!0zw}Qmj9F zXrr|QRoLt$Ehtstg-xRLkJeTJ%zgp51-@~?P_37yOXXR)#PRe(q?RX z_pUO(-~p{W^vgzv?GG%jo8~lO2sxcXnM%K0jC*`YTgV{I?g;XK}wf0myMMm@;awq#LP)v)|OAXlzy zmCipiWU0*q!ulg8N+FM;UiTBc-374dJ$2t~BUs?ZpZ?-4LBH^ywHEwOEn0KOsBYp{ zOA+vI=N7&1Wq&=v3T>Aw0#pg!5;ko>@ps%ZC}AxEFCUxh0DEba2RFj+r8o&UbwpD} zA5J>qRWdG&51eAgR=W=}3cg0L@AO)qiF+Lm%;&A|!x^7UuIzYXcNzAq81*v;OUPgZ zTP8_Pi|_3(vB<$xjwU|WUblI(cP>d1{!`-5P}Ysv^MR^w33Iz${uBlJX1^KTc|$~q zPL|c)nq>*=HpM^MG-G~fI5!_@j+{ko8`sN=I}nuxE0I)BS>CE| zRNBx2qYkhz-QqZbQJ;6&xQ8VPY6;CU%krI}RDjL&R|t_6 z0(I@yvk!}+vTW(NbA%1(d`hw1j^1HnGKPD)7QIH79ovEht@X~vYhOORmau}$RSL?- ziw#>aK{N?GZf%FQ@A%DpcKkS5@Z=zGRnpXtHrE_;;h{I%5JBg%IDJ^7%OHs^EX8b} zen(XKjVb40jLD7aAM6e>P;sA>bHaQbDN;F{D@K}J=3Gr?i%KnX=zbZE;XU+N&Wnt& zB2Ydq36x&!t$TFw*1@u$#tJ`;fE(iTi{eo_xegDyx)h=Jwm#4>Ob>6h4u4cDd zB031jZ4{Hz<4Qqwo&9D+bYwtOgr|uwfIkMK%VY~C*;UyuF{5O2>dSj<-1-se2>B*< zO54#sWT{73>8q@qJcmU4lR4mK5&$>Z0~2q!K#ST^UqLi-+mURZ7uy>*gssUAa{`@X z`McO@tg#lMwjM$p%TzqY6l0{B&HT7-11iox?1i7@L>68++^m;Ozze?mWU;LC8|;vu z$$nb9^t-_?2h>{03CJ<45*V)vudvx55Ol|c@asmO7I1)nEl+IKrAoWnDFXRWl5o{b z;>X|-ScV6UBP!S8HaTXTHQ?^m20jWNDzq<-63QBE07aYF4N7y$aU8!@NUr7xSB5H& z)=?;{^P~fUiHdq$y-mHnb40MKG^wr2mG?6!Z@uLe&V~hbw?N||Gyf{y^IiRLt`M67 zyA`y9T1aUU8dh9&g6lglXJntFcF+&=*0gNuH|p=)4WvHpw)~kfV}lDf@>ru_`*_S2 zJ5z1=v9>p|fyWyCJTE%Ox+;w|6Oh8aVy{&`&=_a4mXk;BFiUWUjoO&tN(|Fvwfh*? z5-7F31JNgUP?HIP{#nvzvuBix#HrDVyK(uyF!t?;;#Gt1LVnkOjgu_+hk8Re(p+uc zVp^hnY!wDM6Sio9r9osvTIQ6S?o@>#Bx3Gvp0@Suzt?Ac20F#kPQ3>?b83-9Trm3f zy7w0#y{rCk5i543eWA_O#M-RyO+)+QBWK?)tYjI{;IC)?CSW;XliC{!D-o&>Ow49& zYX(L8?aU1l$L zn3Y%Wz=wg9Qg}TRb#WG2GlX}om$Jj_4Zm6(u+(skOq9={$qjef;-b{c>Zk0sc`y?w zVT@pB1gZPlrzn{dF6o~)D@liSQr=%D_H?AQ@Pf`AxU!Rq^{0>6N~V%naZHVjDddD8 zsUEZYf0A#~l?@X5^&6@moD`@6gUi};GsB23%7ey^rj#}W(yb)FcR35hz!UM*8*KJ~ zXivb+Ti6>TM|)rnC`n7F$=0f`hZ`5V2@(rtd#IT`N1WA-NlypEy`?t!d8zxy@z zd2!WN{awlzJFyY6!cwX4aP>&%K4ibWS=C0+0=AkC=W`=k1O%g&8;!yG&)~$L z_a{w0sxmwVkyCt0`tK>coeNhw_#$Zqt*3rDAAdT_c6+Aq1!yUYYQZaPl@uEvc*X;))d)Ufu*?+XZ>rh;jPQGlVS*ZX*2JKDEaW2w?BqIo>paZC(=kKsvO!s9 zU0jjW8{8JL;YtF(4{X=Zu?5~@y1%ku=Injg?{8wI3(i2DSo!MvxE&EGshtPf~ zvv^vHkRZLPd&L4opxB0!!O0J&w~0?ZuhD^Ie6}c50MJ@rWnOeQt*T+zun&nE+qx1< zwmVmQmTc(12Z8Lz;kT{^LWJjzsZzu@#f8a=*vPCxmQP5yQJ&JtA-Ep5884kVYQeRZ z$K}K_M!Fx1sR?6Ah(x-thsl#GSA26mj?zy7$oq6B~H2D9SZ!Wd*qouEO`Q`k~ z_4EU^34{#K|$MtqY>2f&a(61p2yGH``B;Z zRQQiA)cc?Xa{(xKQ;g9nH^MUteVltQ-<(!|ho0>~vEHJ}a~*Z~Bl+fN)G<1o<7*XW!0? z?Hf9e%(hX9lYNf8(<_3y-v_txs#h#J-PvvM)+1Y13ZURHkD?YaQQJtbPP*{^QPb0# zH|YcPyXqZ723_h#OKj9*+NLRvUK!TixRLHehuV8_a>3D@>kIJN#3QfMuy?i32T zEpCxRkri$QR&snBb2#IxJDC<|$EIzVns**yHp}a6IGZM>U7-1* z*l-9$;k-zGK}*lY!PxZ;{hNXURPkZ(mLmqdt2>|$+uW3v2x^;=CA@5?ozZ5OH@%wU zcH;8IE+RIHwmK_MKz%Dw)#<0iXcg(I! z>|lC|;G*)?tK8|k(@RnjbXEggza3HL(7MYqfw2rWS}$-OliD;$2XX#CNP$2rzF2&| zvuP{AWj4Ddi#T)E5_Cyr>VK`4Ar~&7&0Ql{Py>)vPjgl~&iZLDq%Ahi1^@p>4E_<@ z7}#VJKX}!*3*Z9RLDTnJXaP`=7_E)}r-1n%ZoM^cd2KP~F2DAu?RG}OzGsbG-~Zcj z0j9d%-l4e?3OTR4m?O)B5TBMWZ%(DV`ZWWF$}) zN^%Pqv;M{uydEamL*F*79o~)|*PFR$w=b8c9<+lot0hK|i+UYN5me{SYC=d(5{o(mftW59mPpXd6Wn;Iv#1o~Jl14-r}Dx9_DB1F;Z1{c zcKVC?b7ThX=NR5V^pzISknUZ8f))GfDvM+{>{_EfH`Co2A+*Oh(k;tAfP4k6dj3u1 zpRe10=ssmhDWEg|})2_=o9J!7NB7Tw~4@b%=bm_0^m!o!F|HroVn zia7ZxAuo;~zKR2NPea;ni5~#)A6Xi8b(bRv?=4QTrRP$2Hx%nA%|Gi^eMK2qsv~6@xe3>)tiB?2Ost$VUvB93C z@#OKe@wQ(T;bMombCg{js|q&2ENw&2HlPs+N?tEZ?bY>d_}L45`TxXN9XGM5f&N;H^hAn~yUMFJaG7$NL!G+`Za&KU)e0 zza<%JcudvWmyLVK#pPAT^CN1l!Lly(M>R%HLMzRL6)eXIzmq&uEbwX8J_VnKs(-Um z9kCx@CGjgl&X3-GdH^&{f`POdr0I;FqeVpw{8R0-t-&beOGVE;Exn0T4d8`Z82bmW z8_mzZy|L+Kq?IX4Gt1G`u2uCeTkq}N>vUp0tNo~(V*goV#WZrnW}N9vDO@)<3Gl}o zmsE1opxM$$LYOUj`d8n;J>GnUCrajN)z3gpeUFMQ7tKDlntePp)jO5e8(VdVmtzU0 z$&EjsmDq6#WBDVXTjXk|fx>r9GSo(~gvCI+cS0(6Evh)+8vJA8)wu+U@G=R;1}BtJ_6g`P^z0OBGkb z!)BE|5nZRvHqyE$_sWTCAxTZZZIl@2+3T42ezTqxN6V9j_IURO)~=wpg{6e?Z#q(N z6Ita}Z*idm{XuPp73kHqx;@fKfOloEeb_9_xr6*jc(gx{pYJQCvwXn#4X^vX?7_LC z4hLl38%ZtfxJ8&&*URauE_$8#L3SwePUgL3Z_%ErnW8?NmZCILgCiAHV6+t&hMQL& z!Jg%dRs6e$RuvE;G$VBN$P+izOxxKNG^=sXe$ZM11!o6b6()_gD(fiKfmKo?j6ZuM zr`l=*Eao(@q}nOTC+^M*lb zZ7P`-TYhNzDth{=fv9&nyEm@4v%A9(mS50I$P@^P<3hgZdF{9qp&dlH8Bee`gO{;b z12)4BK^lxWgGZrYX{(e8;c=ABZ;4H>M%8<0ZDW6=Mjn|#$e26d+nO~cR*#GCwrFXI zn(c^G`MiuN)UpBJG(UiPyXV|2W$=AXbk`m-Noq^FH}=rWH7FphZtyJ`(*9|8^!8PP z(6Oy=h(I)4{T!uwo=7K%!gNR~F6m&(Fn<_YO6+z0OZS}~8j@tiXu{O5zKn@i+2RCY zA4)Ai{fG8Ql@`3}mey=psl6TItR7fAuzlcW*;`dNx*iN3{iw$0m)@M?3j?%ozyWCb ze`E@_GGt+X_6guWwdYv(=IHP5=ErZZoN%Y;&mU)(&msBmisXgy-U`FlQ^NmxbmhX6 l{G|&sy`O?f(UTR{_?@9Zw=UgF}yhw-di z?g4)OG#3|N9p~>ZBI=Vh7neux&8wI0d08;4Tk>RD+>^aGGt~0GKWRVVBz}9h z?g6n2U;gAj<@NpJ*{@$DZg^UiDLvAtxOQjm%a^DK(g#Y>Wyn=sZr3ceU z((l*T(K`#s+9l)+Hm86DikqTBqjwF6)K;ui6JiJ-^E)w@HvZ$s0l zml(2ZZW{Jt^oB)&hNZdM+_edPf1hkr+Eng+B$nXfGeA0kSh$QwGZG)dY^K|?O0TAY->*Fig@1=;ptFnl`fBxTaENa{HYcRRgUzds z`MMvB>r4O_(%5^?=P}9ZGt18DF8(2D%Tsc1~T{~!2?H=Y*1#Y;Qmw2PEeq&A* z2gPYM{}j%?g*@q0UEGfU$PXqVT;6y5#lg$t!)Y4%hPJq^*{#`U2RYw;uiK9f+4*#g1m1b-xjb?E>3zHv%^-hf zy#Ldk2P|ziE--{Sv?7TV#SW_gQ=6z-}}}FH&v?M*DUbM!H!Vq zL2}9LcLtg#O{kLWH8{HE!HBqI&IR_L`;NOEF!2wDY8G5i9nNynn0<&VWaWZczR}CJ z*X$b}SmAsd+{=QXf|hR%T^>p@E!h#HoHG?GswO*2NDLhSLziw+5}tk@{8Gne5XuILm(l8_*lW7P zYdlKtF?Mk&6g@ z8CCtY)xIWd55ud89jRC%j`MDM;ua@{v~9dvCp=K_J~P=$+$JKbtLey?qTN9)CB|(J zP=|ZzVl%QmXB6Jj!e|S(0K*J(-YP7)^e0`GK_yzP&6UH|AOyS`LY$b|GA)*SXH~u5 z9fW9LEiqrCjfj%K<~rGII4Kg_cTa7>tG(wLhuw|P3H|v$vBZ9B2&h_r;5NO6RHdTt zvB2w8bt%(0mUVmYwPGf3eV4Ohx5o@ZtXOcc=4xvh@ni9 z9v6}6x+a-S#iaF~2Jca#->ddJMA|3YGW82B$H!ET5Ui#~S=|r3)(Tb}k~N8<6mYnK z9jLT#EJv3D?=T!)ijf<283&!SZ_e?X`GBA8ShN8JJraf|_Sfk+4CTnnuwBTD>lby{LI|Zcu5U_N4LAHP7M`K?U?TjG&yc(2NR{)A}WCN@K7S zy|p`@Ep4vAXyaE~x~j0MFw;g5CU&Bh=^Mi>{Je;_;J#vsG0jE`NgyH&SWt!}zJT;0 z6*_cX#}p`KErj4f=go6FsYAwB$M(i2OPGGJ)aj0Q%jh6q9K08=V|ocAxhhJ~EV7R^ zhw0P%A{BEMg@;6sq}@$)sZ}=lFdY-+dP{Qtw0nr4@Y>B5^f;C2(x_u1XWC{Oc5O|N z0586-H?8hmZQ+dTbYZ(2grX(FA~p5NE5SsiS=f@=#>^EVPY?4cTc= zAT7pOW!PS0dNS_KoRpl(izZPH)8#)q-y>*R>7cUdHQ1a1mzWX*_)^a*{pMK3$JXx| z?Z(7xVv&CwwE*PG&dF`RhYHCfR6a^JnKhOZCpIg&T2(p;QpA?4?@nnjtW~i&clmUQ zS96?(&BWFy;02fQuDsCZOYOpluZzbC*MNOhF!A^OBt?uR#J?L-uHE%HKH+|qWo*3Z zIZUDXLG7^P;pk3CTs4$JNe{OOUe%ma0?||;^EwW#ul8R@Mj)(jAwn?`A-cuP^KI&* zqEyG5;zp55hfhxxIcm()ACjUX&83ba+iSjoTL^gDP{FuChw4iyVa=-#)cl*FAE_SA zv3Ts6cxpv8%%fVlg?{42!HUT2+JRc=Oho^m-^enQP{_gB9RJ-U-kYuh-H_1{U))A~ zWX*&Pe(xycPmNg%?Vl@iA$KL<_hwJR>+m(o7tm97isS*wa9jr68p()-5<%Q1os905 zkVLd!beHs57{l8_D|AY|F?Y<^LXFQnHV1lg2Ras<>_xgHRi@*O*7YFQVM( z26Y6|#0?^nJ3H}4qjc5LDTVGlB4^*kO#Ig_GQ8T(3Ge14*5eCa1)G9s#{lScP7YQL z&8}?#nOg3oTD_A~DDX~h9xviNwv)N_&i{Y=b>2k)4Fi9)2y9alnDZ$?kTWCa)8hx6 z1?~nueX+*_D>}gWGNa-D<%5Yz#M#;{Y|odSky9=I6}2(HzH&#O-C2dYC}rsNShs{R z&(5^`p^K$42+jgjE^y$c{;MndvS7cbhR#J@UF_tnAOxe^u&lP*PqXT-*m)zx<19O zbqxqF8f#Xm*UmUcX@?w!#6{ElipR>5#c-D56mBj(;L#Ym^Y8k32^rm%emqI1Ty}dy zb07HP)+UC9DV7+nAE%7{E8%FF(Y$0?{kcq$xX%gZFpYZnydeSM7I zca`7Imk1gsQU)9R3YGMqhff0c;4PF z&uCz&@LU*pPokJ|P%t=PlNS!RWu&#w%eG!37I@E2b91T4Q-*5tp*SIvv=KsT@^q{e z=2f`}7PQGO_dR_;XsJxx>Zt6>C9@Rlgi|G0< zF5JHTFNTG?^3=3((M1N-gi}L-2{$Iw)4SS+S7Y!4K~^5xt2txl@mESTi}Aa-KtfZf z<&6>Jp?&>4?lStQ1M{@kBYO?rQ%YqE7vC`h!GRvd^L5Q_!8y=mxQC#D0;!fg4WD}mg?RD>fS`C*aOQ0|V zi1+AEZ+2g1dYwiQILkb(kKS>pVs~3H(ltvIu{spkllwl6EvuqDOFHRKWdTH@0x3b zO(Kt)X)tq3#;uOoCWWhLh)PU)V^t!Lo>zhz5n_oVxJIOxH3G)Y?2mI z2)T}_rna>WkWWP^g|xR0nDYFhrnx2DdfeX@Tw2pxv$>)}Tw>1ABS)gp)jn~}MEFaA z%b+}SEpYnxc%xdK6rxFDo#HYWjzsL{QaKCQP6FUpDyNrt4|8!dO+L)B3LLx#pmJ1Ru%*`X!b##c)LzP=E65nve-W!Iog(` zkuS&)7Nm1ttvrvOU8|Yp*=i{P{Obv^juQqQ zpV=8dcmdUfSm+f!3Qkxlgsl~}w5^lc%&-eu{tI)JwQH5pC_lPb^ekr9ci6u-YUA!^ zhCe&w3J({T*YBM0D*3uJI>RVWu1%6JUhUhz*H&~*Z+$_3YhiQ@J2NKh&l2(nPv3mI z0o$5~NlxzK`a*xW!NieR)gj5uF})4!il6t!WcO%f2N3QY!7+%hZHaw-vL>azlwHBV z+f-0M{;Xhl)p}#tdSm{5t;w4k*Hp$fJ_;~y8i!@f$n4_k;pd$4>{nXGV+GOy+SHgf zO@0YkrCxV(B;ZzApPOS4DW%HDentI)d!w7-VNG&G^T)?luQ-d#bSAW^sbamP_d$W`!#H5VA zl{kzMo}TuC99ziafBnm7{Lnfh|NdCm)=HS1Jq>BE5UJjD6# zz9?E-zi`0TQ)ap(IUxhzw5L}=#if#B1Xo@vUN|8+={^6N)P}l9@ySk~4nV0US+)~w zTYoq@D1sj8PRrCHlK zoZEXkfiK9T!8-^!`8^OVzy;dIIC`WH-{C*%+`(o?w1w0#C%+3_qlNY5F5X8P)eUmV z6wVjgTuC;GEpHkSK^kVK;Aw`-O2k3cUg5MDRon%kDi8S}GIzY32W|2Jdi}6I4t7LA zNC02SSB2g1QFOmcsg3fxyBcRMcjSt57(qTW-a_*7(DMm%Wxik$ql=}@G42|L5)&S< z!KRpHa3|yd+f0T`O9;DbcyVnTAZP?a^<-@)Pb8cDuCx*0R`F@7r$n;Yd@75Q8(8zw zSCQNna^G3-I-Pf-)wuAdNNf&?e4nA>n2s!6U-^vb#lP}mKIDt%FC|IxuO7hrnbQt` z96XZ@jbaY>`>%D4hWNdZI+a?nw@0~ny?`n@P$Fp8q$1Pnt_`Ld+3PfwFe?kh`;Ici z*|MZvTo1pnY203}*)4nfmBA%UEhj4bb8%J&ynZ7?PF#0BD|FEYZ<}UqRU!`wc9k+o zZpMpF^{QHk73W3xI;+LhG3qYN6R&K33GxeLC2y@J^ZWXJG@Hy}KR?zmZ<2cm%)PjE zH?5`Zr_W1^mQjvZ_K@QYg%hzfqCf3M)h>>N+dT|27yi6 zh^G&KV{A2M*ZygnymmlH>*)EScF1p-r(GKqb^*PRcJwmH_s&dTkXWNx@@MmlVc3~I zH~*FOB1d(|7a$_^b%BEDdDQEXp2$yBnXdc9uWi9E2aOu5W%y*iw!9KFN8$pKH+qwK zhY;$^8pOM(qp|kkZqu&UG>z0^ntqKKVkaPH-XIqoE%|coTqiG1dg!xbA3l_Fk4Big_#=a<3l_`m1S*%!a9}1lOjouA!|wR#G3D& zVUfw{J<-+Dy-bmJWz_^8j1zX)N;4wX+Zru5kUjGR2#RjiP{XP3_7$)8vyFQzI(-c1 zPdr?XH8z=X5wLj839sKkE%2I7|Ks|arZTmDQd_7wp*i)=GbqZmonN;H2@g(2GFl1x zC86a-?sh~=FnvF(15FY~cc32R!BG3@z)Od_xOY+bw-KM(MdD*?^}3D^QzS4*90=`2 zHhPU(XGRnn=CnQMa*oW^I_9y@1`y&@j!zd}@2HX%o9uz0MU=6WWURv2@*VM@(<0Hz zx#DTyc(5+&)Toq3+rWxd?{$XpKC}zrH!k^yK%!fDK;k{o?JZlMWNCkhvC{7aZ9n1q z8y6R48+dtrtrd`WntaR&do?ke2xGLe%A*uiumoWfSV-P*Rx8)b@ zL;G3$0VI%@Q-1(iZ5v+yomcXH2VSLq@%R6jLp2JzNf>30`bNW7@8=yTzV*xX!!bH%?FjGONcVkt#yjz*9Ozp*)M1ZGJs#rl-+OiJrm zMnN8CbLxst*Ex=_SOS#;XP>rfi>U@Oe!Y5YP?&G8EI4JOrU~-;5N@yACrO_0+NCa7 zgE+6Vm6`U24t+o4eq@Vqi@%jp$FhlXyS?N|OGj;jd0nvixV|w6(!g=jJ`fC3oHeci z73>7*-khyw9N6nopGU- z&<�hu+1t8=5F{anI0lXcKZ{ftCde+wo!9SOe)!N#F18)gHFVtH#$*5l(7yd=LKG z$}n=}Y)!C)4wn{y&k^x-C4F3>hbZbmK#iKgde!*Jqo>{ai^958`0o+T5w?0Ws(tKv zjypgUh`m}8ZsvId?~F_I(s<({e~exYmM0STDl8g5Rv&%@KnF84Xh|9!7o%qSBl|a z%8FR%%#j+(g}Lt8PZbIbl`2X^U-z1p`~RE_2v(*)N}Vi7T~WwN`J}FwN!yvwZ_k%kTzfj&QF+ZewzI8d1-yG#-4jOx>1Qp@n zQcD3KxSt@g%2~iJ5#@<>a*PVkmW5^EXRD#LkK+gOC1yP(fRzXMsu!v&qmG2T8l=~{ zoa-{Ml{=rckWw3ZlWHznXxO%J)W3ZIrav|YiJI6%VzNPYF1mWa`TsiMpIX)Sq+V6vzwZL>ad<+e7Zgu9|U%M&haiy^tuWW%d;~d zUa6GP!|(o1(*3h48*yY#Re{RQjb1lCLdIY8Xwuh`iT&dq>R`^(PFjghV{6WE)#u{# zk@qJ@=QGz|*p04##>jx|jybk%PvoJ+DD&r56wlU66R?XqO`zXGzdLO-eCpFPC=Rf- zXbQ`4s`6UD68K}NhMcn!aqTDIeM=}@rkIPm5IZz+T1V- z`mqVhbHU#FZ3CN(DbnUw5KJ(b-|P4!j9_*HCab-bIWhcOM!qVtw72C@(kEcI=|1I>r{>zmY%DQDK1E$bVch7KB>P1P@ zpT2v2npcm!uy)7BUm%OZ`-k9~0=k*;p{29H9=As1cxNaSx0X0Ja^kkhvW%x@XMCK? z(81Kvn5lK1#l5MDNS{FEcT@CZ?x0Tn413ut8u7CZSbMz;ymMkO$634>WhWUrt}^sE z8fa1&*PRpAp#21jm073T4|45qaxXMYZ{_yYJWZ=Bb_-E*3bugNCKBG$x~*h$q+T#n zYRhYtv!-N|;97u7%!9!9BpfAIL?>0VNk_y4mA_HQsR0 zq3DIPz*;Hd&M9bDmQ z3IL#IoWkf)$le6wZFpSmmSUobOI@-3qYxg2KTD~u@d*AKFJ~dka+}%!|Sh^9@ z+wu(9wPvD0t^&Jn?#~`1#sl8-#CO5vE>I;chSUF$KCj`j*cXdA{`5@U_aV5WnjG#* z`}Bbu@FoLC8<~QuhL|{TI${9EF+egIuT;$vCg-fZ>$g{B9D3DGG_%27k(_%lJB5%{ z(?3U(d$WCdDP)&2MkCm(KmHgHNI!|UX<>apu-*OIaCj+Q0+VoP3S~!T`Oa1gDO+@b z?DnR5bP7e)7r38uI_yI?P%j_ApEEFw(`vsezMJ#XOAN2FPj1{N%2i6UeWg9)aNef~ zx3BvMX23UM6+A)s8hcZkvq^X;Cl~NL;qh14L`lmQHk z;fnfqQjb9d&tLU=hb`DEx(&1YU4Hh@bFysPtC3XYXB%a@0pT%-1>@gT4*!oFv)1}* z^W?c$e!g_iuklF&05pCQUH?BQA1d^%hLY@?^^regZJlsAcl1iuGTMjrb%-iI8{PZI zzXy4=O)5+_$tC~Bl>ktVU}F9sJ1Lh_JA4?3r}x4iDe6ZTr#LGDi9dqCyzMDJFzh}d zX2|!#JWR-X6<*PQr~s?2*|+=G4-WAbsLKI<^29L$C&>?j?9S+8d|z&}tHVvKd&Yx& z%>DWqVf8oy<_5a|IGrF_*NflIJOh3|95@Bid@N#4pR;iuDBRcqGu!t$Z^ipe6=EDq ziBTBv6wI;>^A=l62vtUQTUU*C5dx%fwAy;ru9RI7?wcd~GQ-@XCW%|FWy zrOo3Y+or9GHjCTJq1XhMg@O}`sXllq!^nh394x|I;|&KPQRZEnq>;?^?=~eC1(?TZ z5aD)X>PMYKh);hvlovhpbG~Po-1^(lC#iTtZ}WnQ?Rm#7b2poip z=5X=F{EFJ|dg5Br@h!`}ULMY1Y+X!H1z4{!_@lWKNm-w(H3;bDDHDfB_@!8n@7#{l z8%Y_UZT{8;cS{I<>MSYB@1B%VV52{j;Iel*Ap7LPTD08OJlqfyA8gg1OwLfu?K86~ z9=7VIwj<2d#<0%C6+8LSrGGXeNGVQDP`ua2KONReuQ`6*g&d%W#gY>GGk=nZYS-zy zk3YfHXChaHV%uJ&Ay05Dsyd$j#CPW0t*OY7&yJH)Z`uY_2?d)mI+o{YXKYI_D1w=L zBO!eb9({kTt3o+#adzIrq4bG^dT^ZSe2lw}xE+?HwNa(`#Sl1mBpS5YmFIa~O(<*O z+h?m@p74n66a(uoyCo!}s|h%WRX5z4y@K(gU8jcY+G)C&Rb;D1m%*3M$NN}B_9zu# z(Hx7T#UV@r6{FLWZ{5Q+d{`7@yDw6!!eEO7PB7^@)sP$}!Fp}>X0qT!*~SV1{DRf6 zRBVHJTx9+w4C zI=@iOuhjyVj8{%_xTb4)7jsXeZ|c1K9_Fd+5$ae4<(x(i8E@D6Ev^gK@NSVWK2nJv z+>G$`@u~iqUi7f0f%?o*F2+B!JR!zByeMZUWG0Zf&KLOISB zuUxS3a%v-&VH#qPDhL9I_GP0Vjyu!%rCMfeJ6wUG3vVX zxcK0R=%fNtgX~-LFWY+>ACK@o`3iYYL(sdt5?8Fm&#aD*svk!<+u_nGk6T*OG z#lgE&whp7A?PX`S%RxUk_Piyj@>H!(Y-@7;k@Urmsr(wg{gE8|A1v{n9dW&WA<15D zZ1%Ghu_W7`q%8Yo6@ab63>CFl_rHrZ-ETV8!ygt-36--P5PkRLKJn7$7(a9GSN&3| z*Bh5hQW6*gBL2W2s3m10nbEREH~UQdcDp&TFODWgzY&6G5bnjHfp(f>RX#p&L;je&&9D^pcQ{B8rZ;3N%hEh8WxqQ1 zs$RaZ^cz5|{dCjdFj{06)m&`i{Az%tsV(HUPEDwwL+$HG@wT6FJZJQbi`*PBRcXW% z$cg8GjBs9{iMz5PRy_$?WFcx@&gCt@QPuv?9Pn4T+B^(5>qcv{0kyC|JV-Ydq z$-ienDl1|KcLNskGJ(tZY1PxwOz@u!AXf>F-R${@5ew;dl(*Z-pZIGt0m7^dfwxB) zE3IQs|3Uct-PHZj1n%0&P_#GxA1?zxa&i6n#}q0Rijy$j+M0aMuUU3=;2Wrx-Q*u2 z@#jC7u}Q;1VTs~7s{MqucW$DK9V`v5fu zw0@LhGKtUVe`o<^k9ZutRoTywxz&v_uMu35XfTe_k!O^p(_-kRELTPugfL-LT#cz1 z>leDc(c-PS%+ws-a*L?BRoz0>+{&pTVSdXhy&v+F!)57J zm*uQXLUa<#jitj=JgwFzsdBY*IVn~@^kugSXh}ca*^a8SCvS2N!Wn_1V>8rOH6KdL zsSn@tZ@Mr(p|AnJ3|e+#3C#tDJ9ar1)cZmy6L`VpRH1xuq= zPf%@*HC5lmjPBl9IueH#IGpq)&qZNWN3Laq5hUIPC+^ z)A}6IaV*F_v9G7fyPkR6&{4Q=qGC^hQsD^w5U0Avl3lb47@~!O5y5wDM5&}bLeVSFxlXzO16soZ$rl6KNTWR(Y+ zb<1YQE(ZEQ8GF+p*`sjItt+-Uli&Rk0v(puWDNngsE8@Ow8uYTIwM;YJEzoz}OVRT2Ye zcv-TG-{G~TtbvV_G3~`?3U3Q0{Wey+VieX5b4nr@up0^QsMUmw2JX!p0XMc!+ZmTL z5qd^JIKn8FAkZ1Hyr%1aR|=f-Iq$=+N#5{!GbFCA^Vh*t>qlx1h?dDIC)0G={e40( zg}tE)65Cjf25Kp)q+1Khd8;0tJ0+oiztjWKdM_o%TW1M=`Fi zPuYAPc(dF#ye!9_+f?sZ&6|WIn0k{k9ZQl9HX~qdi^%j`&0cMLxRKf9PR z#hVvQH!CZlikkUUQ`F$+J? zDVu3pzG+SGpx<5=79sLj6auwr4pfjn4vX#9e;gF{@CfgtrUcmy68?1>FMh-L`q}xu zUTSAq2J4TcJ22e#D@icQU|=;wf_il$xj(3FK+U;#TFUurSAOFFZ8P9g7)(QH zx`b?pqnEiEo&F}!IP%+vu}2KC7)evtOXnMI^^p%wPA(p*@N!@-cyU??MZRc9ifZ;9 z*7f(48ZX4er^pFmZZ5zP%7~p7R~*LvypVh)wwXEU{V+4pl*fuPX)`~O=RV>7_|H-v z8GuXKW_MEa&qQV6q9u&7jZBh-fL0q25+0byr)rRgtlvgR_qQn{-Dr=Rxog;omtpqF z2JuY?f}}=LZ^F{I-^&LL@>di%E%jdY61OW_I%Ww`9;(*xdbWocj8U1G_t7|`S0?%q zp&e?Um^pi?$tzK~7#|>J2w!5aniR{mVGi9)t|kP(7YXBV#Q<9sL|2x%Jok40ann4@ z>N@N!q2q1j#L_v!t}g&xbPb)ITRrS|7HBuV%!EW2jy*jmP=H2|%a!1A$zk4CvQ411 z4w5`Xk$p4;q}3;hJQ<7Qwt}$gwlbu<6mV#7p@Z>v5x!z+`d`D&{l(GNEgcD2+Aht5 z8ee%K@*ywtbEpq;FigVT^J>C7w4kz9sn=oYuGZ)I)x+iYv!clv5Nz&nq08lgJ@Z+B z>33Vbpz=`aJKU;{m<&a(q!p-f8cSuL)MDsA?UKVz@8mqs7d%1Ov79tH+vPva<&_|! zsX(e#y{@;Q?>&Oh=p;I-l5~%+yVX^jq(NxEPT9#Q=w~`WQp7n7iC`=uyba@B6&Q07 zgX`I5$?yyPaiQh!EtXMO?B{=zV}8~6E}YRQ+W=#(X}HYMf4ST#g%HDjz1{zollv@j z%YkP5x&x*NocLcia=X({dK7 z;L@i0p2L{~y|y&EdltRyTK^)rLy9}`7CdkML}S%-3}b8(u;c&ed5Do+7QV2zj8KDB zC;Rk&)4M3a;N#D%MxU4D~3GD|gBAa0c!4FaF{=&)al>ZeXqOXWLW* zA?OWFHNkj9$7O)IaY!te`mz)az8DyDv&QoX(Uy7*#h;g)u`1`A`k5dv`!Q6E*sYs#yn`I1wR`Oy{+mlYJ`k zL7W=i6y0C_PEOUx_LTZlUK)lkgrG0>mY^w!zMVeh-2LY=y>Y%;Q5hxS z;`{QbSfEFR^DAk>!d2Q*>Nv%CyPX0dHdXK_%biQ82c;4Zb1>r47e9krVR8LfY(5HrkAxtPW7sgLMJhC#u=SVxzUB)XrnDqwN`N`iQFB7$m( zJhk~D{4kfqE?^x<9J9eR+EZw5L^`O87s7>pjtB4d8G5WBBIZV;k6jWH72L zj9xG5?l}6-u}+{s!+0AcMTT6zKn!#_K<_Y*+@~Kbkec>aegi$&(iILc4km|-*U4{{ zwY3%ppWhBo@$`IIA56FE*pdNMFyh31*8043@PtT26H#9Aw8Vou>P%zJexI;;i@fiJBzZ zoii*LaKLH1@O{xBRF3HKa;g8m-KVd6`&Wd>*VvPaImtn)HRuoiso>bImwm~x8pWFx z*$xi;4!FXaZk6u%(oJ&LZSB(7MD5yE%mD@BsT%U~TGDQ=U#|Zo2E~WdB~V?4%QZKJ zLcWF)IM%b2B(SQ#*Jt-5|65>DlZ_%M*w?;V(^*FC6SRaLfq&u7qF5VW(HNXG>*%qp zz6G;yN;*PY-2FuBpHhTDu{~(N?KK{E{jhLE@vx4$j1dw=O5QZSwRJ~$XrEzeR^Z&` z3n=gINNSVahSoj3Nfw3ij;uXa3?AYOu6+*T;@j5n{;hf7G};b9T^`rh2WFI5{a}hc zUkSWaZU0H+N48F>5%{<2OE??KSp;rZ>(l@4&jG*Y_YHn80;VPb>pK{dqf~$-;m>8I zz?v^Uf$mk|{#VTYyJI*~zgymKx_+zg>)@vfeO#r#kUyWALTj_19is;;vxiIK=oCud z5w~d9_T#~SkfZ<#e3Bqg4h@y`_S#r4S&5yP8_N$ER9_xtBxq}LJGW`I-g#2EEycsv zJmI~CMVAGf7E+S+GyVBj^+=pEBzgsx8P)G29MdpzUUtiJPq+&cH(d(0o@z4J?smXD8d#uy-ghF)J_n~EU3IwF$<&9mbN>P-m~8^S)l zJ=-O%aLIK|f=eYNkt?5rT#BK5`iBd3-08fAd?Z!dc&K@VXn)3VF11iEg>`$( z`y5?Jj>6&Pfw74jdfsyg6Y1o+ljX;4%jaW9w5sWLN5V78J!I_HPiYCr=KZq|4& z`!==zjTZXDPi_H4^nB2X?IcFVg9f)LLE*YgbXJ_2t$(yOV&zoy-sez351KxUcSy11 z8j4iFGIpT)X^1tbhZR^U(|L!L}93IDGBx?e~BZjpC2KA}lP%DAt_!E9K>!4{^fHzM4FomoWub^YP=kUc%*KPLVkYOw#- zW5v+<#$NsU7@!}CYT_l10YS1XX-6&mw_@vWf(ejh{ijsxudw9@s9wz%9{#4X|9=TK zEDkvW(*IxDN~Vl@fN&0I(ckFOH^=_s_Ww198Cu3{D}x`a{nfex;y?^ND@}CyfI7Vl6p9UtH_EN`Tb3tPU>M2yI3?^~#L``eJ|8q9+&r zy$V1T7ESt=qf;;FLal$JoJaCx4m#kLw`D6FLBOoxRK>Oa^HyCoV=K<%EO@O1hpFH} zuXn;WI-`@<%9AVC8{q2=`BK-Gw$|IW{62Bhp2zygs3bi;m0hdx1X^=c&gL;&c>-=< zx!hm+SKA!;GEz;vyY|iUU9;^G#{mUugB+GMcM>*tvW_x3r@z?*D_;+-WkJ3X&yD!^ z;%;bU4(TkV+s=f2?vItPi53&s0goKy_NP7$Xv15{js{5CPv*Jq(ZCSaPv;%K?mAz9 zxJe)LA9Y-rdRTbI=Fxn~6cY3ihhR@Al5IXMXKytdQ_rquAu5&?4IMs}Y&+`?*uBEg zgUe|UM1j_QZ@);(E#`~96(+o4%{1{21QDT>5E~B^UVwJGC3GE~yk_RLp}#c&tFD@F zRg3*_DavO~WaLa0M9DIl94YiW`heIs){B#t8_cI9e82Y3&~mAD6I#BL&hwhaAgZdf zkX(8BvhSLc`}}@Lu2-t?fiK2APOv7jrnqTu(IdC`o?7wXZwf9hp!PSXX%jI6O69x$ z3YFQ5yGQhtlSJMj205L1a?EKagE&M?vI;1^Z|l{x&Or{vj1_L|4{ljN0}ZJ|df%On zMfA$8vn01zqX@}0W3TEEEHCl&y`Ey9JMZXN|GNs2j4o#WMt{~2ksE!0qu;s`i)eFu zA*SbXr_^RJv}qx_UhtzC@oX3~Q=z;i1pkmBpem0LVJ9UKfGf)umX#%Ew#BIcbjH5y zZz4>#FIUTWF7)qb+7jS6^KZRn0xC9tjQDekH#Q4{v^=? zGNm6*F8dvQ(JUg8Aofgpb~%3Yqu$)_T8ojf$uEH(NbW^q3cq*G@*+6Ht3)TXyhNk) zjM#R%>j%Lu&-|VWu$>)5vnVx(kYbN+R?E9-AF>>wpg%DVv5b(IjH;>FRjM@oVi9|!>*#lRtFnpi$*OfY?oz6gwz4#;t(jx(J=2QMv2|DM_ z13%5`#TFaBH5FaE5WRVK1tyA+j`q>@KX@t|WiQ^P+UMh=Ru#3_b(g4f2lH8ms4P7d zhEnk`@eja{D?e@aP||BYfzF;+e(x$*zeGJnX`!|*rIsKu-F=t1cFu((T4Eh;3XrZH ze*w>JmfhS&{(AUtx~HbaWSPbEN`NIC!&^!Xb((pW|MDlXC-zLF|&9zcL^; z?_UOKS1V_G10n&zd9tWhwNs0O^&u!3GPQ0}L${7^wRm)DNt^kgmB<6a zP*1Weno^nwjk2wlqW8Ux$W%!OlFSR>`jYt&etHktrDtg6Q2O_b*GK#6nVN%&9s{Tx zIW^5n-oF>{w-zv`e3o1}dh#eoczi+%r|*nQeHtbjV&Bxyty6iN8!0Jv3wGGLHQWlm zi)Ytk&~_dJHx-C|2vcY-~S(cPqIQV$GHCn4*wh$pc7?9$fhFSFZ@&{+`yKf zG@9dOfI|VA~4t|eu=UZf!)b1_B7cS0Bh6MX=!!XW`Pe6Rn%!XoTtCznK zBuGG~{#oe_omUTA76htaPRPBwu)wf=4G=kD68u^=B-SH5V?43$^B4vS) z;rg}Se!deqcip}mX~9WCcYq{R&q+f6OC$bk_9|kZV)iQz6H^V9C8xr!ce2`n&aemE zrm|wjVw$hhc1bvq!`Ht6Zr}Z@8JZ&=hhU*Mw4_J2+J6bB8rozd*)3K#Rv%gvNWFL~ zd<6zJyJcS^ele;wLQ_hzJAbizNIgh_$g&@sqE;<$Q=*15h(Q4er)LYNem(aRPve<2E!teKUtEl;fc2r+!%uphA`Z$R-I5#apZ^ukKZE{5D+<-LG zDtMF9Fdpg>_Xd?b;~2+yoUH0Ge<8+;)1G^P22m76ri+Qof~#J^G4r6|GYp`Rktb@s zf48-CW@8+}G$qZpBF|9!Pp;ql-27jo`?yX=NO2oB^XRK^r$Q!y#&z;@iRFo_uhyoG zJ62ZVjbfYi3;hB=z6U|t8V}Apfm-@|rYiu7VsTz^%jS&qR$-%vgZj2C49GXENu`6+ zs`bTF#phrbb^3+f+VLHEdooB85T(|Mo+FbE*6P0NO)tIyKA%Vh@t5b}{NDxTC09U- z3Wh56CaZGm|2WbYVCFyBrwjW-drm*uvQj}((#Z>R51&=IRBRlS4%M_ZbOOh%B5^S7ClhTe`o!Hh%_G2%0sYm=e6KE*cgyD&sd~fn(8D_n~ys0 z#9G(1>2*s!$%(BCPUoEl8$dYmG7G`Fm#g`mcdYBN1vIDF$D#z=Qnx03lb;Eu@r{k# zqiBm021jw3KWrZ4UwHuOnxUL}*@FejlZ{j;cGnQ26iYRFbmZRE>#YGIm@rx#01&m@H|m|iLgK&a-Lh{>v^c_pMSUwBca zl&hb9SLG$`^i-D&HQ71XpgHu0g>S>k&CZp~=DMd7VE!q}SXJttbRcn#*Iz{f(P?s% zYuw4nKW}a6iOSq03})GAW$clv1x_&&6ZE?Ut%q4p^*Vno+)<3Rq0>u@s5yRCv%ao+ z#@xkv8Yw}kKQY>>T3w^(qQ7mvM!zAjj@TeF6fTpT>Ks~mFJ5(4>Jea1lyE@a{#^eL zHUZ6Vrz=V2GJ@%Cxg<^ZoCT!pb!J9gv76cOWJR%BKX&I-L`dqcqcseRjA(k!o4)s8(Wvgt%9Z{jFuh=UDN%;V z*z_pyu+|=a*h^Y1c$DSS0gtkMnb$U`eTd6jUW4?HV(Lr+)&PpA-ffqEyZt%_Yzy#y z2SAMusNkjwUy5>sSX3LKo;xH0-1J0I=gce6T!2|T1-xY@M9L9%Z}1GY!8}1-xir*A z8Eri_cJuij@#jF5<7-ce;stEGrW!*XcRBcck2}F`>ePJoe(f;>(P-{i>_H(QvR$Hq zM2&f6{!F$JKqcfCC0=LdS>m@NS^KE^s@3i)?q*;y%_UWOQf+oER0KH4HGy_+eQWb_ z70&qu_S+h4pETW3Ppu0DpB8^|ytZJW69J~|;qnx^ttZ^&SC;o}#}^KA&&0aTSe(zz zkfVr!%c99o_XdSVhTw~IY%|FZ6$f`Nq3&eg!FnBae3X(&KRl*>XMur~m4+DdVrsW8 zA$wFALX4DtE-E|wwsW6*T(-*vQtqw}l6J`qHrLUJvhPjqZ;wqY-&f>Yb_bVVbW+)% zx>+M9sn6UL1xzF?k;r3q)!EcIEVAp2d5TPo6CzT z_oS0r7-5c6Li#U|{k!@y3+(&*R3s*Z!WXc80T70j8BmXMsRh+LO@g$%Ezy=n>~N#> zNjwXMu#In$@=7e?;l~cZ17bbYB`uw*ObL#Sm0->39-L_%y&14X&$LHB^kGQ^%OC?0 zvk7`VkCSV>=aTl0xasbCzF#nGvG?+S`%rXv-12!5vwnd{+Sxa8wdU3t;O=SED4 ztLtN81Bg87npLD-GB)Y{#VVBM&$~u+W2*%Yi1I1f26Y4z29Z=9=nW zwf-3)*jQQW9DLHR!}hGRoytLbeg|Trx$NC4`(Qi0-@98~nDSBT5TO)qpqQgbrC$5<5NtCq;qBRDzMZOYH~}*0 zrO%=%@qt9SIQGuwdx@Ye{$ew(55*MPym)D4T7J236ZSZ3T*Rev1AqmtbI;$NpbG8a z21TDknf3D&ueR#%SqAy(t~J)d7Z78{*ZZ zEMOcTH4=dPyc_Ts8pz?hjtBUWd;nM5vIVq6P+*xpFNG~7FhQ)!cD(u{MSn7PlHP3H z#jsZg#Auj65VKKL(Y+yJ?snw-gqGucFGZwqws|IrKCsyj*Lz?I|+>AK*)kKfL0a&1!d>cz)K2+}MSMC&C7n5j9 z_Ti@0KLJpie(3WhKxTm<^XI!{uynhP-uLrzAC?aS4hQ8_$#2C zZ$(i`iE1`)ttflt%@A(YZZ$elxgQV``HGA=^BOc26nM}YOoTX`t>NTof&$EFlx47+vfa@zR>Z?lJwy&uU zl^NnTQJmSL*hG7c@Y#M&)3Wc7z`P)m-)y~x#N8Hi9L5aQf5V|@{UVpQ17Xnt2cYPH zZAP?_M_}QzcfMh5S@mggV6YCgTU7ujc70aUn*&oJ&K7fZ&J2dvi&r2i>Oo>FgT@;$ zoVKBx+9V;R>QS~0@HW6I3$({;CF9?Wdpr?SgN9r@d+*k%nHmFQhkZ<^q1#I|ha^SJ zN4Pmc{FN8xX7z8&*t%r;knb;#Mq^CHl>M#Pu1T_7BMnq+^U4HIA&(>k(8OUi@`rlU zTi)iQ6_)4@ubd(*jjW%!n&lp+$YaL-Ri~%= z?l#etx-gNia+h~$kJa?Rw9S56h zhwyk;{xe{~V398@%xh45_Xan1^kj4K0q5CwW)6fr_p2wKlhkBS7EgtNB}F^-B8Y9r zP^6l zz?Uj>6GAS!3O`kBOUUw0I_Vd&uo3Ii4H8<-{CALO&S({S(CTF6`bYYnN zIV>$`q!Ili)gZ&dA`{}GPJ#m9#yj2zkud1rsJF0 zabZu*6^9{_-K@sfQLl{m;meZ|(sC4I(2nI1&$p%5_F+=5DEX9FKEIQ_X9cTOl|Sg5 zq{L`16Rr20p_xnH@~{-u@R1PK7}q4mP|tX=R^4SP((r@3h+g3xXSz4I`~CGO4iE~y zyPS5(RRSoP_kIO1;b_<1;C>zy{qeJT0|;WYt*~kP&=6$+6wcqwWuR9tZU}lRMXl-%Bp2pkW5tXFK;jiwB5sUxI zR|1FupTj5QOV(CjO?5)2$zuK|mL4ogOYMFV+uf!A7BQSKm-J#1yj^TONxlqKH65)h zSLfOz-$t%+SdL3gb{f)L3EEr;gNMu0-w+RdnWWsh>*H7eZ7H15JpFV7T`%b;S$_kO z%J)k0NEqPxfaKle^^6oP5I>cRR|Rg>BO1zAhB>N4^(j<>i8Iv$3WVB`K6iQlasgc* zHC{pcr*Z$9Z?yNZ;^rO*V1XMz-v08QucqzNC1qjZ8`JQ|N)C-GUA*TAgN@>Qo`!{V zT@fL+4RuBiI-iP1NWZZzd3f2?&*#OLui(2)0A9`&enP|oY${4e^YxR(u~YKG)&{e} zx|tCwlGji+{_;n_D=|*gp7^ozPM}`liJ7Q^bL~<}t<2qDJ7jD1rPF*pQfo(%i#f>0 z^BmhtLDVj*wftJS%7}JZiYBqF^Lt%L6-T6uHbGd(gb0`XAxu z?NFk9cfCk_Q3otl)gSU5f}Dt7M-M$PV|(71`d z^<4LV1Cj$GU4Z-qpc@#!MknRzYJI-lKN)v2>2_mW==Rg%`t-Q#xpm9kDGXJjhejx; zg1%OS!ZYC(5qsw<7xW4N!9Z7FGx3t?&lPjoew9EyGrv>-5CZ}bZz;eW;`5aFlsZR+ zpsRF&%#&3Y{O&fdmEh$wx6M}lD&I*pHVgMxTh5E;?A3ujODpGRsZQaq2vR!xwieUK z_{U47N-55SnANQlpfAn&UozE=@d(M8LEP!;L=63laSQ*I9-B8Ab!7B0?yKbujdSD=&PKzw=|a}BRDYq zz_zh6sF0gHUn4;RxpNzWvGHBnEHg22gizfaLl@fU=PpZ;BA$&L{lcC&+hJ0`>{@X( z*X3Cce2}6uFu#0E;))rqvTviqOe@H&LnO&@Ga^gZDv$QSv4P1-gJwqDcRWvzJTS)1 znwP{<;@(ap^k|C;%`p)ouv81T)zqGL$4AjQx|q+VaomvFBaGO9?Eelx02qJ+-6Bi8 zGcyGRpr+7^`L=8j0WkLx{!e!b(uco?5Nyjga0xPfqMD%e1IV6(D>=pSqIi_*Auv~_ z0-J@HNaLn>CR7292JH<6(4)%oZfn^m$Pze zHSv$T+sE^^qb%n7L;1UN#mjvb=|r`evKsjlgz0@!o?Wi-4wPLfo~l|@+q_Ia#R*qK zqGPTLAK%8ckW>)gB7`UL>;=*T8`V5J95ZnJwLu$OUD@J3a|Y{NmO%(|6vMB?4p^C2 zWF4vbUK#hkvCJ{y3dA|SxDAnAVkXUgUvXDf9}@rF*Q1DmgI~Qp6djDc{Y+7>webB- zsB?f2gSXdx1wgkvoyF8;Dw-+J3gsPP&!(p9`Wda|s`INTvTVB+WmUPjzPN~Ey- zgh2NOz7`-01?20kYl}vX+&X((^qt^BK~RF7%UOm&(<#GBaBqBYPB&tB(TI7&5v4-T zJ>PDWH}YC+396c@)_~mIFyos_iJtdFPT0RO4fV*9`T9AZZ@*J7w9P1%()64@W}{H2 zuIkn@>AvE2{u!>={kKIGn3it3bmD>usJWdT@S!ygquE?ZqP}Ph_p*1|z;_y`7&(gM zU+8qcQ+f#R8ui7tce29x**?)G7?%-10g&f`8?KJU$p7l4EQ>+T_(NW$T<#881?@yU z1tgwDk*AN5QTGop$wH;)tIs{|OSp@6ml!z1y7IhTl`lbvdwMpbSk|WG%9nVYBQh=; z#C&Pe`Wiq|?@C~pnPo#+ly=hus9f0J%;3W9j52Xq)$4yRn>wvo(HIReilA9+xh-H{*8H8($Vu3k8j{#Nb%wORS?+vUyCFHfCw06a|AlMg|7$L16;ML z)`(O4(f2kod_V8+^>hPJzT0H*H06uKz`FN1SPL41MWzo~AMC3N!H-HL4T&&R?Zk0>lxB);lVf@eeHNFV6u>)joP z(8W(U^b2~9%x&`eTaeSj-vB>c$tD0RluIm85x&~%qFX#*)T`H&{~RCP5vVHUw0isT zx1;cWB2Ls#gZ`MjhQO7u9@ReoT~RhXKRuKcrf}}Nt~E$dIVgu_E%I`v zl`9tWIe)^#1p#42N9dRcza1a|ajoh0c^5#z!Q=ji)vK5R`m6m-mcSS3Wy=Zmy zHlDk+UsI{g6%n9;8}6GxXPzF8DgdA0bNq&64TvCNWv&PSQ{hz4A1-sn zMA_h6kvFq*#+&T4WlM(6P20RCV<%9BqZ97D&1!L3q3cy;?dBUT1j)rD$9Qpiv6C|C z7-lhtyx|X$FD!PYsEnc0UF}RgGg3Q-ozTdQ59waCl?F1LmbgI-Qe}9^i=LDf! z^$bK%^g$1UaaT9P2Ys>gUOW~Fmq~btmTD?hMn5YT?ZQUDW#pk!go!yl@CAlMojBcKG1&a zEvx#edANx#bqvu)C{B@6+gl2$MOCpCY70V|u)8o}vI2ZPoBrw<{)e)K|A+DYSHdQL zZ)N`T(EpN?)W2<}*ZYLTATmBm(QzpX>mK{7qsHG~vaa_Y#I1|@hYsgIOX2_bQuvza zvRm1s@p<<64tpDj{Eq}&ksJfZa1*np>hBOD5QFj6|BD`J}jt}I|SX*a@(HIxF?5%G; zd;8{RSwv|L*UnHMHVfjME_@V>Un7BD5Q&S!?q1`4l{Nr92>3olk&mK;MAkqruKGCe ziS4>8KW^XB^JvX-W#lYu0kluJu8 zy#f1$pSe9@yiNhFb_9_Uuj{6KYcnDkvAK*)@-IN@dgX%s)k*?q}n0Is%nG- z9Z=>Qy5QxjRnR+qDjAolJz4Q@a^J!3x8!;3^uw;!jv_64KX#2{9Q#}V2gInHQT%Fj z1$#VN89}E+w|d!#U#SOqyJ?~rL}iWa?&;98_*EmNhpUt%-KhcXZJ5JredWBv>7{Q$ znme;OX!@%yV&=bdZ?z2Nj8#<6H%5m(#?CkBJr7r8OfpucM~U#Niq#Ldn%Vgnv8vU1 zqb9bu?JICC=qNCLTUf!zupDTv=7MID-B8QXN(g#=<6I9)j5RVEAr=|R&u@cYsGm5x z>30}gubmz3Gm(Pm)kIG;8f3p10P3a|N3A%e`1{MLq3&e-iv~m!IJZ030Gu}NaW)Cm z*i;FMY{D8rPA;m7dC z2sgV49gpcNuch2wc7)$M(?z|BP* zpLxi1H}o5}?B(vw@l@VEqd}@^ArJCAu8E1IpgD6KmL7e%k#pFe#+h%JP3QjSL6fiE zD+h;Onsa~2S(?XlLi5gY8aWe@12FE&Q!V&vsj*8rjG@`9oB{|@%+qGcfVK2(bh**i`mqmZ2@PE}7S#=x4=WCn_Zo4_YW_U?7VE7p zy`TPjt^XIE1#m%Y9%npk%{Bm5YS+hhn#n(XywG>EeC(Pt@;zt);340CY+jSX;cy$d z22IkSoS}JieNC#{GUprz$=|>!JU7ZQiC_GuM(Gj~^{zp059BtDlQ zrAd&eD#OMpalo5%l9+!#S_eKERD3F^STR(AlyL%>4Ur&~`}bV=8?2pxpO&K-`IlRf x!EJ;iJK>g~hXuy*z|X=XL`Z`da+$NCcpsgbVg3}z*YWXbUC{eESN-~f{{r=|E7<@5 From 45646d025394de258f0492374be7e1453db6d42c Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 16 Dec 2020 12:58:28 -0500 Subject: [PATCH 07/21] More --- docs/docs/manual/reconciling.md | 87 +++++++++++--------- docs/static/img/reconcile-with-property.png | Bin 0 -> 23650 bytes 2 files changed, 47 insertions(+), 40 deletions(-) create mode 100644 docs/static/img/reconcile-with-property.png diff --git a/docs/docs/manual/reconciling.md b/docs/docs/manual/reconciling.md index 3e9974521..40cf147af 100644 --- a/docs/docs/manual/reconciling.md +++ b/docs/docs/manual/reconciling.md @@ -6,23 +6,30 @@ sidebar_label: Reconciling ## Overview -Reconciliation is the process of matching your dataset with that of an external source. Datasets are produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, and interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). +Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison are produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, and interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). To reconcile your OpenRefine project against an external dataset, that dataset must offer a web service that conforms to the [Reconciliation Service API standards](https://reconciliation-api.github.io/specs/0.1/). -You may wish to reconcile in order to fix spelling or variations in proper names, to clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH), to link your data to an existing set, to add it to an open and editable system such as [Wikidata](https://www.wikidata.org), or to see whether entities in your project appear in some specific list or not, such as the [Panama Papers](https://aleph.occrp.org/datasets/734). +You may wish to reconcile in order to: +* fix spelling or variations in proper names +* to clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH) +* to link your data to an existing dataset +* to add it to an open and editable system such as [Wikidata](https://www.wikidata.org) +* or to see whether entities in your project appear in some specific list, such as the [Panama Papers](https://aleph.occrp.org/datasets/734). -Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to ensure the process is successful. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to clean and cluster your data before reconciliaton. +Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to ensure the process is successful. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton. We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data. ## Sources -There is a [current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/) that includes instructions for adding new services via Wikidata editing. OpenRefine maintains a [further list of sources on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources), which can be edited by anyone. This list includes ways that you can reconcile against a [local dataset](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). +We recommend starting with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add. -Other services may exist that are not yet listed in these two places: for example, the [310 datasets hosted by the Organized Crime and Corruption Reporting Project (OCCRP)](https://aleph.occrp.org/datasets/) each have their own reconciliation URL, or you can reconcile against their entire database with the URL listed [here](https://reconciliation-api.github.io/testbench/). For another example, you can reconcile against the entire Virtual International Authority File (VIAF) dataset, or [only the contributions from certain institutions](http://refine.codefork.com/). Search online to see if the authority you wish to reconcile against has an available service, or whether you can download a copy to reconcile against locally. +OpenRefine maintains a [further list of sources on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources), which can be edited by anyone. This list includes ways that you can reconcile against a [local dataset](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). -OpenRefine offers Wikidata reconciliation by default - see the [Wikidata](wikidata) page for more information particular to that service. +Other services may exist that are not yet listed in these two places: for example, the [310 datasets hosted by the Organized Crime and Corruption Reporting Project (OCCRP)](https://aleph.occrp.org/datasets/) each have their own reconciliation URL, or you can reconcile against their entire database with the URL [shared on the reconciliation API list](https://reconciliation-api.github.io/testbench/). For another example, you can reconcile against the entire Virtual International Authority File (VIAF) dataset, or [only the contributions from certain institutions](http://refine.codefork.com/). Search online to see if the authority you wish to reconcile against has an available service, or whether you can download a copy to reconcile against locally. + +OpenRefine includes Wikidata reconciliation in the installation package - see the [Wikidata](wikidata) page for more information particular to that service. :::info OpenRefine extensions can add reconciliation services, and can also add enhanced reconciliation capacities. Check the list of extensions on the [Downloads page](https://openrefine.org/download.html) for more information. @@ -32,27 +39,27 @@ Each source will have its own documentation on how it provides reconciliation. R ## Getting started -Select “Reconcile” → “Start reconciling” on a column. If you want to reconcile only some cells in that column, first use filters and facets to isolate them. +Select ReconcileStart reconciling on a column. If you want to reconcile only some cells in that column, first use filters and facets to isolate them. -In the reconciliation window, you will see Wikidata offered as a default service. To add another service, click “Add Standard Service…” and paste in the URL of a [service](#sources). You should see the name of the service appear in the list of Services if the URL is correct. +In the reconciliation window, you will see Wikidata offered as a default service. To add another service, click Add Standard Service... and paste in the URL of a [service](#sources). You should see the name of the service appear in the list of Services if the URL is correct. ![The reconciliation window.](/img/reconcilewindow.png) Once you select a service, the service may sample your selected column and identify some [suggested categories (“types”)](#reconciling-by-type) to reconcile against. Other services will suggest their available types without sampling, and some services have no types. -For example, if you had a list of artists represented in a gallery collection, you could reconcile their names against the Getty Research Institute’s [Union List of Artist Names (ULAN)](https://www.getty.edu/research/tools/vocabularies/ulan/). The same Getty reconciliation URL will offer you ULAN, AAT (Art and Architecture Thesaurus), and TGN (Thesaurus of Geographic Names). +For example, if you had a list of artists represented in a gallery collection, you could reconcile their names against the Getty Research Institute’s [Union List of Artist Names (ULAN)](https://www.getty.edu/research/tools/vocabularies/ulan/). The same [Getty reconciliation URL](https://services.getty.edu/vocab/reconcile/) will offer you ULAN, AAT (Art and Architecture Thesaurus), and TGN (Thesaurus of Geographic Names). ![The reconciliation window with types.](/img/reconcilewindow2.png) -Refer to the documentation specific to the reconciliation service (from the testbench, for example) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available. +Refer to the documentation specific to the reconciliation service (frequently linked on [this page](https://reconciliation-api.github.io/testbench/)) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available. -Reconciliation can be a time-consuming process, especially with large datasets. We suggest starting with a small test batch. There is no throttle (delay between requests) to set for the reconciliation process. The amount of time will vary for each service, and based on the options you select during the process. +Reconciliation can be a time-consuming process, especially with large datasets. We suggest starting with a small test batch. There is no throttle (delay between requests) to set for the reconciliation process. The amount of time will vary for each service, and vary based on the options you select during the process. When the process is done, you will see the reconciliation data in the cells. If the cell was successfully matched, it displays a single dark blue link. In this case, the reconciliation is confident that the match is correct, and you should not have to check it manually. -If there is no clear match, a few candidates are displayed, together with their reconciliation score, with light blue links. You will need to select the correct one. +If there is no clear match, one or more candidates are displayed, together with their reconciliation score, with light blue links. You will need to select the correct one. -For each matching decision you make, you have two options: match this cell only ![button to perform a single match](https://openrefine-wikidata.toolforge.org/static/screenshot_single_match.png), or also use the same identifier for all other cells containing the same original string ![button to perform a multiple match](https://openrefine-wikidata.toolforge.org/static/screenshot_bulk_match.png). +For each matching decision you make, you have two options: match this cell only (one checkmark), or also use the same identifier for all other cells containing the same original string (two checkmarks). For services that offer the [“preview entities” feature](https://reconciliation-api.github.io/testbench/), you can hover your mouse over the suggestions to see more information about the candidates or matches. Each participating service (and each type) will deliver different structured data that may help you compare the candidates. @@ -62,19 +69,19 @@ For example, the Getty ULAN shows an artist’s discipline, nationality, and bir Hovering over the suggestion will also offer the two matching options as buttons. -For matched values (those appearing as dark blue links), the underlying cell value has not been altered - the cell is storing both the original string and the matched entity link at the same time. If you were to copy your column to a new column at this point, for example, the reconcilation data would not transfer - only the original strings. +For matched values (those appearing as dark blue links), the underlying cell value has not been altered - the cell is storing both the original string and the matched entity link at the same time. If you were to copy your column to a new column at this point using `value`, for example, the reconcilation data would not transfer - only the original strings. You can learn more about how OpenRefine stores different pieces of information in each cell in [the Variables section specific to reconciliation data](expressions#reconciliation). For each cell, you can manually “Create new item,” which will take the cell’s current value and apply it as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is like a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature. ### Reconciliation facets -Under “Reconcile” → “Facets” you can see a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets for you when you reconcile a column. +Under ReconcileFacets you can see a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets for you when you reconcile a column. -One is a numeric facet for “best candidate's score,” the range of reconciliation scores of only the best candidate of each cell. Each service calculates scores differently and has a different range, but higher scores always mean better matches. You can facet for higher scores in the numeric facet, and then approve them all in bulk, by using “Reconcile” → “Actions” → “Match each cell to its best candidate.” +One is a numeric facet for best candidate's score, the range of reconciliation scores of only the best candidate of each cell. Each service calculates scores differently and has a different range, but higher scores always mean better matches. You can facet for higher scores in the numeric facet, and then approve them all in bulk, by using ReconcileActionsMatch each cell to its best candidate. There is also a “judgment” facet created, which lets you filter for the cells that haven't been matched (pick “none” in the facet). As you process each cell, its judgment changes from “none” to “matched” and it disappears from the view. -You can add other reconciliation facets by selecting “Reconcile” → “Facets” on your column. You can facet by: +You can add other facets by selecting ReconcileFacets on your reconciled column. You can facet by: * your judgments (“matched,” or “none” for unreconciled cells, or “new” for entities you've created) * the action you’ve performed on that cell (chosen a “single” match, or set a "mass" match, or no action, as “unknown”) @@ -82,8 +89,8 @@ You can add other reconciliation facets by selecting “Reconcile” → “Face You can facet only the best candidates for each cell, based on: * the score (calculated based on each service's own methods) -* the edit distance (using the [Levenshtein distance](https://www.wikiwand.com/en/Levenshtein_distance), a number based on how many single-character edits would be required to get your original value to the candidate value, with a larger value being a greater difference) -* the word similarity (not including [stop words](https://en.wikipedia.org/wiki/Stop_word), a percentage based on how many words in the original value match words in the candidate. For example, the value "Maria Luisa Zuloaga de Tovar" matched to the candidate "Palacios, Luisa Zuloaga de" results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0.) +* the edit distance (using the [Levenshtein distance](cellediting#nearest-neighbor), a number based on how many single-character edits would be required to get your original value to the candidate value, with a larger value being a greater difference) +* the word similarity (a percentage based on how many words, excluding [stop words](https://en.wikipedia.org/wiki/Stop_word), in the original value match words in the candidate. For example, the value "Maria Luisa Zuloaga de Tovar" matched to the candidate "Palacios, Luisa Zuloaga de" results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0). You can also look at each best candidate’s: * type (the ones you have selected in successive reconciliation attempts, or other types returned by the service based on the cell values) @@ -94,24 +101,24 @@ These facets are useful for doing successive reconciliation attempts, against di ### Reconciliation actions -You can use the “Reconcile” → “Actions” menu options to perform bulk changes, which will apply only to your current set of rows or records: +You can use the ReconcileActions menu options to perform bulk changes (which will apply only to your currently viewed set of rows or records): * Match each cell to its best candidate (by highest score) * Create a new item for each cell (discard any suggested matches) * Create one new item for similar cells (a new entity will be created for each unique string) -* Match all filtered cells to... (a specific item from the chosen service, via a search box. For [services with the “suggest entities” property](https://reconciliation-api.github.io/testbench/).) +* Match all filtered cells to... (a specific item from the chosen service, via a search box. For services with the [“suggest entities” property](https://reconciliation-api.github.io/testbench/)) * Discard all reconciliation judgments (reverts back to multiple candidates per cell, including cells that may have been auto-matched in the original reconciliation process) * Clear reconciliation data, reverting all cells back to their original values. -The other options available under “Reconcile” are: -* Copy reconciliation data... (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and/or new cells will copy over. Unmatched values will not change.) -* [Use values as identifiers](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches). +The other options available under Reconcile are: +* Copy reconciliation data... (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and/or new cells will copy over - unmatched values will not change) +* [Use values as identifiers](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches) * [Add entity identifiers column](#add-entity-identifiers-column). ## Reconciling with unique identifiers Reconciliation services use unique identifiers for their entities. For example, the 14th Dalai Lama has the VIAF ID [38242123](https://viaf.org/viaf/38242123/) and the Wikidata ID [Q17293](https://www.wikidata.org/wiki/Q37349). You can supply these identifiers directly to your chosen reconciliation service in order to pull more data, but these strings will not be “reconciled” against the external dataset. -Select the column with unique identifiers and apply the operation “Reconcile” → “Use values as identifiers.” This will bring up the list of reconciliation services you have already added (to add a new service, open the “Start reconciling…” window first). If you use this operation on a column of IDs, you will not have access to the usual reconciliation settings. +Select the column with unique identifiers and apply the operation ReconcileUse values as identifiers. This will bring up the list of reconciliation services you have already added (to add a new service, open the Start reconciling... window first). If you use this operation on a column of IDs, you will not have access to the usual reconciliation settings. Matching identifiers does not validate them. All cells will appear as dark blue “confirmed” matches. You should check before this operation that the identifiers in the column exist on the target service. @@ -121,7 +128,7 @@ You may get false positives, which you will need to hover over or click on to id ## Reconciling by type -Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc. +Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc. Reconciling against a type may be faster and more accurate, but may result in fewer matches. Some services have hierarchical types (such as “mammal” as a subtype of “animal”). When you reconcile against a more specific type, unmatched values may fall back to more broad types. Other services will not do this, so you may need to perform successive reconciliation attempts against different types. Refer to the documentation specific to the reconciliation service to learn more. @@ -129,13 +136,13 @@ When you select a service from the list, OpenRefine will load some or all availa ![Reconciling using a type.](/img/reconcile-by-type.png) -In this example, “Person” and “Corporate Name” are potential types offered by VIAF. You can also use the “Reconcile against type:” field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications. +In this example, “Person” and “Corporate Name” are potential types offered by VIAF. You can also use the Reconcile against type: field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications. Types are structured to fit their content: the Wikidata “human” type, for example, can include fields for birth and death dates, nationality, etc. The VIAF “person” type can include nationality and gender. You can use this to [include more properties](#reconciling-with-additional-columns) and find better matches. -If your column doesn’t fit one specific type offered, you can “Reconcile against no particular type.” This may take longer for some services. +If your column doesn’t fit one specific type offered, you can Reconcile against no particular type. This may take longer. -We recommend working in batches and reconciling against different types, moving from specific to broad. You can create a “best candidate’s type” facet to see which types are being represented. Some candidates may return more than one type, depending on the service. +We recommend working in batches and reconciling against different types, moving from specific to broad. You can create a facet for Best candidate’s types facet to see which types are being represented. Some candidates may return more than one type, depending on the service. Types may appear in facets by their unique IDs, rather than by their semantic labels (for example, Q5 for “human” in Wikidata). ## Reconciling with additional columns @@ -143,13 +150,13 @@ Some of your cells may be ambiguous, in the sense that a string can point to mor ![Reconciling sometimes turns up ambiguous matches.](/img/reconcileParis.gif) -Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). The other columns in your project will appear in the reconciliation window, with an “Include?” checkbox available on each. +Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). The other columns in your project will appear in the reconciliation window, with an Include? checkbox available on each. -You can fill in the “As Property” field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms. +You can fill in the As Property field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms. -Some services will not be able to search for the exact name of your desired “As Property” entry, but you can still manually supply the field name. Refer to the service to make sure you enter it correctly. +Some services will not be able to search for the exact name of your desired As Property entry, but you can still manually supply the field name. Refer to the service to make sure you enter it correctly. -![Including a birth-date type.](/img/reconcile-by-type.png) +![Including a birth-date type.](/img/reconcile-with-property.png) ## Fetching more data @@ -157,21 +164,21 @@ One reason to reconcile to some external service is that it allows you to pull d * Add identifiers for your values * Add columns from reconciled values -* Add column by fetching URLs +* Add column by fetching URLs. ### Add entity identifiers column -Once you have selected matches for your cells, you can retrieve the unique identifiers for those cells and create a new column for these, with “Reconcile” → “Add entity identifiers column.” You will be asked to supply a column name. New items and other unmatched cells will generate null values in this column. +Once you have selected matches for your cells, you can retrieve the unique identifiers for those cells and create a new column for these, with ReconcileAdd entity identifiers column. You will be asked to supply a column name. New items and other unmatched cells will generate null values in this column. ### Add columns from reconciled values -If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using “Edit column” → “Add columns from reconciled values....” +If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using Edit columnAdd columns from reconciled values.... For example, if you have a column of chemical elements identified by name, you can fetch categorical information about them such as their atomic number and their element symbol, as the animation shows below: ![A screenshare of elements fetching related information.](/img/reconcileelements.gif) -Once you have pulled reconciliation values and selected one for each cell, selecting “Add column from reconciled values...” will bring up a window to choose which information you’d like to import into a new column. The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with suggested properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc. +Once you have pulled reconciliation values and selected one for each cell, selecting Add column from reconciled values... will bring up a window to choose which information you’d like to import into a new column. The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with suggested properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc. ![A screenshot of available properties from GND.](/img/reconcileGND.png) @@ -179,11 +186,11 @@ If you have left any values unreconciled in your column, you will see “<not ### Add columns by fetching URLs -If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as https://viaf.org/viaf/000000). You can use the “Edit column” → “[Add column by fetching URLs](columnediting#add-column-by-fetching-urls)” operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [GREL expressions](expressions#GREL). +If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as https://viaf.org/viaf/000000). You can use the Edit column[Add column by fetching URLs](columnediting#add-column-by-fetching-urls) operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions). -You will likely not want to pull the entire HTML content of the pages at the ends of these URLs, so look to see whether the service offers a metadata endpoint, such as JSON-formatted data. You can either use a column of IDs, or you can pull the ID from each matched cell during the fetching process. +You may not want to pull the entire HTML content of the pages at the ends of these URLs, so look to see whether the service offers a metadata endpoint, such as JSON-formatted data. You can either use a column of IDs, or you can pull the ID from each matched cell during the fetching process. -For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using “Add column by fetching URLs” and entering the GREL expression `“http://vocab.getty.edu/” + value + “.json”` in the window. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”. +For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using Add column by fetching URLs and entering the GREL expression `“http://vocab.getty.edu/” + value + “.json”` in the window. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”. You can alternatively insert the ID directly from the matched column using a GREL expression like `“http://vocab.getty.edu/” + cell.recon.match.id + “.json”` instead. diff --git a/docs/static/img/reconcile-with-property.png b/docs/static/img/reconcile-with-property.png new file mode 100644 index 0000000000000000000000000000000000000000..e25b81a27042dc550c74d212a9f52342b4b2ea25 GIT binary patch literal 23650 zcmdqJ3sjQn+BeQLwVN3`OhsvFvZo`JhoPf*+>@C#Sw>E&iBd9VrfABj0g5$Kny5IP z)KX!z(#%t&f~aUyA|(k;sgNj^sFaAL1gN0=Z>*Wwd%ycVt?zsP?^>U0wesBq32T%Vn1|4XpVUQb74%AVG5&;9Vl z8HWpfFYkM_I_}L)JNs9r{i?Mu&BgaPo%w68w=44-gVxwRS>VP|FDFl4FTPtEO1xW7 z46=XKHgcCZR5E1m=2aLPr~ z&*R-tj}H&OyjTcdHS=SMf_lt6_5Er};mlJPBX*%awvhp+e0b{qqW@ZBFZZDuVilby ziFzAbqo{IPqUJ5pVS6yfT;miG87Ub}*`u<`+(w`o^}=borM!n)Dv3#OIuX#PDbWXp zREX_W#s>?NPH+tLX|6Gtlc!4Yqhn{rmzCpB3BJYZz5IOBaZ2C76*;>cequR)coOZI znl@E&6iSlcnVIL~lEIy*1WkvpeIS%%F2zc+tUX4@p3E|rBacAwXjc&jfqlD_%fA&* zMTFYO6_p>~>febP(oe#_S!Wsny?)|oWUR1j=fMS=-p=L+f@b1@R8Ir2uAxh4U)DZJ@A|igb;o@)wayOp`*^4A;Q5b`6rW5p?gqpf{|L%Ixl9+g1MPm|m+L zM{I%vsC)#D$h~d5m+gF2i(c(R&Bqcw?g(54tX6ZPm1FI(KJwme8khEcEKj=V`$wE^!2$@MVlxZM>o z$jqf-bUp)L#uLWIqU)Vw0|keLo0uAH{tcX7&P-@*p~?JwdAnnb36Ugh7caJhx5m`( z{afg8R2g@J{X@l^_}sdvDAJlt#CDI49qMNjU#&uXBidv%gNEYlj33Wd-rrZN@*<5qU zX1>xR{#K9u^Pqm+VDI)K7Ws}}bB5(3E+*-1$* zNmj+JGVNt-rM~;9=qPlxtN(1sSy_x;tL(^kitqHl48MRwr1#Yd@1MyZLrc^vH=e8U zqoGmhFn3M-vQHqd9GZ(SPJ{(_+wGk?Z`&65i`^z+ReU2h^Q+bnXbT%A?<^dD6%H#? zN0H@IvD2FR%eL*g33iAx~Wc_o31gMC;vXtH(N`Od@!Cp$5wVSd?_r0{ulLk-RkX6mDWCkvJ zfs#pd<8gB~NQi~CH3Q{S{q`d)x92aV-u|vw>N9GpUWtyjOs2k|Y`#IaVg@JbS68|h zrN1oZEaQ4lJ)^Jw`(lS({)7fgEePe==j~uLsTiOMRuA{ih#_ znFm=N*a1U*q#`wD>4gn1mhldwWd`Y<7n66PWTG+E*rho+OaLsy+elB!_RH7=L3;zw9thDVm~PJ)0|e;Nh9SS2j)X zE!QIY%tt;2C;pK%t?qk%zsY@ShNKI0<7pIzr0sauQE$JOE)N>YCl|>ab@X5Im;6|1lXe>i* ziT7eKuBX!!Mtr1b!!jDvKJ{|568#YBCX~G7haX5d2!P!yH#%6ij!R_7irxE-9i?c}N<5nnEpvGyBZm zAjqR-)VF3LEBT>JL|g-w7!((KhqqMSV&I*8)7>SPInwDH79=J_7LB_u?X6tZvHy&l zbsr*VFbCnPwZy;V#axpgYbCVZPPFnRphd66g1M2GZ`x4yjTfHwk~{F&?b!5{QJ+`# z{Os;rfh9M_5nLBcIZ!7E{aXh}6*_7C=PI7Na|_>U3+5&wJJANOiVy5kBem*J2di=l zE%?_r@=o-5o;y@;o28f}=|=LS()3{`1?XO>cPt&3+%lNCIb_IKB8>}*?VWV4P9EGF z8_=Z{`}FK+xu8A6ZVY`hDz|d4^!KKw-C_`8kc(w85m5mupYu7zf!Q$?j-K@ux*Hm8 zse(t#M8lT9aN1oTIqY(XClW=q>Ni$b5}qFJz_}0Qh?69WCG^=H@`{xK^3oQ;w!xE_ zPY?Jey46%BP80TERk>CY$&niTNKE=c*8J4OKH=_LRE1Jv?N%)#VuMEGwM?QCL{7#;MlrY$~mxt=C>Xn8r{c%b7mofCGp>Fz@WH_|clt=BbS$&qKo^*Q_I?1+mh&02`4;IrYrWJCPg{s7hSc62#BBNxk3`cNGZU zm!0-;qjTMwa++ZRzXOb<$Z&b7WNcyL9n4>M_ADDB#|jGc$A_%~d|=)*U#m^+I1c`a z+(u`)d8@aDBTG6;Bua)x`mJX8Y$3Pk~M?~Kx}7atgNcRH5JIQ zZ$@KA)QI=iR!(#abA5`14{Lv-mAl-hgWX5&vDT^CR!)Orc1%?|$Gl6|D;A5h zRIJ)^{b6NZz18XwIikpbY-Joj*{dCQmkmvd*Na55W*jX^keg(U4>+0m(g!S9jP&at zs2qaAnWHvSl9B70U6DzpJ3Os7wGTMQQILuTMtPzYVL^ETK{0V2E!8LpDn|8?R$Chz z;BsQ+jvp$*hSXPdBvsYVvYip7_=o#rpaAEmf1Qskq73-CX;b1On!1;<3aI&x@@G@u z$t&ZQTgFJPs#}pn-X7mX!9%e@$z&jlA2X2NBcj|L8s`4%#XV6G0T<;6n(&!~MwA32 zJ2A0Mwew?##pH4ZgM%yS$>K)`WJ~lV)C~QE^AU-*l!)a8(d%rRMR~RGuDGYQy-vQ7 zS!WhjV>}h4n~AXboG2!z_io&9to3SEhBAPak->Kh_-aI?QXO`R>P~iy=k#uQU)N43 zW#G!C;<%>TUef%d9AqEU&Nr5%uL+mLtjJ}7ZV6B8I4do?2r*4ZRGPg$gY%WOaRjge zdM@syJXOz;FN(Zzb8;6VTLBZGfj;By$SbgSM0nPhPtCeCG$ziAMXSiUe!X4WnJkyi z{VtWy?0kI&!Evt3LRuWZ+Sv#vLaqrqe=fgV88C8m<x`#t%v`J`>#4*ySB?`&ZQI2T+ewQY#g)(O3N^%yfZAmcFM}9W9{|+Vi zqFXw{j(I_v1I0@0ZNlCxOe~9Hbqti?7xeU=h=_u^K0&nzwZ zjK*tN1rr@956dnMWOfV$GH;{Jg<#>qD@l2tQ=c>b)R;~7G>T{9S_D{X0XO>g@xcS{yF@nlmRF5m!G9qE)(*PXjQ z17`;2tkGrwv5qsCcD)Km+t{)Y_mfZj4<3l-9oHtgx2C^|(g9O|38Xb(>L;lH5i4-4 z=m`aw;Eq)wlh0TOBU{ebHYpo+)_1SCv3_2@NCkQRxDd8#BC5n4;wERiCH?UcI+`ynOHQAu@+R)N8z4c83qMhk(3Xci#P2evf6F zjw_8LBnC$W89c^jZ6BC?+>UR%{2(kGqubaEW4J`V5bM|Rw9J~ZLWj$`7h(jkzOKrj z4=E*<63gr^#)N#^eZFW?N>^{>$s3{rglkzf3JIY&4?~KS1pTyiyXWIr4q?psIdkBY zq=8Y9m|hUL0O`3#{x&0gAjiu^6eoGsfK6IZ@VQ{zCzH#Ja9h?*qvvWgZnnApxZ5AxPFU)>2(5(_&;eV{(Waa zR5>kadZJ^C(7B2@_GGGMR-F&)MYU$U}oV;NEUJk4~iv3Em2!PM9?iC zxT1Yg-A8Qh+90mqwRMPYT9#Vp(sY8yCFO4s-nJ|DqcRxYQToIE-C+{Om|;uJ`F7s& zK5~A2Pmuhs(DRyPb!-LASzaJtrxmI*qLm;WzsXU49yf~`~t?$Wwd#T`Q;Eq3%A$bYV&s& zhb9XU<>SH2du@hx)E;aKJM1HOIgfSSaW=8GqR>I(^NUu$lk{tk9dF7n8<-k;uRyB4 zUl_sJYjnuhXPgLHkl49CXmxS@tiEggn z3@_#S!K8`0o{ksZZk)#6E0}jjW2c_wY~;Of8LS%*zR~OT@G4>9K#p+!G1vlsC^mqM z*`utJ&Pn2&t(2(4>uv@0e3`pA>?TBWkrH95}c2UCn8YWT`C$C9Aj&kH{*8jpU#T(dhk z@nX`ebt+xIdF_!vx2;7K`?3j-&5;W8p1LGen(}AEEL!U5;hrNk`VRZSXv3b7EApRH z6BUz=zLB3g%n==isuhD`(bEQtN^DG%VH(wRB&8P9l=8kYOD@uXz1=lJLSCxu#`1<- z%ny@->IEGV={}Od&o}eM3tE7FMY7eE<-|M&PBPks;P50g_j>Oh*MW(P`dFisWDCPa zmS)1lFpuaTv4&lUroSbR_jTzDR|E&xMq<^}w&gl?+L?rJDkQ-sqY&wBZO^(;hZe3^ z$=QR>km9~NO7I36z7*n(EvA|qF3t9bxX-@3;tn6CpdRHXrim^A$1*2A2NWK z?sI9n%Cjb7<|%z^u*A&bz#Y}z-CAu_jb0}WM|zi~Ni04crWZPVW5|CrzSQEi8D@MZ z3qE}_>3wFW1lt{JT*)1u^^zQ&Im&RY*7L-EES%KQvqNGJ>;Ugb{!|QF)!^f_J^Pab z+}LH@+IQ5->zeR037m3_uZ%oZcSG_*97En8nVszib?>nj`rCvO!g!qnTEb`1P5I#` zNhjsr%Gx81Ex0GG6MfzH9k_&k_JV8oaXd2)f?;>Li+4q$=7=Rh@@}R}a~<?8rNxed-Ia;QjM`_T@?5eazmQ7BXfD=FOpm z)vfmF6Z?xpPX<(EN~AgCKY{~Q^f)U0cHG^JS>Pw($;i7CreA|i9g`#Q;sGVJ3flDIA39vZfM z`((3N;wGBkmMf8MZ!fgyOHijp=Qpj)#?)fHR%QpY^NV*dS@sVuZ=iI~9_5gzg%-T% ze28ctsvBkS3p1>29-kZDhuHx@*IFO_f@bsG^~sw{{4aO)um6$3S-($m@JW|8HF$&` zmbsbsnRCYaHj=S8M?m zfx}8lpL3io8CVl;s@{-8;C0vi>8l)mzXkPhaFsUkZ^5niOpC(;{-TJfSMo8_V??BS zQc+nclGl!(-wb4;dhuC8pf9Q+QLh~mf$^YV zI<^S@aCCHL9FW3yHEZoNmU3%h`ed67MA0&C-Tbq%=s*tyKgfGoeL-Q)+A7-{?y~H^ zq0Y->Q}teyzE5`Tefp5K4;8^>9)-%bM@SZ)od7-0<9JV=Iybzu3B7}fOZ{`$f3uCo zm6Y<`C8vUW#8wci6a@%Wd(XiljW^sY$GObzV(NTK-)M9}+hu~$?n6ib>y&ftBYifavZk#}ZyPNlTi03(dY)5l*o8Dzf#>*z~5baz+NKEE_f zL^!M1u;viT+10MvfAh?q1A)Usx#LdBsCQ$k6-7r4ecQ80F~-~G+`y{Y|M-;0fowBe z*3*wIpDTQPxc1qJCtwnLKUj6>3;%a+b60XSodKnCIxuXWZV{Hz62_ira5g+5#G()$ zvoyNE_J038XG-5LRTD@QjP%r(%4wuX&0ua+g_I!A?;xo-tVZCB!6Yyj;t$NZkOZ$8 z`;?m+lrYNe;M+GfOul=xP}mtk(Ns2!wGyo$)gO%0yjMzAZ;*?AxGHp?EEt`x`dyBH zN>!Hh#Hpm5DqEP)M|1(5lun)Kh75i%T>EB^;ya=mMq6G!ueC5w3XRS!=)0;=^)@9j z2N_4_;l1jc3O_sD@OvL0O^SCBVcqqr#(Lu9+hSo>^8BU6Ah?Jbzx8$dD8a2o;~M6? zo3iTkiwzXChkoHt7j0~9G7BAkUE=K=9T!y(QVx9>_nso(yrQfUM>|SL7c(j0+vSoi zh{n5;yKzlU@zza|j3g_!%ldxg_hVOnLu2Hp+_uee2%)UKaME}TkMWy$Bm)84p zUDU8oT7XmNQc1Aa$;_p?s#r*yvn!|7c9W9lj=Ej7$p`e0tgkPElVpvg^aM z9($_fUg+R+bXc%RwG1X0Z(;@jIA z)!uF`35xB8x_f}+iwaitx%(X3U8wY_qO8ivn{@1LTaeE%>6Z<3cv0N)SM2~(tn|dL zNIw10^2wu#Ulc)`Dia1aDVo-e;oWq2%BrUiOs9aGYVAU-t^={s$t91*N1u<^UmRU1 zY6hY@X&Sr5WD89pxo^?ij!leEDRUJ>d~&xnt6&-TOEruBb#fd!==}qelXp}E zPzXM87YoI$l~0Nz0!I0l$5Xu%f;DYXWIV$onqE1u{|x4|AEgh#jS~g7DWmBpi{9@) zs2|#xtf+$-HS9lc3+mbPXN?k9)YIqLVNDz6;Y#I~-GWx&HGRh9TNT0wnGJ3I#^zGRU51fiVXXSv$ou}*bmr5z zrtU0WrM|`uS!Ith^m%@J^UK``FV1C)$iDRyWuxzPGq9&0Te%Ns>fEO`&7F9$s$`id zM#-IW^zIfuH`n0>0Pnr6op=LG#g|-r$}#9en>6>e2pcwh!BUlaY4V8#C zVY2JWSDF*`RNF^~H&B4~1FkET0YC5?`Qg_0slkHftPR-`nKH%@s61*oa2wt!x*ZwL zyAapI45{7d5q*8Z5va@!IK4Z#h^5?vX&zDN?&KiUQ|~X`BhnDcT6iYXn=~9pphuoU zrH6Q2!rq1h&2#3jqk8kHkm$3yvbwb3+;QtGAai$@TFL$EK1@`=)n=P>mnJzj%nptJ zZV6T=2*DJND{ri|*hInYt&|K+K^6DEOOEr3g4KQjMgZl0I?-p|RC@s0GVU=2$zv)u zQtl^a-pZ^>R+?bBp`V zyMvOZG#ase(ga_wkX?D)ZLRyn0T>Q{Ldhtk@V&<;Kodi;AdqEq0f(B$sOT|2EBj%i z$E8Q7e^;?*AgIMvpD{fUo$}(05?e~NC2{SNv$9(N*Z)v93xCO~XAX0VK^qZC@3(@` z2EX9%?}skCV4pO+wK^oKbv*^>9=`=1mr5_1;V!K;*+8Mo=oY)*?(^67Z4W`r<22pO zfqJ~EO8|!bmx~s?4Dhq|h(0kIcHH*DSDhU(Idi^VOm}U;mH9x=vB!%RJZK!J-3Nxo ztnX2?bJC)QAj4^@4JHfdy5Z{AX~z><-}ty6i`3*!HW`MTKyKb?{fvlM7Z^w&NxN4cpN38$lC^;%BHPgUHvBb6iQQ%#HH@|gMfq^%N;WC{Jsz76t(xAOO^*{V)t zg$z0C!1cn~gpYc3GJb`tt{Y#vqc;Jdg_EjG3e>1c>rRTNfn8nY0?7(Lgi4QzRtP+o z_B|y|XosYJxwR1`>|wGGn%6BNG!_Vwu3`80=;~k#Psc#H13u98#8IxfuYHqYZ0ONK zH|Kz=4V_J6t(z3lkbOJYaO$;`$uKOr?1=6yZXsfws50#hk1psLvQquZk(kRj@sDz^!XHuZ1AtWm!d znygy7Lw=Em8Jvb^W=dR*qvNf{ze3&sB=H75{XZ&sdf)VvUJ=!>(7M`uIV_ylj3a-l z3W=z2f2^L96>a*1edzs!5B=)%VBTV+CAnf;t~ zopkz(zcJqC@5AS1s0Hd%swk&Q$LRz%P4b`t%e$i(5>NW&PPvk)X;xQQC>!D2`NBE( zGG?}5wtd69heH`Vqz(Zq%w>Hln*-4^ zQ2!*y!)8y!T1_6;50#3OtKYXCPFR)YarZGIcKE7#MI9z1<0Tx+5MO=c$5vYWB;jLM z5^WBT1VrHzaMvt(5b*(%m`49tOe3DsynFqq>UOlMQ9m;DwwUf3up(>Dsj@vd%c}E> zndkAd?YcN{*JX4_1%HoEvLNKOm^l?O+Wl-YcKM3&A4U=kRF7y0VSZ0#OUqVXZD*mF z1Uiyfug-Ft>{^Lwvd^=8Fq_ghwt=$58ANRUkeWxb2qWdpam3uv+Z{E7VOUhO@_~jWLzOQmk z+Z#o)B`LU>)ZaF3F3Bp;Oynu+5#*FYhaZBJInt?95neV?n^;(V%CE`+M$6;~9^Ky< z%4*7U8zQaycKFSD4&>7hvRX^^hTi--U7mnjx^>msMTSc5C$vBOO=8_SM;XYjlMVT9@AIiB0Rc-=n`HK?J_}iBMN{EjqY|lo z$Q`rv-|Ll=Py~@3&_)jwhqjL{||tKWBvL_Hm}L$WGly(`uW^ zv~q~_0YLN~{^9K=FE_Gx-&|P2X=af0+$B=K_xlbiF8drnZ?M#h7rm$-pMFp2gH4aF1|-9B6x17p%^%D$ z5lhkADSctUZ?F9uplDsZ%^&XppUy-~$grb~h6P{c@DD-HDQ*E0=i`u@aN1vh=RTB~PoCgo zg0dT*Dk_`E%#bqyf$`TKe*UZ~0A;w)q3f=zsneK~zt;%sCB}}S7fG*gDp}JlLv174 zAm-BvdvHg|nA%A@nmCDRN7P&S?AoPWD z_n^{)b)ux#5rl71!%ubUB$@2KiHmBrW@gR{|N3okRuky)Ww+|QqQe^z-E zD*X1X&R`H*fSC*6l;Uqt&S#mgLf)1>DRijbkD6WaA|$P^*f{CQiZ*bBT=oNY zfEi}lqi0#aGH?*l>q8!+2G!gXh=}am!ZWW`OUTs~Q@chJyq58*ye(2FeRcC_<-Hc_ zGe-3+zocC?Jxm5>0}Wo^dU@zM6;G6X2@O!39rOF`_Hn2UHKOLYf>8?Aj%m z*}CWx98B6G1+b6`_q4=hJ=q{RH9A|IlDL!`Wh_Y2{5BT5lv}X)OjewiwahuwtoKdv zd2{W=nIJ&g>8beJ|Y`8V{!^^fj-zhU|%5Kwz03rqKdXM&E@mSoY|14}=J~ zE@*R6Ca;M-?HPaH!PT{Muov|dmzi15#_^*)c14~6!U){f)4ztep}LhAmmULD6-`Ln zwO4{iIrkc)8>dpp1wIfr>GUYuQy>V*pDej`@#6bJlL!KvCT^37It`myrkOJYh@LNJ z_G9*|S*Al|XZdlwzx*k>Gq%vjgJA0WoQcGkol$sZh3T*z1WfJz`!k2C_9jb84A&Uc z!b%!p@OsA;Z&$X-m0x&zBXQ=0p89B47^if;ASDy%@}Jl&rM9Nw)~}l}(+<4vOS7*C zB2Nr-q`=5$m?wP7k_N8%cFG`G6}FU8<7C#wi;w$1faW``B&w2@sC^04B1ox3;Z6Fu zJb)#@%LUFt4 zy37ZPIrI8Lhhv@S(8!Z-t8O>Kccbp!OqyDUJ(p5f$LgCeU)G^O2LfQr=Iw2T-$LJ%@6y`~p94Pvs4D zw}s;fUudq}gYRJE1=G8;Le_tKAgiQ%*-%VmrA{qI3V(L=R^=2(Rc}|WmOs%e5r=v@ zt&3S0!F4XOgAt1^E0w(}@&VAf)njxG$dfhqBGWPF5FY6gZ9Fsp<+sYei+G_l>&6if zW-q!*kBhML?y%nCb$zvCBsO=GE5CAI|}}e0u|B{cDh> z+-uC)|IYyhJ2hWd1?S?VQC9|vS)7sVF7H*PaS`?1H)5XR@aG+hZB9k!x_!~L=_miO z7HdyH&Upr+vxb8@)QZ>G`lVcbsne+Z4Y09gZiMo=r6iV2_%c6p_ko%a!CY8kAUuh@ zg{9I6A!nC^LHxQNB>(cg3N1E7aW4#H1s@KXjv~yT%=vh74aQJhS*e z2}Cec$DS{vYkjJYo<5a&%4|QBZnHZUP5G!-!-f#N!8>E4B5&=^;aqu_byXx^)~pn> zFE}50JQYlL^gJm)mmApxEJeflci(!9uGY!IADo|c2R;i8o7uSE!x6?~`wfaBK!2^D zt_@pu!z=6h-ALiiCBYFD!XN$l-q#eHaWB8+B$jUAb#M1U{7A*Tw08nWHB57Sq4c1zoAo#V=T;CIv zpLD0zjtUrfwHIk^oCC}289}MDKME8CeY&LZOi`g`(5mU-0ASFQFjlZO1AKB z6^bdyxs<*O6Rm!bH~{doO~9x<422mVZN@L-7QGs1kT2gviA66rdl311Yr$?PKuo2U zgj8OA90}Cv+y^7S-?~ldJBG+R2*vkYqx8uV1xYTuZ8H>HanmyH(N!>E68Q{w(W|lF zpZ-E8{?a<{+h{3Y!JRgh)UKjKAR7xtOpAC=kcjD zjqkSi%{k*e>B_{j3T6O^*_sdv7%K^*X&7{9df{hQl#i!p=8dh0)?N#bMGnA(`2a~E zN}=H$Xfh1gSMEDDs1ac>mA=q(?k2PB!wH`ICHHIov5T`T6g#t8;K?9*3F>Sh*+r%m z0`Y-0ClJFZ@WP)*e(tknjW*}Dfd}{Wm{h`M#EiAMCETRek97mWxkEKj_7^V>-?(K8 zRX9|QU$X#XP^;Dy!7qW=PBxZ&de6SG*v52ETFJc_i<-3>wFZm);+APTe>(89PJff& zV-5lUz90Yq7zg+Gx>da_h{M+Z8vWqEpO*QbCPe)2jtB&~Fo~MOKr$@dO|kVqf?gN? zrwAbbEAs{F+il{da1#a6Z2=^CUkotf-@+1s5&Np;AIU4=N%DK>*<3(y`G0mbzzY2T z!1VcF+35fKQv)Z$O~)33;N_p9(-mI%$Myd?CF=i?g#sSH{~T@X;~K?aO|JUmf-Q%m zFu;LlZ%O%Srlg6Q`S>pnG zcZg)-_2%QlKj{`h?)~8D3+*NAl^s<=SJI59F^>4>k_N0}wxWL)J@K9EEJ)VK8%i+x z7-zJ_awg+x$mb&5=g0;5PNk+w2iuj!KAS=%)w(De-L6UaV5e;oz<1BZH6pyAUUqm>2Jm$%_9F-5J6>wvOH z$FXbwRjmO6fmvxu|D}jHZ1{d#rdu^e)sSo*;N+RUkP~~j-h~<{bZU-{3CxRkNr{Wd zucSn45;pxYx*ggjT;}=q`_Q$8(|)Ih>5{IAA*RiquXrI}l}L8>R0@zkxTWV~O9gZu zbz)~vmdB>-6s;m(BnHG`F4I})=;>Y0+0p1aH@e9slJ>$*czEFN*XjdH$fvosB*_z0 zEm59rS_7zp;lG7tD6=Jf+uIpFa;rM}kU`QfmNKvbS$B_&h@7&W<9e1Q4~Fw`}6RX#St_pG?5wVQ%e@v9(}V0K$%b>&C&8tPsL-$4HCZDypIf zFs?~N88%QQCJ2t7Y(isrx8jifjhEs^zTznTr^m=^I@Aj$i2csE;Hm9@ic$qUIXj3L zHbq*@nzH6vu=ZCb>?*j!E!eRxWa7R73@cFn*^+k1ArSHW7P%zrY z+ooZkVm}ClHDp50-gas&%<1EOD^vZo#m#Stu`aB;WsC{iQQq?H&a%wKmjMf%|E=KC zlUa+qaBK{zyI{bpvb|b=zRo)Ww%B6>FXYLo%tc}Q`k$FPwBwEEt-g#;$!l^=IUGe2 zh(?xR)rQs4jfX2TL0xp;%RhJRPTDH<$+UJxSlY?f#r58QEHad9xo)5DvPXvMksNMu zlIBV;NOG0UjMN7j^$R=*wO|WxGAuZ&t6){>pLYsQ|GtjJ@SU%JmyH9c^AG2^>R%Ujh`swU4Em9}w+y0laZ?sMLgC z59Y@ez*~|43PaB3?)&$e`1fHAG+n_%E#vBXlafq<-VVIbg+)}i?eJAKs0Bj+&jNa9 z7+9IsE{+mV7j`xg*;)vI>|Y0B0NhsI+(h}4;R@0r!G;_YhZp3zI@%DeYMctzg~;a@ z49k%3oM%Dq%?6nFZ=x$(emG7BFxH`zu!LFE7bVvyw_Ruohu*|C!YLJ~bENWrTgv^H znU5jMx?h4GUeK0YJNGYMSx-CDH!O`CUD(~+lYpf zQTwyNY^wGb>Ht)~@Y>lrTl-0OD!I(kSVM^Yh8ERT4Hai4#gImkhWuj^@PE}@|ZSXly_Ig+YyHA zZ0EI!sNtYwM8cxtI>wgtX(HV`_5sG{YI#Jae|B+Wa)BL7a+G1)(qCJveB0&&!`COi znRwH-B@PXWkduLQjPl>hbZf}(ew`Z{J{e2`@glxXV)hzcVczxuCePpYT{iY(!kgOT z5unNTO15EC(w`rW8!YB(A*DHSn{3cNaS)z0FCB)>TV}rjb$m%PgJ9clbV- zM+dW0m>tb^ap+Q(3m+rRyj38-8E@EAHW%`0(6R2HSS8yzW<#}mD$E~o5GBjfk3==) zv3ibVo2H+XYh?-80wf+4#~_=korXaDnC}ESg%N!`LMm+pnuF98e0A+cc?tNW0x6v_ zP*QpP4xgis&W&MH70bJ5ioWil1_=$+&fWn=gX@2gOu=pW;VUciCz_p2L3>#G=Bc=V zpz|VJbgsSz@0DLg4c9Gq;@zcrUgX_Sr_|EbyN*8|09!05W!Xl$=U4S~-c_HELypJ^ zEO{42ez&oat!n34#2_OTF%&()RVl8srOOV=#RKT0%F#nl7x79nUFNyKzUpLUehP#_ z2}(LXl2ZPN0!hf-)hp`XWoV+0=L7Jye}JcY1$~zi7ce5Sjz5WfQGYN>_jxQ`?7GHo z8joIA`SZJS zQ`LuEv$#OMx_@$ym8EG+CS-Qt#{7IiQM7zx&JN841oA2dREE+G!}6uvC0QP;(vFj7 z#h!{hg`jwUp(ySLTmr~3b8||cE^(+B%3b1{WOF~O#V6(k1vS0!Y{BzNP&aYU>_hd$ z>zybARW%(A;s=mdf2lwCf%E}+(02RZ6jvmhdHm*f0zlvCs)=3T@qbl9|M!;W>pPS_ zK=WdNZ^r;MH1il0=s3h@g|%j;zK$8e44Ni)Bpv!t(gChi`gZS?u_!~UZ{vLvnameZ zl-yW8s0L7-`ez+I(q&ZKhp@hqAYkQ77){%;@+Q8zF_RF7l*Y0_PzhaXtT80(`aF@V^`ioMN+1;`3wHi6jFu$8rI-E5H3CdgQ+aCAWeXM~uPaSj zXT`eSAZ}ACCu!r*EmT0U2UQ=FZ%<60jHoZKuyjljvnnd_158>0;o-B1epgL&hrDXU zT;GfvzM{Wf5Upf(TZ76_%c04;`NbnXTm=;CP7Af;;UX&Ky#jyT6Ae2lJR-tn_v3** z&T|R0dMo>OLKj2H^s7)vMC_HumqKz}V(t*KrZW67?$XvD%G#}~63b#OArO5uC;W*7VfW*rT1>&3hh9)-cq!^;Vp!3>;4PymW2{L7g6 zz0aIYbyVI#k6FAsdhn%|9{c$jX0ZV(@!_oEFV)q;c)@Zk_{@N|ROp}1=b5uE5r?(T zF@#pN>vm@Tju82y9#W>Z&|O^oQTc%9Yw^U~Tqf+a(f zlijSCCg)1O3YJC;#?v)$Dk6f@OQHrNtsmL(EJ^v`s|Z%uH`7-|P5XPYm~GcvW)_CZ zgdbqw5=fuih)L{bVnM+s^Q!uNzeoIvqXJMP%kH+WijSrPp@`WQLaW^mYH_nkc&#>f zvt6W56!n>1W-6ANkr9L0x7SWLo$`>Wwg76Fp_zV$hbrQcK{Lycg`E;3l1ID=fut^+ zWY7zXp9ML+ZG9kAw#v=^eSrV$o#=>`Ibj!L38UA|3qf)~T)VRVV1`+@m1T04gwWSL zR{KG0N&+M2w{00nI8<*1xONG4B>nu1+?e6o{TgIB?<9|@W(JdvL`kxsAducQ(D}^y_?rag%9{fJw1@4{ z$Ft?D`o(($i-Mm%V;-S-3v%^?u56bM^+{F@ze6l8z+DP#C6)@7ln4W7s?op8P8mr- zQ>#tq3l!W~l?axX$YDcFq74LMr@8~u&t8jWL5|r0{%3&!Y71cf;pY$t{vh=Ao`KBB z6A3pygFHZ+Ox%AogF!0tk3zZslHCkm$ku`yri;UC!FNhPEPLNaSJy;NFkK{7O^v9g zN3LB4cD{*h@ZdK4LDS2^VUX4jH~-=lv)}azFx#oZZ_@gHn!E4_E3)TkacG3(J3@Ny zpjIg!jA&gs2foY@6hU!oluc6w`(R5{L~ypD}7WJ%4}Al;k_GlzxC)hqR0H z{;k_1x*~Y z&aK5Ip%Ee;j?#C<9CitQsjl%ZFReMIzQXq)uVo&mg6*whe-Yc1P^L+E{;W1GAaWu2R@400 z#14Uvu7rpkOOR;F{d!4>LyaXBX;WhsC3Y150CZcu>hamps8_6^lcTj$a|C0;i6&{>x{O1K1L5spg)m&}_tFb86&8IrgrzTibG zjzFGpc<7q>(lS>VSSHj*TPv;D<;9UOC*6^r%9WDbEx<69;+GDQ5k-GMamO?DvY|Xg zQDiAmtrZlk*R?UgHwLb^fMba#ydhMgU-d@#Lz5INa(MB~I#P=OxP}Gg?EWn+?S;0F zX`)?I()z`$pj#|_yP4wa>h6*FmxQB8Z(4kQJe?Q83b_<1pGR;7X+l!buw;pRk$g+X zvk6f6J2;%r7uV9HR-nW*mR5_QtG=oi1Pq9~w=<*aaS=!rPbrQ8-=s5bXVmmy)CfrE zK&ZXHLAkYb^Y_X*5v>C^{IdIYx^#A!Y2&EOzsLDzFTZX^VpZCDpIj zx^k(S+i=)AG@^HMudlZ%kN6SfEG^ArBKA9ZzR5`IcP^-?6oMUd(&jPaki`Xz5!zu9gQ+LiV(jxs=2g+TL_Q;FWJwpc+WfUUUrEmXv)9WOw@dP?z$bKw z62NZCR9b==zc7{4ueecVryCqr?g&A57nx>ly@Y4A!YM=w6jXVjAWR~zYv7Bsz=T1Z z{_Km`?>rE#UN)WupU09&qxo2@*Tv00>pQ$Jbzz$)b2eqz^2h|Y!G>Yy!z5SbSR-}0 zZuM80DAvFf(V#^Whn%CGVteZ_Ks+2h?^K36g?_-)3gYjWDxtBgB!U2-v22_ae%nVHt-M{T|k8MKBshumEm>{X^}yJXHk z9)YwkQl9~fa7tJlC>0?9O_`oo3eHP`yna1zxGt(5Lz1#x1zTXAPV#3lBjzkd5^7p{` z&Ubk4-}^oQCZh=5T!nJgouhwWOi6Ite$~m}l;~MWe6s6oeL>J;IVXxti-y>q^&Fb- zSQw3xlCa7eobz=PF~u*eu^0S8}S3s!WMnF*bxtQ z;QXI7V5h=21MCD-4y_tXt)iY67M44+=6#7}GBDGZ&%ZgMoBY{Vm`&hGf;?B<6*;$W zgY)PK!4q5Qp2ULRf*btoYVZOlcwsB+D^zWDV(UffEYHa7xW8ZuKFcL9$=6O{gXge% zUXH1O1VDs5sBC5GUr6qHZ+dMr-e!M**Y?Tny1X~^hQ4}E-Js>-U zNYn52atU9L`;C5HY)Kbe`k0{S$|Q1XDUI<0Pz~wBCPT z+?fl@%IA;4?rq}}tqEf0njQ)~i3lx=j5#6Pl_huFo9O3{D75dvc72H)RU@&Xt)Ux^ zXE)IFXjd5jwB4T8JLvxa>aTaA{k*5-S>#xzf1~QTP*>VdB?rE+7?!>hpWv0+X3{>| zU1excDO1FgV*kKaDi1mRm0JRwf4#!_11X}1s;XJ3RTpJx{x%8YoVH#0|&KtEfJ^Ah*5lb-1-l!BCuxCG>dg-E%gd|Q{9#RLGUdkqwdMb4%`pRomp}*qY-H= z?eWSs6Zx`X7gybmFiYIVy=ZXE;J%_PVMeQWC-nSY&3LM7hv*A~<4Y#16ig|Zc{e`)7TL23tFBz5S9MHSrz^YCtoMlSVsFUokPQM;bJkv?Zh+Mydvee7(CT{ks% zn!4qy!T?xK!+he(r^NRKOAkV0(J&|O76~rOTBacx>NOHeeQ7d9EH4x^ZI+_A7gVaK z*gH+5<=;+&i2=*tGfH@JCq_(L15j?py(IOMiXXL@1W$6M?Hb$IR}5*vzHBbC4B64P z2#LPbM7Xwkt}$S&1=*PzswBd5%fdo+)$F|o-BOHh7u5k(ntYK=CmSz0$&{*02M?O}U+>{(Zq|c4#+i!`Mo~N!$80E7nVe2-a zc?R4}LwK3SdW!GYP42+k2&}TkQcJftw{b|;vPjNl#yb6sVdOb&l)gNo z$?{iST2RIV@v0M|pSU87tN4$hf%vk3DZ%~JWd#vs6#2PR@>m6ttDF(kd0<veB}y%E9;&AcIY`pz>WHOuBIz3pr~dZV4oT2+YpzPH4lY96*ZsqxnuLn zWa*KGmbLDI2BObYn{XKF4e+F9Mn7ubCGY1-Rf za{gbC^2=?jaNkKgss0RbL^|;uZFelK@PYOW6L&92 zh|#|XfMsL>2F-dUuYZ_Ks%st}uDDiBNQH2H1u7_bI|JJoK{(-Bfjb#@iy#QwB1Pek z78=BgoPJ3y%OzI0l0oK%ddr~R$D}&-4VQiFo{HjrHSV9DN%C*h8{7w_vB=C~{FM^s Ym+=>a`vGn6g%{qzl@$vQ{O(Wx1{}q}eE Date: Tue, 5 Jan 2021 11:59:11 -0500 Subject: [PATCH 08/21] Updates to Reconciling --- docs/docs/manual/expressions.md | 8 +-- docs/docs/manual/reconciling.md | 113 +++++++++++++++++--------------- 2 files changed, 64 insertions(+), 57 deletions(-) diff --git a/docs/docs/manual/expressions.md b/docs/docs/manual/expressions.md index ce41f097e..15e75f19c 100644 --- a/docs/docs/manual/expressions.md +++ b/docs/docs/manual/expressions.md @@ -341,7 +341,7 @@ Examples: | `isError("abc")` | false | | `isError(1 / 0)` | true | -Remember that these are controls and not functions. So you can’t use dot notation (the `e.isX()` syntax). +Remember that these are controls and not functions: you can’t use dot notation (the `e.isX()` syntax). ### Constants |Name |Meaning | @@ -352,7 +352,7 @@ Remember that these are controls and not functions. So you can’t use dot notat ## Jython -Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (.py or .pyc) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. Remember to restart OpenRefine, so that new Jython/Python libraries are initialized during Butterfly's startup. +Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup. OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions), for parsing and working with HTML elements and extraction. @@ -374,7 +374,7 @@ Fields have to be accessed using the bracket operator rather than the dot operat return cells["col1"]["value"] ``` -To access the Levenshtein distance between the reconciled value and the cell value (?) use the [recon variables](#reconciliation): +To access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value, use [recon variables](#reconciliation): ``` return cell["recon"]["features"]["nameLevenshtein"] @@ -415,4 +415,4 @@ For help with syntax, see the [Clojure website's guide to syntax](https://clojur User-contributed Clojure recipes can be found on our wiki at [https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#11-clojure). -Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/). +Full documentation on the Clojure language can be found on its official site: [https://clojure.org/](https://clojure.org/). \ No newline at end of file diff --git a/docs/docs/manual/reconciling.md b/docs/docs/manual/reconciling.md index 40cf147af..4a8c46baf 100644 --- a/docs/docs/manual/reconciling.md +++ b/docs/docs/manual/reconciling.md @@ -6,58 +6,56 @@ sidebar_label: Reconciling ## Overview -Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison are produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, and interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). +Reconciliation is the process of matching your dataset with that of an external source. Datasets for comparison might be produced by libraries, archives, museums, academic organizations, scientific institutions, non-profits, or interest groups. You can also reconcile against user-edited data on [Wikidata](wikidata), or reconcile against [a local dataset that you yourself supply](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). To reconcile your OpenRefine project against an external dataset, that dataset must offer a web service that conforms to the [Reconciliation Service API standards](https://reconciliation-api.github.io/specs/0.1/). You may wish to reconcile in order to: * fix spelling or variations in proper names -* to clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH) -* to link your data to an existing dataset -* to add it to an open and editable system such as [Wikidata](https://www.wikidata.org) -* or to see whether entities in your project appear in some specific list, such as the [Panama Papers](https://aleph.occrp.org/datasets/734). +* clean up manually-entered subject headings against authorities such as the [Library of Congress Subject Headings](https://id.loc.gov/authorities/subjects.html) (LCSH) +* link your data to an existing dataset +* add to an editable platform such as [Wikidata](https://www.wikidata.org) +* or see whether entities in your project appear in some specific list, such as the [Panama Papers](https://aleph.occrp.org/datasets/734). -Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to ensure the process is successful. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton. +Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to review and approve the results. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton. +:::info We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data. +::: ## Sources -We recommend starting with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add. +Start with [this current list of reconcilable authorities](https://reconciliation-api.github.io/testbench/), which includes instructions for adding new services via Wikidata editing if you have one to add. OpenRefine maintains a [further list of sources on the wiki](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources), which can be edited by anyone. This list includes ways that you can reconcile against a [local dataset](https://github.com/OpenRefine/OpenRefine/wiki/Reconcilable-Data-Sources#local-services). Other services may exist that are not yet listed in these two places: for example, the [310 datasets hosted by the Organized Crime and Corruption Reporting Project (OCCRP)](https://aleph.occrp.org/datasets/) each have their own reconciliation URL, or you can reconcile against their entire database with the URL [shared on the reconciliation API list](https://reconciliation-api.github.io/testbench/). For another example, you can reconcile against the entire Virtual International Authority File (VIAF) dataset, or [only the contributions from certain institutions](http://refine.codefork.com/). Search online to see if the authority you wish to reconcile against has an available service, or whether you can download a copy to reconcile against locally. -OpenRefine includes Wikidata reconciliation in the installation package - see the [Wikidata](wikidata) page for more information particular to that service. +OpenRefine includes Wikidata reconciliation in the installation package - see the [Wikidata](wikidata) page for more information particular to that service. Extensions can add reconciliation services, and can also add enhanced reconciliation capacities. Check the list of extensions on the [Downloads page](https://openrefine.org/download.html) for more information. -:::info -OpenRefine extensions can add reconciliation services, and can also add enhanced reconciliation capacities. Check the list of extensions on the [Downloads page](https://openrefine.org/download.html) for more information. -::: - -Each source will have its own documentation on how it provides reconciliation. Refer to the service itself if you have questions about its behaviors and which OpenRefine features it supports. +Each source will have its own documentation on how it provides reconciliation. The table on [the reconciliation API list](https://reconciliation-api.github.io/testbench/) indicates whether your chosen service supports the features described below. Refer to the service's documentation if you have questions about its behaviors and which OpenRefine features it supports. ## Getting started -Select ReconcileStart reconciling on a column. If you want to reconcile only some cells in that column, first use filters and facets to isolate them. +Choose a column to reconcile and use its dropdown menu to select ReconcileStart reconciling. If you want to reconcile only some cells in that column, first use filters and facets to isolate them. In the reconciliation window, you will see Wikidata offered as a default service. To add another service, click Add Standard Service... and paste in the URL of a [service](#sources). You should see the name of the service appear in the list of Services if the URL is correct. ![The reconciliation window.](/img/reconcilewindow.png) -Once you select a service, the service may sample your selected column and identify some [suggested categories (“types”)](#reconciling-by-type) to reconcile against. Other services will suggest their available types without sampling, and some services have no types. +Once you select a service, your selected column may be sampled in order to suggest [“types” (categories)](#reconciling-by-type) to reconcile against. Other services will suggest their available types without sampling, and some services have no types. For example, if you had a list of artists represented in a gallery collection, you could reconcile their names against the Getty Research Institute’s [Union List of Artist Names (ULAN)](https://www.getty.edu/research/tools/vocabularies/ulan/). The same [Getty reconciliation URL](https://services.getty.edu/vocab/reconcile/) will offer you ULAN, AAT (Art and Architecture Thesaurus), and TGN (Thesaurus of Geographic Names). ![The reconciliation window with types.](/img/reconcilewindow2.png) -Refer to the documentation specific to the reconciliation service (frequently linked on [this page](https://reconciliation-api.github.io/testbench/)) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available. +Refer to the [documentation specific to the reconciliation service](https://reconciliation-api.github.io/testbench/) to learn whether types are offered, which types are offered, and which one is most appropriate for your column. You may wish to facet your data and reconcile batches against different types if available. Reconciliation can be a time-consuming process, especially with large datasets. We suggest starting with a small test batch. There is no throttle (delay between requests) to set for the reconciliation process. The amount of time will vary for each service, and vary based on the options you select during the process. When the process is done, you will see the reconciliation data in the cells. -If the cell was successfully matched, it displays a single dark blue link. In this case, the reconciliation is confident that the match is correct, and you should not have to check it manually. -If there is no clear match, one or more candidates are displayed, together with their reconciliation score, with light blue links. You will need to select the correct one. +If the cell was successfully matched, it displays text as a single dark blue link. In this case, the reconciliation is confident that the match is correct, and you should not have to check it manually. +If there is no clear match, one or more candidates are displayed, together with their reconciliation score, with the text in light blue links. You will need to select the correct one. For each matching decision you make, you have two options: match this cell only (one checkmark), or also use the same identifier for all other cells containing the same original string (two checkmarks). @@ -71,26 +69,28 @@ Hovering over the suggestion will also offer the two matching options as buttons For matched values (those appearing as dark blue links), the underlying cell value has not been altered - the cell is storing both the original string and the matched entity link at the same time. If you were to copy your column to a new column at this point using `value`, for example, the reconcilation data would not transfer - only the original strings. You can learn more about how OpenRefine stores different pieces of information in each cell in [the Variables section specific to reconciliation data](expressions#reconciliation). -For each cell, you can manually “Create new item,” which will take the cell’s current value and apply it as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is like a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature. +For each cell, you can manually “Create new item,” which will take the cell’s original value and apply it, as though it is a match. This will not become a dark blue link, because at this time there is nothing to link to: it is a draft entity stored only in your project. You can use this feature to prepare these entries for eventual upload to an editable service such as [Wikidata](wikidata), but most services do not yet support this feature. ### Reconciliation facets -Under ReconcileFacets you can see a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets for you when you reconcile a column. +Under ReconcileFacets there are a number of reconciliation-specific faceting options. OpenRefine automatically creates two facets when you reconcile some cells. -One is a numeric facet for best candidate's score, the range of reconciliation scores of only the best candidate of each cell. Each service calculates scores differently and has a different range, but higher scores always mean better matches. You can facet for higher scores in the numeric facet, and then approve them all in bulk, by using ReconcileActionsMatch each cell to its best candidate. +One is a numeric facet for “best candidate's score,” the range of reconciliation scores of only the best candidate of each cell. Higher scores mean better matches, although each service calculates scores differently and has a different range. You can facet for higher scores using the numeric facet, and then approve them all in bulk, by using Reconcile[Actions](#reconciliation-actions)Match each cell to its best candidate. There is also a “judgment” facet created, which lets you filter for the cells that haven't been matched (pick “none” in the facet). As you process each cell, its judgment changes from “none” to “matched” and it disappears from the view. You can add other facets by selecting ReconcileFacets on your reconciled column. You can facet by: * your judgments (“matched,” or “none” for unreconciled cells, or “new” for entities you've created) -* the action you’ve performed on that cell (chosen a “single” match, or set a "mass" match, or no action, as “unknown”) +* the action you’ve performed on that cell (chosen a “single” match, or set a “mass” match, or no action, which appears as “unknown”) * the timestamps on the edits you’ve made so far (these appear as millisecond counts since an arbitrary point: they can be sorted alphabetically to move forward and back in time). You can facet only the best candidates for each cell, based on: * the score (calculated based on each service's own methods) * the edit distance (using the [Levenshtein distance](cellediting#nearest-neighbor), a number based on how many single-character edits would be required to get your original value to the candidate value, with a larger value being a greater difference) -* the word similarity (a percentage based on how many words, excluding [stop words](https://en.wikipedia.org/wiki/Stop_word), in the original value match words in the candidate. For example, the value "Maria Luisa Zuloaga de Tovar" matched to the candidate "Palacios, Luisa Zuloaga de" results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0). +* the word similarity. + +Word similarity is calculated as a percentage based on how many words (excluding [stop words](https://en.wikipedia.org/wiki/Stop_word)) in the original value match words in the candidate. For example, the value “Maria Luisa Zuloaga de Tovar” matched to the candidate “Palacios, Luisa Zuloaga de” results in a word similarity value of 0.6, or 60%, or 3 out of 5 words. Cells that are not yet matched to one candidate will show as 0.0). You can also look at each best candidate’s: * type (the ones you have selected in successive reconciliation attempts, or other types returned by the service based on the cell values) @@ -102,17 +102,17 @@ These facets are useful for doing successive reconciliation attempts, against di ### Reconciliation actions You can use the ReconcileActions menu options to perform bulk changes (which will apply only to your currently viewed set of rows or records): -* Match each cell to its best candidate (by highest score) -* Create a new item for each cell (discard any suggested matches) -* Create one new item for similar cells (a new entity will be created for each unique string) -* Match all filtered cells to... (a specific item from the chosen service, via a search box. For services with the [“suggest entities” property](https://reconciliation-api.github.io/testbench/)) -* Discard all reconciliation judgments (reverts back to multiple candidates per cell, including cells that may have been auto-matched in the original reconciliation process) -* Clear reconciliation data, reverting all cells back to their original values. +* Match each cell to its best candidate (by highest score) +* Create a new item for each cell (discard any suggested matches) +* Create one new item for similar cells (a new entity will be created for each unique string) +* Match all filtered cells to... (a specific item from the chosen service, via a search box; only works with services that support the “suggest entities” property) +* Discard all reconciliation judgments (reverts back to multiple candidates per cell, including cells that may have been auto-matched in the original reconciliation process) +* Clear reconciliation data, reverting all cells back to their original values. The other options available under Reconcile are: -* Copy reconciliation data... (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and/or new cells will copy over - unmatched values will not change) -* [Use values as identifiers](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches) -* [Add entity identifiers column](#add-entity-identifiers-column). +* Copy reconciliation data... (to an existing column: if the original values in your reconciliation column are identical to those in your chosen column, the matched and new cells will copy over; unmatched values will not change) +* [Use values as identifiers](#reconciling-with-unique-identifiers) (if you are reconciling with unique identifiers instead of by doing string searches) +* [Add entity identifiers column](#add-entity-identifiers-column). ## Reconciling with unique identifiers @@ -130,13 +130,13 @@ You may get false positives, which you will need to hover over or click on to id Reconciliation services, once added to OpenRefine, may suggest types from their databases. These types will usually be whatever the service specializes in: people, events, places, buildings, tools, plants, animals, organizations, etc. -Reconciling against a type may be faster and more accurate, but may result in fewer matches. Some services have hierarchical types (such as “mammal” as a subtype of “animal”). When you reconcile against a more specific type, unmatched values may fall back to more broad types. Other services will not do this, so you may need to perform successive reconciliation attempts against different types. Refer to the documentation specific to the reconciliation service to learn more. +Reconciling against a type may be faster and more accurate, but may result in fewer matches. Some services have hierarchical types (such as “mammal” as a subtype of “animal”). When you reconcile against a more specific type, unmatched values may fall back to the broader type; other services will not do this, so you may need to perform successive reconciliation attempts against different types. Refer to the documentation specific to the reconciliation service to learn more. -When you select a service from the list, OpenRefine will load some or all available types. Some services will sample the first ten rows of your column to suggest types (check the [“Suggest types” column on this table of services](https://reconciliation-api.github.io/testbench/)). You will see a service’s types in the reconciliation window: +When you select a service from the list, OpenRefine will load some or all available types. Some services will sample the first ten rows of your column to suggest types (check the [“Suggest types” column](https://reconciliation-api.github.io/testbench/)). You will see a service’s types in the reconciliation window: ![Reconciling using a type.](/img/reconcile-by-type.png) -In this example, “Person” and “Corporate Name” are potential types offered by VIAF. You can also use the Reconcile against type: field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications. +In this example, “Person” and “Corporate Name” are potential types offered by the reconciliation API for VIAF. You can also use the Reconcile against type: field to enter in another type that the service offers. When you start typing, this field may search and suggest existing types. For VIAF, you could enter “/book/book” if your column contained publications. You may need to enter the service's own strings precisely instead of attempting to search for a match. Types are structured to fit their content: the Wikidata “human” type, for example, can include fields for birth and death dates, nationality, etc. The VIAF “person” type can include nationality and gender. You can use this to [include more properties](#reconciling-with-additional-columns) and find better matches. @@ -150,11 +150,11 @@ Some of your cells may be ambiguous, in the sense that a string can point to mor ![Reconciling sometimes turns up ambiguous matches.](/img/reconcileParis.gif) -Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). The other columns in your project will appear in the reconciliation window, with an Include? checkbox available on each. +Including supplementary information can be useful, depending on the service (such as including birthdate information about each person you are trying to reconcile). You can re-reconcile unmatched cells with additional properties, in the right side of the Start reconciling window, under “Also use relevant details from other columns.” The column names in your project will appear in the reconciliation window, with an Include? checkbox next to each one. -You can fill in the As Property field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms. +Fill in the As Property field with the type of information you are including. When you start typing, potential fields may pop up (depending on the [“suggest properties” feature](https://reconciliation-api.github.io/testbench/)), such as “birthDate” in the case of ULAN or “Geburtsdatum” in the case of Integrated Authority File (GND). Use the documentation for your chosen service to identify the fields in their terms. -Some services will not be able to search for the exact name of your desired As Property entry, but you can still manually supply the field name. Refer to the service to make sure you enter it correctly. +Some services will not be able to search for the exact name of your desired As Property entry, but you can still manually supply the field name. Refer to the service to choose the most appropriate field, and make sure you enter it correctly. ![Including a birth-date type.](/img/reconcile-with-property.png) @@ -174,44 +174,51 @@ Once you have selected matches for your cells, you can retrieve the unique ident If the reconciliation service supports [data extension](https://reconciliation-api.github.io/testbench/), then you can augment your reconciled data with new columns using Edit columnAdd columns from reconciled values.... -For example, if you have a column of chemical elements identified by name, you can fetch categorical information about them such as their atomic number and their element symbol, as the animation shows below: +For example, if you have a column of chemical elements identified by name, you can fetch categorical information about them such as their atomic number and their element symbol: ![A screenshare of elements fetching related information.](/img/reconcileelements.gif) -Once you have pulled reconciliation values and selected one for each cell, selecting Add column from reconciled values... will bring up a window to choose which information you’d like to import into a new column. The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with suggested properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc. +Once you have pulled reconciliation values and selected one for each cell, selecting Add column from reconciled values... will bring up a window to choose which information you’d like to import into new columns. You can manually enter desired properties, or select from a list of suggestions. + +The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with the associated properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc. ![A screenshot of available properties from GND.](/img/reconcileGND.png) -If you have left any values unreconciled in your column, you will see “<not reconciled>” in the preview. These will generate blank cells if you continue with the column addition process. This process may pull more than one property per row in your data, so you may need to switch into records mode after you've added columns. +If you have left any values unreconciled in your column, you will see “<not reconciled>” in the preview. These will generate blank cells if you continue with the column addition process. + +This process may pull more than one property per row in your data (such as multiple children's names), so you may need to switch into records mode after you've added columns. ### Add columns by fetching URLs -If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as https://viaf.org/viaf/000000). You can use the Edit column[Add column by fetching URLs](columnediting#add-column-by-fetching-urls) operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions). +If the reconciliation service cannot extend data, look for a generic web API for that data source, or a structured URL that points to their dataset entities via unique IDs (such as “https://viaf.org/viaf/000000”). You can use the Edit column[Add column by fetching URLs](columnediting#add-column-by-fetching-urls) operation to call this API or URL with the IDs obtained from the reconciliation process. This will require using [expressions](expressions). You may not want to pull the entire HTML content of the pages at the ends of these URLs, so look to see whether the service offers a metadata endpoint, such as JSON-formatted data. You can either use a column of IDs, or you can pull the ID from each matched cell during the fetching process. -For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using Add column by fetching URLs and entering the GREL expression `“http://vocab.getty.edu/” + value + “.json”` in the window. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”. +For example, if you have reconciled artists to the Getty's ULAN, and [have their unique ULAN IDs as a column](#add-entity-identifiers-column), you can generate a new column of JSON-formatted data by using Add column by fetching URLs and entering the GREL expression `"http://vocab.getty.edu/" + value + ".json"`. For this service, the unique IDs are formatted “ulan/000000” and so the generated URLs look like “http://vocab.getty.edu/ulan/000000.json”. -You can alternatively insert the ID directly from the matched column using a GREL expression like `“http://vocab.getty.edu/” + cell.recon.match.id + “.json”` instead. +Alternatively, you can insert the ID directly from the matched column's reconciliation variables, using a GREL expression like `“http://vocab.getty.edu/” + cell.recon.match.id + “.json”` instead. -Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about common errors with this process. +Remember to set an appropriate throttle and to refer to the service documentation to ensure your compliance with their terms. See [the section about this operation](columnediting#add-column-by-fetching-urls) to learn more about the fetching process. ## Keep all the suggestions made -If you would like to generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to “Edit column” → “Add column based on this column.” To create a list of all the possible matches, use +To generate a list of each suggestion made, rather than only the best candidate, you can use a [GREL expression](expressions#GREL). Go to Edit columnAdd column based on this column. To create a list of all the possible matches, use something like -```forEach(cell.recon.candidates,c,c.name).join(“,”)``` +``` +forEach(cell.recon.candidates,c,c.name).join(", ") +``` To get the unique identifiers of these matches instead, use -```forEach(cell.recon.candidates,c,c.id).join(“,”)``` +``` +forEach(cell.recon.candidates,c,c.id).join(", ") +``` -This information is stored as a string without any attached reconciliation information. +This information is stored as a string, without any attached reconciliation information. ## Writing reconciliation expressions -OpenRefine's GREL supplies a number of variables related specifically to reconciled values. -For example, some of the reconciliation variables are: +OpenRefine supplies a number of variables related specifically to reconciled values. These can be used in GREL and Jython expressions. For example, some of the reconciliation variables are: * `cell.recon.match.id` or `cell.recon.match.name` for matched values * `cell.recon.best.name` or `cell.recon.best.id` for best-candidate values @@ -222,8 +229,8 @@ For example, some of the reconciliation variables are: You can find out more in the [reconciliaton variables](expressions#reconciliaton-variables) section. -## Exporting your reconciled data +## Exporting reconciled data Once you have data that is reconciled to existing entities online, you may wish to export that data to a user-editable service such as Wikidata. See the section on [uploading your edits to Wikidata](wikidata#upload-edits-to-wikidata) for more information, or the section on [exporting](exporting) to see other formats OpenRefine can produce. -You can share reconciled data in progress through a [project export or import](exporting#export-a-project), with some preparation. The importing user needs to have the reconciliation services installed on their OpenRefine instance in advance of opening the project in order to use candidate and match links. Otherwise, the links will be broken and the user will need to add the reconciliation service and re-reconcile the columns in question. [Wikidata](wikidata) reconciliation data can be shared more easily as the service comes bundled with OpenRefine. \ No newline at end of file +You can share reconciled data in progress through a [project export or import](exporting#export-a-project), with some preparation. The importing user needs to have the appropriate reconciliation services installed on their OpenRefine instance (by going to Start reconciling and clicking on Add Standard Service...) in advance of opening the project, in order to use candidate and match links. Otherwise, the links will be broken and the user will need to add the reconciliation service and re-reconcile the columns in question. [Wikidata](wikidata) reconciliation data can be shared more easily as the service comes bundled with OpenRefine. \ No newline at end of file From 5d9086cc6f93cdd891cb79b22a545afbe651281d Mon Sep 17 00:00:00 2001 From: allanaaa Date: Tue, 5 Jan 2021 12:07:20 -0500 Subject: [PATCH 09/21] Update reconciling.md --- docs/docs/manual/reconciling.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/manual/reconciling.md b/docs/docs/manual/reconciling.md index 4a8c46baf..7b2544af4 100644 --- a/docs/docs/manual/reconciling.md +++ b/docs/docs/manual/reconciling.md @@ -178,7 +178,7 @@ For example, if you have a column of chemical elements identified by name, you c ![A screenshare of elements fetching related information.](/img/reconcileelements.gif) -Once you have pulled reconciliation values and selected one for each cell, selecting Add column from reconciled values... will bring up a window to choose which information you’d like to import into new columns. You can manually enter desired properties, or select from a list of suggestions. +Once you have chosen reconciliation matches for your cells, selecting Add column from reconciled values... will bring up a window to choose which related information you’d like to import into new columns. You can manually enter desired properties, or select from a list of suggestions. The quality of the suggested properties will depend on how you have reconciled your data beforehand: reconciling against a specific type will provide you with the associated properties of that type. For example, GND suggests elements about the “people” type after you've reconciled with it, such as their parents, native languages, children, etc. @@ -186,7 +186,7 @@ The quality of the suggested properties will depend on how you have reconciled y If you have left any values unreconciled in your column, you will see “<not reconciled>” in the preview. These will generate blank cells if you continue with the column addition process. -This process may pull more than one property per row in your data (such as multiple children's names), so you may need to switch into records mode after you've added columns. +This process may pull more than one property per row in your data (such as multiple occupations), so you may need to switch into records mode after you've added columns. ### Add columns by fetching URLs From 72af1270100a43e662bf9d4835f704eb2034c112 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Tue, 5 Jan 2021 16:46:02 -0500 Subject: [PATCH 10/21] Updates to wikidata --- docs/docs/manual/wikidata.md | 75 +++++++++++++++------------- docs/static/img/wikidata-terms.png | Bin 0 -> 24677 bytes docs/static/img/wikidata-terms2.png | Bin 0 -> 7336 bytes 3 files changed, 41 insertions(+), 34 deletions(-) create mode 100644 docs/static/img/wikidata-terms.png create mode 100644 docs/static/img/wikidata-terms2.png diff --git a/docs/docs/manual/wikidata.md b/docs/docs/manual/wikidata.md index 76c6d75e0..2061a65fc 100644 --- a/docs/docs/manual/wikidata.md +++ b/docs/docs/manual/wikidata.md @@ -8,11 +8,13 @@ sidebar_label: Wikidata OpenRefine provides powerful ways to both pull data from Wikidata and add data to it. -OpenRefine’s connections to Wikidata were formerly an optional extension, but are now installed automatically with the downloadable package. The Wikidata extension can be removed manually by navigating to your OpenRefine installation folder, and then looking inside `webapp/extensions/` and deleting the `wikidata` folder inside. - You do not need a Wikidata account to reconcile your local OpenRefine project to Wikidata. If you wish to [upload your cleaned dataset to Wikidata](#editing-wikidata-with-openrefine), you will need an [autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users) account, and you must [authorize OpenRefine with that account](#manage-wikidata-account). -The best source for information about how OpenRefine works with Wikidata is [on Wikidata itself, under Tools](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine). This section has tutorials, guidelines on editing, and spaces for discussion and help. The following text reviews the basics and can help you get set up, but the Wikidata help page is more regularly updated when technology or policies change. +:::info +The best source for information about how OpenRefine works with Wikidata is [on Wikidata itself, under Tools](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine). That page has tutorials, guidelines on editing, and spaces for discussion and help. The following text on this page reviews the basics and can help you get set up, but the Wikidata help page is more regularly updated when technology or policies change. Links to the Wikidata help page are included throughout this page. +::: + +OpenRefine’s connections to Wikidata were formerly an optional extension, but are now included automatically with installation. The Wikidata extension can be removed manually by navigating to your OpenRefine installation folder, and then looking inside `webapp/extensions/` and deleting the `wikidata` folder found there. ## Reconciling with Wikidata @@ -24,23 +26,23 @@ The Wikidata [reconciliation service](reconciling) for OpenRefine [supports](htt You can find documentation and further resources on the reconciliation API [here](https://wikidata.reconci.link/). -For the most part, Wikidata reconciliation behaves the same way other reconciliation processes do, but there are a few processes and features specific to Wikidata. +For the most part, Wikidata reconciliation behaves the same way other reconciliation services do, but there are a few processes and features specific to Wikidata. ### Language settings -You can install a version of the Wikidata reconciliation service that uses your language. First, you need the language code: this is the [two-letter code found on this list](https://en.wikipedia.org/wiki/List_of_Wikipedias), or in the domain name of the desired Wikipedia (for instance, “fr” if your Wikipedia is [https://fr.wikipedia.org/wiki/](https://fr.wikipedia.org/wiki/)). +You can install a version of the Wikidata reconciliation service that uses your language. First, you need the language code: this is the [two-letter code found on this list](https://en.wikipedia.org/wiki/List_of_Wikipedias), or in the domain name of the desired Wikipedia/Wikidata (for instance, “fr” if your Wikipedia is https://fr.wikipedia.org/wiki/). -Then, open the reconciliation window (under ReconcileStart reconciling...) and click Add Standard Service. The URL is `https://openrefine-wikidata.toolforge.org/fr/api` where “fr” is your desired language code. +Then, open the reconciliation window (under ReconcileStart reconciling...) and click Add Standard Service. The URL to enter is `https://openrefine-wikidata.toolforge.org/fr/api`, where “fr” is your desired language code. -When reconciling using this interface, items and properties will be displayed in your language if a translation is available. The matching score of the reconciliation is not influenced by your choice of language: items are matched by considering all labels and keeping the best possible match. So the language of your dataset is irrelevant to the choice of the language for the reconciliation interface. +When reconciling using this interface, items and properties will be displayed in your chosen language if the label is available. The matching score of the reconciliation is not influenced by your choice of language for the service: items are matched by considering all labels and returning the best possible match. The language of your dataset is also irrelevant to your choice of language for the reconciliation service; it simply determines which language labels to return based on the entity chosen. ### Restricting matches by type In Wikidata, types are items themselves. For instance, the [university of Ljubljana (Q1377)](https://www.wikidata.org/wiki/Q1377) has the type [public university (Q875538)](https://www.wikidata.org/wiki/Q875538), using the [instance of (P31)](https://www.wikidata.org/wiki/Property:P31) property. Types can be subclasses of other types, using the [subclass of (P279)](https://www.wikidata.org/wiki/Property:P279) property. For instance, [public university (Q875538)](https://www.wikidata.org/wiki/Q875538) is a subclass of [university (Q3918)](https://www.wikidata.org/wiki/Q3918). You can visualize these structures with the [Wikidata Graph Builder](https://angryloki.github.io/wikidata-graph-builder/). -When you select or enter a type for reconciliation, OpenRefine will include that type and all of its subtypes. For instance, if you select [university (Q3918)](https://www.wikidata.org/wiki/Q3918), then [university of Ljubljana (Q1377)](https://www.wikidata.org/wiki/Q1377) will be a possible match, though that item isn't directly linked to Q3918. +When you select or enter a type for reconciliation, OpenRefine will include that type and all of its subtypes. For instance, if you select [university (Q3918)](https://www.wikidata.org/wiki/Q3918), then [university of Ljubljana (Q1377)](https://www.wikidata.org/wiki/Q1377) will be a possible match, though that item isn't directly linked to Q3918 - because it is directly linked to Q875538, the subclass of Q3918. -Some items may not yet be set as an instance of anything, because Wikidata is crowdsourced. If you restrict reconciliation to a type, these items will not appear in the results, except as a fallback, and will have a lower score. +Some items and types may not yet be set as an instance or subclass of anything (because Wikidata is crowdsourced). If you restrict reconciliation to a type, items without the chosen type will not appear in the results, except as a fallback, and will have a lower score. ### Reconciling via unique identifiers @@ -52,7 +54,7 @@ If the identifier you submit is assigned to multiple Wikidata items (because Wik Wikidata's hierarchical property structure can be called by using property paths (using |, /, and . symbols). Labels, aliases, descriptions, and sitelinks can also be accessed. You can also match values against subfields, such as latitude and longitude subfields of a geographical coordinate. -For information on how to do this, read the documentation and further resources [here](https://wikidata.reconci.link/#documentation). +For information on how to do this, read the [documentation and further resources here](https://wikidata.reconci.link/#documentation). ## Editing Wikidata with OpenRefine @@ -62,9 +64,11 @@ As a user-maintained data source, Wikidata can be edited by anyone. OpenRefine m Wikidata is built by creating entities (such as people, organizations, or places, identified with unique numbers starting with Q), defining properties (unique numbers starting with P), and using properties to define relationships between entities (a Q has a property P, with a value of another Q). -For example, you may wish to create entities for local authors and the books they've set in your community. Each writer will be an entity with the occupation [author (Q482980)](https://www.wikidata.org/wiki/Q482980), each book will be an entity with [literary work (Q7725634)](https://www.wikidata.org/wiki/Q7725634), and books will be related to authors through a property [author (P50)](https://www.wikidata.org/wiki/Property:P50). Books can have places where they are set, with [setting (Q617332)](https://www.wikidata.org/wiki/Q617332). In OpenRefine, you'll need a column of publication titles that you have reconciled (and create new items where needed); each publication will have one or more locations in a “setting” column, which is also reconciled to municipalities or regions where they exist (and create new items where needed). Then you can add those new relationships to each book, and create new entities for both books and places. +For example, you may wish to create entities for local authors and the books they've set in your community. Each writer will be an entity with the occupation [author (Q482980)](https://www.wikidata.org/wiki/Q482980), each book will be an entity with the property “instance of” ([P31](https://www.wikidata.org/wiki/Property:P31)) linking it to a class such as [literary work (Q7725634)](https://www.wikidata.org/wiki/Q7725634), and books will be related to authors through a property [author (P50)](https://www.wikidata.org/wiki/Property:P50). Books can have places where they are set, with the property [narrative location (P840)](https://www.wikidata.org/wiki/Property:P840). -There is a list of [tutorials and walkthroughs on Wikidata](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing) that will allow you to see the full process. You can save your schemas and drafts in OpenRefine, and your progress stays in draft until you are sure you’re ready to upload it to Wikidata. You can also find information on [how to design a schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) and [how OpenRefine evaluates your proposed edits for issues](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Quality_assurance). +To do this with OpenRefine, you'll need a column of publication titles that you have reconciled (and create new items where needed); each publication will have one or more locations in a “setting” column, which is also reconciled to municipalities or regions where they exist (and create new items where needed). Then you can add those new relationships, and create new entities for authors, books, and places where needed. You do not need columns for properties; those are defined later, in the creation of your [schema](#edit-wikidata-schema). + +There is a list of [tutorials and walkthroughs on Wikidata](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing) that will allow you to see the full process. You can save your schemas and drafts in OpenRefine, and your progress stays in draft until you are ready to upload it to Wikidata. You can also find information on [how to design a schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) and [how OpenRefine evaluates your proposed edits for issues](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Quality_assurance). Batches of edits to Wikidata that are created with OpenRefine can be undone. You can test out the uploading process by reconciling to several “sandbox” entities created specifically for drafting edits and learning about Wikidata: * https://www.wikidata.org/wiki/Q4115189 @@ -80,15 +84,15 @@ You can use OpenRefine's reconciliation preview to look at the target Wikidata e The best resource is the [Schema alignment page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) on Wikidata. -A [schema](https://en.wikipedia.org/wiki/Database_schema) is the plan for how to structure information in a database. In OpenRefine, the schema operates as a template for how Wikidata edits should be applied: how to translate your tabular data into statements. With a schema, you can: +A [schema](https://en.wikipedia.org/wiki/Database_schema) is a plan for how to structure information in a database. In OpenRefine, the schema operates as a template for how Wikidata edits should be applied: how to translate your tabular data into statements. With a schema, you can: * preview the Wikidata edits and inspect them manually; -* analyze and fix any issues raised automatically by the tool; +* analyze and fix any issues highlighted by OpenRefine; * upload your changes to Wikidata by logging in with your own account; * export the changes to the QuickStatements v1 format. For example, if your dataset has columns for authors, publication titles, and publication years, your schema can be conceptualized as: [publication title] has the author [author], and was published in [publication year]. To establish these facts, you need to establish one or more columns as “items,” for which you will make “statements” that relate them to other columns. -You can export any schema you create, and import an existing schema for use with a new dataset. This can help you work in batches on a large amount of data with a minimum of redundant labor. +You can export any schema you create, and import an existing schema for use with a new dataset. This can help you work in batches on a large amount of data while minimizing redundant labor. Once you select Edit Wikidata schema under the Extensions dropdown menu, your project interface will change. You’ll see new tabs added to the right of “X rows/records" in the grid header: “Schema,” “Issues,” and “Preview.” You can now switch between the tabular grid format of your dataset and the screens that allow you to prepare data for uploading. @@ -96,15 +100,19 @@ OpenRefine presents you with an easy visual way to map out the relationships in ![A screenshot of the schema construction window in OpenRefine.](/img/wikidata-schema.png) -There is [a Wikidata tutorial on how OpenRefine handles Wikidata schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic_editing). +You may wish to refer to [this Wikidata tutorial on how OpenRefine handles Wikidata schema](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Tutorials/Basic_editing). #### Editing terms with your schema -You may wish to include edits to terms (labels, aliases, descriptions, or sitelinks) as well as establishing relationships between entities. +With OpenRefine, you can edit the terms (labels, aliases, descriptions, or sitelinks) of Wikidata entities as well as establish relationships between entities. For example, you may wish to upload pseudonyms, pen names, maiden names, or married names for authors. -For example, you may wish to upload pseudonyms, pen names, maiden or married names for historical authors. You can do so by putting the preferred names in one column of your dataset and alternative names in another column. In the schema interface, add an item for the preferred values, then click “Add term” on the right-hand side of the screen. Select “Alias” from the dropdown, enter in “English” in the language field, and drop your alternative names column into the space. +![An author with a number of aliases indicating pseudonyms.](/img/wikidata-terms.png) -Terms must always have a language selected. You cannot edit multiple languages at once, unless you drop a suitable column into the “language” field. For example, if you had translated publication titles, with data in the format +You can do so by putting the preferred names in one column of your dataset and alternative names in another column. In the schema interface, add an item for the preferred values, then click “Add term” on the right-hand side of the screen. Select “Alias” from the dropdown, enter in “English” in the language field, and drop your alternative names column into the space. For this example, you should also consider adding those alternative names to the authors' entries using the property [pseudonym (P742)](https://www.wikidata.org/wiki/Property:P742). The "description" and "label" terms can only contain one value, so there is an option to override existing values if needed. Aliases can be potentially infinite. + +![The schema window showing a term being edited.](/img/wikidata-terms2.png) + +Terms must always have an associated language. You can select the term's language by typing in the “lang” field, which will auto-complete for you. You cannot edit multiple languages at once, unless you supply a suitable column instead. For example, suppose you had translated publication titles, with data in the following format: |English title|Translated title|Translation language| |---|---|---| @@ -115,19 +123,19 @@ Terms must always have a language selected. You cannot edit multiple languages a |Wolf Hall|En la corte del lobo|Spanish| ||ウルフ・ホール|Japanese| -You could upload translated titles to “Label” with the language from “Translation language.” You may wish to fetch the two-letter language code and use that instead for better language matches. +You could upload the “Translated titles” to “Label” with the language specified by “Translation language.” You may wish to fetch the two-letter language code and use that instead for better language matches. ![Constructing a schema with aliases and languages.](/img/wikidata-translated.png) ### Manage Wikidata account -To edit Wikidata directly from OpenRefine, you must have a Wikidata account and log into it in OpenRefine. OpenRefine can only upload edits with Wikidata user accounts that are “[autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users)” - at this time, that means accounts that have more than 50 edits and have existed for longer than four days. +To edit Wikidata directly from OpenRefine, you must log in with a Wikidata account. OpenRefine can only upload edits with Wikidata user accounts that are “[autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users)” - at this time, that means accounts that have more than 50 edits and have existed for longer than four days. -Use the Extensions menu to select Manage Wikidata account and you will be presented with the following window: +Use the Extensions menu to select Manage Wikidata account and you will be presented with the following window: ![The Wikidata authorization window in OpenRefine.](/img/wikidata-login.png) -For security reasons, it is suggested that you not use your main account authorization with OpenRefine. Wikidata allows you to set special passwords to access your account through software. You can find this setting for your account at [https://www.wikidata.org/wiki/Special:BotPasswords](https://www.wikidata.org/wiki/Special:BotPasswords) once logged in. Creating bot access will prompt you for a unique name, and allow you to enable the following required settings: +For security reasons, you should not use your main account authorization with OpenRefine. Wikidata allows you to set special passwords to access your account through software. You can find [this setting for your account here](https://www.wikidata.org/wiki/Special:BotPasswords) once logged in. Creating bot access will prompt you for a unique name. You should then enable the following required settings: * High-volume editing * Edit existing pages * Create, edit, and move pages @@ -136,13 +144,13 @@ It will then generate a username (in the form of “yourwikidatausername@yourbot If your account or your bot is not properly authorized, OpenRefine will not display a warning or error when you try to upload your edits. -You may also wish to store your unencrypted username and password in OpenRefine, saved locally to your computer. For security reasons, you may wish to leave this box unchecked. You can save your OpenRefine-specific bot password in your browser or with a password management tool. +You can store your unencrypted username and password in OpenRefine, saved locally to your computer and available for future use. For security reasons, you may wish to leave this box unchecked. You can also save your OpenRefine-specific bot password in your browser or with a password management tool. ### Import and export schema You can save time on repetitive processes by defining a schema on one project, then exporting it and importing for use on new datasets in the future. Or you and your colleagues can share a schema with each other to coordinate your work. -You can export a schema from a project using ExportWikidata schema, or by using ExtensionsExport schema. OpenRefine will generate a JSON file for you to save and share. You may experience issues with pop-up windows in your browser: consider allowing pop-ups for the OpenRefine URL (`127.0.0.1`) from now on. +You can export a schema from a project using ExportWikidata schema, or by using ExtensionsExport schema. OpenRefine will generate a JSON file for you to save and share. You may experience issues with pop-up windows in your browser: consider allowing pop-ups from the OpenRefine URL (`127.0.0.1`) from now on. You can import a schema using ExtensionsImport schema. You can upload a JSON file, or paste JSON statements directly into a field in the window. An imported schema will look for columns with the same names, and you will see an error message if your project doesn't contain matching columns. @@ -150,26 +158,25 @@ You can import a schema using ExtensionsExport you will see Wikidata edits... and under Extensions you will see Upload edits to Wikidata. Both will bring up the same window for you to [log in with your Wikidata account](#manage-wikidata-account). +There are two menu options in OpenRefine for applying your edits to Wikidata. Under Export you will see Wikidata edits... and under Extensions you will see Upload edits to Wikidata. Both will bring up the same window for you to [log in with a Wikidata account](#manage-wikidata-account). Once you are authorized, you will see a window with any outstanding issues. You can ignore these issues, but we recommend you resolve them. If you are ready to upload your edits, you can provide an “Edit summary” - a short message describing the batch of edits you are making. It can be helpful to leave notes for yourself, such as “batch 1: authors A-G” or other indicators of your workflow progress. OpenRefine will show the progress of the upload as it is happening, but does not show a confirmaton window. -If you have made edits successfully, you will see them on [your Wikidata user contributions page](https://www.wikidata.org/wiki/Special:Contributions/), and on the [Edit groups page](https://editgroups.toolforge.org/). - -All edits can be undone from this interface. +If your edits have been successful, you will see them listed on [your Wikidata user contributions page](https://www.wikidata.org/wiki/Special:Contributions/), and on the [Edit groups page](https://editgroups.toolforge.org/). All edits can be undone from this second interface. ### QuickStatements export -Your OpenRefine data can be exported in a format recognized by [QuickStatements](https://www.wikidata.org/wiki/Help:QuickStatements), a tool that creates Wikidata edits using text commands. OpenRefine generates “version 1” QuickStatements commands. In order to use QuickStatements, you must authorize it with a Wikidata account that is [autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users) (it may appear as “MediaWiki” when you authorize). - -Any dataset can be converted into QuickStatements text commands. You can follow the steps listed on [this page](https://www.wikidata.org/wiki/Help:QuickStatements#Running_QuickStatements). - -Under the Export menu, look for QuickStatements file; under Extensions look for Export to QuickStatements. Exporting your schema from OpenRefine will generated a text file called `statements.txt` by default. Paste the contents of the text file into a new QuickStatements batch using version 1. You can find version 1 of the tool (no longer maintained) [here](https://wikidata-todo.toolforge.org/quick_statements.php). The text commands will be processed into Wikidata edits and previewed for you to review before submitting. +Your OpenRefine data can be exported in a format recognized by [QuickStatements](https://www.wikidata.org/wiki/Help:QuickStatements), a tool that creates Wikidata edits using text commands. OpenRefine generates “version 1” QuickStatements commands. There are advantages to using QuickStatements rather than uploading your edits directly to Wikidata, including the way QuickStatements resolves duplicates and redundancies. You can learn more on QuickStatements' [Help page](https://www.wikidata.org/wiki/Help:QuickStatements), and on OpenRefine's [Uploading page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Uploading). +In order to use QuickStatements, you must authorize it with a Wikidata account that is [autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users) (it may appear as “MediaWiki” when you authorize). + +Follow the [steps listed on this page](https://www.wikidata.org/wiki/Help:QuickStatements#Running_QuickStatements). +To prepare your OpenRefine data into QuickStatements, select ExportQuickStatements file, or ExtensionsExport to QuickStatements. Exporting your schema from OpenRefine will generate a text file called `statements.txt` by default. Paste the contents of the text file into a new QuickStatements batch using version 1. You can find [version 1 of the tool (no longer maintained) here](https://wikidata-todo.toolforge.org/quick_statements.php). The text commands will be processed into Wikidata edits and previewed for you to review before submitting. + ### Schema alignment The best resource is the [Schema alignment page](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine/Editing/Schema_alignment) on Wikidata. diff --git a/docs/static/img/wikidata-terms.png b/docs/static/img/wikidata-terms.png new file mode 100644 index 0000000000000000000000000000000000000000..8327242ae7e0663f8b81b5447885bc749ef1fdbc GIT binary patch literal 24677 zcmcG$dpwi>|36MCl~ZRqt~0L+m7E#US)x+pG*Zrom^qAXqIi{1$@wr6=Cl#R943k7 zoS0#l5@WN;VGf&ZztQ{ixqZI3-}iHTf4+aXUAEhGJ+H^}aXpUr`{UXRi(6MEcFFD% z5)zWQe(lm7A)&1(AtB-Q9oq%pj5I}z3l75IJ6A6Xp}Xa01Rq3Pjm?aOgv#T^)}6Kq zKJR>T?EzRwNV0k3Pnh6~dL$$SVqd>xY#Hpp7^M`$tYC9|I~HYiy{3b>a{k)<6dt{GRf#(0dvuC*gP;}w8e#_T$tS_w!5o4t)9s69p5^kN5JS%CRQ>+OR*&DPy@i()e`Jf{JGW)@GGoCD7t4%N?9PB zkdUsC(T0>l0Z+p>_9!&HMfkwR@!Pg7r#6n+Vj`9sM=qArPYb8LH7&Q+x4iUZWXVM4P9%BiDu@4m$ij1OZ_MmO&|h)9<}@fR+$fC^bY zBey9At97AsC<=Z}j8)-^v;geSSy(kSyzPr$%U5Y!O@;EtKqy1#6C2+6)y5JyJBfVx z%q|pwKb7vYxtZBUiS7LyTC{mXsJMANEP4(JRRaQ_w@Xp#o?v$QHlP#{ZZ_0bXUgrh zSEuklIlY)dk|a-4L$H-_`XjI~izh|AJC}8FU@IW>3b$QFH=rwAca^9PNRJ!c8n29R z5EGm^{>|%5Hl91GM{A9hqqQFT2+Yqqk|PV=QUqmKD#R0*!mQO8CPZ+ikoIo5T`1q; zIsN7ET0d%IF&|FdIREz~0^A`*E(wSFnZpK4bHz5F)u)`Hn?_usI?Y*X1dZ@!o+ocTaaNM9EET<|Bbj!Da|^sHf4tj z1uWCo7HZCGNm4@W8v^Fk`T>npV-7nusL(KM$+y3wF>VF>0``o_XA@|tN`QNaRaJ!2 zsxd1|%Ag3iGTT_xntz!QF3XE3TuMGZ$zF51*Io-nH1bHUM5-e9H8h?t(PY9$titDdE zeH&Jvtx4dt6nT}<<#q2qCdU_9>^i6asH$Psx7&MLf3-uHox!gpO7QAV)KMY7 zyKr@-9~sxksNQWc$B$-r7XUQ5uxazTLO_EP|07M$cO69I>kp7eE#gN)Rdkhn3#F0Y z$Rid~2=)UAuWAMd<*Yw|5c*fL+0&$--UF;CC=+w5AITpztfRh$F9bE0;}%zFN=zNW zHlOSmkU~xQBJt+1dC&;h3~rBI)4DxZEPdZ+^vDi_Rh}Mgh~$m>$Sfw^GMr6H<}Fj% zoEpQWiLKDJSVDi$&v0etF2f5V^LfUl`zR) zFl0wsJozka;!Wu76rMDzWd#>F5^UF?ty*W8@mXrn`#fT7iYJBir!E{}4OwaKzl~s- z(+c&JXodW>sL&&inp4ksieRnyz!4INo50SqtyF1TM(vgM}Dp}Ac&kNaIe!yNYF!dP$`c>xfj%`+kz~YLFn9FGKuOv zO{-ANpwukdW6S6={1RT^4*Q`evKr0Pl)q#tKFIBw_1_IgJn{yH2%ZH5 zAF?wm4_WWN<?hC4dwW#s*=<&grywbIgVRCRr7tH&Ua zLaF%`fx_*p$JXHmf*BbO>_GVz`NBs;lLHI6X8rb}mNJK_@47_#o4APEAo<2!2J*RT zJW~$8I%!MM`qNLd;iux-N~0v3F{?rSj_P)Cg-ML!zIWidrrrZbnkJr|%JL@jHKX1Y(p4 zSNB=tXQOiuTCl8OlfH-G<2gM=9#>Sh^>frvi4+0{k6LJ?{dJOu_n7Br`+U16YB^7& z97eg!Wz#eVAq<5MnWs+%5hKzSrE9uAhb)^}{!LY%VUXa{eBXnD+fIKyv9*XpE+97g zdBiq8b5+h`1u0k*fZo0ndrdDE?KfV!M~RMlXSeQ+YvD(p`aBh(*VMN*)IB<3bS|5H z)=KPo9%Pp=b~i1|!ffkAB_%`{va8You15h59c*AwiFc2Fx=%m0k;=!7>`rA5H^zpT z7&tV*XzS!;21S~f_gl#(P|@PCQOQ_Se~_MG5wFh*beB(VIDb_=TQc8pPi}`vb9pTP zXyILP6x)^r0C&(37*?sYAPKLj@!jwrb$MJ;t0s--Nn6V0Ux%@_vS=kO^!(edfyD@D6Use(&o>4I{uY*$AZ6-%_J%aqRdD>wNH`$4)s9!BNs+>0N1 zExcNq1H+F+_h-(nUzNQx$G2%XTLKmY&qa9OlQ&|~EttWmrrW1P2gA;uK?9z*TyX2& ziK;Di->nk$Fi7-*bmvvSzYh#EHcAY1^fqCZ0*eeCnXNPnAN>*_mxU|@;k^^O1!?9Q%t2)EDt5jh)+HJ7%c%=SAgd>%oWbTI9bpIbH)2wit8x$g1K0bCU|VE6uL;| zXDiC_0Za?vx(w@k4~C?i8U+edC8*-V?z2$h;xm-Rjy|9rzMJS+z zooPoR-^C*qTz#2*8}4eV5-}G5jfZ|KT^tVueo*9&EYkGTUuh`(+$>a%umtPjed|rR zOaS5zw`&n|)rq&@HCPgjy>MY_V8u!O)X)=an=GqF=hVhfUD?QTFhCx_t5WE>rGNIpFRuG&ycw)<`z%Q`N)sUbIYwcb$#ntHvQmCqi z^qn!O7^SO(CU{q>UV-35?KA(?O5W+V5trP^7v1&zVvdFK{>?+xotnu1^#|^$K(|*dO874hY{HcC>D=GcwS-t>v4Es)(thDdW>; zKzc;_NY!PEiw5~`>M^S@>9F1c1tX|C)Muitd=8L+lRhP zN2b@c{DjU@3D)4bvA6ty%~Y_O98*+`+b(-kKtn z&%;TTxg-Q|{+e}Ht>2bow%syRiNw?$OSW^F2~2nNMy5{ofvfc;adGa+(+T*{S^CTP ztGEU~Iki5-#toEX8VBT}`>Vh-!F?^FEqTIN*|VgbsJpg{L1zjA zUlj@8tG-Ms<`e1DjA#S|LD1|c)00JoA%6Ug2!GgeR{?&SpOE(Wa~}Vt&+5I=-#$eH zwni%3v=?i|oW!X2>mEql@NpA6y?~3mNu0hCAff-phg@c`p8tgU~ zbNs%mV1@*JZc{VMl`L92adD{fcm(NfIIT)7;=@jFr|i?B__L4F^xpEW%kys5Eh!M! ztFD|rn>Km(Q$L7huxps)fziIz`j{=Xq0<4IAN@P{QXYpilh*;3E9|r>(f;GXQ8%^w zfB4)sBHCS3|8aZ!wbBEt+ns;+pAb)8u3m0`{E=Cs5P9Dywn~QMg_Im-V9DYP87rIN zO;Z@Wq|Nf$-NifUun)-ixMIll;7JyV;h>r4rX9Zj1ijt#xWQ5+ch+^tg4VJAWLWwS zo-msVSRWT1axCh=N8OG6gE@re@3&ad$?*6Sp9a-&;k9!=EH{f1p-)*M^wR2^*I1YN z$bNeGKMPX)_%6&!?m;+k*E07(<_eh4Geb?d6Ch-YX6z&eWYHL48VhR3&Z=Fkgc9}F zQfrk+3Navu_tl(wHUWeyu+5>nVVi5$L&~@a+M3n~SV{YeN~$x{G;B7AcB*^mv0Y2Z zSfyeQPQTI04^7RezNmLF7IS*3uOJ#c=wr2}TTcT&v7ZWUK%nzp&(3MaBOa^D-7rLY z4?r@3a>CmClf_YiTS94=Xss00h`-Y<8rkYABT!Z`^v;2OE-riHD9<`I_2v?M$Qe?K z^bFCU&YxQEE(4&mav=5icq66A zarAQ(dZ3>QW6FlcfO*Kx`r(1dSRwhB9osDV+aI08u68vy3xg6`f z-ewQEQ_|R$g8a^AxZeEyBfWTR(=PI5bX zAzj)7#Kl>5V9SK}fDT5%m6ANKEqNIfosap6dqa=Z_wm8a+@12n&OsH*J1Ywyx0W=) z125>Z$0v#dW^*TIY2UVEU-fR1W*-Y5AqqM>AaoAqkF(lAw6|fj((F1{ zzj75m?cFjjHiokCYG{*7&8Kv>!fPj#Bc5>Ixuo<-PH`U2UNAwogYMC|cxR@42blB$Aps5^=q6#N;3w~z z89iftB7z|YnMA*E*-%Mv-8J6{Z>CPq8%Qd=jr)c1u16hqEs5j2-!QH!hNB2OoTe4G z1gDlHAVHsSO7LeN6;X)VIU`cx4B?dOYpjrsm$vR3iHFXa-QodgV-%}vkEurO%Iv>)^N zvw0f5vd5&OU&?!Z#%($B)(Zn5j@N+9#qaAnx z+9>c2vi@rl7F~Q^^^pMv4jD}5>1I4Iv}$B{`e&GuJR$XYQ})p_I&L7)JbXJA{7v65 zl`FGG^5%rq^*LjS_(Rb2VSjHAlWGs$|qh1GYz&^s{!88pUMMfHNwt7JvK4c@*R zD|cE*vh8!Yhf?Cm2X(WW(ia8zdd(2m6M%qYEt9E%hZa~e2~V)4rOvnCS9>;=_irk| z1fbO5J<)tw*#P*%DpP)x`oTWXmg|ye5NE7MBlwMl{Dx)5jjlF5YGS3*7k_#NexSdM zyXWZPd8d_a!zfh)jV-5A?J5mIGZOp?N%2Sy^ylN@0PeGn2|_Gh&?>W>8>G0Hu~s6T zQ1pPZTppZ3b-)n)42f^vWh>0+q-be&P?g`2+4y6Gi=KAk0im5K7a08n_#k!BJR0z-n6%hQ*H}rj zvT<=xQwUnAsRA9F#qzi|=Kj#T| zGd;y@@d){tknqU!r@KluXV=MF6(fyKlY_8U?+QSgDf#j6xgAQ3*5e~yp2e6G(7FZP z#cq?B*=Sd?9^dR}P}qfPWO*!8`-NsY@Dd@!MzQ*s12d}!?{Wz=G2sgwoPk}YR^GN6 zq;7;i`$^|fYH$Y&1g3UYS3@$ozz?FeuSR)QldA_o!Q8Lj69fj{eBwl@k~u~Y@`eCP zJN7aMW5Dvuo{qzM*xLw$#lJ$~I5qm`cR%zg~jxG&t#e^$>3J3=Z_Od5Amk4PeU)<{9 zeDp#>a;lxpzMhO1ZDu~uhZ=dB!sfFdPQT2M-&dv3#LoY~P&WT;Q(Bsw`HU6#9O8l$ z4b^Y=#rqka!8P2cTxG6m=Odfv|E2_U{cd!d)r_p~jUPYsby1^|ybgup>LH5x)E|)d zD|?UyHuiZy5>2g_^%b~dG_Lv%U$j>Tqe%ULfIe6sAx$Wzdwyo9)xdvM%}?B?FmwYO zn*-TBCoQQGX1>Q(-W($zNr79%!R`bNMhAy^u;?M(EE+qL-uIqvp%{cSW3YE|T#r+g z)n->Lmi_T0awS`~y6K0_N0vQ%l?+pnL`Aaw(t0JqXMQc;DUF|jAhY88S%)_KTu|ga zE6!G$+loX<{p@i+*vhE_c54%`Bnct+oiU5qmOVY5Z(@Rk2}wA6pvH2!D2 z)_-wMAnzy2UzTQH_Eg^gz#hW1Sf6y=JIl7fkv=JK9~RThh<~dSCM|2y>Z*~B6n_Z z=)69PkU%eZdInT&ujV#!Z4TqFRKDkMpKW>_Lb0wo5pH>@*QId*qzDj?JHE$rAm{!~~s zNMYcFBz5EbvMdGu+YbT&H(SwsAbESg$LNFi@CI-B-w|95cG}R`7_okRIc8anlzBM{ zwsFDWE4qvY*XoX3xWZz?v8Se2VCVN=~hNgF7kc=%Xb}Q1}&;iII7_4-pvwB7l zg>ysPYHhQLj@ai`}HY%QNAe+vKA z7y(Q>jX@Uc&B3n2C=Hu7Qo|b%N9kUp8O>xBQBat$pK`SB4(80!X98|rfKom3+}6KX zfGi$6hYh-9zt)L^-KO|)zm#PnxB}bnwL}-26p0sWF0mg60VlSUkHu4{;~XZWFW-qD zXy_z2fUDco>mVAt)nndgrbEpvCm=iW{R5JSrQ8kg)ukZE$I6+C(MPOD%1cH1%g0a> z#N|zg5A#hqSb7xBArgPcs@y)nUbE{Sg^&|5RPwKCYMeM9X|G~IN|I^*C z0N8`A@p(J{4X&r$@-ePOm=&wVmwx{Jj13dA>$yMndHJ;*3(35*l-tU%{aa4ub-bT+ zD6mkrvON^Df7hhNu0@6C->)9YzEk_Ze}Saa8T`Toq8oopAVIPG=di0evnDeFVy1%a zi+Ezot*8@V@JF9|JOmtqZ;l4<3x;Et*N4O~$og5&2S^W49oJ)=Are8}8sXmZv1 zJa7W{tlTU|-oLOXsH%S9xl~$+Dp$O`bj^9R=gj_g-hqGYuljlTjU2mssp3q9_(?u0!#A0!jucNfWDx8G^|e*Odj{L$Qy#Af_j4Tz6i z*=9J9yBtPC+@!}tdQwD-=5z#NB5%O(axeLIZc`tE>!TO1LT-6>MmzO#4lI@+r6x1( zWk%nC43Q?dY-*o8`~?opmMMXFDZgvjh3XYjXs-t5sd@38npKmHo6i75-`fM32zmV< z%l>?)KSAD||6S_Ieo*BIy~>McQ=Tw60?w;;y_hF|MWC_BkUnX3QMTXrLwDz;w#I;c z>j517jH{EYcDe(TRX&s{lk8E=m8_#H3aSD#;@qb#W-3es;;&IF9eJp3_fp&m)j;$) zu;c!tZ zSYlB_2m~aNui@jW_G71rAis#L~_@pOK^ffN>;2@rW);pp3vlF_n(<_*JBv-Ou?%Sj2yNXIq8R;Ix!&g?-T|_m4t8lAgZPmWJGC&sHBQz4 zRy9X_K%%ukZiOw4SY_mv`!vp+ulJdX3--F|-CYmrr+}6Bw*wPr!$>P}_S^au_8WXEO8A`in9ONx6*z?; z%A>1hSFAh3%HCTjzq>dzw-$F12b_GerD)CFoSiWLlXNLL$$O6f>19%A=EWR4@1eQ! zYeoWZUF?AO1PQ$8FHwjnWc^g~>2%3>@)-Zf0hJZq=NECqA1Tgn1Yz1@Dw&%#r~Hn7 zSMpT$o#g^Sbogt3JMm!a(dQR+-fg6prQ17D?^Fdou(`a`nLEnwO#bQ&+J6He$OtnZ z|8Ezd_5aMiy*Bx03#Wg7cD|*c-Aov|A$n^*L~EbLM9*m#xhda(hdo=0 z99$1QafN2DAy>aV?UT6QIJ@*{)k^Y4BR))@d+E$*fhWnm^w#W`RK04KpRqmH={D{v zr1!`h_m!A}K5;rQjvPhGg!L&i5ZAe%B#Eu>5nY`vOq+qNEndg6;yp+W4kuUg=r(8b zf1S6=FQRttbht1!3g?0s;SXtOtQL)0S@E136eHvXibm?)FdR>t%6aPUYmglWQ#bbq z$0F-GcSYLRQkM=Wlbgef)cg&@qpTv3tAmphjtYjaz8cSn{v9kY-fsN6{+?3cWkFw$ z1}7}ld*;>akGh^6QkrEC(XBw{#tjRyOXM#sl~TZ9Ce37HO-66) z{jx*NRNcf(PNfZx9H_<;t+LT|;kvxzSCag`JldWpo?yJYtTS9j^ly2)pXDQ0-``dW zJeHQyyjX(mCcpChONS76QL;)VF=+^ze?4ppQe6XgylUgzdnTFWt-t(baSDQjD}NO5 zj2-F`tcgGg8|VH8$}|vjzZqOHAF}L(l;+%ybtoB1b==(PF|00 z{w=%fhIL!;XdW!(3d~aQwVKF{d5G@*c zr_u{Up$$Zb#*z4$4i@0kJ-2R1+hUWZVUQ(NN#!o`4PtZv&~5fZ7pixT^m4iFA$eH$ zppk;GhV0_h=^ZYw+gOJl!OE$}v-X?GQ1or6rFebVF^}s?;82woQd__%k!I~+SSF_4 z`ty6ZO#{<4%=iyx^MGdVdCsNkGeL76PHM1(n zQfsA|)iR{-@4*@T@sub_zYQ_$49m9W(fjL}=*4ZqSlB4Z@PY=xa{cKk>&A0RkEU0o z7sZx3;eVLJzv~oYm$%PpM4AdzGLYDsA2J$G9gzQSJtoipYf&cm?2&gSk2b`C#o`9s z6vVQb&~5#zU1Oz&D$zeAbD7) z=GVlJRGCgc*l>of-{Y5{2j!h2p512j`Ys)e&z3;aC9c^~tyDdi#^z-gr{GaDz+1=* zOyLRERx2;2wcUOT>@Wl==ekI+)b64GVKb-f06ywnh7NTNf$2F06E{pbQk%k^dtunT zz3H42c~6ON`$t^y*nkdfTf6QqleoH4h$QNlAK#W&@YeoY>Z%spOiiyAN{&kd-t%nZ)+1{uds#$VVMQzEgK3SpdZJOeIB_ch@(~8a&&6RX(Aqg6ni&d0;8-W0vbjc?PK$D z!#&u_0}AbxbNB6Q6@x}PtKf~LJSXIzdq!0xHk!IO_kPq%0#@Qa9e7r4CTb}?_T^=* z0B|Yf0!z9jQ11LLD++ZR@4F?E(sp=5LBKcO z$FqlDYi+NZ1zz5LlNr^@b$`~{x$H!~1OjgmKaW4V)xl@=Ho0SqH)A;+BE>~#D`)Ed zb|IUJ^v`>ie*79WEtEWBmJ#8w?Bo0`4(e9vq}dmhGk7>v3>Ap%t6ikEZ|e!<4hC(q z%h}43R_{A`Htzd}0}l6-(IKK5>$N4|1H#y0-yzb>VC;``JF;1|FNGn?egxprO(mW% zcHAq#?J*y>zYo_v8+6;8B!;rtd#57DN~FmSk@T$f`~0C7cp7BlEbar^4w=q!ru+jK7oXsPq0Z`ettLg3K-(C34k& zJecdi=wVcQ6OD`ib>9}8g!9XMli{;@CXPBKQ(HW0Hb7+65}lLTlMz{`+pHcoPZr+` zv-%!=-##i(6cKeLW*M-gGF#lxOph6h9$o2={{tfv@PA14=vSDo_s2FBGX8pxMem+X z{XY4+12w(le>VQ_qWv!;HJN;aBrymyj9hu*ug!9U?vYtYEOmxJ_ZV(Z`eT-6x&fp2 zuM$kIjXv11={`lT<6H_4UCyxUo-V$>D(Fgk}|mx(DU{rCgO%z_|cYQjO}z2@?lSlhj%4z}4S0tKR}N zRO)l^uqY?dm2amOf6?8kZlqj>1Kdt?hshuOW0Ctru-Yl^nvb?U?WOo(&1oHxkcJyO zR8(U=JH(ZAJa2M0^wXPMya{+aIT>*vgT%aK1?I50EXsU4tk(b^d&@raFf zvRQEI+ulFI?z8egvBBzzqwfB$qPO&bh5HkN;FAlhez|jg>5YC~viDF7N}iQX`PX3%qf#f&pSiIXPNU0^ z<@(P(;Zp-f=1y!W8Ue98;vOj{#*<-4e0#<~Wb;23_$(xy#$3W)wD!bJT`bz^TKkT! zBHI6SXs)cm)@`ncy%)<(y0#5!P+(qY?Ah_SzKk-xcyfN4|ApAtO;+f5@ zgh#1tvjl718^gvnF2wvp4d2Qd^ia#i-+8C;&-zUMgqi>D>n!0%xzu~VC^G@7%%WUd zrRMM5DUGLxeUkZosx!#gH(Zq^Sne#vAQ9m*HMJ3olVqF9s}RhS)|7Ymk(l>tgCc)a z9iXtaNFrpzvSXVazfruSb;)t*w{HM#XFmN!6HG&ETc>-!#XAE2#E176ubFP~2vB+= zZo14mXOYUtr%v6<2a!|~g0>eN&R?!v~`E>*DVlWzt?%lO7=^Cjv$fUviv2W`lZl{PR z>ETEplngl>3l)524!mkL=RDFK_5QEp zGs%yXhc5~Cf9=LMNeX4guL>PLx-x6wI-IJ5b`TVCiY9pAv234@YJEYrL zLl%dkM?;&*?c zxusl(tYRmtL2SXWC7_ zq+^?k_J3%9`i<$zhI*dNX4k$2wn37kbV*|;4%&O8sL*^k`2S}kZ_ zGj#-mk*<~W^`&6zxhk6zr`vf4+4w!uw58Q5cN6pn#;{c?T#<5*>QlJK&Xk>Um7e}x zw}7l~Wfk+s-t7tD9wy&<=Xc+kDBHgjl%7M_UY$#Sje04VaFWSmR3u7f{H$@=;vxU7 zOS27EoAoCl+&=Q*gs?{KUCpSS9QGmY&cHq}Sg8*~PioV1y=&3Sks0YmbI>Gqo({K+rL| zu*Q0m3A4s)(av7wj|J^V{0@q+`0+XX+JS|*-(Y1?bvi1g@XHnaV0-FS=g(&TO{Uv*XG9E zX8^xvc2y8G`rsd1SO83cTVS80J{KuPL`j}!HaAM{DklCOIa6kQRPtEv zFZ6@`)BeJR2h%E*wNuCU3j8(hU;WiPM(jb;d}ZFjLlqHFSq2JQNF;b5j&?!R85d^h z9q}ESJw)Q4Lh16)$j0L)g}f)o5!l-!7w4MMp;fZWd5t#i14swt;P12B_BGmV7e8Hp zdg@X0!tv3QlBYj5_BqQvdR&)(p5=Fay6t(1^)U7Nj5d+{#=zxPU3`@R4;xmO;WFM) z+4r^4hdk9|%B-1Y{F|+%0ary2ZnNZC&zbOxt9v=}JR;{%bw!TTu08A%GP2(--b&aK z!*w^OJ@Acv6EVo2DH@4yb=ulTzS3Rrk0N4p$!df)`u+igmJf0DakTxD5q zf%V;QNHFee^T$)&*yDiL#%JYrHSf+gX{H5O_lSrpoQ;BIWj#A`TJnIg+t+7r?NF#~ zvER=x*3RY5VuJ0+nDPrtcIUHcu>0|Ma}}_f_sHpY=OTBO8=IV*+@0#)cm&iVFI^F4 zOQm`INeM^in?SoR7D|Zcf!;n!5jy)^hI@~4Oy>O0aK~k}xo?vfuiIfH6!WrgOWr*1 z*2ZfmpwS=9{YOvaTpf7ub_TqtjUC>js%7kFhwCMHjpE1TZLPvO8Kb1{fvL+sJEZ>s3?^u~{tGXm{x=pVnVY>dS`V>R$aJWY-aWo5By`sFsxa$f zrXoetcO3Uo7hG&#sE!?eFq$$Uh*Hs{XiFSCey#{yoz##YcK~_YZ#1J+^PHrX29Pai zJmY&<2HQXJPQ5DlM?Y7oJ|+5)5twLb(#CUWv3t8FejqGRR)k)CTI|z>6V=u zBN#0_D|G8L`C-tyh*vi#%)y2VYlYlyKtBu|ot07#&?9);H{i2=VhdCX0fULH5S#i| zR$#OBj;TfWuoNj-oYWbf1`cGEPn^_FS06Zrg>lwgyjCohF0LFHn_oi(hS3P0(06p< zk+F(fr{uEF$R=mWM4sP~)cOWyka6`-$y2S127{yE_SHR@xn!*^w2U4Lf8L>F($q^G zaO|U%d+b@HVlvc7A~HlUlvtrQ_`yINHNWkhkF?!4=Fz0B8tXe_>c39FuvIOpGhnBl zd(7d~ps|h)ThhHwH1cMW&AIFWbVD9%Yu&fnTVzsjuvyIfp0ID{9D`nagm(1Sl4sPliKRe zH#+6m-FW~o)PPf?A=tqcxXOnkOzqdn;n;nd+xYaHiuFS$++;Z zla+%g<`&xOUGdxbF-q!xzm+3Jt%#X`vlnIe>rY4^v^h(q8=zW5$WnF$$L0m`eE5 z=Y}dT{QQ~Xfy^I>4)Z1~1~|0aK|w zcdV^3Fl*3HpLm*3z5WD}8d6_2HyvqjovcYDDH2%D7KU(#oJ`}oe9hQ_w7GDTV}yjm zCWVT3+kz93-Ynp3t!uFph+UlP!kuBgP^jp$Ug{7-MovI0oI>9E?CaSh$T1Vkc6;Vr zEz*6{!G11?D?A7pNMXt>7JG);sS*VnVeLQ>wSKd=-KS*5RG2T<|2mmCV^`HZs!O_j z3+kOF)!&AFPd||0oD(-GfR`|k%z8NX@BRPhD$iJYK=;=;ipM{SP#r{PYQlxOxuHex z^T;bNJ2q=Z$dr}0lc4+#1z3=!O((V#`}DL7<$c#Vi|N>j4P*}BRQ>(;{U}R6{(%Oy zm-B=OUPs$B4|@mxZ%3S;2mE&CVEmZxdec#qDYM|*fO9EK36QeOThQ>*&dCxSGXvYX zDTGAlZ(d4Irhob|zaCDJc>ZIsFXwBgtII5H-f(tt>9}|<(a&HJ{KWfaa#AX3GUH8% zAj9>hc%EON#Fe)nIe$0)2ngg1*GO8~mR`XZCN>z)OOM?)f@ zi{58{GV%uS$s+wz^X>|fWsIhI^7J`Jyf96m)d&k(>zhMvQ;Rpwqw{-e1$^06e@VAE z`bfqek|!b5JuaOub4n|@a_Z%PUBbk$9JY-YZV2v-y=kyj{c4JQX_wA5?I8zo>$HmsmC-!#Z!F4 zKf2jLUBd3Sz7aRPqJ6NS=S%{-rn^fftt80Jb*88D_@b5ay7&pI6wb^Wy@TU=Vaj%6 zQ17j?(hxL)W%h>kD~?E9+D4}<|A8naW&U@I^ndEdAO4-Z8=zp(qoz7?Oz2b23(8g~ z%-nhy?Y$hiVkYznPglxz*lY#_16l8H#rqcL_ThXhT|FRl>mlN#vt5vY&Jw^vWRJ&c z844dsB|fHi;x@AHL6VZiQNopV#j{icCuwKTnNZ?)maPXt{FCM^g&J3UoO&EMH@<*& zGHUtpjS8!|6&I})lWEqP)!@>nbq080OGnE@KFP1L(w;mk3YI8?g)vbRi`tsG&EX!R z1FLrr8!mME&$BJ^jsFt8bU}N{E9fN2$&nFTk?2 zpce2klBglb9e(SSK6=?6b1W%!`73PV^=zAhWoU1^ad6e!$oHCuZOCm@A06MyP-Eoj z(AY1em;v4D(#aWMS2Xq~_S{PB`v6$$CRz3L@J)#bDub?9pEWt?78^aVEGKVE1h#VC zvTS~Y-U{W8oO{PASR5R2KDbg*K6jbx0{`sv&H0PN%FH3RxWcK0yoq)+pc9a#h`HS$ z5H;GWF+zs^Gh}ybA23BTu7~B1^9lJkI9igyPLxbrda#etqvK>EP(ItX>6wiAYa9BA zW|Au=vOV3{FSXRQHmQa6pl;wGNow}fPtU=QuK9;v$2)9$uqv^oE!@9F$OO99RL+AA zF+#~uPbMGtSJ9oK#-1pB&dxV zJ~0@GwS>g);lDk|_qL1ICh{!L0h3F?wjdv+D}1N=A1klP2_(W-}h&}*&JObcuU49NOon3ScWPwgAJJsxcBR%K6gIF)VA(E1NqI0B zwKYawZYI=-7vj$GjxDW;cdwhRe@|*_r2Zl`&gwr-rjoxiT|FrYkEsW1;_}Ng-}#*> z(aK7WFAh*VgJ70PN=U%RpcJSqz~Y{H>c5WR*Zo3%Gc$dV_ZEjHNafWXNJahHUb`P zK1?R;n>&ayan~eHLMN%Cr-SVK#OogC5cYH()50njbe-(dbC=9ozkbEJ-@bNU8e8AvHuJ6$Roo!1vmL-?PUILq001hfQQc&H}v zw+@*oFr$QG=lC<+S`T(1+&}LBqcU8iBa=1Gdqrdx#_^?8K*zg z!GH97KmSh1|F8dG(0^xvQN6~RC-K8cC1OI;rpdxAFNp1E{fUlp-N}aSGMjS`4>EDm zfm4F+jX)|r{b%x_rk41UEC^SW|(42-MSgu3Pr zY<{};=u#yt)RA*egM=BMX9HkaSxl)w0lE|OOHGuBt>rv{p z*N2N5;&bpSc768M2{&e@j<7n2hb|jR^f_$o9DBPYy)3G_Mgd=gysKzDIXNo`U|tXX z>!Np>WHN5<6sUt9QZZno6DTvfKfH%iY_i>?UQAx)`$(f?xbzLlBS#XS77>CCBS7qc|RE_Q9x$ZETR9A@tO;yLV;`RLBZ>86j3 zjw#JvATjFE+k@j}R0p%Em98f3xNP9RXy@k_1 zMOf2QbUWvKCa1zqe29E$^h&H7##kQ>0IDeg4j8gaN$#0auXH-FCl83;i95hLFdX;$ z(qLkbs^YU1Nr%`s8H1Cb=T9ftju0l?1Ti*PXqq$71zzHIfnRg*qY3Z`@@3a)z{7^z zBR6==I;@@a-R&~odHw()MYgZ&&g}ANRCusNx07cz7G4o)K%r-m=k`0=nIp7L2FiA`ZOAx?G^}`Mm2|Xu9$L!9QE&&i(@omBlH$LdfG2bwZ0vks6v%$&;rARzt@o|UmSk(T()o+5W| z=C_`_DFyrNRf&eSkGa!G&M{Z*1C`n#WaZ1T<3^o<#QT%Y28CmYU4!GsO#gi!-)4W; zFS}N{M=4Z9O-2HjJu+hp(WhOBM`XHvSuzUA z|FZu1#Z2RMQ}22G&%o$i7e>w1I`kMUrZG(zOtv?dm5a^#s*_>2qpq*RSq99U@h_qy zXYKXIn(5yTdOy>0o~y(^L!`bK4jhQ-I8!4A31u6*aiNeAmO`b7E}hKy6L-Jl|Z{MtTmFI9lIHip#zLwx%3e)sk<^rUyA2TGK0MOEbLd4UFHyL0cdMv z;<3lSEBN)_IiZi<74t}!oh~|+U?bvk(Rg$g8fWJ+S6LUcwAN=k+ZB>=PooCstayhZ z8mSUDnady#7QYBI(ib^|k+xq9r02OQ+Mj)%3<%1lfo<-8FtbiWNT8$hl_9z3UEPTd zcF{S6=qN+|BIoeyv)>%qS`69rTx!$$kg>>^W?6)&Pi!Vt%J zm7GKqs7)UwYvkC;7b95^3%ow4T* zO*NvQ%a@o9x?9iod%1@Cb2>sEF$^EE-yhE_Abf#iAqNW_X0Au^VjHvMvahPM85P-0 z4d^s1DOkXNW{b@M!Etsj(`YV;vm(^nZ|D&HRGnD5f*QPf>6o6J!SS zi!2PIo{PT2omA7g5><_ZB3dYvc}+0+vU$@&ZqYthhg+h<$B%bI-<$VIR9hVbY zWtWIPa2@ENX1_m?TrLepMI4<|7_F(h6ru;~i%ntnpMunm4Sw~!i2S;K`GOhRu!(Tld!1*b119aBXgT3wHaZ7kts;kmybo)KlAMD%Bngenu<3`#Sn=g3~tSM@S zIloKuZlU-@VAG6Dt;#p0b-$gN0@!%x3ET~5TK}b!COcgm(Q=X46UYO z0j99U)}=c<#wj0HW_7h1np?uM^=GYVtS8v=ZG20zE@>9u2D6N=I*2(4@_pX?d^nOF`v$Ly_dHL~A+1{!6vSkrUw@v0};)J2j#?5=3Qu&=cbI-Zk&bk`Dm zAUnm1sOM>2DZ1m+^g7AB1eH_xlG z6H659nB?{ZNc*MTx=7!nr?~hSTW#3E%+1^K|4vdDS=PVAjA|t#q&7ZwuZVZ^z^4 zjDe(@R=2&?XHGs<*uso0uWj+!{Iph+p6LqnuZ&=(QoJbHCbL%M1t)j=MGyk0h|Y4} zdlA21KIrEAnkV9>s>)Vx30o}E#T?JSsq5^MYdaT`DGvEla(T3pb*eBZof}h?>7DiH zxb8l%+)A`sPhDn^DS*SigohL61J)B*yg%rlGa&Ty3inmzwuDr1DKK9xB7G@%B}FI2}PPzOLNN<{zclTAIT=qC81|OzqoN^Lv|X_SUKIDPNPl7i@KrR zy+~1_>t3cJ?GQnbEXGe{I>p*{305(y=pD~pI~{9fuE_Q&_<)cOHX+rn zMPgUdNhFGKMXfns7xf#0$pSX=3hz%E zbp3FYx52}yD2*3r#M3v1zPl6EQ27JS!%$`~7sVQw~=;iJO5dZNLGUFW& z#l4ou&b`Ie_!E*Hl6Ua$G`+cpKwlwZfQt)Zb0xq3K2&zq+ig_^E&FOuulDvg|0YYA zI`0{*cnWvD#Gk)kq##7wAO$7Wk8&X{#~j*2+~bhKejr3%HeT5#La{OrZsua-{JWGP z;?D$bn3YSpp}>ZR8!>MxEGpESt{T_faC{)|-Yz+M>`<_^QBgos^A2#`J|4;Nj|s#c z$zMjUK);LzHyKB0_eGXp?25FmkXyt=0Q}#iv3%A*STUdn=dNe+4jWC8_N%N*FCfCR z{1RlV<+TsUg<;B#)Tz6P)I*yC_M|ZxHeV;@H*1XY>x%P$U2?q9O`Dw>ND&=5id_og zlC_m(^@0QFcbDX5VHHywQJV^O;$ z0rUNcSZdjf?hXWr(d`*-1HpifG5+LDg;h6oKN)(oc&52bfRxg(=BcR!CUEYGFThdm z8ApFfr*7AV4bIT?qH>1RKIYtx*I@fTZ5!`vVxnu!;G!n;K)hrJgzQW`zaq;!N| z>K7(njYi?qT`xtPGw&5Pb;_vyBe~b9>TAtoX%tq1DfL|T#ol_;awv$(|w~j z)Wx3L_3cUcP$GBFTrfp1YX&zz@ryyi6%&sv3qgg=C^sM^?1>hH7XAt%>(F)p3-0FRq{o+@?f z2c4uYR4p>5QXs2A1`B87pu}pkHc;JLzpZYu7zyyjDDq%WGR+2StI+Kg zS3xj*`6|-ZdwwPX+~Q-N>fXZq!n8|&IQf@Td5QX%(0|&v4aI(2Z2;uxw&8Yw*~-|_ z*tH6S=b}>p0j#wq0 zM-Y`25Bzm0if2(j-xK8ny|=y4+cR+GTFZ~(cPA+wkT6vRTizgu=({iLVQjUo`PWYs z=a-r3U!k~cMbw*egQ^rH();|oyuJlw>|^mkf?YjJh>^RW(W6*ZMHe6adz9`!cijG8 zsMt>X!0PJOp7whbWhZ`u^ZKcxMQ75U~*Hk7*iwFXg{S{dALvaz%~U31DK G;lBU{TV+K6 literal 0 HcmV?d00001 diff --git a/docs/static/img/wikidata-terms2.png b/docs/static/img/wikidata-terms2.png new file mode 100644 index 0000000000000000000000000000000000000000..fbe7d3286e10fdd66b0dcda3be3791d72401ea92 GIT binary patch literal 7336 zcmb_h3sh5Ax~9`kAJ8eaA_R)isuTf%C{^AtV+9JRDa3$4NQ{7}0Wn4p3`wjrX)OdO zPzZ0P0wM`X1PSjXHrDVGC=U}!1k?jW!4NSd5kkT}fmzd;HC=P(-Zf{f?40a#_UqgK z|9$)a^W9fRgZ^M_Wo%?*^oQUt4}}{Uy-5Y{fp^~q%b_b}x4^GA@ZmuRjaWn0&%qyW zCI0EipNx!}@=Vq+zXyN6cmB&LyphodcMYF6dQxu28yQ*q1|RyJtayUC)Ms#@@ z_1v_N|M*Dr-pDrwz50dU-|v4I-d^0e`~A0rqu&2{hyN$}@kv+rqa@i|O5ZnE_m{oj zR6AoVZDs`=Jmhou-3upUU7i}Zu*>U5r%T(=+BY%gD&%D8?sC z@UD^3`A4NDKwT)!T*1{$ooB+-31o72Xcz=+o9hk7Wa?C63z)BJ{Ap1nYURFUx73B& zc5X7-NBiQoM)_*M)l|dz_2uP}Alf}w_I|O&PiYn6kiq)%BBb%@O0k07(TN@%YRgz# z>dD)7&o7F+z-;?Q=4jZpp2|*WYrx(*cK;Mz;NiupFZAR)SIG+p(wIXF0}18!Rp}>} z)Y25#T%UdQF~hu$J{za(O<*}+bV&E~OU78sLnVM!B8;E%g9SPhhnwv$y2oJiR!e8z zixc_3e*EVr9R^_U;Obqds{CILl%f%D)#r+tb{*)sAT06KGV~tQ|L*I>E`~Ad`sSe5Vo}j|j<7d&&BunE9SYZZ{{fA_dH(16hizo~Q}=N^98_mcmLyLZ>@F-jA;^LTK6 z=#Pwok`~I|hP-B$6E#>^gwU?;2C=!N<-Mf1T03PCoAue~wWp`uEuN42-)$e*=uP9R zTOP~SOq0p`(7O4^ZNlwMG|FE7d#byp{z$!`{4&BYrhE7DXlS-8X(>BfK~xZ`gR7vY zbTxT?UJ~M?Ni?zZ^_ttI;*6Q5@_`Op-LiN9n0cPEmgy;0Xhlr4jdyOhC_|3i9^`E^ znK&&+U=Dj8##ql%EKtW;c|)OQfks9TbA5ad@4Y_^o#GNMt(>n10gUvGJ-p>GW&oQ5 ze=9T#nHAf`_8-!hIq5DpaCMoR9c7kmtZ+L5!myreuPz_8Xj3 zX5x}D5Vkp#(uJD5<;U&pYSux4{aSrlaFJ?8|MhWaaVUIUO>+XC7d+82|Qvv^za_qPxq zD9gX{2N9li?QRgKQu5`|F%bUBWQq$37$(Ij3fWu1wRit;*Gw43)2rC7;)y?iez$d+ zbn^HuweyM>XsGEYk}+$8{MA9sPm;Eo1<>stf6bw6F*3Sxx7J`Z(0?skMg-|{2}cXI zK7lf<+31_iGK~dA@kvD7^U=D#M;&>ej6V_!O`#3eR7JKD?Jz`zNsp)g2mc`)BkXg4prM)B*X2RobyJ|I|BhP zdzpB@bOCH-T{f{zs0MEAG>w6T`ZbrWkX%8p`|)9RUTy0T16qt^>#jK&YDy3os#&|* zA+D%bqJ&bKScpKGft#|x_Fp>O&#a3jmk1#8pfG6$2-O$2$2K6#-XO4lf~*BkSt9F} z$=TNoSMS^SaPKI{`tZ7rykp~P{(3ic^KKtiuzU0F?r&7RNWf(3S7Zq$mPAFsA@GVP zH@8vp2PKCv-52Er$PboG6e77)&%!x7! zO?xrg1pr|CXz%5(0l)}w~LIc%m7AL)}nz95QXk*}BAjWLVmVD3I zPfUQ>0UGT&hZ2^7Jc6lkr7WGSfo5I1);KiC*Xf^@`mfb!R|aS(%f&+Z%wyxYkQ4n+ zZ+__boqauYC^7~zO;B)W$&1?VSN%>sAwxHm&ua2qP-LCziNr-FV#Z`ARuk1E6lrDl z3GaY$4TzTqU-?>Lv%xMJfBm;lI+{DIk(h3hZndf9mFeurR-xKC`C3U|Lxq+wx}vQRYX z!V=)Vj@ySOJvB8Gn)Lp(P3U7Fh}FEhIorM(!NY!}NCKoyTbls()tSxh7iRmP&JC6a zxtYehU?;}HA+m*ClqTo%8G}_J{7Wq&^ref{UiO-KlQYaMNI~)2vh4(v+v=UDp+=DD+?+@qTJ4j z;kUwv&Q<`9vGQp0B3E8amj7hw(Mo)LY<51dtt@H{zhqTmTk~2`jUo1n|z?YjN z@ToplVvZI_?Hjvmc6aT4pi9B)TAR3c`N)}y%bg(igKh)-s-gde{Tq~;Pv1d6l%@id zHx1xHBR);yXrpVmcz7KS5k);>Aq;yjsrp?F`E5A)vrJK^J(YMCR_BT&Hcz43wrjA# zAt+f98>gux-g)|8#&O$dW$f@0T&~nUuXKUzav?X|EVAg{!;6?k1-^Qu8nGOA zqp}s+8r*<2nHcZTrD}0x6l>)c46+bctHO=Xd`@XPvYx6AgMc(M8%IZM5@Et%Vk2i#Joi<| zK&vtEgPuVAa$5UsYMJ1nMLp?}c}rMnGeQ=7!NoWi?Icg!?McISTI6{YhmWm=AT}K5exULzNZHfk4*_ss_biR#! zmQ_1`yK>~_x>~W`v_^4Pb+VxSbJw94J_8?*=-=e&+Eawaz(flLS6%RwHtHVjtvKd+ zKjonANB4SkW+`^V<`E`wmeEx)rnS^se9>$ z8l%#mP{2334{Z`_0A}W}nQ%oSwx{C1+a%`xDf7UI%s>#e?*#4Jf92o=SCg9iZpJ{` zZmPw{29OTTnNh;hKKs`i2Ws=mKfN?_r$Dg7qI{&9|Wem-V{`N2EO|7O)G zR}(-v?C%=W`;6dfP@o;!LasA4Nd!*Sl>GXOE%{N2%lu@p9-j+}P!b*5oWUEuWoe-O z7jYnPhd_OqPJH;jVUg;+#yb#d&eE>(b|59k;98EsTN8 z6~DWu4Ex!f2eL+b@kjO0TnDOhR{MTAu0Zg*W0<%&ZW){S@ybLO|F}N(Ky;|Sr-sjM zSLtlu7Y9CP)0A#{291YHnr5Av)O>Mh$klyat<6rX8?RH4 z*G%u?6Yyb>_0T!##u;grFJ__nB(if4vkbPXoI3`Y&X85s)C^lO!Kv3l1*ej3LtDw3 z_=Y9NqCCR9qq=ECTO;f5A5RwE)b#=F3_YgKw3N9(koHp4x_ zw5#EC_N=em7SZldOzH*SpTbp^>lcERWd22u0f+q4IrKwQ$!pW0&SvCL4gK~>2 z1XW&P(L0iXcc4~hzUq70U*wcs?w(>Z=<|I?V*)JHApKho#KnLy&J90d>Au3Ng+RA@ zmOfoArea}w7X9Go`-gC&(`}R9sd)$Fs#R=TUnW=63Cq=LXpNmI1RW`PzPmGsIg3)E zBDjfLJgfhUdKZB4Dc~I7u>I`;vmV|jfyVK^2``4FsWqb>2=h28Wp7XN;uN>5B9E#x zYY|E8uY2<6oW4aJ+!pxIAcSkW(LDI6@C%ZUj`eV+7KgFuE211Juq%MxA0rCXoE}55 z`Dx|%+rR_q%V!nT`b8<5Q|iU$2YD}wLC*UB>fEMH{Nt!D2#Xo%Z6 z8RK11dP{_)8bh_t8!9h1y9XlE8VkYusA_Mye!c%F=KfY;1Yh%^1ND}r)R8J5o>m3y zR<3C4MUiqeA~z7wdqpt>?eo5fqtU#5)eE!f541;aEK2#`k}``GeRR#99kN2@-$FDu z);o*537a2g29!0IX%oK-nEtKh7deq=c84Ji@cv-0JcN->Rp zS@3WO3j=8_ocoV@ zDSy4^mBrNfLZ@JNdA5OhZ$ksMF<&W7&Ph1nQUJ&AYBrnx2(5yd1?Jy9(vL9)b`Sfm z_Y9~;GMXoLQ!Dxo7z3Z}rW)@|+K3}SHqLh|4aGFBcjfes3H7wc);mCc0BR{qEtauC z;ISoULJAHy{EV;;od7K^uNG{5%fa5kaK^$9pv(hH6U|EHFNxm_eg&R8;%#f7{X>WH zCqDQG0jPO#_*I@W7>4{R8~-L=`i&6kKF@FtHE7@5g>dgkLz?E{AO81z7SyDgJU8?- z$X}-#dG<9z>0@n(^-3sXX2LH1>M@MZJBB`P;oQ$XI)19Z?VRqyQg56)Eo1d#fB!Nf zwEC6U{BFZaK}U36qejdQk0NMsiz7C+AVA>w?FcZxVpVe0yi}0~U%N3EBi}CtzQiQ4 z3&p2|Q(P=h^Y1`nee5 z69zVLZow;g%Zp*l?Y35iiM5}1xEP!j7h|N{)g+=nW0s{nh>qu~BRvriG2DP0B>X=v z%Th7jmlRVYBYhQV>j&ho2ycQZ&9NbKAtRFQ;u70=(ciMo1 zRFw_=#e;T{bkseS**m4?&Unc$6G)Sc z0&7~sQYSWme2zR6JoPa+=hozwQlG<^q`E)a$>BW>g5w0Iy2)xW32lefZT%UN178%y zGA3gpwh4RNStC{NfXg~^8$4su+IjJapXeQkx;<=5m}fEunaaG$O^C(pQ8h{QQ(*2w zdJ9i!a)mQn(>Zodc;cBDM1REO&ZBUMIM(TGN|b6UP{ah0O%lxCP<}rY4%zwp2G=VN zhv9Y%*As}>1&$fS=ktm%aj)Ks?|Mzw9`cxu4d`-T)kt}5q~v83I6^DN-XyNYw&$?q zpgH1`1(}srmoIiatJC-Hm*VL^yaSa%4fQYi_-^^0B)wTd>YxkuGa?D01-^)1(6c$Z1S(rqhCR6zXEORF;tEOQ+XjH z?KnWF!)r%z1q*W(BRmqT|9IWvgQ21kaFCsbyV~O4IlSF+skc^m|d*J>bQ}W;HKRl2JN?*z9&G4$clz5 zb*jsp1K_cSo929v6}!+Q?|_T@kc7dsM{JeYqd{-Q?3OuRS`Y;){3AA-HyX$B&552) zoVIX6pt!fNDJ^ALD^Z1jp^&MreNu|95fxa|(tJ1(ed+zdyw+ac4_yfC2#QgF7j5=1 zOYP71o?S+qm2gru$zWKQyR5n%3hv*V z1G~gWGD;=10yy5mgJZ7@E~-dFkMnA?BIBp&I6nDeB^WBM;xvK` zmav^R7d|kw&5$un`}2m;e7{lt5@-T4s1bPB{q2O~KcpaUzg$UN+(TL(L)+%ql$2E$ z=oTvChcmja5m+0!x7+7!C`*fMzu1`jN1J8cu|NC54sNjCqA=&dYPG9 zKQ@~=?!<5~Xk1d|4QO$zM{}70CFHe_-OufDn~iBBb)*~qC-BkbRND^vmB5~3v!9G- zHe4?m=k@aRe!3}Sm#?3hL9ow%3UaXC;Y3z+JkS5=lX6hu_AhWMogAog)hmf@n^se~ zu>lX{(l!e*E7rR>4A*!)qR#`Sk=SPVUP)kNZG4U-x)=#>nX1=?Du7 zO-SBK*-IdRX=q=+w>C! Date: Fri, 8 Jan 2021 15:26:22 -0500 Subject: [PATCH 11/21] Expressions --- docs/docs/manual/cellediting.md | 2 +- docs/docs/manual/expressions.md | 144 +++++++++++++++++--------------- docs/docs/manual/starting.md | 4 +- 3 files changed, 79 insertions(+), 71 deletions(-) diff --git a/docs/docs/manual/cellediting.md b/docs/docs/manual/cellediting.md index c2c525ac1..673661b01 100644 --- a/docs/docs/manual/cellediting.md +++ b/docs/docs/manual/cellediting.md @@ -13,7 +13,7 @@ You can apply a text facet on numbers, boolean values, and dates, but if you edi ## Transform -Select Edit cellsTransforms to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column. +Select Edit cellsTransform... to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column. Use the preview to ensure your data is being transformed correctly. diff --git a/docs/docs/manual/expressions.md b/docs/docs/manual/expressions.md index 15e75f19c..433476e57 100644 --- a/docs/docs/manual/expressions.md +++ b/docs/docs/manual/expressions.md @@ -6,14 +6,12 @@ sidebar_label: Expressions ## Overview -You can use expressions in multiple places in OpenRefine to extend data cleanup and manipulation. - -Expressions are available with the following functions: +You can use expressions in multiple places in OpenRefine to extend data cleanup and transformation. Expressions are available with the following functions: * Facet: * Custom text facet... * Custom numeric facet… - * You can also manually “change” most Customized facets after they have been created, which will bring up an expressions window. + * Customized facets (click “change” after they have been created to bring up an expressions window) * Edit cells: * Transform… @@ -24,11 +22,11 @@ Expressions are available with the following functions: * Split * Join * Add column based on this column - * Add column by fetching URLs + * Add column by fetching URLs. -In the expressions editor window you will have the opportunity to select one supported language. The default is [GREL (General Refine Expression Language)](#grel-general-refine-expression-language); OpenRefine also comes with support for [Clojure](#clojure) and [Jython](#jython). Extensions may offer support for more expressions languages. +In the expressions editor window you have the opportunity to select a supported language. The default is [GREL (General Refine Expression Language)](#grel-general-refine-expression-language); OpenRefine also comes with support for [Clojure](#clojure) and [Jython](#jython). Extensions may offer support for more expressions languages. -These languages have some syntax differences but support most of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`. +These languages have some syntax differences but support many of the same [variables](#variables). For example, the GREL expression `value.split(" ")[1]` would be written in Jython as `return value.split(" ")[1]`. This page is a general reference for available functions, variables, and syntax. For examples that use these expressions for common data tasks, look at the [Recipes section on the Wiki](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Users#recipes-and-worked-examples). @@ -49,34 +47,34 @@ Were you to apply a transformation to the “friend” column with the expressio value.split(" ")[1] ``` -OpenRefine would work through each row, splitting the “friend” values based on a space character. `value` for row 1 would be “John Smith” so the output would be “Smith” (as "[1]" selects the second part of the created output); `value` for row 2 would be “Jane Doe” so the output would be “Doe.” Using variables, a single expression yields different results for different rows. The old information would be discarded; you couldn't get "John" and "Jane" back unless you undid the operation in the History tab. +OpenRefine would work through each row, splitting the “friend” values based on a space character. The `value` for row 1 is “John Smith” so the output would be “Smith” (as "[1]" selects the second part of the created output); the `value` for row 2 is “Jane Doe” so the output would be “Doe”. Using variables, a single expression yields different results for different rows. The old information would be discarded; you couldn't get "John" and "Jane" back unless you undid the operation in the [History](running#history-undoredo) tab. For another example, if you were to create a new column based on your data using the expression `row.starred`, it would generate a column of true and false values based on whether your rows were starred at that moment. If you were to then star more rows and unstar some rows, that data would not dynamically update - you would need to run the operation again to have current true/false values. -Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is invoked. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns. +Note that an expression is typically based on one particular column in the data - the column whose drop-down menu is first selected. Many variables are created to stand for things about the cell in that “base column” of the current row on which the expression is evaluated. There are also variables about rows, which you can use to access cells in other columns. ### The expressions editor -When you select a function that offers the ability to supply expressions, you will see a window overlay the screen with what we call the expressions editor. +When you select a function that accepts expressions, you will see a window overlay the screen with what we call the expressions editor. ![The expressions editor window with a simple expression: value + 10.](/img/expression-editor.png) The expressions editor offers you a field for entering your formula and shows you a preview of its transformation on your first few rows of cells. -There is a dropdown menu from which you can choose an expression language. The default is GREL. Jython and Clojure are also offered with the installation package, and you may be able to add more language support with third-party extensions and customizations. +There is a dropdown menu from which you can choose an expression language. The default at first is GREL; if you begin working with another language, that selection will persist across OpenRefine. Jython and Clojure are also offered with the installation package, and you may be able to add more language support with third-party extensions and customizations. There are also tabs for: -* History, which shows you formulas you’ve recently used from across all your projects -* Starred, which shows you formulas from your History that you’ve starred for reuse -* Help, a quick reference to GREL functions. +* History, which shows you formulas you’ve recently used from across all your projects +* Starred, which shows you formulas from your History that you’ve starred for reuse +* Help, a quick reference to GREL functions. -Starring formulas you’ve used in the past can be very helpful for repetitive tasks you’re performing in batches. +Starring formulas you’ve used in the past can be helpful for repetitive tasks you’re performing in batches. You can also choose how formula errors are handled: replicate the original cell value, output an error message into the cell, or ouput a blank cell. ### Regular expressions -OpenRefine offers several fields that support the use of regular expressions (regex), such as in a Text filter or a Replace… operation. GREL and other expressions can also use regular expression markup to extend their functionality. +OpenRefine offers several fields that support the use of regular expressions (regex), such as in a Text filter or a Replace… operation. GREL and other expressions can also use regular expression markup to extend their functionality. If this is your first time working with regex, you may wish to read [this tutorial specific to the Java syntax that OpenRefine supports](https://docs.oracle.com/javase/tutorial/essential/regex/). We also recommend this [testing and learning tool](https://regexr.com/). @@ -92,19 +90,19 @@ the regular expression is `\s+`, and the syntax used in the expression wraps it Do not use slashes to wrap regular expressions outside of a GREL expression. -The [GREL functions](#grel-general-refine-expression-language) that support regex are: -* contains -* replace -* find -* match -* partition -* rpartition -* split -* smartSplit +On the [GREL functions](#grel-general-refine-expression-language) page, functions that support regex will indicate that with a “p” for “pattern.” The GREL functions that support regex are: +* [contains](grelfunctions#containss-sub-or-p) +* [replace](grelfunctions#replaces-s-or-p-find-s-replace) +* [find](grelfunctions#finds-sub-or-p) +* [match](grelfunctions#matchs-p) +* [partition](grelfunctions#partitions-s-or-p-fragment-b-omitfragment-optional) +* [rpartition](grelfunctions#rpartitions-s-or-p-fragment-b-omitfragment-optional) +* [split](grelfunctions#splits-s-or-p-sep) +* [smartSplit](grelfunctions#smartsplits-s-or-p-sep-optional) #### Jython-supported regex -You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a Custom Text Facet: +You can also use [regex with Jython expressions](http://www.jython.org/docs/library/re.html), instead of GREL, for example with a Custom Text Facet: ``` python import re g = re.search(ur"\u2014 (.*),\s*BWV", value) return g.group(1) @@ -120,7 +118,7 @@ clojure (nth (re-find #"\u2014 (.*),\s*BWV" value) 1) ## Variables -Most of the OpenRefine-specific variables have attributes: aspects of the variables that can be called separately. We call these attributes "member fields" because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to `record`, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). +Most OpenRefine variables have attributes: aspects of the variables that can be called separately. We call these attributes “member fields” because they belong to certain variables. For example, you can query a record to find out how many rows it contains with `row.record.rowCount`: `rowCount` is a member field specific to the `record` variable, which is a member field of `row`. Member fields can be called using a dot separator, or with square brackets (`row["record"]`). The square bracket syntax is also used for variables that can call columns by name, for example, `cells["Postal Code"]`. |Variable |Meaning | |-|-| @@ -141,49 +139,51 @@ The `row` variable itself is best used to access its member fields, which you ca |-|-| | `row.index` | The index value of the current row (the first row is 0) | | `row.cells` | The cells of the row, returned as an array | -| `row.columnNames` | An array of the column names of the row, i.e. the column names in the project. This will report all columns, even those with null cell values in the particular row. Call a column by number with row.columnNames[3] | +| `row.columnNames` | An array of the column names of the project. This will report all columns, even those with null cell values in that particular row. Call a column by number with `row.columnNames[3]` | | `row.starred` | A boolean indicating if the row is starred | | `row.flagged` | A boolean indicating if the row is flagged | | `row.record` | The [record](#record) object containing the current row | For array objects such as `row.columnNames` you can preview the array using the expressions window, and output it as a string using `toString(row.columnNames)` or with something like: -```forEach(row.columnNames,v,v).join("; ")``` +``` +forEach(row.columnNames,v,v).join("; ") +``` ### Cells -The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. There is no `cells.value` - it can only be used with member fields. To get the corresponding column value inside the `cells` variable, use `.value` at the end, for example `cells["Postal Code"].value`. +The `cells` object is used to call information from the columns in your project. For example, `cells.Foo` returns a [cell](#cell) object representing the cell in the column named “Foo” of the current row. If the column name has spaces, use square brackets, e.g., `cells["Postal Code"]`. To get the corresponding column's value inside the `cells` variable, use `.value` at the end, for example, `cells["Postal Code"].value`. There is no `cells.value` - it can only be used with member fields. ### Cell -A `cell` object contains all the data of a cell and is stored as a single object that has two fields. +A `cell` object contains all the data of a cell and is stored as a single object. -You can use `cell` on its own in the expressions editor to copy all the contents of a column to another column, including reconciliation information. Although the preview in the expressions editor will only show a small representation [object Cell], it will actually copy all the cell's data. Try this with Edit ColumnAdd Column based on this column .... +You can use `cell` on its own in the expressions editor to copy all the contents of a column to another column, including reconciliation information. Although the preview in the expressions editor will only show a small representation (“[object Cell]”), it will actually copy all the cell's data. Try this with Edit ColumnAdd Column based on this column .... |Field |Meaning |Member fields | |-|-|-| | `cell` | An object containing the entire contents of the cell | .value, .recon, .errorMessage | | `cell.value` | The value in the cell, which can be a string, a number, a boolean, null, or an error | | -| `cell.recon` | An object encapsulating reconciliation results for that cell | See the reconciliation section below | +| `cell.recon` | An object encapsulating reconciliation results for that cell | See the [reconciliation](expressions#reconciliation) section | | `cell.errorMessage` | Returns the message of an *EvalError* instead of the error object itself (use value to return the error object) | .value | ### Reconciliation -Several of the fields here are equivalent to what can be used through [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work. +Several of the fields here provide the data used in [reconciliation facets](reconciling#reconciliation-facets). You must type `cell.recon`; `recon` on its own will not work. |Field|Meaning |Member fields | |-|-|-| -| `cell.recon.judgment` | A string, either "matched", "new", "none" | | -| `cell.recon.judgmentAction` | A string, either "single" or "similar" (or "unknown") | | +| `cell.recon.judgment` | A string: either “matched”, "new”, "none” | | +| `cell.recon.judgmentAction` | A string: either "single” or “similar” (or “unknown”) | | | `cell.recon.judgmentHistory` | A number, the epoch timestamp (in milliseconds) of your judgment | | -| `cell.recon.matched` | A boolean, true if judgment is "matched" | | +| `cell.recon.matched` | A boolean, true if judgment is “matched” | | | `cell.recon.match` | The recon candidate that has been matched against this cell (or null) | .id, .name, .type | | `cell.recon.best` | The highest scoring recon candidate from the reconciliation service (or null) | .id, .name, .type, .score | | `cell.recon.features` | An array of reconciliation features to help you assess the accuracy of your matches | .typeMatch, .nameMatch, .nameLevenshtein, .nameWordDistance | -| `cell.recon.features.typeMatch` | A boolean, true if your chosen type is "matched" and false if not (or "(no type)" if unreconciled) | | -| `cell.recon.features.nameMatch` | A boolean, true if the cell and candidate strings are identical and false if not (or "(unreconciled)") | | -| `cell.recon.features.nameLevenshtein` | A number, representing the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): larger if the difference is greater between value and candidate | | -| `cell.recon.features.nameWordDistance` | A number, based on the [word similarity](reconciling#reconciliation-facets) | | +| `cell.recon.features.typeMatch` | A boolean, true if your chosen type is “matched” and false if not (or “(no type)” if unreconciled) | | +| `cell.recon.features.nameMatch` | A boolean, true if the cell and candidate strings are identical and false if not (or “(unreconciled)”) | | +| `cell.recon.features.nameLevenshtein` | A number representing the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance): larger if the difference is greater between value and candidate | | +| `cell.recon.features.nameWordDistance` | A number based on the [word similarity](reconciling#reconciliation-facets) | | | `cell.recon.candidates` | An array of the top 3 candidates (default) | .id, .name, .type, .score | The `cell.recon.candidates` and `cell.recon.best` objects have a few deeper fields: `id`, `name`, `type`, and `score`. `type` is an array of type identifiers for a list of candidates, or a single string for the best candidate. @@ -203,7 +203,7 @@ A `row.record` object encapsulates one or more rows that are grouped together, w | `row.record.cells` | The cells of the row | | `row.record.fromRowIndex` | The row index of the first row in the record | | `row.record.toRowIndex` | The row index of the last row in the record + 1 (i.e. the next row) | -| `row.record.rowCount` | count of the number of rows in the record | +| `row.record.rowCount` | A count of the number of rows in the record | ## GREL (General Refine Expression Language) @@ -219,7 +219,9 @@ GREL is designed to resemble Javascript. Formulas use variables and depend on da | `value.substring(7, 10)` | Output the substring of the value from character index 7, 8, and 9 (excluding character index 10) | | `value.substring(13)` | Output the substring from index 13 to the end of the string | -If you're used to Excel, note that the operator for string concatenation is + (not &). Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`). +Note that the operator for string concatenation is `+` (not “&” as is used in Excel). + +Evaluating conditions uses symbols such as <, >, *, /, etc. To check whether two objects are equal, use two equal signs (`value=="true"`). ### Syntax @@ -234,7 +236,7 @@ The second form is a shorthand to make expressions easier to read. It simply pul | `value.trim().length()` | `length(trim(value))` | | `value.substring(7, 10)` | `substring(value, 7, 10)` | -So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. +So, in the dot shorthand, the functions occur from left to right in the order of calling, rather than in the reverse order with parentheses. This allows you to string together multiple functions in a readable order. The dot notation can also be used to access the member fields of [variables](#variables). For referring to column names that contain spaces (anything not a continuous string), use square brackets instead of dot notation: @@ -243,7 +245,7 @@ The dot notation can also be used to access the member fields of [variables](#va | `FirstName.cells` | Access the cell in the column named “FirstName” of the current row | | `cells["First Name"]` | Access the cell in the column called “First Name” of the current row | -Brackets can also be used to get substrings and sub-arrays, and single items from arrays: +Square brackets can also be used to get substrings and sub-arrays, and single items from arrays: |Example |Description | |-|-| @@ -251,24 +253,26 @@ Brackets can also be used to get substrings and sub-arrays, and single items fro | `"internationalization"[1,-2]` | Will return “nternationalizati” (negative indexes are counted from the end) | | `row.columnNames[5]` | Will return the name of the fifth column | -Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0). For example, partition() would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting the third part with "internationalization".partition("nation")[2] will output “alization” (and so will [-1], indicating the final item in the array). +Any function that outputs an array can use square brackets to select only one part of the array to output as a string (remember that the index of the items in an array starts with 0). -### Controls +For example, partition() would normally output an array of three items: the part before your chosen fragment, the fragment you've identified, and the part after. Selecting only the third part with `"internationalization".partition("nation")[2]` will output “alization” (and so will [-1], indicating the final item in the array). -GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to value, and how. +### GREL controls + +GREL offers controls to support branching and looping (that is, “if” and “for” functions), but unlike functions, their arguments don't all get evaluated before they get run. A control can decide which part of the code to execute and can affect the environment bindings. Functions, on the other hand, can't do either. Each control decides which of their arguments to evaluate to `value`, and how. Please note that the GREL control names are case-sensitive: for example, the isError() control can't be called with iserror(). -#### if(e, expression eTrue, expression eFalse) +#### if(e, eTrue, eFalse) -Expression o is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole `if` expression. Otherwise, expression eFalse is evaluated and that result is the value. +Expression e is evaluated to a value. If that value is true, then expression eTrue is evaluated and the result is the value of the whole if() expression. Otherwise, expression eFalse is evaluated and that result is the value. Examples: | Example expression | Result | | ------------------------------------------------------------------------ | ------------ | -| `if("internationalization".length() > 10, "big string", "small string")` | big string | -| `if(mod(37, 2) == 0, "even", "odd")` | odd | +| `if("internationalization".length() > 10, "big string", "small string")` | “big string” | +| `if(mod(37, 2) == 0, "even", "odd")` | “odd” | Nested if (switch case) example: @@ -290,7 +294,7 @@ Evaluates expression e1 and binds its value to variable v. Then evaluates expres | `with("european union".split(" "), a, forEach(a, v, v.length()))` | [ 8, 5 ] | | `with("european union".split(" "), a, forEach(a, v, v.length()).sum() / a.length())` | 6.5 | -#### filter(e1, variable v, e test) +#### filter(e1, v, e test) Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression test - which should return a boolean. If the boolean is true, pushes v onto the result array. @@ -298,7 +302,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu | ---------------------------------------------- | ------------- | | `filter([ 3, 4, 8, 7, 9 ], v, mod(v, 2) == 1)` | [ 3, 7, 9 ] | -#### forEach(e1, variable v, e2) +#### forEach(e1, v, e2) Evaluates expression e1 to an array. Then for each array element, binds its value to variable v, evaluates expression e2, and pushes the result onto the result array. @@ -306,7 +310,7 @@ Evaluates expression e1 to an array. Then for each array element, binds its valu | ------------------------------------------ | ------------------- | | `forEach([ 3, 4, 8, 7, 9 ], v, mod(v, 2))` | [ 1, 0, 0, 1, 1 ] | -#### forEachIndex(e1, variable i, variable v, e2) +#### forEachIndex(e1, i, v, e2) Evaluates expression e1 to an array. Then for each array element, binds its index to variable i and its value to variable v, evaluates expression e2, and pushes the result onto the result array. @@ -314,15 +318,15 @@ Evaluates expression e1 to an array. Then for each array element, binds its inde | ------------------------------------------------------------------------------- | --------------------------- | | `forEachIndex([ "anne", "ben", "cindy" ], i, v, (i + 1) + ". " + v).join(", ")` | 1. anne, 2. ben, 3. cindy | -#### forRange(n from, n to, n step, variable v, e) +#### forRange(n from, n to, n step, v, e) -Iterates over the variable v starting at from, incrementing by step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array. +Iterates over the variable v starting at from, incrementing by the value of step each time while less than to. At each iteration, evaluates expression e, and pushes the result onto the result array. -#### forNonBlank(e, variable v, expression eNonBlank, expression eBlank) +#### forNonBlank(e, v, eNonBlank, eBlank) -Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if o evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead. +Evaluates expression e. If it is non-blank, forNonBlank() binds its value to variable v, evaluates expression eNonBlank and returns the result. Otherwise (if e evaluates to blank), forNonBlank() evaluates expression eBlank and returns that result instead. -Unlike other GREL functions beginning with "for", forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an "if" statement. +Unlike other GREL functions beginning with “for,” forNonBlank() is not iterative. forNonBlank() essentially offers a shorter syntax to achieving the same outcome by using the isNonBlank() function within an “if” statement. #### isBlank(e), isNonBlank(e), isNull(e), isNotNull(e), isNumeric(e), isError(e) @@ -333,7 +337,7 @@ Examples: | Expression | Result | | ------------------- | ------- | | `isBlank("abc")` | false | -| `isNonBlank("abc")` | true | +| `isNonBlank("abc")` | true | | `isNull("abc")` | false | | `isNotNull("abc")` | true | | `isNumeric(2)` | true | @@ -341,7 +345,7 @@ Examples: | `isError("abc")` | false | | `isError(1 / 0)` | true | -Remember that these are controls and not functions: you can’t use dot notation (the `e.isX()` syntax). +Remember that these are controls and not functions: you can’t use dot notation (for example, the format `e.isX()` will not work). ### Constants |Name |Meaning | @@ -352,9 +356,13 @@ Remember that these are controls and not functions: you can’t use dot notation ## Jython -Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible. Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup. +Jython 2.7.2 comes bundled with the default installation of OpenRefine 3.4.1. You can add libraries and code by following [this tutorial](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules). A large number of Python files (`.py` or `.pyc`) are compatible. -OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions), for parsing and working with HTML elements and extraction. +Python code that depends on C bindings will not work in OpenRefine, which uses Java / Jython only. Since Jython is essentially Java, you can also import Java libraries and utilize those. + +You will need to restart OpenRefine, so that new Jython or Python libraries are initialized during startup. + +OpenRefine now has [most of the Jsoup.org library built into GREL functions](#jsoup-xml-and-html-parsing-functions) for parsing and working with HTML and XML elements. ### Syntax @@ -368,19 +376,19 @@ Expressions in Jython must have a `return` statement: return rowIndex%2 ``` -Fields have to be accessed using the bracket operator rather than the dot operator: +Fields have to be accessed using the bracket operator rather than dot notation: ``` return cells["col1"]["value"] ``` -To access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value, use [recon variables](#reconciliation): +For example, to access the [edit distance](reconciling#reconciliation-facets) between a reconciled value and an original cell value using [recon variables](#reconciliation): ``` return cell["recon"]["features"]["nameLevenshtein"] ``` -To return the lower case of value (if the value is not null): +To return the lower case of `value` (if the value is not null): ``` if value is not None: @@ -391,7 +399,7 @@ To return the lower case of value (if the value is not null): ### Tutorials - [Extending Jython with pypi modules](https://github.com/OpenRefine/OpenRefine/wiki/Extending-Jython-with-pypi-modules) -- [Working with Phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python) +- [Working with phone numbers using Java libraries inside Python](https://github.com/OpenRefine/OpenRefine/wiki/Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python) Full documentation on the Jython language can be found on its official site: [http://www.jython.org](http://www.jython.org). diff --git a/docs/docs/manual/starting.md b/docs/docs/manual/starting.md index 795c380b6..08b06f710 100644 --- a/docs/docs/manual/starting.md +++ b/docs/docs/manual/starting.md @@ -59,7 +59,7 @@ Click on Browse… and select a file (or several) If you import an archive file (something with the extension `.zip`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.gz`, or `.bz2`), OpenRefine detects the files inside it, shows you a preview screen, and allows you to select which ones to load. This does not work with `.rar` files. -### Web Addresses (URLs) +### Web addresses (URLs) Type or paste the URL to a data file into the field provided. You can add as many fields as you want. OpenRefine will download the file and preview the project for you. @@ -91,7 +91,7 @@ You can either connect just once to gather data, or save the connection to use i If your connection is successful, you will see a Query Editor where you can run your SQL query. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way. -### Google Data +### Google data You have two ways to load in data from Google Sheets: * providing a link to an accessible Google Sheet (that is, one with link-sharing turned on), and From bd70208446ac770932cad91c3192e85cc378b820 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Mon, 11 Jan 2021 15:58:29 -0500 Subject: [PATCH 12/21] Update grelfunctions.md --- docs/docs/manual/grelfunctions.md | 128 +++++++++++++++--------------- 1 file changed, 63 insertions(+), 65 deletions(-) diff --git a/docs/docs/manual/grelfunctions.md b/docs/docs/manual/grelfunctions.md index 22393b45d..fe5d7d76a 100644 --- a/docs/docs/manual/grelfunctions.md +++ b/docs/docs/manual/grelfunctions.md @@ -6,32 +6,32 @@ sidebar_label: GREL functions ## Reading this reference -For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of [data type](exploring#data-types) used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, as well as with “null” and “error.” +For the reference below, the function is given in full-length notation and the in-text examples are written in dot notation. Shorthands are used to indicate the kind of [data type](exploring#data-types) used in each function: s for string, b for boolean, n for number, d for date, a for array, p for a regex pattern, and o for any data type, as well as “null” and “error” data types. If a function can take more than one kind of data as input or can output more than one kind of data, that is indicated with more than one letter (as with “s or a”) or with o for object. We also use shorthands for substring (“sub”) and separator string (“sep”). Optional arguments will say “(optional)”. -In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any regex notation, wrap the pattern in forward slashes. +In places where OpenRefine will accept a string (s) or a regex pattern (p), you can supply a string by putting it in quotes. If you wish to use any [regex](expressions#regular-expressions) notation, wrap the pattern in forward slashes. ## Boolean functions ###### and(b1, b2, ...) -Uses the logical operator AND on two or more booleans to yield a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, `and(1 < 3, 1 < 0)` returns false because one condition is true and one is false. +Uses the logical operator AND on two or more booleans to output a boolean. Evaluates multiple statements into booleans, then returns true if all of the statements are true. For example, `(1 < 3).and(1 < 0)` returns false because one condition is true and one is false. ###### or(b1, b2, ...) -Uses the logical operator OR on two or more booleans to yield a boolean. For example, `or(1 < 3, 1 > 7)` returns true because at least one of the conditions (the first one) is true. +Uses the logical operator OR on two or more booleans to output a boolean. For example, `(1 < 3).or(1 > 7)` returns true because at least one of the conditions (the first one) is true. ###### not(b) -Uses the logical operator NOT on a boolean to yield a boolean. For example, `not(1 > 7)` returns true because 1 > 7 itself is false. +Uses the logical operator NOT on a boolean to output a boolean. For example, `not(1 > 7)` returns true because 1 > 7 itself is false. ###### xor(b1, b2, ...) -Uses the logical operator XOR (exclusive-or) on two or more booleans to yield a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, `xor(1 < 3, 1 < 7)` returns false because more than one of the conditions is true. +Uses the logical operator XOR (exclusive-or) on two or more booleans to output a boolean. Evaluates multiple statements, then returns true if only one of them is true. For example, `(1 < 3).xor(1 < 7)` returns false because more than one of the conditions is true. ## String functions @@ -41,9 +41,9 @@ Returns the length of string s as a number. ###### toString(o, string format (optional)) -Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value. You can convert between types, within limits (for example, you can't turn the string “asdfsd” into a date or a number, but you can convert the number “123” into a string). +Takes any value type (string, number, date, boolean, error, null) and gives a string version of that value. -You can also use toString() to convert numbers to strings with rounding, using an [optional string format](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html). For example, if you applied the expression `value.toString("%.0f")` to a column: +You can use toString() to convert numbers to strings with rounding, using an [optional string format](https://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html). For example, if you applied the expression `value.toString("%.0f")` to a column: |Input|Output| |-|-| @@ -100,7 +100,7 @@ Returns a copy of the string s with leading and trailing whitespace removed. For ###### chomp(s, sep) -Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, `"hardly".chomp("ly")` and `"hard".chomp("ly")` both return the string “hard”. +Returns a copy of string s with the string sep removed from the end if s ends with sep; otherwise, just returns s. For example, `"barely".chomp("ly")` and `"bare".chomp("ly")` both return the string “bare”. #### Substring @@ -108,7 +108,7 @@ Returns a copy of string s with the string sep removed from the end if s ends wi Returns the substring of s starting from character index from, and up to (excluding) character index to. If the to argument is omitted, substring will output to the end of s. For example, `"profound".substring(3)` returns the string “found”, and `"profound".substring(2, 4)` returns the string “of”. -Character indexes start from zero. Negative character indexes count from the end of the string. For example, `"profound".substring(0, -1)` returns the string “profoun”. +Remember that character indices start from zero. A negative character index counts from the end of the string. For example, `"profound".substring(0, -1)` returns the string “profoun”. ###### slice(s, n from, n to (optional)) @@ -130,12 +130,12 @@ Returns the first character index of sub as it last occurs in s; or, returns -1 ###### replace(s, s or p find, s replace) -Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, `"The cow jumps over the moon and moos".replace("oo", "ee")` returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern; if so, replace can also contain capture groups declared in find. +Returns the string obtained by replacing the find string with the replace string in the inputted string. For example, `"The cow jumps over the moon and moos".replace("oo", "ee")` returns the string “The cow jumps over the meen and mees”. Find can be a regex pattern. For example, `"The cow jumps over the moon and moos".replace(/\s+/, "_")` will return “The_cow_jumps_over_the_moon_and_moos”. You cannot find or replace nulls with this, as null is not a string. You can instead: 1. Facet by null and then bulk-edit them to a string, or -2. Transform the column with an expression such as `if(value==null,'new',value)` +2. Transform the column with an expression such as `if(value==null,"new",value)`. ###### replaceChars(s, s find, s replace) @@ -145,7 +145,7 @@ Returns the string obtained by replacing a character in s, identified by find, w Outputs an array of all consecutive substrings inside string s that match the substring or [regex](expressions#grel-supported-regex) pattern p. For example, `"abeadsabmoloei".find(/[aeio]+/)` would result in the array [ "a", "ea", "a", "o", "oei" ]. -You can supply a sub instead of p, by putting it in quotes, and OpenRefine will compile it into a regex pattern. Anytime you supply quotes, OpenRefine interprets the contents as a string, not regex. If you wish to use any regex notation, wrap the pattern in forward slashes, for example: `"OpenRefine is Awesome".find(/fine\sis/)` would return [ "fine is" ]. +You can supply a substring instead of p, by putting it in quotes, and OpenRefine will compile it into a regex pattern. Anytime you supply quotes, OpenRefine interprets the contents as a string, not regex. If you wish to use any regex notation, wrap the pattern in forward slashes. ###### match(s, p) @@ -153,9 +153,9 @@ Attempts to match the string s in its entirety against the [regex](expressions#g You will need to convert the array to a string to store it in a cell, with a function such as toString(). An empty array [] is returned when there is no match to the desired substrings. A null is output when the entire regex does not match. -Remember to enclose your regex in forward slashes, and to escape characters and use parentheses as needed. Parentheses are required to denote a desired substring (capturing group); for example, “.*(\d\d\d\d)” would return an array containing a single value, while “(.*)(\d\d\d\d)” would return two. So, if you are looking for a desired substring anywhere within a string, use the syntax `value.match(/.*(desired-substring-regex).*/)`. +Remember to enclose your regex in forward slashes, and to escape characters and use parentheses as needed. Parentheses denote a desired substring (capturing group); for example, “.*(\d\d\d\d)” would return an array containing a single value, while “(.*)(\d\d\d\d)” would return two. So, if you are looking for a desired substring anywhere within a string, use the syntax `value.match(/.*(desired-substring-regex).*/)`. -For example, if the value is “hello 123456 goodbye”: +For example, if `value` is “hello 123456 goodbye”, the following would occur: |Expression|Result| |-|-| @@ -186,9 +186,7 @@ Note: [`value.escape('javascript')`](#escapes-s-mode) is useful for previewing u ###### splitByCharType(s) -Returns an array of strings obtained by splitting s into groups of consecutive characters each time the characters change unicode types. For example, `"HenryCTaylor".splitByCharType()` will result in an array of [ "H", "enry", "CT", "aylor" ]. - -It is useful for separating letters and numbers: `"BE1A3E".splitByCharType()` will result in [ "BE", "1", "A", "3", "E" ]. +Returns an array of strings obtained by splitting s into groups of consecutive characters each time the characters change [Unicode categories](https://en.wikipedia.org/wiki/Unicode_character_property#General_Category). For example, `"HenryCTaylor".splitByCharType()` will result in an array of [ "H", "enry", "CT", "aylor" ]. It is useful for separating letters and numbers: `"BE1A3E".splitByCharType()` will result in [ "BE", "1", "A", "3", "E" ]. ###### partition(s, s or p fragment, b omitFragment (optional)) @@ -200,15 +198,13 @@ You can use regex for your fragment. The expresion `"abcdefgh".partition(/c.e/)` ###### rpartition(s, s or p fragment, b omitFragment (optional)) -Returns an array of strings [ a, fragment, z ] where a is the substring within s before the last occurrence of fragment, and z is the substring after the last instance of fragment. (Rpartition means “reverse partition.”) For example, `"parallel".rpartition("a")` returns 3 strings: [ "par", "a", "llel" ]. - -Otherwise works identically to partition() above. +Returns an array of strings [ a, fragment, z ] where a is the substring within s before the last occurrence of fragment, and z is the substring after the last instance of fragment. (Rpartition means “reverse partition.”) For example, `"parallel".rpartition("a")` returns 3 strings: [ "par", "a", "llel" ]. Otherwise works identically to partition() above. ### Encoding and hashing ###### diff(s1, s2, s timeUnit (optional)) -Takes two strings and compares them, returning a string. Returns the remainder of s2 starting with the first character where they differ. For example, `diff("cacti", "cactus")` returns "us". Also works with dates; see [Date functions](#diffd1-d2-s-timeunit). +Takes two strings and compares them, returning a string. Returns the remainder of s2 starting with the first character where they differ. For example, `"cacti".diff("cactus")` returns "us". Also works with dates; see [Date functions](#diffd1-d2-s-timeunit). ###### escape(s, s mode) @@ -266,7 +262,7 @@ Quotes a value as a JSON literal value. Parses a string as JSON. get() can then be used with parseJson(): for example, `parseJson(" { 'a' : 1 } ").get("a")` returns 1. -For example from the following JSON array, let's get all instances called “keywords” having the same object string name of “text”, and combine it with the forEach() function to iterate over the array. +For example, from the following JSON array in `value`, we want to get all instances of “keywords” having the same object string name of “text”, and combine them, using the forEach() function to iterate over the array. { "status":"OK", @@ -293,14 +289,16 @@ The GREL expression `forEach(value.parseJson().keywords,v,v.text).join(":::")` w ### Jsoup XML and HTML parsing ###### parseHtml(s) -Given a cell full of HTML-formatted text, simplifies HTML tags (such as by removing “ /” at the end of self-closing tags), closes any unclosed tags, and inserts linebreaks and indents for cleaner code. You cannot pass parseHtml() a URL, but you can pre-fetch HTML with the Add column by fetching URLs menu option. A cell cannot store the output of parseHtml() unless you convert it with toString(). +Given a cell full of HTML-formatted text, parseHtml() simplifies HTML tags (such as by removing “ /” at the end of self-closing tags), closes any unclosed tags, and inserts linebreaks and indents for cleaner code. You cannot pass parseHtml() a URL, but you can pre-fetch HTML with the [Add column by fetching URLs](columnediting#add-column-by-fetching-urls) menu option. + +A cell cannot store the output of parseHtml() unless you convert it with toString(): for example, `value.parseHtml().toString()`. When parseHtml() simplifies HTML, it can sometimes introduce errors. When closing tags, it makes its best guesses based on line breaks, indentation, and the presence of other tags. You may need to manually check the results. -You can then extract or select() which portions of the HTML document you need for further splitting, partitioning, etc. An example of extracting all table rows from a div using parseHtml().select() together is described more in depth at [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML). +You can then extract or [select()](#selects-element) which portions of the HTML document you need for further splitting, partitioning, etc. An example of extracting all table rows from a div using parseHtml().select() together is described more in depth at [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML). ###### parseXml(s) -Given a cell full of XML-formatted text, returns a full XML document and adds any missing closing tags. You can then extract or select() which portions of the XML document you need for further splitting, partitioning, etc. Functions the same way as parseHtml() is described above. +Given a cell full of XML-formatted text, parseXml() returns a full XML document and adds any missing closing tags. You can then extract or [select()](#selects-element) which portions of the XML document you need for further splitting, partitioning, etc. Functions the same way as parseHtml() is described above. ###### select(s, element) Returns an array of all the desired elements from an HTML or XML document, if the element exists. Elements are identified using the [Jsoup selector syntax](https://jsoup.org/apidocs/org/jsoup/select/Selector.html). For example, `value.parseHtml().select("img.portrait")[0]` would return the entirety of the first “img” tag with the “portrait” class found in the parsed HTML inside `value`. Returns an empty array if no matching element is found. Use with toString() to capture the results in a cell. A tutorial of select() is shown in [StrippingHTML](https://github.com/OpenRefine/OpenRefine/wiki/StrippingHTML). @@ -312,34 +310,34 @@ value.parseHtml().select("div#content")[0].select("tr").toString() ``` ###### htmlAttr(s, element) -Returns a string from an attribute on an HTML element. Use it in conjunction with parseHtml() as in the following example: `value.parseHtml().select("a.email")[0].htmlAttr("href")`. +Returns a string from an attribute on an HTML element. Use it in conjunction with parseHtml() as in the following example: `value.parseHtml().select("a.email")[0].htmlAttr("href")` would retrieve the email address attached to a link with the “email” class. ###### xmlAttr(s, element) -Returns a string from an attribute on an XML element. Function the same way htmlAttr() is described above. Use it in conjunction with parseXml(). +Returns a string from an attribute on an XML element. Functions the same way htmlAttr() is described above. Use it in conjunction with parseXml(). ###### htmlText(element) Returns a string of the text from within an HTML element (including all child elements), removing HTML tags and line breaks inside the string. Use it in conjunction with parseHtml() and select() to provide an element, as in the following example: `value.parseHtml().select("div.footer")[0].htmlText()`. ###### xmlText(element) -Returns a string of the text from within an XML element (including all child elements). Functions the same way as htmlText() is described above. Use it in conjunction with parseXml() and select() to provide an element. +Returns a string of the text from within an XML element (including all child elements). Functions the same way htmlText() is described above. Use it in conjunction with parseXml() and select() to provide an element. ###### wholeText(element) _Works from OpenRefine 3.4.1 beta 644 onwards only_ -Selects the (unencoded) text of an element and its children, including any newlines and spaces, and returns a string of unencoded, un-normalized text. Use it in conjunction with parseHtml() and select() to provide an element as in the following example: `value.parseHtml().select("div.footer")[0].wholeText()`. +Selects the (unencoded) text of an element and its children, including any new lines and spaces, and returns a string of unencoded, un-normalized text. Use it in conjunction with parseHtml() and select() to provide an element as in the following example: `value.parseHtml().select("div.footer")[0].wholeText()`. ###### innerHtml(element) Returns the [inner HTML](https://developer.mozilla.org/en-US/docs/Web/API/Element/innerHTML) of an HTML element. This will include text and children elements within the element selected. Use it in conjunction with parseHtml() and select() to provide an element. ###### innerXml(element) -Returns all the inner XML elements inside your chosen XML element. Does not return the text directly inside your chosen XML element - only the contents of its children. To select the direct text, use ownText(). To select both, use xmlText(). Use it in conjunction with parseXml() and select() to provide an element. +Returns the inner XML elements of an XML element. Does not return the text directly inside your chosen XML element - only the contents of its children. To select the direct text, use ownText(). To select both, use xmlText(). Use it in conjunction with parseXml() and select() to provide an element. ###### outerHtml(element) Returns the [outer HTML](https://developer.mozilla.org/en-US/docs/Web/API/Element/outerHTML) of an HTML element. outerHtml includes the start and end tags of the current element. Use it in conjunction with parseHtml() and select() to provide an element. ###### ownText(element) -Returns the text directly inside the selected XML or HTML element only, ignoring text inside children elements. Use it in conjunction with a parser and select() to provide an element. +Returns the text directly inside the selected XML or HTML element only, ignoring text inside children elements (for this, use innerXml()). Use it in conjunction with a parser and select() to provide an element. ## Array functions @@ -347,17 +345,17 @@ Returns the text directly inside the selected XML or HTML element only, ignoring Returns the size of an array, meaning the number of objects inside it. Arrays can be empty, in which case length() will return 0. ###### slice(a, n from, n to (optional)) -Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. If to is omitted, it is understood to be the end of the array. For example, `[0, 1, 2, 3, 4].slice(1, 3)` returns [ 1, 2 ], and `[ 0, 1, 2, 3, 4].slice(1)` returns [ 1, 2, 3, 4 ]. Also works with strings; see [String functions](#slices-n-from-n-to-optional). +Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. If the to value is omitted, it is understood to be the end of the array. For example, `[0, 1, 2, 3, 4].slice(1, 3)` returns [ 1, 2 ], and `[ 0, 1, 2, 3, 4].slice(2)` returns [ 2, 3, 4 ]. Also works with strings; see [String functions](#slices-n-from-n-to-optional). ###### get(a, n from, n to (optional)) Returns a sub-array of a given array, from the first index provided and up to and excluding the optional last index provided. Remember that array objects are indexed starting at 0. -If to is omitted, only one array item is returned, as a string, instead of a sub-array. To return a sub-array from one index to the end, you can set the to argument to a very high number such as `value.get(2,999)` or you can use something like `with(value,a,a.get(1,a.length()))` to count the length of each array. +If the to value is omitted, only one array item is returned, as a string, instead of a sub-array. To return a sub-array from one index to the end, you can set the to argument to a very high number such as `value.get(2,999)` or you can use something like `with(value,a,a.get(1,a.length()))` to count the length of each array. -Also works with strings; see [get() in String functions](#gets-n-from-n-to-optional). +Also works with strings; see [String functions](#gets-n-from-n-to-optional). ###### inArray(a, s) -Returns true if the array contains the desired string, and false otherwise. +Returns true if the array contains the desired string, and false otherwise. Will not convert data types; for example, `[ 1, 2, 3, 4 ].inArray("3")` will return false. ###### reverse(a) Reverses the array. For example, `[ 0, 1, 2, 3].reverse()` returns the array [ 3, 2, 1, 0 ]. @@ -366,7 +364,7 @@ Reverses the array. For example, `[ 0, 1, 2, 3].reverse()` returns the array [ 3 Sorts the array in ascending order. Sorting is case-sensitive, uppercase first and lowercase second. For example, `[ "al", "Joe", "Bob", "jim" ].sort()` returns the array [ "Bob", "Joe", "al", "jim" ]. ###### sum(a) -Return the sum of the numbers in the array. For example, `[ 2, 1, 0, 3].sum()` returns 6. +Return the sum of the numbers in the array. For example, `[ 2, 1, 0, 3 ].sum()` returns 6. ###### join(a, sep) Joins the items in the array with sep, and returns it all as a string. For example, `[ "and", "or", "not" ].join("/")` returns the string “and/or/not”. @@ -424,40 +422,40 @@ Also works with strings; see [diff() in string functions](#diffsd1-sd2-s-timeuni ###### inc(d, n, s timeUnit) -Returns a date changed by the given amount in the given unit of time (see the table below). The default unit is “hour”. For example, if you want to move a date backwards by two months, use `value.inc(-2,'month')`. +Returns a date changed by the given amount in the given unit of time (see the table below). The default unit is “hour”. A positive value increases the date, and a negative value moves it back in time. For example, if you want to move a date backwards by two months, use `value.inc(-2,"month")`. ###### datePart(d, s timeUnit) -Returns part of a date. Data type returned depends on the unit (see the table below). +Returns part of a date. The data type returned depends on the unit (see the table below). OpenRefine supports the following values for timeUnit: | Unit | Date part returned | Returned data type | Example using [date 2014-03-14T05:30:04.000789000Z] as value | |-|-|-|-| -| years | Year | Number | value.datePart("years") -> 2014 | -| year | Year | Number | value.datePart("year") -> 2014 | -| months | Month | Number | value.datePart("months") -> 2 | -| month | Month | Number | value.datePart("month") -> 2 | -| weeks | Week of the month | Number | value.datePart("weeks") -> 3 | -| week | Week of the month | Number | value.datePart("week") -> 3 | -| w | Week of the month | Number | value.datePart("w") -> 3 | -| weekday | Day of the week | String | value.datePart("weekday") -> Friday | -| hours | Hour | Number | value.datePart("hours") -> 5 | -| hour | Hour | Number | value.datePart("hour") -> 5 | -| h | Hour | Number | value.datePart("h") -> 5 | -| minutes | Minute | Number | value.datePart("minutes") -> 30 | -| minute | Minute | Number | value.datePart("minute") -> 30 | -| min | Minute | Number | value.datePart("min") -> 30 | -| seconds | Seconds | Number | value.datePart("seconds") -> 04 | -| sec | Seconds | Number | value.datePart("sec") -> 04 | -| s | Seconds | Number | value.datePart("s") -> 04 | -| milliseconds | Millseconds | Number | value.datePart("milliseconds") -> 789 | -| ms | Millseconds | Number | value.datePart("ms") -> 789 | -| S | Millseconds | Number | value.datePart("S") -> 789 | -| n | Nanoseconds | Number | value.datePart("n") -> 789000 | -| nano | Nanoseconds | Number | value.datePart("n") -> 789000 | -| nanos | Nanoseconds | Number | value.datePart("n") -> 789000 | -| time | Date expressed as milliseconds since the Unix Epoch | Number | value.datePart("time") -> 1394775004000 | +| years | Year | Number | value.datePart("years") → 2014 | +| year | Year | Number | value.datePart("year") → 2014 | +| months | Month | Number | value.datePart("months") → 2 | +| month | Month | Number | value.datePart("month") → 2 | +| weeks | Week of the month | Number | value.datePart("weeks") → 3 | +| week | Week of the month | Number | value.datePart("week") → 3 | +| w | Week of the month | Number | value.datePart("w") → 3 | +| weekday | Day of the week | String | value.datePart("weekday") → Friday | +| hours | Hour | Number | value.datePart("hours") → 5 | +| hour | Hour | Number | value.datePart("hour") → 5 | +| h | Hour | Number | value.datePart("h") → 5 | +| minutes | Minute | Number | value.datePart("minutes") → 30 | +| minute | Minute | Number | value.datePart("minute") → 30 | +| min | Minute | Number | value.datePart("min") → 30 | +| seconds | Seconds | Number | value.datePart("seconds") → 04 | +| sec | Seconds | Number | value.datePart("sec") → 04 | +| s | Seconds | Number | value.datePart("s") → 04 | +| milliseconds | Millseconds | Number | value.datePart("milliseconds") → 789 | +| ms | Millseconds | Number | value.datePart("ms") → 789 | +| S | Millseconds | Number | value.datePart("S") → 789 | +| n | Nanoseconds | Number | value.datePart("n") → 789000 | +| nano | Nanoseconds | Number | value.datePart("n") → 789000 | +| nanos | Nanoseconds | Number | value.datePart("n") → 789000 | +| time | Milliseconds between input and the [Unix Epoch](https://en.wikipedia.org/wiki/Unix_time) | Number | value.datePart("time") → 1394775004000 | ## Math functions @@ -507,7 +505,7 @@ Some of these math functions don't recognize integers when supplied as the first ## Other functions ###### type(o) -Returns a string with the data type of o, such as undefined, string, number, boolean, etc. For example, a Transform operation using `value.type()` will convert all cells in a column to strings of their data types. +Returns a string with the data type of o, such as undefined, string, number, boolean, etc. For example, a [Transform](cellediting#transform) operation using `value.type()` will convert all cells in a column to strings of their data types. ###### facetCount(choiceValue, s facetExpression, s columnName) Returns the facet count corresponding to the given choice value, by looking for the facetExpression in the choiceValue in columnName. For example, to create facet counts for the following table, we could generate a new column based on “Gift” and enter in `value.facetCount("value", "Gift")`. This would add the column we've named “Count”: @@ -522,7 +520,7 @@ Returns the facet count corresponding to the given choice value, by looking for The facet expression, wrapped in quotes, can be useful to manipulate the inputted values before counting. For example, you could do a textual cleanup using fingerprint(): `(value.fingerprint()).facetCount(value.fingerprint(),"Gift")`. ###### hasField(o, s name) -Returns a boolean indicating whether o has a member field called [name](expressions#variables). For example, `cell.recon.hasField("match")` will return false if a reconciliation match hasn’t been selected yet, or true if it has. You cannot chain your desired fields: for example, `cell.hasField(“recon.match”)` will return false even if the above expression returns true). +Returns a boolean indicating whether o has a member field called [name](expressions#variables). For example, `cell.recon.hasField("match")` will return false if a reconciliation match hasn’t been selected yet, or true if it has. You cannot chain your desired fields: for example, `cell.hasField("recon.match")` will return false even if the above expression returns true). ###### coalesce(o1, o2, o3, ...) Returns the first non-null from a series of objects. For example, `coalesce(value, "")` would return an empty string “” if `value` was null, but otherwise return `value`. From 07d3153839a09d9871810c030a58dc6b199ba2b6 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Tue, 12 Jan 2021 15:19:24 -0500 Subject: [PATCH 13/21] Exporting --- docs/docs/manual/exporting.md | 66 +++++++++++++++++++------------- docs/static/img/export-menu.png | Bin 0 -> 12993 bytes 2 files changed, 39 insertions(+), 27 deletions(-) create mode 100644 docs/static/img/export-menu.png diff --git a/docs/docs/manual/exporting.md b/docs/docs/manual/exporting.md index bf0f12237..b6ffe8379 100644 --- a/docs/docs/manual/exporting.md +++ b/docs/docs/manual/exporting.md @@ -6,24 +6,26 @@ sidebar_label: Exporting ## Overview -Once your data is cleaned, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata. +Once your dataset is ready, you will need to get it out of OpenRefine and into the system of your choice. OpenRefine outputs a number of file formats, can upload your data directly into Google Sheets, and can create or update statements on Wikidata. You can also [export your full project data](#export-a-project) so that it can be opened by someone else using OpenRefine (or yourself, on another computer). ## Export data -Many of the following options only export data in the current view - that is, with current filters and facets applied. Some will give you the choice to export your entire dataset or just your current view. +![A screenshot of the Export dropdown.](/img/export-menu.png) -To export from a project, click the Export dropdown button at the top right corner and pick the format you want. You options are: +Many of the options only export data in the current view - that is, with current filters and facets applied. Some will give you the choice to export your entire dataset or just the currently-viewed rows. + +To export data from a project, click the Export dropdown button in the top right corner and pick the format you want. Your options are: * Tab-separated value (TSV) or Comma-separated value (CSV) * HTML-formatted table -* Excel (XLS or XLSX) -* ODF spreadsheet +* Excel spreadsheet (XLS or XLSX) +* Open Document Format (ODF) spreadsheet (ODS) * Upload to Google Sheets (requires [Google account authorization](starting#google-sheet-from-drive)) * [Custom tabular exporter](#custom-tabular-exporter) * [SQL statement exporter](#sql-statement-exporter) -* [Templating exporter](#templating-exporter) +* [Templating exporter](#templating-exporter), which generates JSON by default You can also export reconciled data to Wikidata, or export your Wikidata schema for future use with other OpenRefine projects: @@ -35,7 +37,7 @@ You can also export reconciled data to Wikidata, or export your Wikidata schema ![A screenshot of the custom tabular content tab.](/img/custom-tabular-exporter.png) -With the custom tabular exporter, you can choose which of your data to export, the separator you wish to use, and whether you'd like to download it to your computer or upload it into a Google Sheet. +With the custom tabular exporter, you can choose which of your data to export, the separator you wish to use, and whether you'd like to download the result to your computer or upload it into a Google Sheet. On the Content tab, you can drag and drop the columns appearing in the column list to reorder the output. The options for reconciled and date data are applied to each column individually. @@ -44,23 +46,23 @@ This exporter is especially useful with reconciled data, as you can choose wheth “Output nothing for unmatched cells” will export empty cells for both newly-created matches and cells with no chosen matches. “Link to matched entity's page” will produce hyperlinked text in an HTML table output, but have no effect in other formats. At this time, the date-formatting options in this window do not work. You can [keep track of this issue on Github](https://github.com/OpenRefine/OpenRefine/issues/3368). -In the future, you will also be able to choose how to [output date-formatted cells](exploring#dates). You can create a custom date output by using [formatting according to the SimpleDateFormat parsing key found here](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-). +In the future, you will be able to choose how to [output date-formatted cells](exploring#dates). You can create a custom date output by using [formatting according to the SimpleDateFormat parsing key found here](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-). ![A screenshot of the custom tabular file download tab.](/img/custom-tabular-exporter2.png) On the Download tab, you can generate a preview of how the first ten rows of your dataset will output. If you do not choose one of the file formats on the right, the Download button will generate a text file. On the Upload tab, you can create a new Google Sheet. -With the Option Code tab, you can copy JSON of your current settings to reuse on another project, or you can paste in existing JSON settings to apply to the current project. +With the Option Code tab, you can copy JSON of your current custom settings to reuse on another export, or you can paste in existing JSON settings to apply to the current project. ### SQL exporter -The SQL exporter creates a SQL statement containing the data you’ve exported, which you can use to overwrite or add to an existing database. Choosing ExportSQL exporter will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement with options to preview and download the statement. +The SQL exporter creates a SQL statement containing the data you’ve exported, which you can use to overwrite or add to an existing database. Choosing ExportSQL exporter will bring up a window with two tabs: one to define what data to output, and another to modify other aspects of the SQL statement, with options to preview and download the statement. ![A screenshot of the SQL statement content window.](/img/sql-exporter.png) The Content tab allows you to craft your dataset into an SQL table. From here, you can choose which columns to export, the data type to export for each (or choose "VARCHAR"), and the maximum character length for each field (if applicable based on the data type). You can set a default value for empty cells after unchecking “Allow null” in one or more columns. -With this output tool, you can choose whether to output only currently visible rows, or all the rows in your dataset, as well as whether to include empty rows. Trimming column names will remove their whitespace characters. +With this output tool, you can choose whether to output only currently visible rows, or all the rows in your dataset, as well as whether to include empty rows. The option to “Trim column names” will remove their whitespace characters. ![A screenshot of the SQL statement download window.](/img/sql-exporter2.png) @@ -68,9 +70,9 @@ The Download tab allows you to finalize your comp Include schema means that you will start your statement with the creation of a table. Without that, you will only have an INSERT statement. -Include content means the INSERT statement with data from your project. Without that, you will only create empty columns. +Include content means including the INSERT statement with data from your project. Without that, you will only create empty columns. -You can include DROP and IF EXISTS if you require them, and set a name for the table which the statement will refer to. +You can include DROP and IF EXISTS if you require them, and set a name for the table to which the statement will refer. You can then preview your statement, which will open up a new browser tab/window showing a statement with the first ten rows of your data (if included), or you can save a `.sql` file to your computer. @@ -78,30 +80,36 @@ You can then preview your statement, which will open up a new browser tab/window If you pick Templating… from the Export dropdown menu, you can “roll your own” exporter. This is useful for formats that we don't support natively yet, or won't support. The Templating exporter generates JSON by default. -The window that appears allows you to set your own separators, prefix, and suffix to create a complete dataset in the language of your choice. In the Row Template section, you can choose which columns to generate from each row by calling them with variables. +![A screenshot of the Templating exporter generating JSON by default.](/img/templating-exporter.png) + +The Templating Export window allows you to set your own separators, prefix, and suffix to create a complete dataset in the language of your choice. In the Row template section, you can choose which columns to generate from each row by calling them with [variables](expressions#variables). This can be used to: -* output reconciliation data (`cells["column name"].recon.match.name`, `.recon.match.id`, and `.recon.best.name`, for example) instead of cell values -* create multiple columns of output from different member fields of a single project column -* employ GREL expressions to modify cell data for output (for example, `cells["column name"].value.toUppercase()`). +* output [reconciliation data](expressions#reconciliation), such as `cells["ColumnName"].recon.match.name` +* create multiple columns of output from different [member fields](expressions#variables) of a single project column +* employ [expressions](expressions) to modify data for output: for example, `cells["ColumnName"].value.toUppercase()`. -Anything that appears inside doubled curly braces ({{}}) is treated as a GREL expression; anything outside is generated as straight text. You can use Jython or Clojure by declaring it at the start: for example, `{{jython:return cells["Author"].value}}` will run a Jython expression. +Anything that appears inside doubled curly braces ({{ }}) is treated as a GREL expression; anything outside is generated as straight text. You can use Jython or Clojure by declaring it at the start: +``` +{{jython:return cells["ColumnName"].value}} +``` :::caution Note that some syntax is different in this tool than elsewhere in OpenRefine: a forward slash must be escaped with a backslash, while other characters do not need escaping. You cannot, at this time, include a closing curly brace (}) anywhere in your expression, or it will cause it to malfunction. ::: -You can include [regular expressions](expressions#regular-expressions) as usual (inside forward slashes, with any GREL function that accepts them). For example, you could output a version of your cells with punctuation removed, using an expression such as `{{jsonize(cells["Column Name"].value.replaceChars("/[.!?$&,/]/",""))}}`. +You can include [regular expressions](expressions#regular-expressions) as usual (inside forward slashes, with any GREL function that accepts them). For example, you could output a version of your cells with punctuation removed, using an expression such as +``` +{{jsonize(cells["ColumnName"].value.replaceChars("/[.!?$&,/]/",""))}} +``` -You could also simply output a plain-text document inserting data from your project into sentences (for example, "In `{{cells["Year"].value}}` we received `{{cells["RequestCount"].value}}` requests."). +You could also simply output a plain-text document inserting data from your project into sentences: for example, "In `{{cells["Year"].value}}` we received `{{cells["RequestCount"].value}}` requests." -You can use the shorthand `${Column Name}` (no need for quotes) to insert column values directly. You cannot use this inside an expression, because of the closing curly brace. +You can use the shorthand `${ColumnName}` (no need for quotes) to insert column values directly. You cannot use this inside an expression, because of the closing curly brace. If your projects is in records mode, the Row separator field will insert a separator between records, rather than individual rows. Rows inside a single record will be directly appended to one another as per the content in the Row Template field. -![A screenshot of the Templating exporter generating JSON by default.](/img/templating-exporter.png) - -Once you have created your template, you may wish to save the text you produced in each field, in order to reuse it in the future. Once you click Export OpenRefine will output a simple text file, and your template will be discarded. +Once you have created your template, you may wish to save the text you produced in each field, in order to reuse it in the future. Once you click Export OpenRefine will output a simple `.txt` file, and your template will be discarded. We have recipes on using the Templating exporter to [produce several different formats](https://github.com/OpenRefine/OpenRefine/wiki/Recipes#12-templating-exporter). @@ -109,13 +117,17 @@ We have recipes on using the Templating exporter to [produce several different f You can share a project in progress with another computer, a colleague, or with someone who wants to check your history. This can be useful for showing that your data cleanup didn’t distort or manipulate the information in any way. Once you have exported a project, another OpenRefine installation can [import it as a new project](starting#import-a-project). +You can either save it locally or upload it to Google Drive (which requires you to authorize a Google account). + :::caution -OpenRefine project archives contain confidential data from previous steps which is still accessible to anyone who has the file. If you are hoping to keep your original dataset hidden for privacy reasons, such as using OpenRefine to anonymize information, do not share your project archive. +OpenRefine project archives contain confidential data from previous steps, which will still be accessible to anyone who has the archive. If you are hoping to keep your original dataset hidden for privacy reasons, such as using OpenRefine to anonymize information, do not share your project archive. ::: -From the Export dropdown, select OpenRefine project archive to file. OpenRefine exports your full project with all of its history. It does not export any current views or applied facets. Any reconciliation information will be preserved, but the importing installation will need to add the same reconciliation services to keep working with that data. +To save your project archive locally: from the Export dropdown, select OpenRefine project archive to file. OpenRefine exports your full project with all of its history. It does not export any current views or applied facets. Existing reconciliation information will be preserved, but the importing computer will need to add the same reconciliation services to keep working with that data. -OpenRefine exports files in `.tar.gz` format. You can rename the file when you save it; otherwise it will bear the project name. You can either save it locally or upload it to Google Drive (which requires you to authorize a Google account), using the OpenRefine project archive to Google Drive... option. OpenRefine will not share the link with you, only confirm that the file was uploaded. +OpenRefine exports files in `.tar.gz` format. You can rename the file when you save it; otherwise it will bear the project name. + +To save your project archive to Google Drive: from the Export dropdown, select OpenRefine project archive to Google Drive.... OpenRefine will not share the link with you, only confirm that the file was uploaded. ## Export operations diff --git a/docs/static/img/export-menu.png b/docs/static/img/export-menu.png new file mode 100644 index 0000000000000000000000000000000000000000..ad02db554911ebb4560bc04f7a6f0c64bf2205e5 GIT binary patch literal 12993 zcmb_@dpy(q-~W_yETJ4C%&~(*D5tOtQ9>v=ucG8I$B@HDA>^=$NJL$7mPO8F9b}VJ zIkYfT8^atj=CEP+XLWr~kH`JH?(g^h-G7YPK8M%){dzv1uhU1;1uL^Xya#zfAkdz( z<|eiv5Qhiw`oz5(_$2G}K4ah)N3g9K6jaqKIRpH`>1kwX1Oio~_%^TZ0{-R+Fn0_F zf%d*gmb?PNjOWclS7}STtWQe=NRZMPvmDJ-Gkof=nz?FeNc1Dla%pI;+>I4Nmc?Nkm6}P2z2Iu`tg!yd$j>eUt645WUb>MtNyJ^9XFu8X1x}@ zYBwq3W!WIVodMsIY%7+eBUE7(n%1#8qF3s)pZkz2`|*XkpBiyCXGHs#v8 zp!Y0)mO!@3SR0fACMUKlcIW;fF7U7%-+_~)cooRfm0c!|cD`>|#IvlJzAMt$F@(%i{ zFkWpF1=)5;TVoZmn5V)Rr-C<2wxgdj#8cQ;aINgUgqzF@Q%!4;N?PunM-|q?6#sss z?F4Bg9w&`l^jKj++L_yJ(cckueK*ORnX}X%TcW$~Y__qC%jdO7_hxlQ>7_3g+n3FU za(95=E23fRi&&PS@FHDJUsc1c;q}@(($*6kYdPDN$;tYMOEp=#kML2c6{OqWVIs7 zZktjdO*X89Y;W!iA0Oy&0nRA-Ka*Fo?doM~pPMO&V@-H7r5<&8e{M zY8LKdfoA(~70x?)!&BdHk!j#)RzAL;TU_hT=DTR-e8bFbcw?XxmM~#mhhRQ#;aF@_ z#4N6@<(V(6V#?}qi(>_&9mt8*wZ+cvh>zi{@xvMf|F2m~Q<9XVUn@L+vgh5Wz_q%j z37@cRc(_4<@siWDk5-)Fgcq@?2Ts9!kVEM|NWB;>?fF_N7{Nq#T*a=`iK9~GF8TAu z@#mgKe#dT1<=$##T#TkvM{c#0)%&~67aa5}E^&KESX^C`TE^gRuZ_EJJt=XU$A+)D zvA#YjVO)+SNl_PHpK1+-5BPXF(+v#Rmy5 zZ?E>J3{2(dI5PCnu>PtqE3~-4Cz|$fyP~1skRM)&dAWGYs2yI^W_HHvbXl8fxzFvQ z=pc0OD@^;AK$D1FT0;Us$%QXH&^!9*uf^EDFmnDrvc7He4;W>`kU6OAf?JyTO0cxG zB=gO-r8B{ILr2vF{#e4D*M0^;HXFgxU?uf^`*b;q7|o^SSM^J9`G`??pTAYP3PNRt z3MPgqyYeAY`-DAPq~30i6a>zUJpU&bt^Y_iSh8ic2auGd%?hSrydx8{1>q}Jtaf4R zZiEzKN@b`D6tg+_!?7b@{w%VI3E~3NMjWdU_EiTD2QS)yu!_CArpH*tg=} zgR`WpwOhQW8W3dhutpI(DacgRW>O%Lo)|q|wsjIhLSU7s|Jcnw2y3%_eQg_yPhFj` zg9~W0-ECm~^|gplL$tvyNhJIqgnrd$cuIYIj54{H2! zlpo4T3p5A?83wsEnHn_Q{+nukP>kFB;_NTT`T0UCu;T+`SQRZf575noXhuo&V>dWo z96v~u3#!Er-F=fU&Q~ms|C0#!@NsVOrz-nuuI|$~?JVQ*UdAJ){%a+ALd+#}<_?-; z1R9qY^tOl#x;rL$Cs-oQIBxUpkl(hgSJFa+cjFPm#@R6MR)yv^zi9zR#n|ocg_)8B zmpa^-qAjTk%eXS4zdtP1Z@TqdQ{Tx2n{W*XA-7dTVh3;`C-5op#j1PGZ-d9ef{Z2M z4|p}Szcy#_s}P@Rho?0JZl#IKg`MYvENaG}_n19Fo3=6iVygRw^pn#(5ruCqYr1r* z!p`=My|vo1+I$ZKvo&+w~VNsCpR zL37myMsY+jmG}^;rJ9Wl9;Cb~Y4c!$ar~rq2;ti!MAHZ-VBb+NEG$QSX8m(u>)8?* za>6>zM*A&2z>YawoBN${YIMxRyZ8mpY3z#J5`R!g0_Y7lNtA0KQWTRvgg!j6O$FX+M?u%i%qS#w%*)t$`{+iCY84A(3&ZT@>) z-g@XsAMix%CWN)N_0}%hIsDiGIFIgfmoj$6ec*6uQEr=!oNnX%ie+uv+f9Bm(|`>7 zsovKjcjuvPpy2}D2_4qa1St}pO75J8+2AkbI{Q};WKXHIEron{Iq_C3us{cAQk)i2 zb$?01ByJN3zd0xva-%5v=QktPB5O_UY}ki8=qKlE(eHDWvO*4jDf15y-s9o_lB3K$ zB-VaPb#hNV-^LDJigUNxO&%PyH$_EvwMRtQ+|r_|3|8Lum>6*iIiH5n7qwI^F{apM z&OSn6wsh+gxHkx}=+}wsKtRVW-b`zK$eRzw_<^?gq$l!{Nb&26-Q!o}779IIVl={z z_`h_;i(b%~xHJ}Mg_uhxbY_e$29=4oE{ICo<(|rQ*Gr&YolstV*L3r)ei8LKe#oYK zmy}>lx1pK>Q7#4VTz1nOPix5+dKNLR4e!&OncgTs=8M+u(*AsUb@wv92h$;5`qA~H z1l)6Z*Zy8@n=r!?5n3CP=dEH#kbGa@LC=u$*Zp1}s!PR6m#^L_r;1JLu0MiWhCpzu z`@mh@%eFhf6g12`{nP%BNjpit#ncXo;H{+qAqS$!m`Ra@Z^t}AJKA)?c6>PAniSRb*hnB-5Vx3ZzuCI^Wd$e8tZmWCsi}s|dYl(-m(W9;5I1D z%(X?#2pQXE33cm2N;kDNquPH(>_*I9Ky7kO<2Gk2FsmqoRR>m&_Cz;W7O^%K7K;eX zjcc`{FWG#|co0}#@V7&W!F{^m=q)-SIMI9}oN{Q*Y5Ulyo=3q12GvC$LfpD3UNS=Z zb19fs+I?Z#Wf!uu=S?b87HiU`tx=r+^7LF-a$A@mtC z=ag-Sl`z+_?HSgFx(!|&7S3gui7;;&571X-S+ZK&`nxfvtaXxSet5^$<<+yJg($;= zwVkULYOnX#ei$ulqa-TEE3*{1uZ@Be(LhjHPUgAsdut02tHU2S|J=)!0c$nIKMhbS zhzb&1`fMiMRFr4CaDOLw=bH~;==M4Z)q`WpJv)EdVg*rWi?)jX{4x~9k&>W1cA}~T zCtCt~X_JSR6YvEC*K?8N9u8nD^lAJ(;Mm$-P%TNed}|#e4>GK?McD66@JB>lB2LJ9 zV9!U&t}6G2k6Cv0!Za$1CVj*)qn(uKm35HUzMAhkH}6F~mE5NR-5m+YeqarHtnK=j z%3b6*Vn}EG0fGhg)G?0E2h1|9;R*U>r0&*j$88sQO*8?J)eY;72M9aTl(*o;(M*Vr zr*_r$MR0$G~FJx@k@e4?#Iu*mJxH#Qp=W=B9G zY%>zjrnTFJ_1mRUWAQ;N(7lXE_a-7TT3KWWaucOcoAUkamr#6o^&&353T6SBgQZTAHMY)kF*R zZ1h~6!sskx%I9pYSLadU^4K%@g0n$e-eM2}ryo+sahtsWdU||za`LnqpUJ*1n4_}n zSuld4|5>asF)FuLlee52wdcFo@W@_0RA6z-U}lBsPXtL)(5X?lu5K3DQc>|t1R+)Q zr%TN7Q{0{@pqB;iahn7za*Q=L@%o|fDY`>{0XuTx^bKS ztktH`)ulexCaVS>R5@oof65aeO`}dNnni~}cs(8`JK$+f?Z+^$W9n{i`FJrCqSq@T z4^hnHA@<(oWfs2@2)TT4^6}R$!PVBq)o8m7@3_ic`CS>m(h@UekMwavn+^^7s9q~t zVFjr+bP#NT?{>Jca57{D;Cw*f^SC+BqNYz8bc{O(6)x>mp1|$+PCEBvE>H-7L~&&( z?#!sn!KB{VX9>oh{@U%sNd>An8VUzg~;cm4$idHHi@%*pTAsv z2^T^`M3w0Gmr9zm!K-vv?=FH+eQKQvG?bur5I5wei@a#j>RP@>HMWO$BlYd|Y^fpW zcqA$QokYWHknGvYdeppxFyZNNn2Y|sJ~>R?%8!T>vyBeE$AS=RLw^LHQ>jKPjH#lh zlz+x_ubYCiy<;tme&vt2Cu-P@I_yfRPz^DYc>}+cDw#;#*D2K>W^x0*?cP}MdN;GR z+9eekoo7cE7#SdW@f3H!V5syh1Ye!}Z$dJ|KM<;}ZQf298Toc?AGpLNVCHigdMx!( z^Ah>q8We5lB*J)PEFWnO3x5%pxz9f9TlzVdmNO`&=?GpD#=C631tT)1SSKGh=hG@h z+f8v$N4kWCzypr_@{&iB+N5;dyWGz`^@H7x^|6%P;8E}cgQxzsQq36IQcszYUV`C$ z-~8*;`u_FK`sTI`Cn2~~&HW`x5yKvZi54-mc7+Fs|AK)BPCiE0UYjJZCOSJM9Wj=O z=4!p&5g`;E@^oKLHzJn7=?YYhRm-I5`giNqATCt(S0f@+6p`eU7e-J2pFI)A)rMboH|EB){0stVx@1XYpJY(nfI)7&K z>Yp_R_Mg^=WSjA?71IseH&b4H&pEhtWtyrgE-53Ldty&?zrKX4Q(rQFngn!ZgG zQ!jESu4=S|XnOjS-v##fL5xG`8E7oWk+%6t+7%1Ju3Y9FyI zvqRS;pKH)rB|Ih-DByzkf~hBGh!4&!*H|h@HXYa#Qq2QUX!E1}j5mRC5Uhnc+VP0n70eq$!^d_nY&4`dQQC zp)tgGFK5mnPgFapIwarW4G(ykU)%{`*wk}Ryzwo5M;8h)V;JTY1MK;y94^PN|EmT&Pg ztR%-T14CPBx+VE4$ed2No7L+t9d+q&9uHt$1bE_I`N82plk%mHvdB{|C)<-k9i5(k z5=nO$DZ6iM68xGD%qQ`f)j{5sareBxuzkXJ!^I?~b_khW} zf}}KSv@=RA-15iLWUCP2S(HH+9W)0-;VFGh4>ewuluf5d$M188aD|*lah#fiAGJ&m z8>AW*>@NzPC}wa{M5S^?7WJYR_JE#>0Jdf1gO3gCKH~>KG3eNI-{Z$HZBQ*T<;MOk z{oYxd*pD#p+_Q6hH>xbU-6QPZ2^@iCTvXw!a98qRmG6z&32dkCsHO+Oj7Ra$_=kOJ)I;juk; zN0{r+WSaVQGFrb#6AGh@(riZ)kx?}_P-q315uQ%)ka@tI{mK)^Uvd{kA*(!bIa%QqjL+h`iQXFQnqbDq5(UavLg;;T0>2YFcq_+8 z=t+TOOCD+hyBJZ05qJXzE^(1ZoYwtnI5z=zAYd=M4MdwE-aqVbv&oj6e{Kr>|Df%! zVEV{7PBPkNleTMV=Ry~OE?Pxct7aX-e$MWoB(}wDD-%tuZXN`j%WfEPP2f&dwX1iX zIFeS~r(Bc#kPmuq9*`G1C31Jak&}pSARehkGbYPgi1xg?iRgCd8q#-0WyOqyAhhwp zx;LWi94O)e^!0j&#@WUr6k>h_Tn}rd(v#UEPI%yZjLxOA7r4_=Fc-DiYp`4Wm~;9^ z`5td@m%0T+`xMMq=iEmEiL66^!danEDtghC(utuoI3t}s- zUjVvn=j+f9^-stO5i$=Xm^TV?2_GzE=sUmp+#LtuR`djL;+#Zx?bJcQV?R0weB8+UQ}gSy?`t|LU_F z%>kAuw!;62A3N?da7`>;3!-Luzfk`54xlcvhf3W*8_r05r+U6(PBtYXL-HUfTX<() zS7^ORgQKI;8+M{Q$UCf_)M<9y{nihM5gKBaEPL09HtlYLZ)u~F_drJmeRXr9g>DnW zu0&#d+PHqm9kxmWfF9o7Vu?)tlcbx-n$ zgRB5E{{un$6Jn&lmFIw&>(WA?T8{raA?4!PXwJh|X{cnh!Vuh(R56`cjj7*sRGtP9ZMMG$E`>rxD)cg2h1~l{Gxa)Q^)Jv9ulG)XPIxWxZja3P9L8LF zv0&NR5Z`pyM~tvFY2O;X?&(t!8#LY9 zLw3cHlF%z&y`x3J6+PGnSd^;8#8P1TgGwJl-K>wGlmuDzNx}lbKN~Yue)HnL8gxCE`L#F;VZ4_hR+b; zh#y})-^u+x2^MS|eaQ#=*h?4I?O#!kAO>@XXZGs9O7YGsnkv|1k2sACR^(;TmEFty;JB$k_RSeBFcx+zhV1yIeI;so<#s zFhyRz{lVi;8yQfiv_J}}NOVOd&l$ems3IFV9+Ve($l~(QLDz|ERhJN5q4dCj&%ZeW z)}Gl6wC}pz4xwg#b#>jPsjAen;eZH6X@0&wA)o63Q6ZiKn{9-oh81Ha9a^<|Ok5)$ z9gFl!udLZvJ7-zYARSqW$PtS?n?@N4K~}D}Y7zo28PAMV!RdARDkN%Cyk)1?xh#}cW*n=Yw%Nt$wi#6r;9}zWJYry--QqLJ);fg^Tike zqP$nArsJ%a(3cmzXs(s7X%6|e{qob7+TP`wyj0O8Y2Vw~x9g1z!2e6oUT*DUuc;NL zE}qDH%8h~7o3@*Z9q^RLV9(i715r>c8z4Ps-0-z&@`bzKVXi4#*X?zsgH^#6EfPvu z3&-=N&qI_#bBjFj-jz377&IYouQyPg{jUM>@0`etpTpJ~2zp;ilPFy68ViB6?URz>gKBjfX_6Db z&F(-m$B`WXMT2gH%VSeM4>hJ_#5w=C>GZ{GJk&hRZ#viE?h&a0ICE(;$jW99LoD2& zn~Z5*7encV0>zVC1kV-r_U0Y+c%7B!Z0BD;wo>}uKQevys3Ae#l*QeC)|uzzbar~k zQr!crTje1947^f*hXg>GsS{wcySGN$44kHe>t3J^#kY35T$&hqbnv8`4Ssh0iZkI@ z`^g=?0sw{v;`I^4g720E8gAJaB+X{uCqFZhJJk8SB1duSAnh6b;URPM z55LF%0?^0ca&P4Fu+*hg~*Gy;DhI zWDs_|F$53@P^{xq5HeB5@g@i`w?2N(bI6dt^$3V~9QalX*yDzO`28k*&z{+dd1U3d37P7^56ow;M9A2jS&{(gBxn}he4jR`OES2mMy!2a%w5{ zAReY5H7&Ue_0h zKH#saWIps(&T{H9CTLnct<@@@xP2hrrLKuyAEBbrf9)VlwAWNPMQ|-C_&i?yhz(a} z(4qb;X49;D+`3K+n~RDb8`p=Rd;RkrzC3fV!;*JKobnrZ^)Tlnv8eTJC>VD7dsOnq zE|u>N_?&75oZu_dGOx~)bc>%VEoypf@rnPQBTju94IGVO-V~dw=msWyzM?c>!WVWaEYD6lk{m)KueP{&@5V4c z;0C99ryliwkNB$HRT-fASjKur&&xx9Yo@)Bnf=@TjxAL#fINEOuElHT;$|I3YYqCI zuNh=7!-uqAd$x+xQ(hkhW|g&1df2l{ld+9FATeSnaOR42rR(D|9pXL$!%%ODXvoF+ zotFseAML{Cxt^Ln^9<*bS39XlNtMinXMi|19xoD1wlxJ;fl6%D-%! zkI6-L%wN_vw*22s5;NbV&M=D^*+G};LoRpqDH&2^%oaB1*rI|)C*##9lbg2ry-mFh zb=1(SmGV&=fU9K4?z&r16!E$Es6qHnh-$6p07k|GRDpLPIm*!>`YIsf#u0N{Z# zA=+$TatAkSPUTuGOK;E1U2Jm2=WRRp{E^A=2AEnM$MJ=qUS&dtgD^vv^%(BAq!YFw z&=gx~Phxj5haVs#k22!s;`BO)=w~-4{C9F#GnI^}ENU4u*QIzAqz&{#y1k(GHL1)} zc=`*G#dW=Hh&?{kn;)d3#5V5UG)-E~B$`%lh9`w?nnNM=+@m`j zxlno@njiOohS-z=Dif`pe-D}efpq?*_WkR2{;OyLB~n+p)GXhuf7HY2LTsX;8wT_9 z!v-ZrtvlgYTHu7Y>TuGlxo_4eKQOF&?$+mi+ns(JrNohF3JCmu7dy#TBHOf8@Kh%r zcHpw^JMV6EseZn>ZMo~@3zN$q)zdQzB}J#^^a_*=F~)u1YPEIw^fKXZ;9uD}`J?z1 zuh0FlBeLO)zz~Qw)zVNyd_up5uedocq^{lFWAv_`=_iYx4LuI2y}%X@L3?@p6!(8_ zoEKFzcy6h3vsvf~P_N)R-5X9R@U+hPhrpY8hj}aeIoEHKMH}02+Zf@ZAKq@uWV}~p$~E6arB5f*;B4m zb5|nA6apTt!0pH(*>CMIdD@JMizw_IvMDFP&6+^`vA#a~7-f)YnvQxJRFiNvyUHS#leaCAxbfHL% zuNB-#`szm&X=Wj|CV+K8k$D-u{Yc%q4c4ubE5+>S?;WhxQ<~BAYgsN>E(`9*k==Ic z12aNEkNH8)p#M!%l?I9AiYz=Eyu3~uuCT(7#?%y>=I6WP?<$@3xc|#NaJErDlfQo!}y{1cCraB6tqxG z@WFpTtgOzi4{Drq!Vmaf2vWjjk%}UfVI=YLL+> z7s5}6l_})c)Rhvisz@QaFN_xWVBkkZYk}wjBx0swoneKU2;hFP!epw7z z@|9{E6^fVjX|(f@oQDrLE8Y1Kc0Mh+;TY{au7{G2Sbp$0JL;+mFXH?H{q%M# z`{9SH-f?-@YSE;QYxr|=3)-4fBYQMlJSw?wq>Il~U8vz>nEv>(wW6#01CHng*6So#7%akocfj@)#>IZ}YrCF%Sd4DWO+n)~``tAl~q9MdQbIRV32rEz|f zX1_4B|NefW!Ktbhiy;}wqvZ4X>-w+iOs35w9{-}Y&?~}WyHCZa$R8;t@ikL zhaFiqx6v`gpYMItKw}G11|HAVAH6#pI!B!pNmA1}vqr#i2j$I4Fn6C^Xl~1$lwP$U z$Ve)Zee>7zV&`9Gk>#?)5K#+ta$#;DIRGRHK%J^8{eM?^{FQS4so3XJ(Ui#t`U9c^ znAOg-A<=53HhBcSX!h|^*Xa8LP@(<7dFi7ExxQtwvmwEn%6IB&d?#c^T?sE%XXevW z(7>YrF}XmvD4tD^H$UPQyr5TYt+nHh7Z~+Nh)?Sa`I4+dyjiXX0}xveOpsW&;+r2c z=CZFm9^=CG%4b0yd(RJQSX^B&QAwL2VzQQq5?Kc!w|7>&kp5=EW;XC##jRG~djzX? zS;;!JeT5TPZe52c&1vDfUji!1h%r>f+f0i-?K5tkAH4sCB+=1&B&6G4$_BA4k?yMs zFN<1U%~T>9)*@qo2N&2uRMuW8`Rvibg@&@1TX3%*7sV?)K>H`}A_sSw(Wf!WL`Y`m_N83N zjqZ#^QCp<)@!pHL`UlGNz$ACc>KGDSfqZ9#!C?jTwu{|>2gTTV@xOt#|GsPI7PVx$ zwo{h-k0)PcbAhM9-iQE$+W+}~GybhZ6sMUi0e{-WxvO8#i~#?30y=AIWl{yba`(Rg DpcDV> literal 0 HcmV?d00001 From b1870b84f29172765ab517c9c0bde9847f9ed177 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 13 Jan 2021 14:04:27 -0500 Subject: [PATCH 14/21] Adding titles to infoboxes --- docs/docs/manual/facets.md | 2 +- docs/docs/manual/reconciling.md | 2 +- docs/docs/manual/running.md | 4 ++-- docs/docs/manual/starting.md | 2 +- docs/docs/manual/transforming.md | 2 +- docs/docs/manual/wikidata.md | 2 +- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/docs/manual/facets.md b/docs/docs/manual/facets.md index 19e38913a..dfa603f4f 100644 --- a/docs/docs/manual/facets.md +++ b/docs/docs/manual/facets.md @@ -96,7 +96,7 @@ Whereas a text facet groups unique text values into groups, a numeric facet sort You will be offered the option to include blank, non-numeric, and error values in your numeric visualization; these will appear in the visual range as “0” values. -:::info +:::info Numbers as text You can create a text facet on numeric data, which will treat each entry as a string. This can be useful if you wish, for example, to manually include facets instead of selecting a range, or sort by count, or copy that count. ::: diff --git a/docs/docs/manual/reconciling.md b/docs/docs/manual/reconciling.md index 7b2544af4..1b14c1184 100644 --- a/docs/docs/manual/reconciling.md +++ b/docs/docs/manual/reconciling.md @@ -19,7 +19,7 @@ You may wish to reconcile in order to: Reconciliation is semi-automated: OpenRefine matches your cell values to the reconciliation information as best it can, but human judgment is required to review and approve the results. Reconciling happens by default through string searching, so typos, whitespace, and extraneous characters will have an effect on the results. You may wish to [clean and cluster](cellediting) your data before reconciliaton. -:::info +:::info Working iteratively We recommend planning your reconciliation operations as iterative: reconcile multiple times with different settings, and with different subgroups of your data. ::: diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index 6d385c275..f0c1d3d89 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -324,7 +324,7 @@ From the home screen, look in the options to the left for Reorder rows permanently. You will see the numbering of the rows change under the All column. -:::info +:::info Reordering all rows Reordering rows permanently will affect all rows in the dataset, not just those currently viewed through [facets and filters](facets). ::: diff --git a/docs/docs/manual/wikidata.md b/docs/docs/manual/wikidata.md index 2061a65fc..5c414ee81 100644 --- a/docs/docs/manual/wikidata.md +++ b/docs/docs/manual/wikidata.md @@ -10,7 +10,7 @@ OpenRefine provides powerful ways to both pull data from Wikidata and add data t You do not need a Wikidata account to reconcile your local OpenRefine project to Wikidata. If you wish to [upload your cleaned dataset to Wikidata](#editing-wikidata-with-openrefine), you will need an [autoconfirmed](https://www.wikidata.org/wiki/Wikidata:Autoconfirmed_users) account, and you must [authorize OpenRefine with that account](#manage-wikidata-account). -:::info +:::info A better resource The best source for information about how OpenRefine works with Wikidata is [on Wikidata itself, under Tools](https://www.wikidata.org/wiki/Wikidata:Tools/OpenRefine). That page has tutorials, guidelines on editing, and spaces for discussion and help. The following text on this page reviews the basics and can help you get set up, but the Wikidata help page is more regularly updated when technology or policies change. Links to the Wikidata help page are included throughout this page. ::: From 690ae799b0e8f833d16abbcef554127298df45a4 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Wed, 13 Jan 2021 14:46:25 -0500 Subject: [PATCH 15/21] Missed one --- docs/docs/manual/installing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/manual/installing.md b/docs/docs/manual/installing.md index adda2313c..87d274470 100644 --- a/docs/docs/manual/installing.md +++ b/docs/docs/manual/installing.md @@ -519,7 +519,7 @@ Make sure it is not commented out (that is, that the line doesn't start with a Extensions have been created by our contributor community to add functionality or provide convenient shortcuts for common uses of OpenRefine. [We list extensions we know about on our downloads page](https://openrefine.org/download.html). -:::info +:::info Contributing extensions If you’d like to create or modify an extension, [see our developer documentation here](https://github.com/OpenRefine/OpenRefine/wiki/Documentation-For-Developers). If you’re having a problem, [use our downloads page](https://openrefine.org/download.html) to go to the extension’s page and report the issue there. ::: From ee5c3fa7c312d3152377f019400045903eda8f1f Mon Sep 17 00:00:00 2001 From: allanaaa Date: Thu, 14 Jan 2021 12:11:32 -0500 Subject: [PATCH 16/21] Update docs/docs/manual/running.md Co-authored-by: Antonin Delpeuch --- docs/docs/manual/running.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index f0c1d3d89..f5782115d 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -456,7 +456,7 @@ Not all operations can be extracted. Edits to a single cell, for example, can’ ### Running as a server :::caution -Please note that if your machine has an external IP (is exposed to the Internet), you should not do this, or should protect it behind a proxy or firewall, such as nginx. Proceed at your own risk. +Please note that exposing an OpenRefine instance to the Internet is dangerous, as it gives anyone the ability to read and modify your projects and run arbitrary code on your computer. OpenRefine should at least be protected by an authenticating proxy. ::: By default (and for security reasons), OpenRefine only listens to TCP requests coming from localhost (127.0.0.1) on port 3333. If you want to share your OpenRefine instance with colleagues and respond to TCP requests to any IP address of the machine, start it from the command line like this: @@ -503,4 +503,4 @@ Some examples: * And the same in Ruby: [Refine-Ruby](https://github.com/maxogden/refine-ruby) * Another Python client library, by Paul Makepeace: [OpenRefine Python Client Library](https://github.com/PaulMakepeace/refine-client-py) -To look for other instances, search our Google Groups [for users](https://groups.google.com/g/openrefine) and [for developers](https://groups.google.com/g/openrefine-dev), where [these projects were originally posted](https://groups.google.com/g/openrefine/c/GfS1bfCBJow/m/qWYOZo3PKe4J). \ No newline at end of file +To look for other instances, search our Google Groups [for users](https://groups.google.com/g/openrefine) and [for developers](https://groups.google.com/g/openrefine-dev), where [these projects were originally posted](https://groups.google.com/g/openrefine/c/GfS1bfCBJow/m/qWYOZo3PKe4J). From a54e86becc6e0cf6d113d6e44d5bed274e69fa0c Mon Sep 17 00:00:00 2001 From: allanaaa Date: Thu, 14 Jan 2021 12:13:46 -0500 Subject: [PATCH 17/21] Update docs/docs/manual/exploring.md Co-authored-by: Antonin Delpeuch --- docs/docs/manual/exploring.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/manual/exploring.md b/docs/docs/manual/exploring.md index b91d91217..6ec611ee1 100644 --- a/docs/docs/manual/exploring.md +++ b/docs/docs/manual/exploring.md @@ -43,7 +43,7 @@ To transform data from one type to another, see [Transforming data](cellediting# ### Dates -A “date” type is created when a column is [transformed into dates](transforming#to-date), when an expression is used to [convert cells to dates](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-) or when individual cells are set to have the data type “date.” +A “date” type is created when a column is [transformed into dates](transforming#to-date), when an expression is used to [convert cells to dates](grelfunctions#todateo-b-monthfirst-s-format1-s-format2-) or when individual cells are set to have the data type “date”. Date-formatted data in OpenRefine relies on a number of conversion tools and standards. For something to be considered a date in OpenRefine, it will be converted into the ISO-8601-compliant extended format with time in UTC: YYYY-MM-DDTHH:MM:SSZ. @@ -113,4 +113,4 @@ OpenRefine assigns a unique key behind the scenes, so your records don’t need To [split multi-valued cells](transforming#split-multi-valued-cells) and apply other operations that take advantage of records mode, see [Transforming data](transforming). -Be careful when in records mode that you do not accidentally delete rows based on being blank in one column where there is a value in another. \ No newline at end of file +Be careful when in records mode that you do not accidentally delete rows based on being blank in one column where there is a value in another. From 6fed414c0920cd45f65bf612ff6f1ff67ff7c102 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Thu, 14 Jan 2021 13:12:21 -0500 Subject: [PATCH 18/21] Mac instruction changes, autosave changes --- docs/docs/manual/columnediting.md | 2 +- docs/docs/manual/installing.md | 24 ++---------------------- docs/docs/manual/running.md | 19 +++++++++++++++---- docs/docs/manual/starting.md | 2 ++ 4 files changed, 20 insertions(+), 27 deletions(-) diff --git a/docs/docs/manual/columnediting.md b/docs/docs/manual/columnediting.md index 470db2441..59be6d82e 100644 --- a/docs/docs/manual/columnediting.md +++ b/docs/docs/manual/columnediting.md @@ -65,7 +65,7 @@ Through the Add column by fetching URLs function, If you have a column of URLs and want to fetch the information that they point to, you can simply run the expression as `value`. If your column has, for example, unique identifiers for Wikidata entities (numerical values starting with Q), you can download the JSON-formatted metadata about each entity with ``` -“https://www.wikidata.org/wiki/Special:EntityData/” + value + “.json” +"https://www.wikidata.org/wiki/Special:EntityData/" + value + ".json" ``` or whatever metadata format you prefer. Information about the format options in Wikidata can be found [here](https://www.wikidata.org/wiki/Wikidata:Data_access). The service you are fetching data from may have similar documentation on its provided options. diff --git a/docs/docs/manual/installing.md b/docs/docs/manual/installing.md index 87d274470..181c2890d 100644 --- a/docs/docs/manual/installing.md +++ b/docs/docs/manual/installing.md @@ -392,32 +392,12 @@ You can change this when you run OpenRefine from the terminal, by pointing to th ### Logs -OpenRefine does not currently output an error log, but because the OpenRefine console window is always open while OpenRefine runs in your browser, you can copy information from the console if an error occurs. +OpenRefine does not currently output an error log, but because the OpenRefine console window is always open (on Linux and Windows) while OpenRefine runs in your browser, you can copy information from the console if an error occurs. - - - - -You can access OpenRefine server logs from the terminal on Mac: - -* Find the OpenRefine app/icon in Finder -* control-click on the icon and select “Show Package Contents” from the context menu that displays -* This should open a new Finder menu showing a folder called “Contents” - navigate into this folder then into the “MacOS” folder -* control-click on “JavaAppLauncher” -* Choose “Open With” from the menu, and select “Terminal” +Using a Mac, you can [run OpenRefine using the terminal](running#starting-and-exiting) in order to capture errors. --- - - - - ## Increasing memory allocation OpenRefine relies on having computer memory available to it to work effectively. If you are planning to work with large datasets, you may wish to set up OpenRefine to handle it at the outset. By “large” we generally mean one of the following indicators: diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index f5782115d..efdd57469 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -56,7 +56,14 @@ To exit OpenRefine, close all the browser tabs or windows, then navigate to the -You can find OpenRefine in your Applications folder, or you can call it from the command line with `./refine`. +You can find OpenRefine in your Applications folder, or you can open it using Terminal. + +To run OpenRefine using Terminal: +* Find the OpenRefine application / icon in Finder +* Control-click on the icon and select “Show Package Contents” from the context menu +* This should open a new Finder menu: navigate into the “MacOS” folder +* Control-click on “JavaAppLauncher” +* Choose “Open With” from the menu, and select “Terminal.” To exit, close all your OpenRefine browser tabs, go back to the terminal window and press `Command` and `Q` to close it down. @@ -416,7 +423,7 @@ You can preserve your facets and filters for future use by copying a Undo/Redo, you will lose it when you leave the project workspace. +Autosaving happens by default every five minutes. You can [change this preference by following these directions](running#jvm-preferences). + You can only save and share facets and filters, not any other type of view. To save current facets and filters, click Permalink. The project will reload with a different URL, which you can then copy and save elsewhere. This permalink will save both the facets and filters you’ve set, and the settings for each one (such as sorting by count rather than by name). ### Deleting projects From 611b9cb1ef09e96b62686ab2df195d7e8b083960 Mon Sep 17 00:00:00 2001 From: allanaaa Date: Fri, 15 Jan 2021 09:31:49 -0500 Subject: [PATCH 19/21] Update running.md --- docs/docs/manual/running.md | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index efdd57469..b69f9478d 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -111,7 +111,6 @@ When you run OpenRefine from a command line, you can change a number of default defaultValue="win" values={[ {label: 'Windows', value: 'win'}, - {label: 'Mac', value: 'mac'}, {label: 'Linux', value: 'linux'} ] }> @@ -137,25 +136,6 @@ Get a list of all the commands with `refine /?`. - - -To see the full list of command-line options, run `./refine -h`. - -|Command|Use|Syntax example| -|---|---|---| -|-w|Path to the webapp|./refine -w /path/to/openrefine| -|-d|Path to the workspace|./refine -d /where/you/want/the/workspace| -|-m|Memory maximum heap|./refine -m 6000M| -|-p|Port|./refine -p 3334| -|-i|Interface (IP address, or IP and port)|./refine -i 127.0.0.2:3334| -|-k|Add a Google API key|./refine -k YOUR_API_KEY| -|-v|Verbosity (from low to high: error,warn,info,debug,trace)|./refine -v info| -|-x|Additional configuration parameters|| -|--debug|Enable debugging (on port 8000)|./refine --debug| -|--jmx|Enable JMX monitoring for Jconsole and JvisualVM|./refine --jmx| - - - To see the full list of command-line options, run `./refine -h`. From c300239b7a28ac025f2413516e9a7e489a075b5c Mon Sep 17 00:00:00 2001 From: allanaaa Date: Fri, 15 Jan 2021 10:17:48 -0500 Subject: [PATCH 20/21] Update running.md --- docs/docs/manual/running.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index b69f9478d..c172ff9d0 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -106,6 +106,8 @@ If you are having problems connecting to OpenRefine with your browser, [check ou When you run OpenRefine from a command line, you can change a number of default settings. +You cannot start the Mac version with modifications using Terminal, but you can modify the way the application starts with [settings within files](#modifications-set-within-files). + Date: Fri, 15 Jan 2021 10:21:59 -0500 Subject: [PATCH 21/21] Update running.md --- docs/docs/manual/running.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/docs/manual/running.md b/docs/docs/manual/running.md index c172ff9d0..fbb44f993 100644 --- a/docs/docs/manual/running.md +++ b/docs/docs/manual/running.md @@ -106,13 +106,12 @@ If you are having problems connecting to OpenRefine with your browser, [check ou When you run OpenRefine from a command line, you can change a number of default settings. -You cannot start the Mac version with modifications using Terminal, but you can modify the way the application starts with [settings within files](#modifications-set-within-files). - @@ -138,6 +137,12 @@ Get a list of all the commands with `refine /?`. + + +You cannot start the Mac version with modifications using Terminal, but you can modify the way the application starts with [settings within files](#modifications-set-within-files). + + + To see the full list of command-line options, run `./refine -h`.