This commit is contained in:
allanaaa 2020-12-16 11:18:00 -05:00
parent 61f531b6e7
commit 6fa08ee264
6 changed files with 62 additions and 57 deletions

View File

@ -13,7 +13,7 @@ You can apply a text facet on numbers, boolean values, and dates, but if you edi
## Transform
Select <span class="menuItems">Edit cells</span><span class="menuItems">Transforms</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. In these cases, `value` is the value in each cell in the selected column.
Select <span class="menuItems">Edit cells</span><span class="menuItems">Transforms</span> to open up an expressions window. From here, you can apply [expressions](expressions) to your data. The simplest examples are GREL functions such as [`toUppercase()`](grelfunctions#touppercases) or [`toLowercase()`](grelfunctions#tolowercases), used in expressions as `toUppercase(value)` or `toLowercase(value)`. When used on a column operation, `value` is the information in each cell in the selected column.
Use the preview to ensure your data is being transformed correctly.
@ -122,13 +122,15 @@ The clustering pop-up window offers you a variety of clustering methods:
**Key collisions** are very fast and can process millions of cells in seconds:
**Fingerprinting** is the least likely to produce false positives, so its a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “wang zhenyi”).
**Fingerprinting** is the least likely to produce false positives, so its a good place to start. It does the same kind of data-cleaning behind the scenes that you might think to do manually: fix whitespace into single spaces, put all uppercase letters into lowercase, discard punctuation, remove diacritics (e.g. accents) from characters, split up all strings (words) and sort them alphabetically (so “Zhenyi, Wang” becomes “wang zhenyi”).
**N-gram fingerprinting** allows you to set the _n_ value to whatever number youd like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a _fingerprint_. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”). This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself wont identify because it keeps words separated). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”).
**N-gram fingerprinting** allows you to set the _n_ value to whatever number youd like, and will create n-grams of _n_ size (after doing some cleaning), alphabetize them, then join them back together into a fingerprint. For example, a 1-gram fingerprint will simply organize all the letters in the cell into alphabetical order - by creating segments one character in length. A 2-gram fingerprint will find all the two-character segments, remove duplicates, alphabetize them, and join them back together (for example, “banana” generates “ba an na an na,” which becomes “anbana”).
This can help match cells that have typos, or incorrect spaces (such as matching “lookout” and “look out,” which fingerprinting itself wont identify because it separates words). The higher the _n_ value, the fewer clusters will be identified. With 1-grams, keep an eye out for mismatched values that are near-anagrams of each other (such as “Wellington” and “Elgin Town”).
##### Phonetic clustering
The next four methods are phonetic algorithms: they know whether two letters sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after only hearing it spoken aloud.
The next four methods are phonetic algorithms: they identify letters that sound the same when pronounced out loud, and assess text values based on that (such as knowing that a word with an “S” might be a mistype of a word with a “Z”). They are great for spotting mistakes made by not knowing the spelling of a word or name after hearing it spoken aloud.
**Metaphone3 fingerprinting** is an English-language phonetic algorithm. For example, “Reuben Gevorkiantz” and “Ruben Gevorkyants” share the same phonetic fingerprint in English.
@ -140,24 +142,24 @@ Regardless of the language of your data, applying each of them might find differ
#### Nearest neighbor
**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups. We recommend setting the block number to at least 3, and then increasing it if you need to be more strict (for example, if every value with “river” is being matched, you should increase it to 6 or more). Note bigger block values will take much longer to process, while smaller blocks may miss matches. Increasing the radius will make the matches more lax, as bigger differences will be clustered:
**Nearest neighbor** clustering methods are slower than key collision methods. They allow the user to set a radius - a threshold for matching or not matching. OpenRefine uses a “blocking” method first, which sorts values based on whether they have a certain amount of similarity (the default is “6” for a six-character string of identical characters) and then runs the nearest-neighbor operations on those sorted groups.
**Levenshtein distance** counts the number of edits required to make one value perfectly match another. As in the key collision methods above, it will do things like change uppercase to lowercase, fix whitespace, change special characters, etc. Each character that gets changed counts as 1 “distance.” “New York” and “newyork” have an edit distance value of 3 (“N” to “n”, “Y” to “y,” remove the space). It can do relatively advanced edits, such as understand the distance between “M. Makeba” and “Miriam Makeba” (5), but it may create false positives if these distances are greater than other, simpler transformations (such as the one-character distance to “B. Makeba,” another person entirely).
We recommend setting the block number to at least 3, and then increasing it if you need to be more strict (for example, if every value with “river” is being matched, you should increase it to 6 or more). Note that bigger block values will take much longer to process, while smaller blocks may miss matches. Increasing the radius will make the matches more lax, as bigger differences will be clustered.
**PPM (or Prediction by Partial Matching)** uses compression to see whether two values are similar or different. In practice, this method is very lax even for small radius values and tends to generate many false positives, but because it operates at a sub-character level it is capable of finding substructures that are not easily identifiable by distances that work at the character level. So it should be used as a 'last resort' clustering method. It is also more effective on longer strings than on shorter ones.
**Levenshtein distance** counts the number of edits required to make one value perfectly match another. As in the key collision methods above, it will do things like change uppercase to lowercase, fix whitespace, change special characters, etc. Each character that gets changed counts as 1 “distance.” “New York” and “newyork” have an edit distance value of 3 (“N” to “n”; “Y” to “y”; remove the space). It can do relatively advanced edits, such as understand the distance between “M. Makeba” and “Miriam Makeba” (5), but it may create false positives if these distances are greater than other, simpler transformations (such as the one-character distance to “B. Makeba,” another person entirely).
**PPM (Prediction by Partial Matching)** uses compression to see whether two values are similar or different. In practice, this method is very lax even for small radius values and tends to generate many false positives, but because it operates at a sub-character level it is capable of finding substructures that are not easily identifiable by distances that work at the character level. So it should be used as a “last resort” clustering method. It is also more effective on longer strings than on shorter ones.
For more of the theory behind clustering, see [Clustering In Depth](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth).
## Replace
OpenRefine provides a find/replace function for you to edit your data. Selecting “Edit cells” → “Replace” will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use regular expressions in this field.
You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first.
OpenRefine provides a find/replace function for you to edit your data. Selecting <span class="menuItems">Edit cells</span><span class="menuItems">Replace</span> will bring up a simple window where you can input a string to search and a string to replace it with. You can set case-sensitivity, and set it to only select whole words, defined by a string with spaces or punctuation around it (to prevent, for example, “house” selecting the “house” part of “doghouse”). You can use [regular expressions](expressions#regular-expressions) in this field. You may wish to preview the results of this operation by testing it with a [Text filter](facets#text-filter) first.
You can also perform a sort of find/replace operation by editing one cell, and selecting “apply to all identical cells.”
## Edit one cell at a time
You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue button labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the data type of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window.
You can edit individual cells by hovering your mouse over that cell. You should see a tiny blue link labeled “edit.” Click it to edit the cell. That pops up a window with a bigger text field for you to edit. You can change the [data type](exploring#data-types) of that cell, and you can apply these changes to all identical cells (in the same column), using this pop-up window.
You will likely want to avoid doing this except in rare cases - the more efficient means of improving your data will be through automated and bulk operations.

View File

@ -6,47 +6,45 @@ sidebar_label: Column editing
## Overview
Column editing contains some of the most powerful data-improvement methods in OpenRefine. While we call it “edit column,” this includes using one column of data to add entirely new columns and fields to your dataset.
Column editing contains some of the most powerful data-improvement methods in OpenRefine. The operations in the <span class="menuItems">Edit column</span> menu involve using one column of data to add entirely new columns and fields to your dataset.
## Split or Join
## Splitting or joining
Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. You may want to split out an address column into columns for street addresses, cities, territories, and postal codes.
The reverse is also often true: you may have several columns of category values that you want to join into one “category” column.
### Split into several columns...
Many users find that they frequently need to make their data more granular: for example, splitting a “Firstname Lastname” column into two columns, one for first names and one for last names. The reverse is also often true: you may have several columns of category values that you want to join into one “category” column.
.
### Split into several columns
![A screenshot of the settings window for splitting columns.](/img/columnsplit.png)
Splitting one column into several columns requires you to identify the character, string lengths, or evaluating expression you want to split on. Just like [splitting multi-valued cells into rows](cellediting#split-multi-valued-cells), splitting cells into multiple columns will remove the separator character or string you indicate. Lengths will discard any information that comes after the specified total length.
You can find this operation at <span class="menuItems">Edit column</span><span class="menuItems">Split into several columns...</span>. Splitting one column into several columns requires you to identify the character, string lengths, or evaluating expression you want to split on. Just like [splitting multi-valued cells into rows](cellediting#split-multi-valued-cells), splitting cells into multiple columns will remove the separator character or string you indicate. Splitting by lengths will discard any information that comes after the specified total length.
You can also specify a maximum number of new columns to be made: separator characters after this limit will be ignored, and the remaining characters will end up in the last column.
New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can have the original column removed with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting to numbers, and may not work with dates.
New columns will be named after the original column, with a number: “Location 1,” “Location 2,” etc. You can choose to remove the original column with this operation, and you can have [data types](exploring#data-types) identified where possible. This function will work best with converting strings to numbers, and may not work with [dates](exploring#dates).
### Join columns
### Join columns
![A screenshot of the settings window for joining columns.](/img/columnjoin.png)
You can join columns by selecting “Edit column” → “Join columns…”. All the columns currently in your dataset will appear in the pop-up window. You can select or un-select all the columns you want to join, and drag columns to put them in the order you want to join them in. You will define a separator character (optional) and define a string to insert into empty cells (nulls).
You can join columns by selecting <span class="menuItems">Edit column</span><span class="menuItems">Join columns...</span>. All the columns currently in your dataset will appear in the pop-up window. You can select or un-select all the columns you want to join, and drag columns to put them in the order you want to join them in. You will define a separator character (optional) and define a string to insert into empty cells (nulls).
The joined data will appear in the column you originally selected, or you can create a new column based on this join and specify a name. You can delete all the columns that were used in this join operation.
The joined data will appear in the column you originally selected, or you can create a new column for this content and specify a name. You can delete all the columns that were used in this join operation.
## Add column based on this column
This selection will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external reconciliation sources.
Selecting <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column...</span> will open up an [expressions](expressions) window where you can transform the data from this column (using `value`), or write a more complex expression that takes information from any number of columns or from external sources.
The simplest way to use this operation is simply leave the default `value` in the expression field, to create an exact copy of your column.
Expressions used in this operation will rely on your knowledge of variables. You can learn more in the [Expressions section on variables](expressions#variables).
For a reconciled column, you can use the variable `cell` instead, to copy both the original string and the existing reconciliation data. This will include matched values, candidates, and new items. You can learn other useful variables in the [Expressions section on GREL variables](expressions#variables).
The simplest way to use this operation is simply leave the default `value` in the expression field, to create an exact copy of your column. For a column of [reconciled data](reconciling), you can use the variable `cell` instead, to copy both the original string and the existing reconciliation data. This will include matched values, candidates, and new items.
You can create a column based on concatenating (merging) two other columns. Select either of the source columns, apply “Column editing” → “Add column based on this column...”, name your new column, and use the following format in the expression window:
One useful expression is to create a column based on concatenating (merging) two other columns. Select either of the source columns, choose <span class="menuItems">Edit column</span><span class="menuItems">Add column based on this column...</span>, name your new column, and use the following format in the expression window:
```
cells[“Column 1”].value + cells[“Column 2”].value
cells["Column 1"].value + cells["Column 2"].value
```
If your column names do not contain spaces, you can use the following format:
If your column names do not contain spaces, you can use the following format instead:
```
cells.Column1.value + cells.Column2.value
@ -58,29 +56,29 @@ If you are in records mode instead of rows mode, you can concatenate using the f
row.record.cells.Column1.value + row.record.cells.Column2.value
```
You may wish to add separators or spaces, or modify your input during this operation with more advanced GREL.
You may wish to add separators or spaces, or modify your input during this operation with more advanced expressions.
## Add column by fetching URLs
Through the “Add column by fetching URLs” function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be taking strings from your selected column and inserting them into URL strings. This presumes your chosen column contains parts of paths to valid HTML pages or files online.
Through the <span class="menuItems">Add column by fetching URLs</span> function, OpenRefine supports the ability to fetch HTML or data from web pages or services. In this operation you will be building URL strings based on your column of data, by using `value` to insert a relevant substring. Your chosen column needs to contains parts of paths to valid HTML pages or files online.
If you have a column of URLs and watch to fetch the information that they point to, you can simply run the expression as `value`. If your column has, for example, unique identifiers for Wikidata entities (numerical values starting with Q), you can download the JSON-formatted metadata about each entity with
If you have a column of URLs and want to fetch the information that they point to, you can simply run the expression as `value`. If your column has, for example, unique identifiers for Wikidata entities (numerical values starting with Q), you can download the JSON-formatted metadata about each entity with
```
“https://www.wikidata.org/wiki/Special:EntityData/” + value + “.json”
```
or whatever metadata format you prefer. [Information about these Wikidata options can be found here](https://www.wikidata.org/wiki/Wikidata:Data_access).
This service is more useful when getting metadata files instead of HTML, but you may wish to work with a pages entire HTML contents and then parse out information from that.
Be aware that the fetching process can take quite some time and that servers may not want to fulfill hundreds or thousands of page requests in seconds. Fetching allows you to set a “throttle delay” which determines the amount of time between requests. The default is 5 seconds per row in your dataset (5000 milliseconds).
or whatever metadata format you prefer. Information about the format options in Wikidata can be found [here](https://www.wikidata.org/wiki/Wikidata:Data_access). The service you are fetching data from may have similar documentation on its provided options.
![A screenshot of the settings window for fetching URLs.](/img/fetchingURLs.png)
Note the following:
This service is more useful when getting metadata files instead of HTML, but you may wish to work with a pages entire HTML contents and then parse out information from that.
* Many systems prevent you from making too many requests per second. To avoid this problem, set the throttle delay, which tells OpenRefine to wait the specified number of milliseconds between URL requests.
:::caution
Be aware that the fetching process can take quite some time and that servers may not want to fulfill hundreds or thousands of page requests in seconds. Fetching allows you to set a “throttle delay” which determines the amount of time between requests. The default is 5 seconds per row in your dataset (5000 milliseconds). We recommend leaving this at 1000 or greater.
:::
Note the following:
* Before pressing “OK,” copy and paste a URL or two from the preview and test them in another browser tab to make sure they work.
* In some situations you may need to set [HTTP request headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers). To set these, click the small “Show” button next to “HTTP headers to be used when fetching URLs” in the settings window. The authorization credentials get logged in your operation history in plain text, which may be a security concern for you. You can set the following request headers:
* [User-Agent](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent)
@ -89,7 +87,7 @@ Note the following:
### Common errors
When OpenRefine attempts to fetch information from a web page or service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function.
When OpenRefine attempts to fetch information from a web service, it can fail in a variety of ways. The following information is meant to help troubleshoot and fix problems encountered when using this function.
First, make sure that your fetching operation is storing errors (check “store error”). Then run the fetch and look at the error messages.
@ -109,7 +107,7 @@ Note that for Mac users and for Windows users with the OpenRefine installation w
* On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security`.
* On Windows: `\server\target\jre\lib\security`.
An error that includes **“javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed”** can occur when you try to retrieve information over HTTPS but the remote site is using a certificate not trusted by your local Java installation. You will need to make sure that the certificate, or (more likely) the root certificate, is trusted.
**“javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed”** can appear when the remote site is using an HTTPS certificate not trusted by your local Java installation. You will need to make sure that the certificate, or (more likely) the root certificate, is trusted.
The list of trusted certificates is stored in an encrypted file called `cacerts` in your local Java installation. This can be read and updated by a tool called “keytool.” You can find directions on how to add a security certificate to the list of trusted certificates for a Java installation [here](http://magicmonster.com/kb/prg/java/ssl/pkix_path_building_failed.html) and [here](http://javarevisited.blogspot.co.uk/2012/03/add-list-certficates-java-keystore.html).
@ -118,8 +116,9 @@ Note that for Mac users and for Windows users with the OpenRefine installation w
* On Mac, it will look something like `/Applications/OpenRefine.app/Contents/PlugIns/jdk1.8.0_60.jdk/Contents/Home/jre/lib/security/cacerts`.
* On Windows: `\server\target\jre\lib\security\`.
## Rename, Remove, and Move
## Renaming, removing, and moving
Any column can be repositioned, renamed, or deleted with these actions. They can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to “View” → “Collapse this column” instead.
Every column's <span class="menuItems">Edit column</span> dropdown contains options to move it (to the beginning, end, left, or right), rename it, and delete it.
These operations can be undone, but a removed column cannot be restored later if you keep modifying your data. If you wish to temporarily hide a column, go to <span class="menuItems">[View](sortview#view)</span><span class="menuItems">Collapse this column</span> instead.
Be cautious about moving columns in [records mode](cellediting#rows-vs-records): if you change the first column in your dataset (the key column), your records may change in unintended ways.

View File

@ -128,7 +128,7 @@ If you would like to export a scatterplot, OpenRefine will open a new tab with a
You may want to explore your textual data with modifications that aren't permanent. Creating custom text facets will load your column into memory, transform the data temporarily, and store those transformations inside the facet.
You can also use custom text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and false” as values.
You can also use custom text facets to analyze numerical data, such as by analyzing a number as a string, or by creating a test that will return “true” and false” as values.
Clicking on <span class="menuItems">Facet</span><span class="menuItems">Custom text facet…</span> will bring up an [expressions](expressions) window where you can enter in a GREL, Jython, or Clojure expression to modify how the facet works.
@ -140,7 +140,7 @@ For example, you may wish to analyze only the first word in a text field - perha
value.split(" ")[0]
```
In this case, `split()` is creating an array of text strings based on every space in the cells ["Firstname", "Lastname"]. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. (Assuming the first name is one word, not something like “Mary Anne.) We can do the same splitting and ask for the last name with
In this case, `split()` is creating an array of text strings based on every space in the cells ["Firstname", "Lastname"]. Because arrays number their entries starting with 0, we want the first value, so we ask for `[0]`. (Assuming the first name is one word, not something like “Mary Anne.) We can do the same splitting and ask for the last name with
```
value.split(" ")[1]
@ -152,7 +152,7 @@ You may want to create a facet that references several columns. For example, let
cells["First Name"].value[0] == cells["Last Name"].value[0]
```
That expression will look for the first letter (the character at index 0) of each entry and compare them. Then it will facet your rows into `true` and `false`.
That expression will look for the first letter (the character at index 0) of each entry and compare them. Then it will facet your rows into “true” and “false.”
You can learn more about text-modification functions on the [Expressions page](expressions).

View File

@ -17,13 +17,13 @@ Imagine personal data with addresses in this format:
|Jacques Cousteau|23, quai de Conti|Paris||France|75270|
|Emmy Noether|010 N Merion Avenue|Bryn Mawr|Pennsylvania|USA|19010|
You can transpose the address information from this format into multiple rows. Go to the “Street” column and select “Transpose” → “Transpose cells across columns into rows.” From there you can select all of the five columns, starting with “Street” and ending with “Postal code,” that correspond to address information. Once you begin, you should put your project into [records mode](exploring#rows-vs-records) to associate the subsequent rows with “Name” as the key column.
You can transpose the address information from this format into multiple rows. Go to the “Street” column and select <span class="menuItems">Transpose</span><span class="menuItems">Transpose cells across columns into rows</span>. From there you can select all of the five columns, starting with “Street” and ending with “Postal code,” that correspond to address information. Once you begin, you should put your project into [records mode](exploring#rows-vs-records) to associate the subsequent rows with “Name” as the key column.
![A screenshot of the transpose across columns window.](/img/transpose1.png)
### One column
You can transpose the multiple address columns into a series of rows instead:
You can transpose the multiple address columns into a series of rows:
|Name|Address|
|---|---|
@ -37,7 +37,7 @@ You can transpose the multiple address columns into a series of rows instead:
||USA|
||19010|
You can include the column-name information in each cell by prepending it to the value, with or without a separator:
You can choose one column and include the column-name information in each cell by prepending it to the value, with or without a separator:
|Name|Address|
|---|---|
@ -53,7 +53,7 @@ You can include the column-name information in each cell by prepending it to the
### Two columns
You can retain the column names as separate cell values, by selecting “Two new columns” and naming the key and value columns.
You can retain the column names as separate cell values, by selecting <span class="fieldLabels">Two new columns</span> and naming the key and value columns.
|Name|Address part|Address|
|---|---|---|
@ -91,7 +91,7 @@ The goal is to sort out all of the information contained in one column into sepa
|Joe Khoury |Junior analyst |Beirut|
|Samantha Martinez |CTO |Tokyo|
By selecting “Transpose” → “Transpose cells in rows into columns...” a window will appear that simply asks how many rows to transpose. In this case, each employee record has three rows, so input “3” (do not subtract one for the original column). The original column will disappear and be replaced with three columns, with the name of the original column plus a number appended.
By selecting <span class="menuItems">Transpose</span><span class="menuItems">Transpose cells in rows into columns...</span> a window will appear that simply asks how many rows to transpose. In this case, each employee record has three rows, so input “3” (do not subtract one for the original column). The original column will disappear and be replaced with three columns, with the name of the original column plus a number appended.
|Column 1 |Column 2 |Column 3|
|---|---|---|
@ -99,13 +99,17 @@ By selecting “Transpose” → “Transpose cells in rows into columns...” a
|Employee: Joe Khoury |Job title: Junior analyst |Office: Beirut|
|Employee: Samantha Martinez |Job title: CTO |Office: Tokyo|
From here you can use “Cell editing” → “Replace” to remove “Employee: ”, “Job title: ”, and “Office: ”, or use [GREL functions](expressions#grel) with “Edit cells” → “Transform...” to clean out the extraneous characters: `value.replace('Employee: ', '')`, etc.
From here you can use <span class="menuItems">Cell editing</span><span class="menuItems">Replace</span> to remove “Employee: ”, “Job title: ”, and “Office: ” if you wish, or use [expressions](expressions) with <span class="menuItems">Edit cells</span><span class="menuItems">Transform...</span> to clean out the extraneous characters:
If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try “Columnize by key/value columns.“
```
value.replace("Employee: ", "")
```
If your dataset doesn't have a predictable number of cells per intended row, such that you cannot specify easily how many columns to create, try <span class="menuItems">Columnize by key/value columns</span>.
## Columnize by key/value columns
This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at “Transpose” → “Columnize by key/value columns.”
This operation can be used to reshape a dataset that contains key and value columns: the repeating strings in the key column become new column names, and the contents of the value column are moved to new columns. This operation can be found at <span class="menuItems">Transpose</span><span class="menuItems">Columnize by key/value columns</span>.
![A screenshot of the Columnize window.](/img/transpose2.png)
@ -120,7 +124,7 @@ Consider the following example, with flowers, their colours, and their Internati
|Color |Yellow |
|IUCN ID |161899 |
In this format, each flower species is described by multiple attributes on consecutive rows. The “Field” column contains the keys and the “Data” column contains the values. In the “Columnize by key/value columns” window you can select each of these from the available columns. It transforms the table as follows:
In this format, each flower species is described by multiple attributes on consecutive rows. The “Field” column contains the keys and the “Data” column contains the values. In the <span class="menuItems">Columnize by key/value columns</span> window you can select each of these from the available columns. It transforms the table as follows:
| Name | Color | IUCN ID |
|-----------------------|----------|---------|
@ -227,4 +231,4 @@ This will be transformed to:
This actually changes the operation: OpenRefine no longer looks for the first key (“Name”) but simply pivots all information based on the first extra column's values. Every old row with the same value gets transposed into one new row. If you have more than one extra column, they are pivoted as well but not used as the new key.
You can use [“Fill down”](cellediting#fill-down) to put identical values in the extra columns if you need to.
You can use <span class="menuItems">[Fill down](cellediting#fill-down-and-blank-down)</span> to put identical values in the extra columns if you need to.

View File

@ -67,7 +67,7 @@
font-size: 35px;
}
.markdown .menuItems {
.markdown .menuItems, .markdown .fieldLabels, .markdown .tabLabels, .markdown .buttonLabels {
font-size: var(--ifm-code-font-size);
display: inline-block;
border: 2px solid #4dc6e1;

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 23 KiB