Update starting.md

This commit is contained in:
allanaaa 2020-07-21 17:24:53 -04:00
parent 8a57b4c0ee
commit 019c067706

View File

@ -8,7 +8,7 @@ sidebar_label: Starting a project
OpenRefine doesnt allow you to create a dataset from nothing - you can only start a project by importing in some existing data.
No matter where your data comes from, OpenRefine doesnt modify your original data source. It copies all the information from your input, creates its own project file, and stores it in your [workspace directory](installing.md#set-where-data-is-stored).
No matter where your data comes from, OpenRefine doesnt modify your original data source. It copies all the information from your input, creates its own project file, and stores it in your [workspace directory](installing#set-where-data-is-stored).
The data and all of your edits are automatically saved inside the project file. When youre finished modifying the data, you can export it back out into the file format of your choice.
@ -66,7 +66,7 @@ Type or paste the URL to the data file into the field provided. You can add as m
If you supply two or more file URLs, OpenRefine will identify each one and ask you to choose which (or all) to load.
Do not use this form to load a Google Sheet by its link; use the Google Data form instead.
Do not use this form to load a Google Sheet by its link; use [the Google Data form instead](#google-data).
#### Clipboard
@ -78,49 +78,27 @@ This can also be useful if you would like to paste in a list of URLs, which you
#### Database (SQL)
You may want to pull the latest dataset directly from an online database. This could include an online catalogue, a Wordpress or similar content management system, or a [digital repository or collection management system](https://bits.ashleyblewer.com/blog/2017/08/09/collection-management-system-collection/).
If you are an administrator or have SQL access to a database of information, you may want to pull the latest dataset directly from there. This could include an online catalogue, a content management system, or a digital repository or collection management system.
There are some publicly-accessible databases that you may want to query, such as [one provided by Rfam ](https://docs.rfam.org/en/latest/database.html). The instructions provided by Rfam can help you understand how to connect to and query from any database.
OpenRefine can connect to PostgreSQL, MySQL, MariaDB, and SQLite database systems. It will automatically populate the “Port” field based on which of these you choose, but you can manually edit this if needed.
If you have a downloaded database (`.db`) file, you can supply the path to the file on your computer directly in the “Database” field at the bottom of the form. You can leave the rest of the fields blank.
If you have a downloaded database (`.db`) file from an SQLite database, you can supply the path to the file on your computer directly in the “Database” field at the bottom of the form. You can leave the rest of the fields blank.
To import data directly from a database online, you will need to do two things:
* Add OpenRefine (running from your computer) to an account authorized to access your database
* Set up OpenRefine to access that database using that account
* Add OpenRefine (running from your computers IP address) to an account authorized to access your database
* Set up OpenRefine to access that database using that authorized account
![A screenshot of connecting to a database](img/databaseconnect.jpg "A screenshot of connecting to a database.")
Log in to your hosting provider. Get the database type (such as MySQL), database name, and the URL (either an IP address, such as `127.0.0.1`, or the domain that uses the database). Then look at the accounts authorized for access. You may wish to create a new account just for OpenRefine, or add OpenRefine to an existing account.
Each host will have a slightly different method, but generally speaking: look for “accounts with access” to the database you wish to authorize, and within the settings for that account, look for “allowable hosts” or “access hosts.”
In that list, add the IP address of your own computer, because that is where the OpenRefine access request will be coming from. You can find this easily, by clicking “Test” within OpenRefine once the rest of the information is filled out: OpenRefine will give you an error that looks like
``` error:Access denied for user 'yourusername'@'123-45-67-89.yourISP.com' ```
Take your IP address from this error message and put that, with periods instead of hyphens, into the “allowable hosts” field on the account youre trying to use. Add a wildcard to the end of your IP address (“123.45.67.89%”). Save that setting, and then test the connection again with OpenRefine.
You can either connect just once and gather data, or save the connection to use it again later. If you press “Connect” without saving, OpenRefine will forget all the information you just entered.
If youd like to save the connection, name your connection in a way you will recognize later. Click “Save” and it will appear in the “Saved Connections” list on the left. Now, click on the “...” ellipsis to the right of the connection youve saved, and click “Connect.”
If your connection is successful, you will see a Query Editor. From here you can write an [SQL query](https://www.w3schools.com/sql/) to pull the specific data you need.
If you need help, you may be able to find instructions from your hosting provider. Here are the guides from:
* [Dreamhost](https://help.dreamhost.com/hc/en-us/articles/214883058-How-do-I-connect-to-my-database-using-a-third-party-program-)
* [GoDaddy](https://ca.godaddy.com/help/connect-remotely-to-a-mysql-database-in-my-linux-hosting-account-16103)
You can either connect just once and gather data, or save the connection to use it again later. If you press “Connect” without saving, OpenRefine will forget all the information you just entered. To save it, name your connection in a way you will recognize later. Click “Save” and it will appear in the “Saved Connections” list on the left. Now, you can click on the “...” ellipsis to the right of the connection youve saved, and click “Connect.”
If your connection is successful, you will see a Query Editor. From here you can write an [SQL query](https://www.w3schools.com/sql/) to pull the specific data you need. OpenRefine will give you an error if you write a statement that tries to modify the source database in any way.
#### Google Data
You have two ways to load in data from Google Sheets:
* A link to an accessible Google Sheet (that is, one with link-sharing turned on)
* Selecting a Google Sheet in your Google Drive
@ -129,7 +107,7 @@ You have two ways to load in data from Google Sheets:
You can import data from any Google Sheet that has link-sharing turned on. Paste in a URL that looks something like
```https://docs.google.com/spreadsheets/……….../edit?usp=sharing```
```https://docs.google.com/spreadsheets/………/edit?usp=sharing```
This will only work with Sheets, not with any other Google Drive file that might have an available link.
@ -168,7 +146,7 @@ Because OpenRefine only runs locally on your computer, you cant have a projec
The best way to collaborate with another person is to export and import projects that save all your changes, so that you can pick up where someone else left off. You can also [export projects](export.md) and import them to new computers of your own, such as for working on the same project from the office and from home.
An exported project will include all of the [history](interface/projectscreen.md#history), so you can see (and undo) all the changes from the previous user. It is essentially a point-in-time snapshot of their work. OpenRefine only exports projects as `.tar.gz` files at this time.
An exported project will include all of the [history](running#history-undoredo), so you can see (and undo) all the changes from the previous user. It is essentially a point-in-time snapshot of their work. OpenRefine only exports projects as `.tar.gz` files at this time.
### Instructions
@ -214,8 +192,6 @@ You can only save and share facets and filters, not any other type of view. To s
You can delete projects, which will erase the project files from the work directory on your computer. This is immediate and cannot be undone.
Go to “Open Project” and find the project you want to delete. Click on the X to the left of the project name. There will be a confirmation dialog.
@ -223,4 +199,4 @@ Go to “Open Project” and find the project you want to delete. Click on the X
You can find all of your raw project files in your work directory. They will be named according to the unique Project ID that OpenRefine has assigned them, which you can find on the “Open Project” screen, under the “About” button for each project.
You can point OpenRefine to use another work directory using [Preferences](running.md#preferences).
You can point OpenRefine to use another work directory using [Preferences](running#preferences).