2024-11-12 05:17:45 +01:00
|
|
|
# Web scraper 🔍
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
This project is a web scraper designed to extract data from websites.
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
## Features
|
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
☑️ Extracts data from web pages
|
2024-11-12 05:17:45 +01:00
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
## Usage
|
2024-11-12 05:17:45 +01:00
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
### With Docker
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
1. Clone the repository:
|
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
git clone https://git.wmi.amu.edu.pl/s500042/webscraper
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
|
|
|
2. Navigate to the project directory:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
cd webscraper
|
|
|
|
```
|
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
3. Build the Docker image and run it using `start.py` script:
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
python scripts/start.py
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
On Mac, you'll have to use
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
python3 scripts/start.py
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
2024-11-15 22:40:07 +01:00
|
|
|
4. Check `/app/dist/data.json` file to see the extracted data.
|
|
|
|
|
2024-11-12 05:17:45 +01:00
|
|
|
### Without Docker
|
|
|
|
|
|
|
|
1. Clone the repository:
|
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
git clone https://git.wmi.amu.edu.pl/s500042/webscraper
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
2. Install the required dependencies:
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
pip install -r app/requirements.txt
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
|
|
|
If you're on Arch Linux, you'll need to create a virtual environment.
|
|
|
|
Here's is a [Step by step guide](#) that will help you create it.
|
|
|
|
|
2024-11-15 22:40:07 +01:00
|
|
|
3. Run `run_with_no_docker.py` script:
|
2024-11-15 17:13:29 +01:00
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
python scripts/run_with_no_docker.py
|
2024-11-15 17:13:29 +01:00
|
|
|
```
|
2024-11-12 05:17:45 +01:00
|
|
|
|
2024-11-15 17:13:29 +01:00
|
|
|
On Mac you'll, need to use:
|
2024-11-12 05:17:45 +01:00
|
|
|
|
|
|
|
```bash
|
2024-11-15 22:40:07 +01:00
|
|
|
python3 scripts/run_with_no_docker.py
|
2024-11-12 05:17:45 +01:00
|
|
|
```
|
|
|
|
|
2024-11-15 22:40:07 +01:00
|
|
|
4. Check `/app/dist/data.json` file to see the extracted data.
|
|
|
|
|
2024-11-12 05:17:45 +01:00
|
|
|
## License
|
|
|
|
|
|
|
|
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|