diff --git a/IUM_05.Biblioteki_DL.ipynb b/IUM_05.Biblioteki_DL.ipynb
index 4d49da6..2456ccf 100644
--- a/IUM_05.Biblioteki_DL.ipynb
+++ b/IUM_05.Biblioteki_DL.ipynb
@@ -285,7 +285,7 @@
}
},
"source": [
- "## Zadanie [20 pkt.]\n",
+ "## Zadanie [22 pkt.]\n",
"\n",
"Termin: 2 tygodnie (25 IV)\n",
"\n",
@@ -321,7 +321,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.8.5"
+ "version": "3.9.1"
},
"toc": {
"base_numbering": 1,
diff --git a/IUM_09.Python_srodowiska.ipynb b/IUM_09.Python_srodowiska.ipynb
index d0c2e1e..b2b2862 100644
--- a/IUM_09.Python_srodowiska.ipynb
+++ b/IUM_09.Python_srodowiska.ipynb
@@ -3,7 +3,11 @@
{
"cell_type": "markdown",
"id": "be5ab2df",
- "metadata": {},
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"# Środowiska wirtualne"
]
@@ -11,7 +15,11 @@
{
"cell_type": "markdown",
"id": "cf14c577",
- "metadata": {},
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"## Python Virtual Env\n",
" - Python posiada wbudowany mechanizm do zarządzania wirtualnymi środowiskami\n",
@@ -23,16 +31,24 @@
},
{
"cell_type": "markdown",
- "id": "182bbf83",
- "metadata": {},
+ "id": "85284459",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Poniżej stworzymy środowisko w katalogu `./myenv`:"
]
},
{
"cell_type": "markdown",
- "id": "69d39a9e",
- "metadata": {},
+ "id": "9cabe194",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ python3 -m venv myenv\n",
@@ -41,16 +57,24 @@
},
{
"cell_type": "markdown",
- "id": "47a4bf00",
- "metadata": {},
+ "id": "2a8b1048",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Teraz możemy je aktywować:"
]
},
{
"cell_type": "markdown",
- "id": "cf54cb09",
- "metadata": {},
+ "id": "4619a71d",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ source ./myenv/bin/activate\n",
@@ -61,16 +85,24 @@
},
{
"cell_type": "markdown",
- "id": "bdafb824",
- "metadata": {},
+ "id": "2e5bf86a",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"I modyfikować instalując zależności:"
]
},
{
"cell_type": "markdown",
- "id": "399a8b45",
- "metadata": {},
+ "id": "256149c4",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"(myenv) $ python3 -m pip install requests\n",
@@ -79,16 +111,24 @@
},
{
"cell_type": "markdown",
- "id": "efde93a2",
- "metadata": {},
+ "id": "7fbba7d3",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Środowisko możemy deaktywować poprzez:"
]
},
{
"cell_type": "markdown",
- "id": "838c5ebd",
- "metadata": {},
+ "id": "a2d688b7",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"deactivate\n",
@@ -97,16 +137,37 @@
},
{
"cell_type": "markdown",
- "id": "0557e0c8",
- "metadata": {},
+ "id": "0d3eb6d4",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Środowisko można udostępnić kopiując cały katalog ze środowiskiem"
]
},
{
"cell_type": "markdown",
- "id": "26f253cb",
+ "id": "90605b49",
"metadata": {},
+ "source": [
+ "## pipx\n",
+ " - pipx: polecenie, które instaluje moduł Pythonowy w odrębnym środowisku wirtualnym venv\n",
+ " - jednocześnie dodaje powiązane z nim polecenie (\"Command line entry point\") do zmiennej `PATH`\n",
+ " - w ten sposób możemy zainstalować polecenie, które będzie globalnie dostępne a jednocześnie nie będzie \"mieszało\" w zależnościach modułów Pythonowych. Umożliwia to uniknięcie konfliktów między zależnościami i jednocześnie umożliwia dostęp do polecenia oferowanego przez moduł z poziomu systemu (bez ręcznej aktywacji środowiska)\n",
+ " - więcej informacji: https://packaging.python.org/guides/installing-stand-alone-command-line-tools/\n",
+ " - https://github.com/pypa/pipx"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "26f253cb",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"## Conda\n",
"> *Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.*\n",
@@ -122,15 +183,19 @@
"Różnice między Conda a venv [źródło](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html#virtual-environments):\n",
" - Dowolna wersja Python w Conda (inna niż systemowa)\n",
" - Conda zarządza też zależnościami innymi niż Pythonowe\n",
- " - Paczki w PyPI (używane przez P"
+ " - Paczki w PyPI (używane przez `pip`) pochodzą od ich autorów. Paczki w conda są budowane przez conda albo społeczność conda-forge"
]
},
{
"cell_type": "markdown",
"id": "57f19a08",
- "metadata": {},
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
- "## Dystrybucje: Anaconda i Conda\n",
+ "## Dystrybucje: Anaconda i MiniConda\n",
"Conda jest dostępna w dwóch dystrybcjach:\n",
" - [Miniconda](https://docs.conda.io/en/latest/miniconda.html):\n",
" - wymaga 400 MB miejsca na dysku\n",
@@ -149,8 +214,12 @@
},
{
"cell_type": "markdown",
- "id": "b199274c",
- "metadata": {},
+ "id": "d6d5156a",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"## Dystrybucje\n",
" - Wersje paczek/bibliotek zawartych w danej dystrybucji są przetestowane pod względem zgodności ze sobą\n",
@@ -162,8 +231,12 @@
},
{
"cell_type": "markdown",
- "id": "b1d8c391",
- "metadata": {},
+ "id": "61d89bd2",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ conda update conda\n",
@@ -253,7 +326,11 @@
{
"cell_type": "markdown",
"id": "1c7b2930",
- "metadata": {},
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"## Instalacja\n",
"Instrukcje: \n",
@@ -264,8 +341,12 @@
},
{
"cell_type": "markdown",
- "id": "3edc108c",
- "metadata": {},
+ "id": "0ceff229",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"## Pakiety Conda\n",
"Pakiet (Package) conda to archiwum o rozszerzeniu `.tar.bz2` lub `.conda`zawierające:\n",
@@ -285,8 +366,12 @@
},
{
"cell_type": "markdown",
- "id": "3094ba52",
- "metadata": {},
+ "id": "f54f3bdb",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Repozytoria i kanały\n",
"- Pakiety mogą być ściągane z różnych kanałów (\"channels\")\n",
@@ -298,8 +383,12 @@
},
{
"cell_type": "markdown",
- "id": "8c045d64",
- "metadata": {},
+ "id": "5ab846d0",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Na przykład, pakiet `mlflow` nie jest dostępny na oficjalnym kanale:"
]
@@ -307,8 +396,12 @@
{
"cell_type": "code",
"execution_count": 6,
- "id": "9d72552f",
- "metadata": {},
+ "id": "f301bdf7",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -347,8 +440,12 @@
},
{
"cell_type": "markdown",
- "id": "123b3e4e",
- "metadata": {},
+ "id": "ea36f2cd",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Do poleceń `search` i `install` możemy dodać flagę `channel` co doda podany kanał do listy przeszukiwanych przez to polecenie kanałów:"
]
@@ -356,8 +453,12 @@
{
"cell_type": "code",
"execution_count": 7,
- "id": "d55a2d84",
- "metadata": {},
+ "id": "61911bf5",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -475,8 +576,12 @@
},
{
"cell_type": "markdown",
- "id": "a4994ed6",
- "metadata": {},
+ "id": "c7cea2c5",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Możemy dodać kanał `conda-forge` tak, żeby był używany automatycznie (bez podawania flagi `channel`):"
]
@@ -484,8 +589,12 @@
{
"cell_type": "code",
"execution_count": 11,
- "id": "0f93ee82",
- "metadata": {},
+ "id": "df31755c",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -503,8 +612,12 @@
},
{
"cell_type": "markdown",
- "id": "85d7014f",
- "metadata": {},
+ "id": "33b4d692",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Polecanie `conda info` pokaże nam m.in. używane domyślnie kanały.\n",
"Możem dodawać i usuwać kanały oraz zmieniać ich kolejność (priorytet) edytując plik [`~/.condarc`](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)"
@@ -513,8 +626,12 @@
{
"cell_type": "code",
"execution_count": 10,
- "id": "e57a28b9",
- "metadata": {},
+ "id": "b5621afa",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -564,8 +681,12 @@
},
{
"cell_type": "markdown",
- "id": "9e67e25c",
- "metadata": {},
+ "id": "cb1df295",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Jak widać, po dodaniu kanału `conda-forge`, pakiet `mlflow` zostaje znaleziony:"
]
@@ -573,8 +694,12 @@
{
"cell_type": "code",
"execution_count": 9,
- "id": "56e51ade",
- "metadata": {},
+ "id": "92b5da5b",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -692,8 +817,12 @@
},
{
"cell_type": "markdown",
- "id": "327015c2",
- "metadata": {},
+ "id": "7ab96758",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"\n",
"## Środowiska (Environments)\n",
@@ -706,8 +835,12 @@
},
{
"cell_type": "markdown",
- "id": "0ef28a62",
- "metadata": {},
+ "id": "5c57462d",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Wyświetlanie listy środowisk:"
]
@@ -715,8 +848,12 @@
{
"cell_type": "code",
"execution_count": 12,
- "id": "2865a5cf",
- "metadata": {},
+ "id": "638002cd",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -741,8 +878,12 @@
},
{
"cell_type": "markdown",
- "id": "c710e400",
- "metadata": {},
+ "id": "e7e40897",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Tworzenie środowiska\n",
"Środowisko można utworzyć i skonfigurować interaktywnie, lub z pliku `*.yml`"
@@ -751,8 +892,12 @@
{
"cell_type": "code",
"execution_count": 14,
- "id": "8cd7d3f0",
- "metadata": {},
+ "id": "0373de68",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -790,8 +935,12 @@
},
{
"cell_type": "markdown",
- "id": "843c0104",
- "metadata": {},
+ "id": "c8688ba6",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Aktywacja środowiska\n",
" - Żeby zmodyfikować środowisko albo zacząć z niego korzystać, musimy je aktywować.\n",
@@ -801,8 +950,12 @@
},
{
"cell_type": "markdown",
- "id": "a6adf414",
- "metadata": {},
+ "id": "767f03e6",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ conda activate hello_env\n",
@@ -813,8 +966,12 @@
},
{
"cell_type": "markdown",
- "id": "8457c95e",
- "metadata": {},
+ "id": "245e64be",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Domyślnie wersja pythona będzie taka sama jak systemowa.\n",
"Żeby deaktywować środowisko, używamy `conda deactivate`:"
@@ -822,8 +979,12 @@
},
{
"cell_type": "markdown",
- "id": "e3175dc3",
- "metadata": {},
+ "id": "261f9aab",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ conda deactivate\n",
@@ -834,8 +995,12 @@
},
{
"cell_type": "markdown",
- "id": "753ff7e5",
- "metadata": {},
+ "id": "980c295c",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Modyfikowanie środowiska\n",
"Jeśli chcemy stworzyc środowisko z inną wersją niż systemowa:"
@@ -844,8 +1009,12 @@
{
"cell_type": "code",
"execution_count": 22,
- "id": "5b5b0ca0",
- "metadata": {},
+ "id": "ccaadc75",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -956,16 +1125,24 @@
},
{
"cell_type": "markdown",
- "id": "a3350d71",
- "metadata": {},
+ "id": "68bcd671",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Rzeczywiście, stworzone środowisko ma wersję Pythona z linii 3.9:"
]
},
{
"cell_type": "markdown",
- "id": "ee072a93",
- "metadata": {},
+ "id": "db3311c4",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"$ conda activate myenv\n",
@@ -976,8 +1153,12 @@
},
{
"cell_type": "markdown",
- "id": "89d443aa",
- "metadata": {},
+ "id": "10b5cf2e",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Tak samo, przy tworzeniu środowiska możemy podać inne zależności wraz z ich wersjami:"
]
@@ -985,8 +1166,12 @@
{
"cell_type": "code",
"execution_count": 25,
- "id": "402c122e",
- "metadata": {},
+ "id": "dd2eefc5",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -1246,16 +1431,24 @@
},
{
"cell_type": "markdown",
- "id": "4f9b7b29",
- "metadata": {},
+ "id": "d8417e18",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Możemy podać jakie pakiety i w jakich wersjach mają być domyślnie dodawane do nowo tworzonych środowisk za pomocą sekcji [create_default_package](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#config-add-default-pkgs) w pliku [`~/.condarc`](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)"
]
},
{
"cell_type": "markdown",
- "id": "50346792",
- "metadata": {},
+ "id": "7a2adbfa",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"```\n",
"create_default_packages:\n",
@@ -1267,8 +1460,12 @@
},
{
"cell_type": "markdown",
- "id": "01a9ea21",
- "metadata": {},
+ "id": "68c9045a",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Doinstalowanie nowych pakietów może odbyć się poprzez:\n",
" - `conda install mlflow` - wywołane w aktywowanym środowisku\n",
@@ -1278,8 +1475,12 @@
},
{
"cell_type": "markdown",
- "id": "7ec5430e",
- "metadata": {},
+ "id": "a6c3ee99",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"#### Klonowanie środowiska\n",
"Istniejące środowisko można skopiować:"
@@ -1288,8 +1489,12 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "18f72668",
- "metadata": {},
+ "id": "4363044b",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [],
"source": [
"conda create --name myclone --clone myenv\n"
@@ -1297,8 +1502,12 @@
},
{
"cell_type": "markdown",
- "id": "3bcaf2e6",
- "metadata": {},
+ "id": "856b631d",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Eksportowanie środowisk\n",
"- Definicję środowiska można wyeksportować do pliku `*.yml`, który może potem posłużyć do jego odtworzenia\n",
@@ -1308,8 +1517,12 @@
{
"cell_type": "code",
"execution_count": 26,
- "id": "f901b34b",
- "metadata": {},
+ "id": "3f497efb",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -1354,8 +1567,12 @@
},
{
"cell_type": "markdown",
- "id": "c9fb78ca",
- "metadata": {},
+ "id": "69054adb",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"Normalnie, zapisalibyśmy wyni eksportu do pliku:"
]
@@ -1363,8 +1580,12 @@
{
"cell_type": "code",
"execution_count": 31,
- "id": "048760fd",
- "metadata": {},
+ "id": "08bb0906",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -1381,8 +1602,12 @@
},
{
"cell_type": "markdown",
- "id": "7e3c1e11",
- "metadata": {},
+ "id": "8d0fd480",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"#### Eksport między systemami\n",
" - Jeśli chcemy zapeniwć, że nasze środowisko będzie można odtworzyć na innym systemie, musimy uyżyć flagi `--from-history`\n",
@@ -1392,8 +1617,12 @@
{
"cell_type": "code",
"execution_count": 29,
- "id": "43fcb023",
- "metadata": {},
+ "id": "9bfbf736",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -1421,8 +1650,12 @@
},
{
"cell_type": "markdown",
- "id": "9b67fddd",
- "metadata": {},
+ "id": "52965351",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
"### Tworzenie środowiska z pliku `*.yml`\n",
"Mając plik `*.yml` wyeksportowany za pomocą `conda env export` albo stworzony/zmodyfikowane [ręcznie](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually), możemy utworzyć na jego podstawie środowisko:"
@@ -1431,8 +1664,12 @@
{
"cell_type": "code",
"execution_count": 32,
- "id": "4a4ce330",
- "metadata": {},
+ "id": "efe7bcd9",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"outputs": [
{
"name": "stdout",
@@ -1463,10 +1700,14 @@
},
{
"cell_type": "markdown",
- "id": "f37e7aa7",
- "metadata": {},
+ "id": "b88eb07c",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
"source": [
- "## Zadania\n",
+ "## Zadania [10pkt]\n",
"1. Zainstaluj Anaconda lub Miniconda na swoim komputerze\n",
"2. Stwórz środowisko zawierające wszystkie zależności wymagane przez stworzone na zajęciach skrypty/programy\n",
"3. Wyeksportuj środowisko do pliku `environment.yml` i dodaj ten plik do repozytorium"
@@ -1474,6 +1715,7 @@
}
],
"metadata": {
+ "celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
diff --git a/IUM_10.DVC.ipynb b/IUM_10.DVC.ipynb
new file mode 100644
index 0000000..1c080cd
--- /dev/null
+++ b/IUM_10.DVC.ipynb
@@ -0,0 +1,1387 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "0c6f27a5",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "# DVC\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "560eec71",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## DVC - Data Version Control\n",
+ "- [dvc.org](https://dvc.org/)\n",
+ "- \"Version Control System for Machine Learning Projects\" (System kontroli wersji dla projektów uczenia maszynowego)\n",
+ "- Open Source\n",
+ "- Umożliwia:\n",
+ " - wersjonowanie danych i modeli. \"Git dla danych i modeli\"\n",
+ " - budowanie potoków (\"pipeline\") definiujących jak budować/trenować/ewaluować modele. \"Makefile dla uczenia maszynowego\"\n",
+ " - śledzeniem, porównywanie metryk i parametrów\n",
+ "- ściśle zintegowany z gitem\n",
+ "- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego\n",
+ "- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs&t=197s"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9bfb356e",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Instalacja i inicjalizacja\n",
+ " - https://dvc.org/doc/install\n",
+ " - ```pip(x) install dvc``` albo:\n",
+ " - ```conda install dvc```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "054c7a11",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Collecting package metadata (current_repodata.json): done\n",
+ "Solving environment: failed with initial frozen solve. Retrying with flexible solve.\n",
+ "Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.\n",
+ "Collecting package metadata (repodata.json): done\n",
+ "Solving environment: done\n",
+ "\n",
+ "## Package Plan ##\n",
+ "\n",
+ " environment location: /home/tomek/miniconda3\n",
+ "\n",
+ " added / updated specs:\n",
+ " - dvc\n",
+ "\n",
+ "\n",
+ "The following packages will be downloaded:\n",
+ "\n",
+ " package | build\n",
+ " ---------------------------|-----------------\n",
+ " atpublic-1.0 | py_0 7 KB conda-forge\n",
+ " bzip2-1.0.8 | h7f98852_4 484 KB conda-forge\n",
+ " cached-property-1.5.2 | hd8ed1ab_1 4 KB conda-forge\n",
+ " cached_property-1.5.2 | pyha770c72_1 11 KB conda-forge\n",
+ " colorama-0.4.4 | pyh9f0ad1d_0 18 KB conda-forge\n",
+ " commonmark-0.9.1 | py_0 46 KB conda-forge\n",
+ " configobj-5.0.6 | py_0 31 KB conda-forge\n",
+ " dictdiffer-0.8.1 | pyhd8ed1ab_0 16 KB conda-forge\n",
+ " diskcache-5.2.1 | pyh44b312d_0 36 KB conda-forge\n",
+ " distro-1.5.0 | pyh9f0ad1d_0 20 KB conda-forge\n",
+ " dpath-2.0.1 | py39hf3d152e_0 23 KB conda-forge\n",
+ " dulwich-0.20.23 | py39h3811e60_0 721 KB conda-forge\n",
+ " dvc-2.1.0 | py39hf3d152e_0 551 KB conda-forge\n",
+ " flatten-dict-0.3.0 | pyh9f0ad1d_0 11 KB conda-forge\n",
+ " flufl.lock-3.2 | py_0 19 KB conda-forge\n",
+ " fsspec-0.9.0 | pyhd8ed1ab_2 75 KB conda-forge\n",
+ " ftfy-5.5.1 | py_0 47 KB conda-forge\n",
+ " funcy-1.16 | pyhd8ed1ab_0 30 KB conda-forge\n",
+ " future-0.18.2 | py39hf3d152e_3 718 KB conda-forge\n",
+ " grandalf-0.6 | py_0 42 KB conda-forge\n",
+ " jsonpath-ng-1.5.2 | pyh9f0ad1d_0 26 KB conda-forge\n",
+ " libgit2-1.1.0 | h0b03e73_0 693 KB conda-forge\n",
+ " libssh2-1.9.0 | ha56f1ee_6 226 KB conda-forge\n",
+ " mailchecker-4.0.7 | pyhd8ed1ab_0 206 KB conda-forge\n",
+ " nanotime-0.5.2 | py_0 6 KB conda-forge\n",
+ " networkx-2.5 | py_0 1.2 MB conda-forge\n",
+ " pathlib2-2.3.5 | py39hf3d152e_3 35 KB conda-forge\n",
+ " pathspec-0.8.1 | pyhd3deb0d_0 29 KB conda-forge\n",
+ " pcre2-10.35 | h032f7d1_2 693 KB conda-forge\n",
+ " phonenumbers-8.10.14 | py_0 1.5 MB conda-forge\n",
+ " ply-3.11 | py_1 44 KB conda-forge\n",
+ " pyasn1-0.4.8 | py_0 53 KB conda-forge\n",
+ " pydot-1.2.4 | py_0 20 KB conda-forge\n",
+ " pygit2-1.5.0 | py39h3811e60_0 213 KB conda-forge\n",
+ " pygtrie-2.3.2 | pyh8c360ce_0 24 KB conda-forge\n",
+ " python-benedict-0.24.0 | pyhd8ed1ab_0 30 KB conda-forge\n",
+ " python-fsutil-0.5.0 | pyhd8ed1ab_0 13 KB conda-forge\n",
+ " python-slugify-5.0.2 | pyhd8ed1ab_0 12 KB conda-forge\n",
+ " rich-10.2.2 | py39hf3d152e_0 337 KB conda-forge\n",
+ " ruamel.yaml-0.17.4 | py39h3811e60_0 160 KB conda-forge\n",
+ " ruamel.yaml.clib-0.2.2 | py39h3811e60_2 173 KB conda-forge\n",
+ " shortuuid-1.0.1 | py39hf3d152e_4 15 KB conda-forge\n",
+ " shtab-1.3.6 | pyhd8ed1ab_0 15 KB conda-forge\n",
+ " text-unidecode-1.3 | py_0 68 KB conda-forge\n",
+ " toml-0.10.2 | pyhd8ed1ab_0 18 KB conda-forge\n",
+ " unidecode-1.2.0 | pyhd8ed1ab_0 155 KB conda-forge\n",
+ " voluptuous-0.12.1 | pyhd3deb0d_0 28 KB conda-forge\n",
+ " zc.lockfile-2.0 | py_0 11 KB conda-forge\n",
+ " ------------------------------------------------------------\n",
+ " Total: 8.8 MB\n",
+ "\n",
+ "The following NEW packages will be INSTALLED:\n",
+ "\n",
+ " _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu\n",
+ " appdirs conda-forge/noarch::appdirs-1.4.4-pyh9f0ad1d_0\n",
+ " atpublic conda-forge/noarch::atpublic-1.0-py_0\n",
+ " bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4\n",
+ " cached-property conda-forge/noarch::cached-property-1.5.2-hd8ed1ab_1\n",
+ " cached_property conda-forge/noarch::cached_property-1.5.2-pyha770c72_1\n",
+ " colorama conda-forge/noarch::colorama-0.4.4-pyh9f0ad1d_0\n",
+ " commonmark conda-forge/noarch::commonmark-0.9.1-py_0\n",
+ " configobj conda-forge/noarch::configobj-5.0.6-py_0\n",
+ " dictdiffer conda-forge/noarch::dictdiffer-0.8.1-pyhd8ed1ab_0\n",
+ " diskcache conda-forge/noarch::diskcache-5.2.1-pyh44b312d_0\n",
+ " distro conda-forge/noarch::distro-1.5.0-pyh9f0ad1d_0\n",
+ " dpath conda-forge/linux-64::dpath-2.0.1-py39hf3d152e_0\n",
+ " dulwich conda-forge/linux-64::dulwich-0.20.23-py39h3811e60_0\n",
+ " dvc conda-forge/linux-64::dvc-2.1.0-py39hf3d152e_0\n",
+ " flatten-dict conda-forge/noarch::flatten-dict-0.3.0-pyh9f0ad1d_0\n",
+ " flufl.lock conda-forge/noarch::flufl.lock-3.2-py_0\n",
+ " fsspec conda-forge/noarch::fsspec-0.9.0-pyhd8ed1ab_2\n",
+ " ftfy conda-forge/noarch::ftfy-5.5.1-py_0\n",
+ " funcy conda-forge/noarch::funcy-1.16-pyhd8ed1ab_0\n",
+ " future conda-forge/linux-64::future-0.18.2-py39hf3d152e_3\n",
+ " gitdb conda-forge/noarch::gitdb-4.0.7-pyhd8ed1ab_0\n",
+ " gitpython conda-forge/noarch::gitpython-3.1.17-pyhd8ed1ab_0\n",
+ " grandalf conda-forge/noarch::grandalf-0.6-py_0\n",
+ " jsonpath-ng conda-forge/noarch::jsonpath-ng-1.5.2-pyh9f0ad1d_0\n",
+ " libgit2 conda-forge/linux-64::libgit2-1.1.0-h0b03e73_0\n",
+ " libgomp conda-forge/linux-64::libgomp-9.3.0-h2828fa1_19\n",
+ " libssh2 conda-forge/linux-64::libssh2-1.9.0-ha56f1ee_6\n",
+ " mailchecker conda-forge/noarch::mailchecker-4.0.7-pyhd8ed1ab_0\n",
+ " nanotime conda-forge/noarch::nanotime-0.5.2-py_0\n",
+ " networkx conda-forge/noarch::networkx-2.5-py_0\n",
+ " pathlib2 conda-forge/linux-64::pathlib2-2.3.5-py39hf3d152e_3\n",
+ " pathspec conda-forge/noarch::pathspec-0.8.1-pyhd3deb0d_0\n",
+ " pcre2 conda-forge/linux-64::pcre2-10.35-h032f7d1_2\n",
+ " phonenumbers conda-forge/noarch::phonenumbers-8.10.14-py_0\n",
+ " pip conda-forge/noarch::pip-21.1.2-pyhd8ed1ab_0\n",
+ " ply conda-forge/noarch::ply-3.11-py_1\n",
+ " pyasn1 conda-forge/noarch::pyasn1-0.4.8-py_0\n",
+ " pydot conda-forge/noarch::pydot-1.2.4-py_0\n",
+ " pygit2 conda-forge/linux-64::pygit2-1.5.0-py39h3811e60_0\n",
+ " pygtrie conda-forge/noarch::pygtrie-2.3.2-pyh8c360ce_0\n",
+ " python-benedict conda-forge/noarch::python-benedict-0.24.0-pyhd8ed1ab_0\n",
+ " python-fsutil conda-forge/noarch::python-fsutil-0.5.0-pyhd8ed1ab_0\n",
+ " python-slugify conda-forge/noarch::python-slugify-5.0.2-pyhd8ed1ab_0\n",
+ " rich conda-forge/linux-64::rich-10.2.2-py39hf3d152e_0\n",
+ " ruamel.yaml conda-forge/linux-64::ruamel.yaml-0.17.4-py39h3811e60_0\n",
+ " ruamel.yaml.clib conda-forge/linux-64::ruamel.yaml.clib-0.2.2-py39h3811e60_2\n",
+ " shortuuid conda-forge/linux-64::shortuuid-1.0.1-py39hf3d152e_4\n",
+ " shtab conda-forge/noarch::shtab-1.3.6-pyhd8ed1ab_0\n",
+ " smmap conda-forge/noarch::smmap-3.0.5-pyh44b312d_0\n",
+ " tabulate conda-forge/noarch::tabulate-0.8.9-pyhd8ed1ab_0\n",
+ " text-unidecode conda-forge/noarch::text-unidecode-1.3-py_0\n",
+ " toml conda-forge/noarch::toml-0.10.2-pyhd8ed1ab_0\n",
+ " typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0\n",
+ " unidecode conda-forge/noarch::unidecode-1.2.0-pyhd8ed1ab_0\n",
+ " voluptuous conda-forge/noarch::voluptuous-0.12.1-pyhd3deb0d_0\n",
+ " wheel conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0\n",
+ " zc.lockfile conda-forge/noarch::zc.lockfile-2.0-py_0\n",
+ "\n",
+ "The following packages will be UPDATED:\n",
+ "\n",
+ " certifi pkgs/main::certifi-2020.12.5-py39h06a~ --> conda-forge::certifi-2020.12.5-py39hf3d152e_1\n",
+ " libgcc-ng pkgs/main::libgcc-ng-9.1.0-hdf63c60_0 --> conda-forge::libgcc-ng-9.3.0-h2828fa1_19\n",
+ "\n",
+ "The following packages will be SUPERSEDED by a higher-priority channel:\n",
+ "\n",
+ " _libgcc_mutex pkgs/main::_libgcc_mutex-0.1-main --> conda-forge::_libgcc_mutex-0.1-conda_forge\n",
+ " ca-certificates pkgs/main::ca-certificates-2021.4.13-~ --> conda-forge::ca-certificates-2020.12.5-ha878542_0\n",
+ " conda pkgs/main::conda-4.10.1-py39h06a4308_1 --> conda-forge::conda-4.10.1-py39hf3d152e_0\n",
+ " openssl pkgs/main::openssl-1.1.1k-h27cfd23_0 --> conda-forge::openssl-1.1.1k-h7f98852_0\n",
+ "\n",
+ "\n",
+ "\n",
+ "Downloading and Extracting Packages\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "diskcache-5.2.1 | 36 KB | ##################################### | 100% \n",
+ "pathspec-0.8.1 | 29 KB | ##################################### | 100% \n",
+ "cached-property-1.5. | 4 KB | ##################################### | 100% \n",
+ "networkx-2.5 | 1.2 MB | ##################################### | 100% \n",
+ "commonmark-0.9.1 | 46 KB | ##################################### | 100% \n",
+ "configobj-5.0.6 | 31 KB | ##################################### | 100% \n",
+ "python-fsutil-0.5.0 | 13 KB | ##################################### | 100% \n",
+ "fsspec-0.9.0 | 75 KB | ##################################### | 100% \n",
+ "dulwich-0.20.23 | 721 KB | ##################################### | 100% \n",
+ "funcy-1.16 | 30 KB | ##################################### | 100% \n",
+ "bzip2-1.0.8 | 484 KB | ##################################### | 100% \n",
+ "ply-3.11 | 44 KB | ##################################### | 100% \n",
+ "libgit2-1.1.0 | 693 KB | ##################################### | 100% \n",
+ "ftfy-5.5.1 | 47 KB | ##################################### | 100% \n",
+ "nanotime-0.5.2 | 6 KB | ##################################### | 100% \n",
+ "pyasn1-0.4.8 | 53 KB | ##################################### | 100% \n",
+ "unidecode-1.2.0 | 155 KB | ##################################### | 100% \n",
+ "dvc-2.1.0 | 551 KB | ##################################### | 100% \n",
+ "pydot-1.2.4 | 20 KB | ##################################### | 100% \n",
+ "zc.lockfile-2.0 | 11 KB | ##################################### | 100% \n",
+ "dpath-2.0.1 | 23 KB | ##################################### | 100% \n",
+ "pcre2-10.35 | 693 KB | ##################################### | 100% \n",
+ "ruamel.yaml-0.17.4 | 160 KB | ##################################### | 100% \n",
+ "flatten-dict-0.3.0 | 11 KB | ##################################### | 100% \n",
+ "python-slugify-5.0.2 | 12 KB | ##################################### | 100% \n",
+ "shortuuid-1.0.1 | 15 KB | ##################################### | 100% \n",
+ "text-unidecode-1.3 | 68 KB | ##################################### | 100% \n",
+ "cached_property-1.5. | 11 KB | ##################################### | 100% \n",
+ "colorama-0.4.4 | 18 KB | ##################################### | 100% \n",
+ "flufl.lock-3.2 | 19 KB | ##################################### | 100% \n",
+ "libssh2-1.9.0 | 226 KB | ##################################### | 100% \n",
+ "python-benedict-0.24 | 30 KB | ##################################### | 100% \n",
+ "distro-1.5.0 | 20 KB | ##################################### | 100% \n",
+ "grandalf-0.6 | 42 KB | ##################################### | 100% \n",
+ "future-0.18.2 | 718 KB | ##################################### | 100% \n",
+ "ruamel.yaml.clib-0.2 | 173 KB | ##################################### | 100% \n",
+ "rich-10.2.2 | 337 KB | ##################################### | 100% \n",
+ "shtab-1.3.6 | 15 KB | ##################################### | 100% \n",
+ "pygtrie-2.3.2 | 24 KB | ##################################### | 100% \n",
+ "mailchecker-4.0.7 | 206 KB | ##################################### | 100% \n",
+ "voluptuous-0.12.1 | 28 KB | ##################################### | 100% \n",
+ "atpublic-1.0 | 7 KB | ##################################### | 100% \n",
+ "phonenumbers-8.10.14 | 1.5 MB | ##################################### | 100% \n",
+ "pathlib2-2.3.5 | 35 KB | ##################################### | 100% \n",
+ "pygit2-1.5.0 | 213 KB | ##################################### | 100% \n",
+ "dictdiffer-0.8.1 | 16 KB | ##################################### | 100% \n",
+ "toml-0.10.2 | 18 KB | ##################################### | 100% \n",
+ "jsonpath-ng-1.5.2 | 26 KB | ##################################### | 100% \n",
+ "Preparing transaction: done\n",
+ "Verifying transaction: done\n",
+ "Executing transaction: done\n",
+ "\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "conda install dvc"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "20975d62",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "aae59ec2",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "!mkdir -p IUM_10/sample-ml-project"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "1e522a93",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd\n",
+ "%cd \"IUM_10/sample-ml-project\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "199c0d92",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "c13c525b",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Initialized empty Git repository in /home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project/.git/\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git init"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7155369",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Teraz inicjalizujemy repozytorium DVC:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "44f28226",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Initialized DVC repository.\n",
+ "\n",
+ "You can now commit the changes to git.\n",
+ "\n",
+ "\u001b[31m+---------------------------------------------------------------------+\n",
+ "\u001b[0m\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n",
+ "\u001b[31m|\u001b[0m DVC has enabled anonymous aggregate usage analytics. \u001b[31m|\u001b[0m\n",
+ "\u001b[31m|\u001b[0m Read the analytics documentation (and how to opt-out) here: \u001b[31m|\u001b[0m\n",
+ "\u001b[31m|\u001b[0m <\u001b[36mhttps://dvc.org/doc/user-guide/analytics\u001b[39m> \u001b[31m|\u001b[0m\n",
+ "\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n",
+ "\u001b[31m+---------------------------------------------------------------------+\n",
+ "\u001b[0m\n",
+ "\u001b[33mWhat's next?\u001b[39m\n",
+ "\u001b[33m------------\u001b[39m\n",
+ "- Check out the documentation: <\u001b[36mhttps://dvc.org/doc\u001b[39m>\n",
+ "- Get help and share ideas: <\u001b[36mhttps://dvc.org/chat\u001b[39m>\n",
+ "- Star us on GitHub: <\u001b[36mhttps://github.com/iterative/dvc\u001b[39m>\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc init"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00bc72ed",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Zobaczmy jakie pliki dodał (również do repozytorium git) DVC.\n",
+ "Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "d1aefe16",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "On branch master\r\n",
+ "\r\n",
+ "No commits yet\r\n",
+ "\r\n",
+ "Changes to be committed:\r\n",
+ " (use \"git rm --cached ...\" to unstage)\r\n",
+ "\t\u001b[32mnew file: .dvc/.gitignore\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/config\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/confusion.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/confusion_normalized.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/default.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/linear.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/scatter.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvc/plots/smooth.json\u001b[m\r\n",
+ "\t\u001b[32mnew file: .dvcignore\u001b[m\r\n",
+ "\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git status"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "72e0a272",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Możemy teraz zacommitować zmiany w git:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "59780e99",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "On branch master\r\n",
+ "nothing to commit, working tree clean\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git commit -m \"Initial commit\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd8e529b",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Śledzenie plików za pomocą DVC\n",
+ " - dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:\n",
+ " - wydajnością\n",
+ " - przestrzenią w repozytorium\n",
+ " - Git posiada rozszerzenie [lfs(Large File Storage)](https://git-lfs.github.com/), które stanowi pewne rozwiązanie tego problemu. Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane\n",
+ " - DVC proponuje podobne podejście, ale:\n",
+ " - pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie\n",
+ " - brak limitu wielkości plików (w Git-LFS najczęściej limit 2GB)\n",
+ " - DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z wynikami eksperymentów\n",
+ " - więcej, patrz [tutaj](https://dvc.org/doc/user-guide/related-technologies)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a8861abe",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Przygotujmy przykładowe dane, pobierając je z Kaggle:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "f05ece1b",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Downloading iris.zip to /home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project\n",
+ " 0%| | 0.00/3.60k [00:00, ?B/s]\n",
+ "100%|██████████████████████████████████████| 3.60k/3.60k [00:00<00:00, 2.63MB/s]\n",
+ "Archive: iris.zip\n",
+ " inflating: Iris.csv \n",
+ " inflating: database.sqlite \n"
+ ]
+ }
+ ],
+ "source": [
+ "!kaggle datasets download -d uciml/iris\n",
+ "!unzip -o iris.zip\n",
+ "!rm database.sqlite iris.zip\n",
+ "!mkdir -p data\n",
+ "!mv Iris.csv data/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "adb9a522",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Teraz dodamy plik(i) z danymi do DVC:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "74d182c7",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Adding... \n",
+ "!\u001b[A\n",
+ " 0%| |.E8dZEGBYoRayYsJLdesNS4.tmp 0.00/5.11k [00:00, ?it/s]\u001b[A\n",
+ "100% Add|██████████████████████████████████████████████|1/1 [00:04, 4.71s/file]\u001b[A\n",
+ "\n",
+ "To track the changes with git, run:\n",
+ "\n",
+ "\tgit add data/Iris.csv.dvc data/.gitignore\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc add data/Iris.csv"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "72c6b5d0",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ " - DVC utworzył plik `data/Iris.csv.dvc` i dadał oryginalny plik do `.gitignore`\n",
+ " - W repozytorium będzie obecny tylko plik `*.dvc`, zawierający odnośnik do prawdziwego pliku"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "74d54652",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "On branch master\r\n",
+ "Untracked files:\r\n",
+ " (use \"git add ...\" to include in what will be committed)\r\n",
+ "\t\u001b[31mdata/.gitignore\u001b[m\r\n",
+ "\t\u001b[31mdata/Iris.csv.dvc\u001b[m\r\n",
+ "\t\u001b[31miris.zip\u001b[m\r\n",
+ "\r\n",
+ "nothing added to commit but untracked files present (use \"git add\" to track)\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git status -u"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8589fecf",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Dodajmy pliki `data/Iris.csv.dvc data/.gitignore` do repozytorium git, zgodnie z sugestią DVC:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "460c4a17",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "!git add data/Iris.csv.dvc data/.gitignore"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "80644077",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[master cc0821a] Dodano dane IRIS (DVC)\r\n",
+ " 2 files changed, 5 insertions(+)\r\n",
+ " create mode 100644 data/.gitignore\r\n",
+ " create mode 100644 data/Iris.csv.dvc\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git commit -m \"Dodano dane IRIS (DVC)\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "03899863",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Plik `*.dvc` zawiera m.in. hash pliku. Więcej o plikach `*.dvc`: [link](https://dvc.org/doc/user-guide/project-structure/dvc-files)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8cb2ba7c",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# %load data/Iris.csv.dvc\n",
+ "outs:\n",
+ "- md5: 717820ef0af287ff346c5cabfb4c612c\n",
+ " size: 5107\n",
+ " path: Iris.csv\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0b421d45",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Oryginalny plik `Iris.csv` został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być [różny w zależności od systemu plików](https://dvc.org/doc/user-guide/large-dataset-optimization)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "1d471f3a",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "total 8\r\n",
+ "-r--r--r-- 1 tomek tomek 5107 wrz 19 2019 7820ef0af287ff346c5cabfb4c612c\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls -l .dvc/cache/71"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "901e8e90",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## dvc remote\n",
+ " - żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników) musimy mieć skonfigurowaną taką lokazliację\n",
+ " - służy do tego polecenie [`dvc remote add`](https://dvc.org/doc/command-reference/remote/add)\n",
+ " - użyjemy lokalnego \"remote\". Tutaj będzie to po prostu utworzony wcześniej katalog `/dvcstore`. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze\n",
+ " - w realnych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez inernet jak np. serwer SFTP, ścieżka do AWS S3 itp."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "731f6ea4",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Setting 'my_local_remote' as a default remote.\r\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc remote add -d my_local_remote /dvcstore"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "9c3deeaf",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "On branch master\r\n",
+ "Changes not staged for commit:\r\n",
+ " (use \"git add ...\" to update what will be committed)\r\n",
+ " (use \"git restore ...\" to discard changes in working directory)\r\n",
+ "\t\u001b[31mmodified: .dvc/config\u001b[m\r\n",
+ "\r\n",
+ "no changes added to commit (use \"git add\" and/or \"git commit -a\")\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git status"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "899eac7d",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[master 3ff62b6] Added DVC remote\r\n",
+ " 1 file changed, 4 insertions(+)\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git add .dvc/config\n",
+ "!git commit -m \"Added DVC remote\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8c556c96",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## dvc push\n",
+ "Kiedy mamy już skonfigurowany \"remote\" możemy wypchnąć do niego pliki korzystając z polecenia `dvc push`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "8ecf3091",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 0% Uploading| |0/1 [00:00, ?file/s]\n",
+ "!\u001b[A\n",
+ " 0%| |data/Iris.csv 0.00/5.11k [00:00, ?it/s]\u001b[A\n",
+ "1 file pushed \u001b[A\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc push"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "id": "8a355575",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[34;42m/dvcstore\u001b[00m\r\n",
+ "└── \u001b[01;34m71\u001b[00m\r\n",
+ " └── 7820ef0af287ff346c5cabfb4c612c\r\n",
+ "\r\n",
+ "1 directory, 1 file\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!tree /dvcstore"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af59ecb3",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## dvc pull\n",
+ "Żeby pobrać dane z DVC (np. w innej lokalizacji, przez innego użytkownika), musimy:\n",
+ " - sklonować repozytorium git (żeby m.in. pobrać pliki `*.dvc`\n",
+ " - wykonać `dvc pull`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9fa914a7",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "Dodawanie nowych plików i modyfikacja istniejących wygląda tak podobnie jak przy zwykłych plikach śledzonych przez git, tylko zamiast `git` używamy polecenia `dvc` a dodatkowo pamiętamy o zarządzaniu plikami `*.dvc` za pomocą gita:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "dde39796",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "!head -n -1 data/Iris.csv | sponge data/Iris.csv"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "id": "7f14ec60",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "On branch master\r\n",
+ "nothing to commit, working tree clean\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git status"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "id": "8a841039",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "data/Iris.csv.dvc: core\u001b[39m>\n",
+ "\tchanged outs:\n",
+ "\t\tmodified: data/Iris.csv\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc status"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "id": "bf6c1067",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Adding... \n",
+ "!\u001b[A\n",
+ " 0%| |.TatTHknArFHCT9iDCtxHzh.tmp 0.00/5.07k [00:00, ?it/s]\u001b[A\n",
+ "100% Add|██████████████████████████████████████████████|1/1 [00:00, 2.68file/s]\u001b[A\n",
+ "\n",
+ "To track the changes with git, run:\n",
+ "\n",
+ "\tgit add data/Iris.csv.dvc\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc add data/Iris.csv"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "id": "4a4865c9",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[master e38c244] Removed last line from Iris dataset\r\n",
+ " 1 file changed, 2 insertions(+), 2 deletions(-)\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "!git add data/Iris.csv.dvc\n",
+ "!git commit -m \"Removed last line from Iris dataset\"\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d710977c",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "### dvc checkout\n",
+ " - Polecenia `dvc checkout` używamy razem z `git checkout`, żeby zmienić branch, na którym pracujemy.\n",
+ " - DVC podmieni wersje plików śledzonych przez siebie na pochodzące z innego brancha (o ile pliki te się różnią i różnią się pliki `*.dvc` w odpowiednich branchach\n",
+ " - zmiana brancha przez git powoduje (ewentualną) zmianę plików `*.dvc` a `dvc checkout` kopiuje/linkuje pliki z katalogu `.dvc/cache` o wartościach hash odpowiadających tym z plików `*.dvc`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5897e8eb",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Wymiana danych między projektami\n",
+ " - za pomocą poleceń `dvc import` i `dvc update` możemy dodać i później aktualizować pliki śledzone przez DVC w innym repozytorium"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "id": "9b018146",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'\n",
+ " 0% Downloading| |0/1 [00:00, ?file/s]\n",
+ "!\u001b[A\n",
+ " 0%| |get-started/data.xml 0.00/37.9M [00:00, ?it/s]\u001b[A\n",
+ " 0%| |get-started/data.xml 64.0k/36.1M [00:00<02:12, 286kB/s]\u001b[A\n",
+ " 0%| |get-started/data.xml 128k/36.1M [00:00<01:33, 403kB/s]\u001b[A\n",
+ " 1%| |get-started/data.xml 256k/36.1M [00:00<00:57, 658kB/s]\u001b[A\n",
+ " 1%| |get-started/data.xml 384k/36.1M [00:00<00:45, 818kB/s]\u001b[A\n",
+ " 1%|▏ |get-started/data.xml 512k/36.1M [00:00<00:53, 693kB/s]\u001b[A\n",
+ " 2%|▏ |get-started/data.xml 640k/36.1M [00:01<00:57, 644kB/s]\u001b[A\n",
+ " 2%|▏ |get-started/data.xml 768k/36.1M [00:01<00:59, 619kB/s]\u001b[A\n",
+ " 2%|▏ |get-started/data.xml 896k/36.1M [00:01<00:51, 718kB/s]\u001b[A\n",
+ " 3%|▎ |get-started/data.xml 1.00M/36.1M [00:01<00:55, 666kB/s]\u001b[A\n",
+ " 3%|▎ |get-started/data.xml 1.12M/36.1M [00:01<00:57, 633kB/s]\u001b[A\n",
+ " 3%|▎ |get-started/data.xml 1.25M/36.1M [00:02<00:57, 638kB/s]\u001b[A\n",
+ " 4%|▍ |get-started/data.xml 1.38M/36.1M [00:02<00:52, 698kB/s]\u001b[A\n",
+ " 4%|▍ |get-started/data.xml 1.50M/36.1M [00:02<00:55, 656kB/s]\u001b[A\n",
+ " 4%|▍ |get-started/data.xml 1.62M/36.1M [00:02<00:57, 628kB/s]\u001b[A\n",
+ " 5%|▍ |get-started/data.xml 1.69M/36.1M [00:02<00:58, 618kB/s]\u001b[A\n",
+ " 5%|▌ |get-started/data.xml 1.81M/36.1M [00:02<00:53, 675kB/s]\u001b[A\n",
+ " 5%|▌ |get-started/data.xml 1.94M/36.1M [00:03<00:53, 672kB/s]\u001b[A\n",
+ " 6%|▌ |get-started/data.xml 2.06M/36.1M [00:03<00:55, 642kB/s]\u001b[A\n",
+ " 6%|▌ |get-started/data.xml 2.12M/36.1M [00:03<00:56, 628kB/s]\u001b[A\n",
+ " 6%|▌ |get-started/data.xml 2.19M/36.1M [00:03<00:57, 616kB/s]\u001b[A\n",
+ " 6%|▌ |get-started/data.xml 2.25M/36.1M [00:03<00:58, 606kB/s]\u001b[A\n",
+ " 7%|▋ |get-started/data.xml 2.38M/36.1M [00:03<00:48, 732kB/s]\u001b[A\n",
+ " 7%|▋ |get-started/data.xml 2.50M/36.1M [00:04<00:52, 666kB/s]\u001b[A\n",
+ " 7%|▋ |get-started/data.xml 2.62M/36.1M [00:04<00:55, 636kB/s]\u001b[A\n",
+ " 8%|▊ |get-started/data.xml 2.75M/36.1M [00:04<00:56, 614kB/s]\u001b[A\n",
+ " 8%|▊ |get-started/data.xml 2.88M/36.1M [00:04<00:49, 711kB/s]\u001b[A\n",
+ " 8%|▊ |get-started/data.xml 3.00M/36.1M [00:04<00:52, 663kB/s]\u001b[A\n",
+ " 9%|▊ |get-started/data.xml 3.12M/36.1M [00:05<00:54, 637kB/s]\u001b[A\n",
+ " 9%|▉ |get-started/data.xml 3.25M/36.1M [00:05<00:55, 623kB/s]\u001b[A\n",
+ " 9%|▉ |get-started/data.xml 3.38M/36.1M [00:05<00:48, 710kB/s]\u001b[A\n",
+ " 10%|▉ |get-started/data.xml 3.50M/36.1M [00:05<00:51, 664kB/s]\u001b[A\n",
+ " 10%|█ |get-started/data.xml 3.62M/36.1M [00:05<00:45, 751kB/s]\u001b[A\n",
+ " 10%|█ |get-started/data.xml 3.75M/36.1M [00:05<00:49, 691kB/s]\u001b[A\n",
+ " 11%|█ |get-started/data.xml 3.88M/36.1M [00:06<00:43, 777kB/s]\u001b[A\n",
+ " 11%|█ |get-started/data.xml 4.00M/36.1M [00:06<00:47, 705kB/s]\u001b[A\n",
+ " 11%|█▏ |get-started/data.xml 4.12M/36.1M [00:06<00:42, 790kB/s]\u001b[A\n",
+ " 12%|█▏ |get-started/data.xml 4.25M/36.1M [00:06<00:46, 716kB/s]\u001b[A\n",
+ " 12%|█▏ |get-started/data.xml 4.38M/36.1M [00:06<00:44, 749kB/s]\u001b[A\n",
+ " 12%|█▏ |get-started/data.xml 4.50M/36.1M [00:07<00:45, 734kB/s]\u001b[A\n",
+ " 13%|█▎ |get-started/data.xml 4.62M/36.1M [00:07<00:40, 810kB/s]\u001b[A\n",
+ " 13%|█▎ |get-started/data.xml 4.75M/36.1M [00:07<00:42, 773kB/s]\u001b[A\n",
+ " 13%|█▎ |get-started/data.xml 4.88M/36.1M [00:07<00:41, 795kB/s]\u001b[A\n",
+ " 14%|█▍ |get-started/data.xml 5.00M/36.1M [00:07<00:37, 870kB/s]\u001b[A\n",
+ " 14%|█▍ |get-started/data.xml 5.12M/36.1M [00:07<00:34, 932kB/s]\u001b[A\n",
+ " 15%|█▍ |get-started/data.xml 5.25M/36.1M [00:07<00:35, 916kB/s]\u001b[A\n",
+ " 15%|█▍ |get-started/data.xml 5.38M/36.1M [00:08<00:35, 898kB/s]\u001b[A\n",
+ " 15%|█▌ |get-started/data.xml 5.50M/36.1M [00:08<00:33, 962kB/s]\u001b[A\n",
+ " 16%|█▌ |get-started/data.xml 5.62M/36.1M [00:08<00:33, 949kB/s]\u001b[A\n",
+ " 16%|█▌ |get-started/data.xml 5.75M/36.1M [00:08<00:31, 1.00MB/s]\u001b[A\n",
+ " 16%|█▋ |get-started/data.xml 5.88M/36.1M [00:08<00:30, 1.04MB/s]\u001b[A\n",
+ " 17%|█▋ |get-started/data.xml 6.06M/36.1M [00:08<00:26, 1.19MB/s]\u001b[A\n",
+ " 17%|█▋ |get-started/data.xml 6.19M/36.1M [00:08<00:26, 1.19MB/s]\u001b[A\n",
+ " 17%|█▋ |get-started/data.xml 6.31M/36.1M [00:08<00:26, 1.19MB/s]\u001b[A\n",
+ " 18%|█▊ |get-started/data.xml 6.50M/36.1M [00:08<00:23, 1.31MB/s]\u001b[A\n",
+ " 18%|█▊ |get-started/data.xml 6.62M/36.1M [00:09<00:23, 1.30MB/s]\u001b[A\n",
+ " 19%|█▉ |get-started/data.xml 6.81M/36.1M [00:09<00:21, 1.41MB/s]\u001b[A\n",
+ " 19%|█▉ |get-started/data.xml 7.00M/36.1M [00:09<00:20, 1.48MB/s]\u001b[A\n",
+ " 20%|█▉ |get-started/data.xml 7.19M/36.1M [00:09<00:19, 1.54MB/s]\u001b[A\n",
+ " 20%|██ |get-started/data.xml 7.38M/36.1M [00:09<00:18, 1.60MB/s]\u001b[A\n",
+ " 21%|██ |get-started/data.xml 7.56M/36.1M [00:09<00:18, 1.62MB/s]\u001b[A\n",
+ " 21%|██▏ |get-started/data.xml 7.75M/36.1M [00:09<00:17, 1.68MB/s]\u001b[A\n",
+ " 22%|██▏ |get-started/data.xml 7.94M/36.1M [00:09<00:17, 1.70MB/s]\u001b[A\n",
+ " 22%|██▏ |get-started/data.xml 8.12M/36.1M [00:10<00:17, 1.72MB/s]\u001b[A\n",
+ " 23%|██▎ |get-started/data.xml 8.38M/36.1M [00:10<00:15, 1.88MB/s]\u001b[A\n",
+ " 24%|██▎ |get-started/data.xml 8.56M/36.1M [00:10<00:15, 1.84MB/s]\u001b[A\n",
+ " 24%|██▍ |get-started/data.xml 8.81M/36.1M [00:10<00:14, 1.96MB/s]\u001b[A\n",
+ " 25%|██▌ |get-started/data.xml 9.06M/36.1M [00:10<00:13, 2.06MB/s]\u001b[A\n",
+ " 26%|██▌ |get-started/data.xml 9.31M/36.1M [00:10<00:13, 2.14MB/s]\u001b[A\n",
+ " 27%|██▋ |get-started/data.xml 9.62M/36.1M [00:10<00:11, 2.32MB/s]\u001b[A\n",
+ " 27%|██▋ |get-started/data.xml 9.88M/36.1M [00:10<00:11, 2.33MB/s]\u001b[A\n",
+ " 28%|██▊ |get-started/data.xml 10.2M/36.1M [00:10<00:11, 2.46MB/s]\u001b[A\n",
+ " 29%|██▉ |get-started/data.xml 10.4M/36.1M [00:11<00:10, 2.45MB/s]\u001b[A\n",
+ " 30%|██▉ |get-started/data.xml 10.8M/36.1M [00:11<00:10, 2.57MB/s]\u001b[A\n",
+ " 31%|███ |get-started/data.xml 11.1M/36.1M [00:11<00:09, 2.67MB/s]\u001b[A\n",
+ " 32%|███▏ |get-started/data.xml 11.4M/36.1M [00:11<00:09, 2.84MB/s]\u001b[A\n",
+ " 33%|███▎ |get-started/data.xml 11.8M/36.1M [00:11<00:08, 2.85MB/s]\u001b[A\n",
+ " 34%|███▎ |get-started/data.xml 12.1M/36.1M [00:11<00:08, 3.01MB/s]\u001b[A\n",
+ " 35%|███▍ |get-started/data.xml 12.5M/36.1M [00:11<00:07, 3.12MB/s]\u001b[A\n",
+ " 36%|███▌ |get-started/data.xml 12.9M/36.1M [00:11<00:07, 3.22MB/s]\u001b[A\n",
+ " 37%|███▋ |get-started/data.xml 13.2M/36.1M [00:11<00:07, 3.31MB/s]\u001b[A\n",
+ " 38%|███▊ |get-started/data.xml 13.7M/36.1M [00:12<00:06, 3.49MB/s]\u001b[A\n",
+ " 39%|███▉ |get-started/data.xml 14.1M/36.1M [00:12<00:06, 3.62MB/s]\u001b[A\n",
+ " 40%|████ |get-started/data.xml 14.6M/36.1M [00:12<00:06, 3.74MB/s]\u001b[A\n",
+ " 42%|████▏ |get-started/data.xml 15.0M/36.1M [00:12<00:05, 3.82MB/s]\u001b[A\n",
+ " 43%|████▎ |get-started/data.xml 15.4M/36.1M [00:12<00:05, 3.97MB/s]\u001b[A\n",
+ " 44%|████▍ |get-started/data.xml 15.9M/36.1M [00:12<00:05, 4.08MB/s]\u001b[A\n",
+ " 45%|████▌ |get-started/data.xml 16.4M/36.1M [00:12<00:04, 4.23MB/s]\u001b[A\n",
+ " 47%|████▋ |get-started/data.xml 17.0M/36.1M [00:12<00:04, 4.44MB/s]\u001b[A\n",
+ " 48%|████▊ |get-started/data.xml 17.5M/36.1M [00:12<00:04, 4.52MB/s]\u001b[A\n",
+ " 50%|████▉ |get-started/data.xml 18.1M/36.1M [00:13<00:04, 4.69MB/s]\u001b[A\n",
+ " 52%|█████▏ |get-started/data.xml 18.6M/36.1M [00:13<00:03, 4.84MB/s]\u001b[A\n",
+ " 53%|█████▎ |get-started/data.xml 19.2M/36.1M [00:13<00:03, 5.05MB/s]\u001b[A\n",
+ " 55%|█████▍ |get-started/data.xml 19.8M/36.1M [00:13<00:03, 5.16MB/s]\u001b[A\n",
+ " 57%|█████▋ |get-started/data.xml 20.4M/36.1M [00:13<00:03, 5.35MB/s]\u001b[A\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 58%|█████▊ |get-started/data.xml 21.1M/36.1M [00:13<00:02, 5.49MB/s]\u001b[A\n",
+ " 60%|██████ |get-started/data.xml 21.8M/36.1M [00:13<00:02, 5.66MB/s]\u001b[A\n",
+ " 62%|██████▏ |get-started/data.xml 22.4M/36.1M [00:13<00:02, 5.83MB/s]\u001b[A\n",
+ " 64%|██████▍ |get-started/data.xml 23.2M/36.1M [00:14<00:02, 6.05MB/s]\u001b[A\n",
+ " 66%|██████▌ |get-started/data.xml 23.9M/36.1M [00:14<00:02, 6.20MB/s]\u001b[A\n",
+ " 68%|██████▊ |get-started/data.xml 24.6M/36.1M [00:14<00:01, 6.40MB/s]\u001b[A\n",
+ " 70%|███████ |get-started/data.xml 25.4M/36.1M [00:14<00:01, 6.51MB/s]\u001b[A\n",
+ " 72%|███████▏ |get-started/data.xml 26.0M/36.1M [00:14<00:01, 5.75MB/s]\u001b[A\n",
+ " 74%|███████▎ |get-started/data.xml 26.6M/36.1M [00:14<00:02, 4.26MB/s]\u001b[A\n",
+ " 75%|███████▍ |get-started/data.xml 27.1M/36.1M [00:14<00:02, 3.53MB/s]\u001b[A\n",
+ " 76%|███████▌ |get-started/data.xml 27.5M/36.1M [00:15<00:02, 3.26MB/s]\u001b[A\n",
+ " 77%|███████▋ |get-started/data.xml 27.9M/36.1M [00:15<00:02, 3.00MB/s]\u001b[A\n",
+ " 78%|███████▊ |get-started/data.xml 28.2M/36.1M [00:15<00:02, 2.95MB/s]\u001b[A\n",
+ " 79%|███████▉ |get-started/data.xml 28.5M/36.1M [00:15<00:02, 2.91MB/s]\u001b[A\n",
+ " 80%|███████▉ |get-started/data.xml 28.8M/36.1M [00:15<00:02, 2.88MB/s]\u001b[A\n",
+ " 81%|████████ |get-started/data.xml 29.1M/36.1M [00:15<00:02, 2.86MB/s]\u001b[A\n",
+ " 81%|████████▏ |get-started/data.xml 29.4M/36.1M [00:15<00:02, 2.84MB/s]\u001b[A\n",
+ " 82%|████████▏ |get-started/data.xml 29.8M/36.1M [00:16<00:02, 2.83MB/s]\u001b[A\n",
+ " 83%|████████▎ |get-started/data.xml 30.1M/36.1M [00:16<00:02, 2.83MB/s]\u001b[A\n",
+ " 84%|████████▍ |get-started/data.xml 30.4M/36.1M [00:16<00:02, 2.83MB/s]\u001b[A\n",
+ " 85%|████████▍ |get-started/data.xml 30.7M/36.1M [00:16<00:02, 2.83MB/s]\u001b[A\n",
+ " 86%|████████▌ |get-started/data.xml 31.0M/36.1M [00:16<00:01, 2.83MB/s]\u001b[A\n",
+ " 87%|████████▋ |get-started/data.xml 31.3M/36.1M [00:16<00:01, 2.83MB/s]\u001b[A\n",
+ " 88%|████████▊ |get-started/data.xml 31.6M/36.1M [00:16<00:01, 2.83MB/s]\u001b[A\n",
+ " 88%|████████▊ |get-started/data.xml 31.9M/36.1M [00:16<00:01, 2.84MB/s]\u001b[A\n",
+ " 89%|████████▉ |get-started/data.xml 32.2M/36.1M [00:16<00:01, 2.85MB/s]\u001b[A\n",
+ " 90%|█████████ |get-started/data.xml 32.6M/36.1M [00:17<00:01, 2.85MB/s]\u001b[A\n",
+ " 91%|█████████ |get-started/data.xml 32.9M/36.1M [00:17<00:01, 2.86MB/s]\u001b[A\n",
+ " 92%|█████████▏|get-started/data.xml 33.2M/36.1M [00:17<00:01, 2.86MB/s]\u001b[A\n",
+ " 93%|█████████▎|get-started/data.xml 33.5M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 94%|█████████▎|get-started/data.xml 33.8M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 94%|█████████▍|get-started/data.xml 34.1M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 95%|█████████▌|get-started/data.xml 34.4M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 96%|█████████▌|get-started/data.xml 34.8M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 97%|█████████▋|get-started/data.xml 35.1M/36.1M [00:17<00:00, 2.87MB/s]\u001b[A\n",
+ " 98%|█████████▊|get-started/data.xml 35.4M/36.1M [00:18<00:00, 2.87MB/s]\u001b[A\n",
+ " 99%|█████████▉|get-started/data.xml 35.7M/36.1M [00:18<00:00, 2.88MB/s]\u001b[A\n",
+ "100%|█████████▉|get-started/data.xml 36.0M/36.1M [00:18<00:00, 2.87MB/s]\u001b[A\n",
+ " \u001b[A\n",
+ "To track the changes with git, run:\n",
+ "\n",
+ "\tgit add data/.gitignore data/data.xml.dvc\n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc import https://github.com/iterative/dataset-registry \\\n",
+ " get-started/data.xml -o data/data.xml"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "id": "be2c1a37",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Data and pipelines are up to date. \n",
+ "\u001b[0m"
+ ]
+ }
+ ],
+ "source": [
+ "!dvc status"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "id": "3306c5b7",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "total 37020\r\n",
+ "-rw-rw-r-- 1 tomek tomek 37891850 maj 31 11:10 data.xml\r\n",
+ "-rw-rw-r-- 1 tomek tomek 284 maj 31 11:10 data.xml.dvc\r\n",
+ "-rw-rw-r-- 1 tomek tomek 5072 maj 31 11:01 Iris.csv\r\n",
+ "-rw-rw-r-- 1 tomek tomek 76 maj 31 11:01 Iris.csv.dvc\r\n"
+ ]
+ }
+ ],
+ "source": [
+ "ls -l data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b73c56ea",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# %load data/data.xml.dvc\n",
+ "md5: a7cd139231cc35ed63541ce3829b96db\n",
+ "frozen: true\n",
+ "deps:\n",
+ "- path: get-started/data.xml\n",
+ " repo:\n",
+ " url: https://github.com/iterative/dataset-registry\n",
+ " rev_lock: ba014f40e29670421a67cb1c47543f402348aa13\n",
+ "outs:\n",
+ "- md5: a304afb96060aad90176268345e10355\n",
+ " size: 37891850\n",
+ " path: data.xml\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "db1063ac",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## DVC pipelines\n",
+ " - wprowadzenie: https://youtu.be/71IGzyH95UY\n",
+ " - Getting started: https://dvc.org/doc/start/data-pipelines\n",
+ " - dvc pipelines pozwala nam zbudować (za pomocą polecenie `dvc run`) lub zdefiniować (edytując plik `dvc.yaml`) graf zależności między krokami wykonywanymi w naszym projekcie (takimi jak \"przygotowanie danych\", \"trenowanie\", \"ewaluacja\")\n",
+ " - tak zdefiniowany pipeline można potem uruchomić za pomocą polecenia `dvc reproduce`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e2939867",
+ "metadata": {
+ "slideshow": {
+ "slide_type": "slide"
+ }
+ },
+ "source": [
+ "## Zadania [15pkt]\n",
+ "1. Zainicjalizuj repozytorium DVC wewnątrz Twojego repozytorium z projektem [1pkt]\n",
+ "2. Dodaj plik(i) z danymi w Twoim projekcie do DVC [1pkt]\n",
+ "3. Skonfiguruj remote (dane do konfiguracji będą podane niebawem) [1pkt]\n",
+ "4. Stwórz/zdefiniuj i dodaj do repozytorium plik `dvc.yaml` opisujący kroki wykonywane w Twoim projekcie. Wydziel przynajmniej 2 kroki (np. przygotowanie danych/trenowanie) powiązane ze sobą za pomocą zależności (skorzystaj z materiałów \"Getting started\", link powyżej) [6pkt]\n",
+ "5. Stwórz projekt na Jenkinsie (`s1233456-dvc`), w którym sklonujesz repozytorium, ściągniesz pliki dvc (za pomocą `dvc pull`) i uruchomisz pipeline (za pomocą `dvc reproduce`) [6pkt]"
+ ]
+ }
+ ],
+ "metadata": {
+ "celltoolbar": "Slideshow",
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/IUM_12.Praca.ipynb b/IUM_12.Praca.ipynb
new file mode 100644
index 0000000..ea4bbca
--- /dev/null
+++ b/IUM_12.Praca.ipynb
@@ -0,0 +1,79 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "85b20432",
+ "metadata": {},
+ "source": [
+ "# Przegląd rynku pracy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "69587a1e",
+ "metadata": {},
+ "source": [
+ "## Zadanie\n",
+ "\n",
+ "1. Znajdź minimum 6, interesujących dla Ciebie, ogłoszeń o pracę\n",
+ " - na razie nie zważaj na wymagania\n",
+ " - szukaj pracy, którą będziesz mógł wykonywać np. po ukończeniu studiów\n",
+ " - do tego czasu możesz jeszcze sporo się nauczyć\n",
+ " - nie przejmuj się lokalizacją geograficzną - żyjemy w czasie pracy zdalnej\n",
+ "\n",
+ "2. Policz wymagania (zarówno te obowiązkowe jak i mile widziane) z tych ogłoszeń, tak, żeby stworzyć ich listę frekwencyjną, np:\n",
+ "|wymaganie |liczba ogłoszeń|\n",
+ "|--------------------|---------------|\n",
+ "|analityczne myślenie| 6 |\n",
+ "|git | 5 |\n",
+ "|Python | 3 |\n",
+ "|Haskell | 1 |\n",
+ "\n",
+ "3. Przygotuj 3-4 slajdy prezentacji, która potrwa około 10 minut. Przedstaw w niej:\n",
+ " - listę ogłoszeń, które znalazłeś (stanowisko, firma)\n",
+ " - statystyki dotyczące wymagań (tabelka + wykres)\n",
+ " - wybierz jedno wymaganie. Najlepiej mniej znane i oczywiste. Jeśli jest to :\n",
+ " - technologia (biblioteka itp) - opowiedz o niej krótko (do czego służy, jakie ma możliwości itp)\n",
+ " - umiejętność, dziedzina wiedzy - opowiedz o niej krótko + przedstaw źródło, z którego można się jej nauczyć (książka, kurs, artykuł)\n",
+ " \n",
+ " Jeśli znasz jakieś ciekawe technologie/umiejętności, które chciałbyś przedstawić grupie, możesz odwrócić kolejność: poszukaj ogłoszeń, które o nich wspominają.\n",
+ " \n",
+ "Jedna osoba z grupy ma zadanie specjalne:\n",
+ " - nie szuka ogłoszeń\n",
+ " - zbiera statystyki od reszty grupy i agreguje je (eliminując duplikaty)\n",
+ " - przygotowuje 2 slajdy przedstawiające statystyki:\n",
+ " - wymagań (tak jak każdy, ale dla całej grupy)\n",
+ " - firm będących aautorami ogłoszeń"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1fb3795f",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}