From 4ff197902d337ae5fc8b5978c0226e5315bee974 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20Zi=C4=99tkiewicz?= Date: Mon, 31 May 2021 11:55:27 +0200 Subject: [PATCH] 10. DVC --- IUM_05.Biblioteki_DL.ipynb | 4 +- IUM_09.Python_srodowiska.ipynb | 466 ++++++++--- IUM_10.DVC.ipynb | 1387 ++++++++++++++++++++++++++++++++ IUM_12.Praca.ipynb | 79 ++ 4 files changed, 1822 insertions(+), 114 deletions(-) create mode 100644 IUM_10.DVC.ipynb create mode 100644 IUM_12.Praca.ipynb diff --git a/IUM_05.Biblioteki_DL.ipynb b/IUM_05.Biblioteki_DL.ipynb index 4d49da6..2456ccf 100644 --- a/IUM_05.Biblioteki_DL.ipynb +++ b/IUM_05.Biblioteki_DL.ipynb @@ -285,7 +285,7 @@ } }, "source": [ - "## Zadanie [20 pkt.]\n", + "## Zadanie [22 pkt.]\n", "\n", "Termin: 2 tygodnie (25 IV)\n", "\n", @@ -321,7 +321,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.5" + "version": "3.9.1" }, "toc": { "base_numbering": 1, diff --git a/IUM_09.Python_srodowiska.ipynb b/IUM_09.Python_srodowiska.ipynb index d0c2e1e..b2b2862 100644 --- a/IUM_09.Python_srodowiska.ipynb +++ b/IUM_09.Python_srodowiska.ipynb @@ -3,7 +3,11 @@ { "cell_type": "markdown", "id": "be5ab2df", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "# Środowiska wirtualne" ] @@ -11,7 +15,11 @@ { "cell_type": "markdown", "id": "cf14c577", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Python Virtual Env\n", " - Python posiada wbudowany mechanizm do zarządzania wirtualnymi środowiskami\n", @@ -23,16 +31,24 @@ }, { "cell_type": "markdown", - "id": "182bbf83", - "metadata": {}, + "id": "85284459", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Poniżej stworzymy środowisko w katalogu `./myenv`:" ] }, { "cell_type": "markdown", - "id": "69d39a9e", - "metadata": {}, + "id": "9cabe194", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ python3 -m venv myenv\n", @@ -41,16 +57,24 @@ }, { "cell_type": "markdown", - "id": "47a4bf00", - "metadata": {}, + "id": "2a8b1048", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Teraz możemy je aktywować:" ] }, { "cell_type": "markdown", - "id": "cf54cb09", - "metadata": {}, + "id": "4619a71d", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ source ./myenv/bin/activate\n", @@ -61,16 +85,24 @@ }, { "cell_type": "markdown", - "id": "bdafb824", - "metadata": {}, + "id": "2e5bf86a", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "I modyfikować instalując zależności:" ] }, { "cell_type": "markdown", - "id": "399a8b45", - "metadata": {}, + "id": "256149c4", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "(myenv) $ python3 -m pip install requests\n", @@ -79,16 +111,24 @@ }, { "cell_type": "markdown", - "id": "efde93a2", - "metadata": {}, + "id": "7fbba7d3", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Środowisko możemy deaktywować poprzez:" ] }, { "cell_type": "markdown", - "id": "838c5ebd", - "metadata": {}, + "id": "a2d688b7", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "deactivate\n", @@ -97,16 +137,37 @@ }, { "cell_type": "markdown", - "id": "0557e0c8", - "metadata": {}, + "id": "0d3eb6d4", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Środowisko można udostępnić kopiując cały katalog ze środowiskiem" ] }, { "cell_type": "markdown", - "id": "26f253cb", + "id": "90605b49", "metadata": {}, + "source": [ + "## pipx\n", + " - pipx: polecenie, które instaluje moduł Pythonowy w odrębnym środowisku wirtualnym venv\n", + " - jednocześnie dodaje powiązane z nim polecenie (\"Command line entry point\") do zmiennej `PATH`\n", + " - w ten sposób możemy zainstalować polecenie, które będzie globalnie dostępne a jednocześnie nie będzie \"mieszało\" w zależnościach modułów Pythonowych. Umożliwia to uniknięcie konfliktów między zależnościami i jednocześnie umożliwia dostęp do polecenia oferowanego przez moduł z poziomu systemu (bez ręcznej aktywacji środowiska)\n", + " - więcej informacji: https://packaging.python.org/guides/installing-stand-alone-command-line-tools/\n", + " - https://github.com/pypa/pipx" + ] + }, + { + "cell_type": "markdown", + "id": "26f253cb", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Conda\n", "> *Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.*\n", @@ -122,15 +183,19 @@ "Różnice między Conda a venv [źródło](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/environments.html#virtual-environments):\n", " - Dowolna wersja Python w Conda (inna niż systemowa)\n", " - Conda zarządza też zależnościami innymi niż Pythonowe\n", - " - Paczki w PyPI (używane przez P" + " - Paczki w PyPI (używane przez `pip`) pochodzą od ich autorów. Paczki w conda są  budowane przez conda albo społeczność conda-forge" ] }, { "cell_type": "markdown", "id": "57f19a08", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## Dystrybucje: Anaconda i Conda\n", + "## Dystrybucje: Anaconda i MiniConda\n", "Conda jest dostępna w dwóch dystrybcjach:\n", " - [Miniconda](https://docs.conda.io/en/latest/miniconda.html):\n", " - wymaga 400 MB miejsca na dysku\n", @@ -149,8 +214,12 @@ }, { "cell_type": "markdown", - "id": "b199274c", - "metadata": {}, + "id": "d6d5156a", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Dystrybucje\n", " - Wersje paczek/bibliotek zawartych w danej dystrybucji są przetestowane pod względem zgodności ze sobą\n", @@ -162,8 +231,12 @@ }, { "cell_type": "markdown", - "id": "b1d8c391", - "metadata": {}, + "id": "61d89bd2", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ conda update conda\n", @@ -253,7 +326,11 @@ { "cell_type": "markdown", "id": "1c7b2930", - "metadata": {}, + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Instalacja\n", "Instrukcje: \n", @@ -264,8 +341,12 @@ }, { "cell_type": "markdown", - "id": "3edc108c", - "metadata": {}, + "id": "0ceff229", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "## Pakiety Conda\n", "Pakiet (Package) conda to archiwum o rozszerzeniu `.tar.bz2` lub `.conda`zawierające:\n", @@ -285,8 +366,12 @@ }, { "cell_type": "markdown", - "id": "3094ba52", - "metadata": {}, + "id": "f54f3bdb", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Repozytoria i kanały\n", "- Pakiety mogą być ściągane z różnych kanałów (\"channels\")\n", @@ -298,8 +383,12 @@ }, { "cell_type": "markdown", - "id": "8c045d64", - "metadata": {}, + "id": "5ab846d0", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Na przykład, pakiet `mlflow` nie jest dostępny na oficjalnym kanale:" ] @@ -307,8 +396,12 @@ { "cell_type": "code", "execution_count": 6, - "id": "9d72552f", - "metadata": {}, + "id": "f301bdf7", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -347,8 +440,12 @@ }, { "cell_type": "markdown", - "id": "123b3e4e", - "metadata": {}, + "id": "ea36f2cd", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Do poleceń `search` i `install` możemy dodać flagę `channel` co doda podany kanał do listy przeszukiwanych przez to polecenie kanałów:" ] @@ -356,8 +453,12 @@ { "cell_type": "code", "execution_count": 7, - "id": "d55a2d84", - "metadata": {}, + "id": "61911bf5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -475,8 +576,12 @@ }, { "cell_type": "markdown", - "id": "a4994ed6", - "metadata": {}, + "id": "c7cea2c5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Możemy dodać kanał `conda-forge` tak, żeby był używany automatycznie (bez podawania flagi `channel`):" ] @@ -484,8 +589,12 @@ { "cell_type": "code", "execution_count": 11, - "id": "0f93ee82", - "metadata": {}, + "id": "df31755c", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -503,8 +612,12 @@ }, { "cell_type": "markdown", - "id": "85d7014f", - "metadata": {}, + "id": "33b4d692", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Polecanie `conda info` pokaże nam m.in. używane domyślnie kanały.\n", "Możem dodawać i usuwać kanały oraz zmieniać ich kolejność (priorytet) edytując plik [`~/.condarc`](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)" @@ -513,8 +626,12 @@ { "cell_type": "code", "execution_count": 10, - "id": "e57a28b9", - "metadata": {}, + "id": "b5621afa", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -564,8 +681,12 @@ }, { "cell_type": "markdown", - "id": "9e67e25c", - "metadata": {}, + "id": "cb1df295", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Jak widać, po dodaniu kanału `conda-forge`, pakiet `mlflow` zostaje znaleziony:" ] @@ -573,8 +694,12 @@ { "cell_type": "code", "execution_count": 9, - "id": "56e51ade", - "metadata": {}, + "id": "92b5da5b", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -692,8 +817,12 @@ }, { "cell_type": "markdown", - "id": "327015c2", - "metadata": {}, + "id": "7ab96758", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "\n", "## Środowiska (Environments)\n", @@ -706,8 +835,12 @@ }, { "cell_type": "markdown", - "id": "0ef28a62", - "metadata": {}, + "id": "5c57462d", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Wyświetlanie listy środowisk:" ] @@ -715,8 +848,12 @@ { "cell_type": "code", "execution_count": 12, - "id": "2865a5cf", - "metadata": {}, + "id": "638002cd", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -741,8 +878,12 @@ }, { "cell_type": "markdown", - "id": "c710e400", - "metadata": {}, + "id": "e7e40897", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Tworzenie środowiska\n", "Środowisko można utworzyć i skonfigurować interaktywnie, lub z pliku `*.yml`" @@ -751,8 +892,12 @@ { "cell_type": "code", "execution_count": 14, - "id": "8cd7d3f0", - "metadata": {}, + "id": "0373de68", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -790,8 +935,12 @@ }, { "cell_type": "markdown", - "id": "843c0104", - "metadata": {}, + "id": "c8688ba6", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Aktywacja środowiska\n", " - Żeby zmodyfikować środowisko albo zacząć z niego korzystać, musimy je aktywować.\n", @@ -801,8 +950,12 @@ }, { "cell_type": "markdown", - "id": "a6adf414", - "metadata": {}, + "id": "767f03e6", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ conda activate hello_env\n", @@ -813,8 +966,12 @@ }, { "cell_type": "markdown", - "id": "8457c95e", - "metadata": {}, + "id": "245e64be", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Domyślnie wersja pythona będzie taka sama jak systemowa.\n", "Żeby deaktywować środowisko, używamy `conda deactivate`:" @@ -822,8 +979,12 @@ }, { "cell_type": "markdown", - "id": "e3175dc3", - "metadata": {}, + "id": "261f9aab", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ conda deactivate\n", @@ -834,8 +995,12 @@ }, { "cell_type": "markdown", - "id": "753ff7e5", - "metadata": {}, + "id": "980c295c", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Modyfikowanie środowiska\n", "Jeśli chcemy stworzyc środowisko z inną wersją niż systemowa:" @@ -844,8 +1009,12 @@ { "cell_type": "code", "execution_count": 22, - "id": "5b5b0ca0", - "metadata": {}, + "id": "ccaadc75", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -956,16 +1125,24 @@ }, { "cell_type": "markdown", - "id": "a3350d71", - "metadata": {}, + "id": "68bcd671", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Rzeczywiście, stworzone środowisko ma wersję Pythona z linii 3.9:" ] }, { "cell_type": "markdown", - "id": "ee072a93", - "metadata": {}, + "id": "db3311c4", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "$ conda activate myenv\n", @@ -976,8 +1153,12 @@ }, { "cell_type": "markdown", - "id": "89d443aa", - "metadata": {}, + "id": "10b5cf2e", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Tak samo, przy tworzeniu środowiska możemy podać inne zależności wraz z ich wersjami:" ] @@ -985,8 +1166,12 @@ { "cell_type": "code", "execution_count": 25, - "id": "402c122e", - "metadata": {}, + "id": "dd2eefc5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -1246,16 +1431,24 @@ }, { "cell_type": "markdown", - "id": "4f9b7b29", - "metadata": {}, + "id": "d8417e18", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Możemy podać jakie pakiety i w jakich wersjach mają być domyślnie dodawane do nowo tworzonych środowisk za pomocą sekcji [create_default_package](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#config-add-default-pkgs) w pliku [`~/.condarc`](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html)" ] }, { "cell_type": "markdown", - "id": "50346792", - "metadata": {}, + "id": "7a2adbfa", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "```\n", "create_default_packages:\n", @@ -1267,8 +1460,12 @@ }, { "cell_type": "markdown", - "id": "01a9ea21", - "metadata": {}, + "id": "68c9045a", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Doinstalowanie nowych pakietów może odbyć się poprzez:\n", " - `conda install mlflow` - wywołane w aktywowanym środowisku\n", @@ -1278,8 +1475,12 @@ }, { "cell_type": "markdown", - "id": "7ec5430e", - "metadata": {}, + "id": "a6c3ee99", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "#### Klonowanie środowiska\n", "Istniejące środowisko można skopiować:" @@ -1288,8 +1489,12 @@ { "cell_type": "code", "execution_count": null, - "id": "18f72668", - "metadata": {}, + "id": "4363044b", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [], "source": [ "conda create --name myclone --clone myenv\n" @@ -1297,8 +1502,12 @@ }, { "cell_type": "markdown", - "id": "3bcaf2e6", - "metadata": {}, + "id": "856b631d", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Eksportowanie środowisk\n", "- Definicję środowiska można wyeksportować do pliku `*.yml`, który może potem posłużyć do jego odtworzenia\n", @@ -1308,8 +1517,12 @@ { "cell_type": "code", "execution_count": 26, - "id": "f901b34b", - "metadata": {}, + "id": "3f497efb", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -1354,8 +1567,12 @@ }, { "cell_type": "markdown", - "id": "c9fb78ca", - "metadata": {}, + "id": "69054adb", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "Normalnie, zapisalibyśmy wyni eksportu do pliku:" ] @@ -1363,8 +1580,12 @@ { "cell_type": "code", "execution_count": 31, - "id": "048760fd", - "metadata": {}, + "id": "08bb0906", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -1381,8 +1602,12 @@ }, { "cell_type": "markdown", - "id": "7e3c1e11", - "metadata": {}, + "id": "8d0fd480", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "#### Eksport między systemami\n", " - Jeśli chcemy zapeniwć, że nasze środowisko będzie można odtworzyć na innym systemie, musimy uyżyć flagi `--from-history`\n", @@ -1392,8 +1617,12 @@ { "cell_type": "code", "execution_count": 29, - "id": "43fcb023", - "metadata": {}, + "id": "9bfbf736", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -1421,8 +1650,12 @@ }, { "cell_type": "markdown", - "id": "9b67fddd", - "metadata": {}, + "id": "52965351", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ "### Tworzenie środowiska z pliku `*.yml`\n", "Mając plik `*.yml` wyeksportowany za pomocą `conda env export` albo stworzony/zmodyfikowane [ręcznie](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually), możemy utworzyć na jego podstawie środowisko:" @@ -1431,8 +1664,12 @@ { "cell_type": "code", "execution_count": 32, - "id": "4a4ce330", - "metadata": {}, + "id": "efe7bcd9", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "outputs": [ { "name": "stdout", @@ -1463,10 +1700,14 @@ }, { "cell_type": "markdown", - "id": "f37e7aa7", - "metadata": {}, + "id": "b88eb07c", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, "source": [ - "## Zadania\n", + "## Zadania [10pkt]\n", "1. Zainstaluj Anaconda lub Miniconda na swoim komputerze\n", "2. Stwórz środowisko zawierające wszystkie zależności wymagane przez stworzone na zajęciach skrypty/programy\n", "3. Wyeksportuj środowisko do pliku `environment.yml` i dodaj ten plik do repozytorium" @@ -1474,6 +1715,7 @@ } ], "metadata": { + "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", diff --git a/IUM_10.DVC.ipynb b/IUM_10.DVC.ipynb new file mode 100644 index 0000000..1c080cd --- /dev/null +++ b/IUM_10.DVC.ipynb @@ -0,0 +1,1387 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0c6f27a5", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# DVC\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "560eec71", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## DVC - Data Version Control\n", + "- [dvc.org](https://dvc.org/)\n", + "- \"Version Control System for Machine Learning Projects\" (System kontroli wersji dla projektów uczenia maszynowego)\n", + "- Open Source\n", + "- Umożliwia:\n", + " - wersjonowanie danych i modeli. \"Git dla danych i modeli\"\n", + " - budowanie potoków (\"pipeline\") definiujących jak budować/trenować/ewaluować modele. \"Makefile dla uczenia maszynowego\"\n", + " - śledzeniem, porównywanie metryk i parametrów\n", + "- ściśle zintegowany z gitem\n", + "- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego\n", + "- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs&t=197s" + ] + }, + { + "cell_type": "markdown", + "id": "9bfb356e", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Instalacja i inicjalizacja\n", + " - https://dvc.org/doc/install\n", + " - ```pip(x) install dvc``` albo:\n", + " - ```conda install dvc```" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "054c7a11", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Collecting package metadata (current_repodata.json): done\n", + "Solving environment: failed with initial frozen solve. Retrying with flexible solve.\n", + "Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.\n", + "Collecting package metadata (repodata.json): done\n", + "Solving environment: done\n", + "\n", + "## Package Plan ##\n", + "\n", + " environment location: /home/tomek/miniconda3\n", + "\n", + " added / updated specs:\n", + " - dvc\n", + "\n", + "\n", + "The following packages will be downloaded:\n", + "\n", + " package | build\n", + " ---------------------------|-----------------\n", + " atpublic-1.0 | py_0 7 KB conda-forge\n", + " bzip2-1.0.8 | h7f98852_4 484 KB conda-forge\n", + " cached-property-1.5.2 | hd8ed1ab_1 4 KB conda-forge\n", + " cached_property-1.5.2 | pyha770c72_1 11 KB conda-forge\n", + " colorama-0.4.4 | pyh9f0ad1d_0 18 KB conda-forge\n", + " commonmark-0.9.1 | py_0 46 KB conda-forge\n", + " configobj-5.0.6 | py_0 31 KB conda-forge\n", + " dictdiffer-0.8.1 | pyhd8ed1ab_0 16 KB conda-forge\n", + " diskcache-5.2.1 | pyh44b312d_0 36 KB conda-forge\n", + " distro-1.5.0 | pyh9f0ad1d_0 20 KB conda-forge\n", + " dpath-2.0.1 | py39hf3d152e_0 23 KB conda-forge\n", + " dulwich-0.20.23 | py39h3811e60_0 721 KB conda-forge\n", + " dvc-2.1.0 | py39hf3d152e_0 551 KB conda-forge\n", + " flatten-dict-0.3.0 | pyh9f0ad1d_0 11 KB conda-forge\n", + " flufl.lock-3.2 | py_0 19 KB conda-forge\n", + " fsspec-0.9.0 | pyhd8ed1ab_2 75 KB conda-forge\n", + " ftfy-5.5.1 | py_0 47 KB conda-forge\n", + " funcy-1.16 | pyhd8ed1ab_0 30 KB conda-forge\n", + " future-0.18.2 | py39hf3d152e_3 718 KB conda-forge\n", + " grandalf-0.6 | py_0 42 KB conda-forge\n", + " jsonpath-ng-1.5.2 | pyh9f0ad1d_0 26 KB conda-forge\n", + " libgit2-1.1.0 | h0b03e73_0 693 KB conda-forge\n", + " libssh2-1.9.0 | ha56f1ee_6 226 KB conda-forge\n", + " mailchecker-4.0.7 | pyhd8ed1ab_0 206 KB conda-forge\n", + " nanotime-0.5.2 | py_0 6 KB conda-forge\n", + " networkx-2.5 | py_0 1.2 MB conda-forge\n", + " pathlib2-2.3.5 | py39hf3d152e_3 35 KB conda-forge\n", + " pathspec-0.8.1 | pyhd3deb0d_0 29 KB conda-forge\n", + " pcre2-10.35 | h032f7d1_2 693 KB conda-forge\n", + " phonenumbers-8.10.14 | py_0 1.5 MB conda-forge\n", + " ply-3.11 | py_1 44 KB conda-forge\n", + " pyasn1-0.4.8 | py_0 53 KB conda-forge\n", + " pydot-1.2.4 | py_0 20 KB conda-forge\n", + " pygit2-1.5.0 | py39h3811e60_0 213 KB conda-forge\n", + " pygtrie-2.3.2 | pyh8c360ce_0 24 KB conda-forge\n", + " python-benedict-0.24.0 | pyhd8ed1ab_0 30 KB conda-forge\n", + " python-fsutil-0.5.0 | pyhd8ed1ab_0 13 KB conda-forge\n", + " python-slugify-5.0.2 | pyhd8ed1ab_0 12 KB conda-forge\n", + " rich-10.2.2 | py39hf3d152e_0 337 KB conda-forge\n", + " ruamel.yaml-0.17.4 | py39h3811e60_0 160 KB conda-forge\n", + " ruamel.yaml.clib-0.2.2 | py39h3811e60_2 173 KB conda-forge\n", + " shortuuid-1.0.1 | py39hf3d152e_4 15 KB conda-forge\n", + " shtab-1.3.6 | pyhd8ed1ab_0 15 KB conda-forge\n", + " text-unidecode-1.3 | py_0 68 KB conda-forge\n", + " toml-0.10.2 | pyhd8ed1ab_0 18 KB conda-forge\n", + " unidecode-1.2.0 | pyhd8ed1ab_0 155 KB conda-forge\n", + " voluptuous-0.12.1 | pyhd3deb0d_0 28 KB conda-forge\n", + " zc.lockfile-2.0 | py_0 11 KB conda-forge\n", + " ------------------------------------------------------------\n", + " Total: 8.8 MB\n", + "\n", + "The following NEW packages will be INSTALLED:\n", + "\n", + " _openmp_mutex conda-forge/linux-64::_openmp_mutex-4.5-1_gnu\n", + " appdirs conda-forge/noarch::appdirs-1.4.4-pyh9f0ad1d_0\n", + " atpublic conda-forge/noarch::atpublic-1.0-py_0\n", + " bzip2 conda-forge/linux-64::bzip2-1.0.8-h7f98852_4\n", + " cached-property conda-forge/noarch::cached-property-1.5.2-hd8ed1ab_1\n", + " cached_property conda-forge/noarch::cached_property-1.5.2-pyha770c72_1\n", + " colorama conda-forge/noarch::colorama-0.4.4-pyh9f0ad1d_0\n", + " commonmark conda-forge/noarch::commonmark-0.9.1-py_0\n", + " configobj conda-forge/noarch::configobj-5.0.6-py_0\n", + " dictdiffer conda-forge/noarch::dictdiffer-0.8.1-pyhd8ed1ab_0\n", + " diskcache conda-forge/noarch::diskcache-5.2.1-pyh44b312d_0\n", + " distro conda-forge/noarch::distro-1.5.0-pyh9f0ad1d_0\n", + " dpath conda-forge/linux-64::dpath-2.0.1-py39hf3d152e_0\n", + " dulwich conda-forge/linux-64::dulwich-0.20.23-py39h3811e60_0\n", + " dvc conda-forge/linux-64::dvc-2.1.0-py39hf3d152e_0\n", + " flatten-dict conda-forge/noarch::flatten-dict-0.3.0-pyh9f0ad1d_0\n", + " flufl.lock conda-forge/noarch::flufl.lock-3.2-py_0\n", + " fsspec conda-forge/noarch::fsspec-0.9.0-pyhd8ed1ab_2\n", + " ftfy conda-forge/noarch::ftfy-5.5.1-py_0\n", + " funcy conda-forge/noarch::funcy-1.16-pyhd8ed1ab_0\n", + " future conda-forge/linux-64::future-0.18.2-py39hf3d152e_3\n", + " gitdb conda-forge/noarch::gitdb-4.0.7-pyhd8ed1ab_0\n", + " gitpython conda-forge/noarch::gitpython-3.1.17-pyhd8ed1ab_0\n", + " grandalf conda-forge/noarch::grandalf-0.6-py_0\n", + " jsonpath-ng conda-forge/noarch::jsonpath-ng-1.5.2-pyh9f0ad1d_0\n", + " libgit2 conda-forge/linux-64::libgit2-1.1.0-h0b03e73_0\n", + " libgomp conda-forge/linux-64::libgomp-9.3.0-h2828fa1_19\n", + " libssh2 conda-forge/linux-64::libssh2-1.9.0-ha56f1ee_6\n", + " mailchecker conda-forge/noarch::mailchecker-4.0.7-pyhd8ed1ab_0\n", + " nanotime conda-forge/noarch::nanotime-0.5.2-py_0\n", + " networkx conda-forge/noarch::networkx-2.5-py_0\n", + " pathlib2 conda-forge/linux-64::pathlib2-2.3.5-py39hf3d152e_3\n", + " pathspec conda-forge/noarch::pathspec-0.8.1-pyhd3deb0d_0\n", + " pcre2 conda-forge/linux-64::pcre2-10.35-h032f7d1_2\n", + " phonenumbers conda-forge/noarch::phonenumbers-8.10.14-py_0\n", + " pip conda-forge/noarch::pip-21.1.2-pyhd8ed1ab_0\n", + " ply conda-forge/noarch::ply-3.11-py_1\n", + " pyasn1 conda-forge/noarch::pyasn1-0.4.8-py_0\n", + " pydot conda-forge/noarch::pydot-1.2.4-py_0\n", + " pygit2 conda-forge/linux-64::pygit2-1.5.0-py39h3811e60_0\n", + " pygtrie conda-forge/noarch::pygtrie-2.3.2-pyh8c360ce_0\n", + " python-benedict conda-forge/noarch::python-benedict-0.24.0-pyhd8ed1ab_0\n", + " python-fsutil conda-forge/noarch::python-fsutil-0.5.0-pyhd8ed1ab_0\n", + " python-slugify conda-forge/noarch::python-slugify-5.0.2-pyhd8ed1ab_0\n", + " rich conda-forge/linux-64::rich-10.2.2-py39hf3d152e_0\n", + " ruamel.yaml conda-forge/linux-64::ruamel.yaml-0.17.4-py39h3811e60_0\n", + " ruamel.yaml.clib conda-forge/linux-64::ruamel.yaml.clib-0.2.2-py39h3811e60_2\n", + " shortuuid conda-forge/linux-64::shortuuid-1.0.1-py39hf3d152e_4\n", + " shtab conda-forge/noarch::shtab-1.3.6-pyhd8ed1ab_0\n", + " smmap conda-forge/noarch::smmap-3.0.5-pyh44b312d_0\n", + " tabulate conda-forge/noarch::tabulate-0.8.9-pyhd8ed1ab_0\n", + " text-unidecode conda-forge/noarch::text-unidecode-1.3-py_0\n", + " toml conda-forge/noarch::toml-0.10.2-pyhd8ed1ab_0\n", + " typing_extensions conda-forge/noarch::typing_extensions-3.7.4.3-py_0\n", + " unidecode conda-forge/noarch::unidecode-1.2.0-pyhd8ed1ab_0\n", + " voluptuous conda-forge/noarch::voluptuous-0.12.1-pyhd3deb0d_0\n", + " wheel conda-forge/noarch::wheel-0.36.2-pyhd3deb0d_0\n", + " zc.lockfile conda-forge/noarch::zc.lockfile-2.0-py_0\n", + "\n", + "The following packages will be UPDATED:\n", + "\n", + " certifi pkgs/main::certifi-2020.12.5-py39h06a~ --> conda-forge::certifi-2020.12.5-py39hf3d152e_1\n", + " libgcc-ng pkgs/main::libgcc-ng-9.1.0-hdf63c60_0 --> conda-forge::libgcc-ng-9.3.0-h2828fa1_19\n", + "\n", + "The following packages will be SUPERSEDED by a higher-priority channel:\n", + "\n", + " _libgcc_mutex pkgs/main::_libgcc_mutex-0.1-main --> conda-forge::_libgcc_mutex-0.1-conda_forge\n", + " ca-certificates pkgs/main::ca-certificates-2021.4.13-~ --> conda-forge::ca-certificates-2020.12.5-ha878542_0\n", + " conda pkgs/main::conda-4.10.1-py39h06a4308_1 --> conda-forge::conda-4.10.1-py39hf3d152e_0\n", + " openssl pkgs/main::openssl-1.1.1k-h27cfd23_0 --> conda-forge::openssl-1.1.1k-h7f98852_0\n", + "\n", + "\n", + "\n", + "Downloading and Extracting Packages\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "diskcache-5.2.1 | 36 KB | ##################################### | 100% \n", + "pathspec-0.8.1 | 29 KB | ##################################### | 100% \n", + "cached-property-1.5. | 4 KB | ##################################### | 100% \n", + "networkx-2.5 | 1.2 MB | ##################################### | 100% \n", + "commonmark-0.9.1 | 46 KB | ##################################### | 100% \n", + "configobj-5.0.6 | 31 KB | ##################################### | 100% \n", + "python-fsutil-0.5.0 | 13 KB | ##################################### | 100% \n", + "fsspec-0.9.0 | 75 KB | ##################################### | 100% \n", + "dulwich-0.20.23 | 721 KB | ##################################### | 100% \n", + "funcy-1.16 | 30 KB | ##################################### | 100% \n", + "bzip2-1.0.8 | 484 KB | ##################################### | 100% \n", + "ply-3.11 | 44 KB | ##################################### | 100% \n", + "libgit2-1.1.0 | 693 KB | ##################################### | 100% \n", + "ftfy-5.5.1 | 47 KB | ##################################### | 100% \n", + "nanotime-0.5.2 | 6 KB | ##################################### | 100% \n", + "pyasn1-0.4.8 | 53 KB | ##################################### | 100% \n", + "unidecode-1.2.0 | 155 KB | ##################################### | 100% \n", + "dvc-2.1.0 | 551 KB | ##################################### | 100% \n", + "pydot-1.2.4 | 20 KB | ##################################### | 100% \n", + "zc.lockfile-2.0 | 11 KB | ##################################### | 100% \n", + "dpath-2.0.1 | 23 KB | ##################################### | 100% \n", + "pcre2-10.35 | 693 KB | ##################################### | 100% \n", + "ruamel.yaml-0.17.4 | 160 KB | ##################################### | 100% \n", + "flatten-dict-0.3.0 | 11 KB | ##################################### | 100% \n", + "python-slugify-5.0.2 | 12 KB | ##################################### | 100% \n", + "shortuuid-1.0.1 | 15 KB | ##################################### | 100% \n", + "text-unidecode-1.3 | 68 KB | ##################################### | 100% \n", + "cached_property-1.5. | 11 KB | ##################################### | 100% \n", + "colorama-0.4.4 | 18 KB | ##################################### | 100% \n", + "flufl.lock-3.2 | 19 KB | ##################################### | 100% \n", + "libssh2-1.9.0 | 226 KB | ##################################### | 100% \n", + "python-benedict-0.24 | 30 KB | ##################################### | 100% \n", + "distro-1.5.0 | 20 KB | ##################################### | 100% \n", + "grandalf-0.6 | 42 KB | ##################################### | 100% \n", + "future-0.18.2 | 718 KB | ##################################### | 100% \n", + "ruamel.yaml.clib-0.2 | 173 KB | ##################################### | 100% \n", + "rich-10.2.2 | 337 KB | ##################################### | 100% \n", + "shtab-1.3.6 | 15 KB | ##################################### | 100% \n", + "pygtrie-2.3.2 | 24 KB | ##################################### | 100% \n", + "mailchecker-4.0.7 | 206 KB | ##################################### | 100% \n", + "voluptuous-0.12.1 | 28 KB | ##################################### | 100% \n", + "atpublic-1.0 | 7 KB | ##################################### | 100% \n", + "phonenumbers-8.10.14 | 1.5 MB | ##################################### | 100% \n", + "pathlib2-2.3.5 | 35 KB | ##################################### | 100% \n", + "pygit2-1.5.0 | 213 KB | ##################################### | 100% \n", + "dictdiffer-0.8.1 | 16 KB | ##################################### | 100% \n", + "toml-0.10.2 | 18 KB | ##################################### | 100% \n", + "jsonpath-ng-1.5.2 | 26 KB | ##################################### | 100% \n", + "Preparing transaction: done\n", + "Verifying transaction: done\n", + "Executing transaction: done\n", + "\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "conda install dvc" + ] + }, + { + "cell_type": "markdown", + "id": "20975d62", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "aae59ec2", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "!mkdir -p IUM_10/sample-ml-project" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "1e522a93", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project\n" + ] + } + ], + "source": [ + "#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd\n", + "%cd \"IUM_10/sample-ml-project\"" + ] + }, + { + "cell_type": "markdown", + "id": "199c0d92", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "c13c525b", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Initialized empty Git repository in /home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project/.git/\r\n" + ] + } + ], + "source": [ + "!git init" + ] + }, + { + "cell_type": "markdown", + "id": "c7155369", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Teraz inicjalizujemy repozytorium DVC:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "44f28226", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Initialized DVC repository.\n", + "\n", + "You can now commit the changes to git.\n", + "\n", + "\u001b[31m+---------------------------------------------------------------------+\n", + "\u001b[0m\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n", + "\u001b[31m|\u001b[0m DVC has enabled anonymous aggregate usage analytics. \u001b[31m|\u001b[0m\n", + "\u001b[31m|\u001b[0m Read the analytics documentation (and how to opt-out) here: \u001b[31m|\u001b[0m\n", + "\u001b[31m|\u001b[0m <\u001b[36mhttps://dvc.org/doc/user-guide/analytics\u001b[39m> \u001b[31m|\u001b[0m\n", + "\u001b[31m|\u001b[0m \u001b[31m|\u001b[0m\n", + "\u001b[31m+---------------------------------------------------------------------+\n", + "\u001b[0m\n", + "\u001b[33mWhat's next?\u001b[39m\n", + "\u001b[33m------------\u001b[39m\n", + "- Check out the documentation: <\u001b[36mhttps://dvc.org/doc\u001b[39m>\n", + "- Get help and share ideas: <\u001b[36mhttps://dvc.org/chat\u001b[39m>\n", + "- Star us on GitHub: <\u001b[36mhttps://github.com/iterative/dvc\u001b[39m>\n", + "\u001b[0m" + ] + } + ], + "source": [ + "!dvc init" + ] + }, + { + "cell_type": "markdown", + "id": "00bc72ed", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Zobaczmy jakie pliki dodał (również do repozytorium git) DVC.\n", + "Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "d1aefe16", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "On branch master\r\n", + "\r\n", + "No commits yet\r\n", + "\r\n", + "Changes to be committed:\r\n", + " (use \"git rm --cached ...\" to unstage)\r\n", + "\t\u001b[32mnew file: .dvc/.gitignore\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/config\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/confusion.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/confusion_normalized.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/default.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/linear.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/scatter.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvc/plots/smooth.json\u001b[m\r\n", + "\t\u001b[32mnew file: .dvcignore\u001b[m\r\n", + "\r\n" + ] + } + ], + "source": [ + "!git status" + ] + }, + { + "cell_type": "markdown", + "id": "72e0a272", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Możemy teraz zacommitować zmiany w git:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "59780e99", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "On branch master\r\n", + "nothing to commit, working tree clean\r\n" + ] + } + ], + "source": [ + "!git commit -m \"Initial commit\"" + ] + }, + { + "cell_type": "markdown", + "id": "dd8e529b", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## Śledzenie plików za pomocą DVC\n", + " - dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:\n", + " - wydajnością\n", + " - przestrzenią w repozytorium\n", + " - Git posiada rozszerzenie [lfs(Large File Storage)](https://git-lfs.github.com/), które stanowi pewne rozwiązanie tego problemu. Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane\n", + " - DVC proponuje podobne podejście, ale:\n", + " - pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie\n", + " - brak limitu wielkości plików (w Git-LFS najczęściej limit 2GB)\n", + " - DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z wynikami eksperymentów\n", + " - więcej, patrz [tutaj](https://dvc.org/doc/user-guide/related-technologies)" + ] + }, + { + "cell_type": "markdown", + "id": "a8861abe", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Przygotujmy przykładowe dane, pobierając je z Kaggle:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "f05ece1b", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Downloading iris.zip to /home/tomek/AITech/repo/aitech-ium-private/IUM_10/sample-ml-project\n", + " 0%| | 0.00/3.60k [00:00...\" to include in what will be committed)\r\n", + "\t\u001b[31mdata/.gitignore\u001b[m\r\n", + "\t\u001b[31mdata/Iris.csv.dvc\u001b[m\r\n", + "\t\u001b[31miris.zip\u001b[m\r\n", + "\r\n", + "nothing added to commit but untracked files present (use \"git add\" to track)\r\n" + ] + } + ], + "source": [ + "!git status -u" + ] + }, + { + "cell_type": "markdown", + "id": "8589fecf", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Dodajmy pliki `data/Iris.csv.dvc data/.gitignore` do repozytorium git, zgodnie z sugestią DVC:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "460c4a17", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "!git add data/Iris.csv.dvc data/.gitignore" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "80644077", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[master cc0821a] Dodano dane IRIS (DVC)\r\n", + " 2 files changed, 5 insertions(+)\r\n", + " create mode 100644 data/.gitignore\r\n", + " create mode 100644 data/Iris.csv.dvc\r\n" + ] + } + ], + "source": [ + "!git commit -m \"Dodano dane IRIS (DVC)\"" + ] + }, + { + "cell_type": "markdown", + "id": "03899863", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Plik `*.dvc` zawiera m.in. hash pliku. Więcej o plikach `*.dvc`: [link](https://dvc.org/doc/user-guide/project-structure/dvc-files)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8cb2ba7c", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [], + "source": [ + "# %load data/Iris.csv.dvc\n", + "outs:\n", + "- md5: 717820ef0af287ff346c5cabfb4c612c\n", + " size: 5107\n", + " path: Iris.csv\n" + ] + }, + { + "cell_type": "markdown", + "id": "0b421d45", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "Oryginalny plik `Iris.csv` został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być [różny w zależności od systemu plików](https://dvc.org/doc/user-guide/large-dataset-optimization)." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "1d471f3a", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "total 8\r\n", + "-r--r--r-- 1 tomek tomek 5107 wrz 19 2019 7820ef0af287ff346c5cabfb4c612c\r\n" + ] + } + ], + "source": [ + "!ls -l .dvc/cache/71" + ] + }, + { + "cell_type": "markdown", + "id": "901e8e90", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## dvc remote\n", + " - żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników) musimy mieć skonfigurowaną taką lokazliację\n", + " - służy do tego polecenie [`dvc remote add`](https://dvc.org/doc/command-reference/remote/add)\n", + " - użyjemy lokalnego \"remote\". Tutaj będzie to po prostu utworzony wcześniej katalog `/dvcstore`. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze\n", + " - w realnych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez inernet jak np. serwer SFTP, ścieżka do AWS S3 itp." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "731f6ea4", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Setting 'my_local_remote' as a default remote.\r\n", + "\u001b[0m" + ] + } + ], + "source": [ + "!dvc remote add -d my_local_remote /dvcstore" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "9c3deeaf", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "On branch master\r\n", + "Changes not staged for commit:\r\n", + " (use \"git add ...\" to update what will be committed)\r\n", + " (use \"git restore ...\" to discard changes in working directory)\r\n", + "\t\u001b[31mmodified: .dvc/config\u001b[m\r\n", + "\r\n", + "no changes added to commit (use \"git add\" and/or \"git commit -a\")\r\n" + ] + } + ], + "source": [ + "!git status" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "899eac7d", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[master 3ff62b6] Added DVC remote\r\n", + " 1 file changed, 4 insertions(+)\r\n" + ] + } + ], + "source": [ + "!git add .dvc/config\n", + "!git commit -m \"Added DVC remote\"" + ] + }, + { + "cell_type": "markdown", + "id": "8c556c96", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "## dvc push\n", + "Kiedy mamy już skonfigurowany \"remote\" możemy wypchnąć do niego pliki korzystając z polecenia `dvc push`:" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "8ecf3091", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 0% Uploading| |0/1 [00:00\n", + "\tchanged outs:\n", + "\t\tmodified: data/Iris.csv\n", + "\u001b[0m" + ] + } + ], + "source": [ + "!dvc status" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "bf6c1067", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Adding... \n", + "!\u001b[A\n", + " 0%| |.TatTHknArFHCT9iDCtxHzh.tmp 0.00/5.07k [00:00 'data/data.xml'\n", + " 0% Downloading| |0/1 [00:00