{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7fe475ae",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
    "<div class=\"alert alert-block alert-info\">\n",
    "<h1> Inżynieria uczenia maszynowego </h1>\n",
    "<h2> 10. <i>DVC</i>  [laboratoria]</h2> \n",
    "<h3> Tomasz Ziętkiewicz (2022)</h3>\n",
    "</div>\n",
    "\n",
    "![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c6f27a5",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"img/expcontrol/dvc-logo.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "560eec71",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## DVC - Data Version Control\n",
    "- [dvc.org](https://dvc.org/)\n",
    "- \"Version Control System for Machine Learning Projects\" (System kontroli wersji dla projektów uczenia maszynowego)\n",
    "- Open Source\n",
    "- Umożliwia:\n",
    "  - wersjonowanie danych i modeli. \"Git dla danych i modeli\"\n",
    "  - budowanie potoków (\"pipeline\") definiujących jak budować/trenować/ewaluować modele. \"Makefile dla uczenia maszynowego\"\n",
    "  - śledzenie, porównywanie metryk i parametrów\n",
    "- ściśle zintegowany z gitem\n",
    "- działa niezależnie od używanego języka/bibliotek i systemu operacyjnego\n",
    "- 5-minutowe wprowadzenie: https://www.youtube.com/watch?v=UbL7VUpv1Bs&t=197s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9bfb356e",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Instalacja i inicjalizacja\n",
    " - https://dvc.org/doc/install\n",
    " - ```pip(x) install dvc``` albo:\n",
    " - ```conda install dvc```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "054c7a11",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting dvc\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/96/9d/9cca62742cb99da8002c9d6ac6c7463344deb60a219ca4f3a9778b02c67b/dvc-2.8.1-py3-none-any.whl (386kB)\n",
      "\u001b[K    100% |████████████████████████████████| 389kB 1.4MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting setuptools>=34.0.0 (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/b0/3a/88b210db68e56854d0bcf4b38e165e03be377e13907746f825790f3df5bf/setuptools-59.6.0-py3-none-any.whl\n",
      "Collecting colorama>=0.3.9 (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/44/98/5b86278fbbf250d239ae0ecb724f8572af1c91f4a11edf4d36a206189440/colorama-0.4.4-py2.py3-none-any.whl\n",
      "Collecting dictdiffer>=0.8.1 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/47/ef/4cb333825d10317a36a1154341ba37e6e9c087bac99c1990ef07ffdb376f/dictdiffer-0.9.0-py2.py3-none-any.whl\n",
      "Collecting ply>=3.9 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/a3/58/35da89ee790598a0700ea49b2a66594140f44dec458c07e8e3d4979137fc/ply-3.11-py2.py3-none-any.whl (49kB)\n",
      "\u001b[K    100% |████████████████████████████████| 51kB 1.8MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting nanotime>=0.5.2 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/d5/54/6d5924f59cf671326e7809f4b3f70fa8df535d67e952ad0b6fea02f52faf/nanotime-0.5.2.tar.gz\n",
      "Collecting fsspec[http]>=2021.10.0 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/f6/90/32e53b96067954c2f916667f5a11634aeabef8ed70e83133ed8037b8111b/fsspec-2022.1.0-py3-none-any.whl (133kB)\n",
      "\u001b[K    100% |████████████████████████████████| 143kB 1.3MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting networkx>=2.5 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/f3/b7/c7f488101c0bb5e4178f3cde416004280fd40262433496830de8a8c21613/networkx-2.5.1-py3-none-any.whl (1.6MB)\n",
      "\u001b[K    100% |████████████████████████████████| 1.6MB 609kB/s ta 0:00:01\n",
      "\u001b[?25hCollecting flufl.lock<4,>=3.2 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/1e/68/393c148df629f90a919de653ebb967a8bd8c83d07d2bc3150ca0faff3940/flufl.lock-3.2.tar.gz\n",
      "Collecting shtab<2,>=1.3.4 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/37/c0/7d887589bb87db6b87b55315ac41acbbe7355074006a0968343d52ded5cd/shtab-1.5.4-py2.py3-none-any.whl\n",
      "Collecting shortuuid>=0.5.0 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/01/14/df1aa61e1bb75a6fff19b25d7d175b7f188dc15a70dfc80e7badd5bda1de/shortuuid-1.0.9-py3-none-any.whl\n",
      "Collecting gitpython>3 (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/55/60/f884f01eef2a7255875862ec1b12d57d74113ec6e8d9e16c4d254cd6aa3c/GitPython-3.1.20-py3-none-any.whl\n",
      "Collecting tqdm<5,>=4.45.0 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/8a/c4/d15f1e627fff25443ded77ea70a7b5532d6371498f9285d44d62587e209c/tqdm-4.64.0-py2.py3-none-any.whl (78kB)\n",
      "\u001b[K    100% |████████████████████████████████| 81kB 2.0MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting zc.lockfile>=1.2.1 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/6c/2a/268389776288f0f26c7272c70c36c96dcc0bdb88ab6216ea18e19df1fadd/zc.lockfile-2.0-py2.py3-none-any.whl\n",
      "Collecting rich>=10.9.0 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/05/c0/844c879d659e15c4b1fbe8be89b25de7cf9b0d5e8485aadfe83ae6e7286a/rich-12.4.1-py3-none-any.whl (231kB)\n",
      "\u001b[K    100% |████████████████████████████████| 235kB 1.5MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting aiohttp-retry>=2.4.5 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/0f/fe/58130a432d4397e174b82e57d8a11ccf5066631d03949dada4aac8b88041/aiohttp_retry-2.4.6-py3-none-any.whl\n",
      "Collecting voluptuous>=0.11.7 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/a7/68/927add5dfd55a0d666ffc8939ff4390b76ca3ffbc36c12369f9a034393cb/voluptuous-0.13.1-py3-none-any.whl\n",
      "Collecting psutil>=5.8.0 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/d6/de/0999ea2562b96d7165812606b18f7169307b60cd378bc29cf3673322c7e9/psutil-5.9.1.tar.gz (479kB)\n",
      "\u001b[K    100% |████████████████████████████████| 481kB 1.2MB/s ta 0:00:011\n",
      "\u001b[?25hCollecting flatten-dict<1,>=0.4.1 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/43/f5/ee39c6e92acc742c052f137b47c210cd0a1b72dcd3f98495528bb4d27761/flatten_dict-0.4.2-py2.py3-none-any.whl\n",
      "Collecting configobj>=5.0.6 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/64/61/079eb60459c44929e684fa7d9e2fdca403f67d64dd9dbac27296be2e0fab/configobj-5.0.6.tar.gz\n",
      "Collecting pygtrie>=2.3.2 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/a5/8b/90d0f21a27a354e808a73eb0ffb94db990ab11ad1d8b3db3e5196c882cad/pygtrie-2.4.2.tar.gz\n",
      "Collecting tabulate>=0.8.7 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/ca/80/7c0cad11bd99985cfe7c09427ee0b4f9bd6b048bd13d4ffb32c6db237dfb/tabulate-0.8.9-py3-none-any.whl\n",
      "Collecting dulwich>=0.20.23 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/50/88/d5f6b478086a78995a71151a956ac97d6635f0f6d80f625c96ba13cc612d/dulwich-0.20.40.tar.gz (423kB)\n",
      "\u001b[K    100% |████████████████████████████████| 430kB 1.1MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting appdirs>=1.4.3 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/3b/00/2344469e2084fb287c2e0b57b72910309874c3245463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl\n",
      "Collecting ruamel.yaml>=0.17.11 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/9e/cb/938214ac358fbef7058343b3765c79a1b7ed0c366f7f992ce7ff38335652/ruamel.yaml-0.17.21-py3-none-any.whl (109kB)\n",
      "\u001b[K    100% |████████████████████████████████| 112kB 2.1MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting python-benedict>=0.24.2 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/77/84/6268f9c31140f23ef290c89cb91197f5ea6cbbf102a15db13c1eea845a22/python_benedict-0.25.1-py3-none-any.whl (41kB)\n",
      "\u001b[K    100% |████████████████████████████████| 51kB 2.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting grandalf==0.6 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/54/f4/a0b6a4c6d616d0a838b2dd0bc7bf74d73e8e8cdc880bab7fdb5fdc3d0e06/grandalf-0.6-py3-none-any.whl\n",
      "Collecting pyparsing==2.4.7 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl (67kB)\n",
      "\u001b[K    100% |████████████████████████████████| 71kB 2.1MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting dpath<3,>=2.0.2 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/9d/68/19b2a579121446ba80aa3574f04de6931ad549a22cd5588e43c8031de7f8/dpath-2.0.6-py3-none-any.whl\n",
      "Collecting pygit2<1.7,>=1.5.0; python_version < \"3.7\" (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/6b/23/a8c5b726a58282fe2cadcc63faaddd4be147c3c8e0bd38b233114adf98fd/pygit2-1.6.1.tar.gz (258kB)\n",
      "\u001b[K    100% |████████████████████████████████| 266kB 3.5MB/s eta 0:00:01\n",
      "\u001b[?25hCollecting dataclasses>=0.7; python_version < \"3.7\" (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/fe/ca/75fac5856ab5cfa51bbbcefa250182e50441074fdc3f803f6e76451fab43/dataclasses-0.8-py3-none-any.whl\n",
      "Collecting pyasn1>=0.4.1 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77kB)\n",
      "\u001b[K    100% |████████████████████████████████| 81kB 2.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting pathspec<0.9.0,>=0.6.0 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/29/29/a465741a3d97ea3c17d21eaad4c64205428bde56742360876c4391f930d4/pathspec-0.8.1-py2.py3-none-any.whl\n",
      "Collecting contextvars>=2.1; python_version < \"3.7\" (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/83/96/55b82d9f13763be9d672622e1b8106c85acb83edd7cc2fa5bc67cd9877e9/contextvars-2.4.tar.gz\n",
      "Collecting distro>=1.3.0 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/e1/54/d08d1ad53788515392bec14d2d6e8c410bffdc127780a9a4aa8e6854d502/distro-1.7.0-py3-none-any.whl\n",
      "Collecting packaging>=19.0 (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/05/8e/8de486cbd03baba4deef4142bd643a3e7bbe954a784dc1bb17142572d127/packaging-21.3-py3-none-any.whl\n",
      "Collecting typing-extensions>=3.7.4 (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/45/6b/44f7f8f1e110027cf88956b59f2fad776cca7e1704396d043f89effd3a0e/typing_extensions-4.1.1-py3-none-any.whl\n",
      "Collecting importlib-metadata>=1.4; python_version < \"3.8\" (from dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/a0/a1/b153a0a4caf7a7e3f15c2cd56c7702e2cf3d89b1b359d1f1c5e59d68f4ce/importlib_metadata-4.8.3-py3-none-any.whl\n",
      "Collecting diskcache>=5.2.1 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/a1/c4/80d38cf6852ba87f8b506a91f18b3a485c668f452700689add960d7e2ecc/diskcache-5.4.0-py3-none-any.whl (44kB)\n",
      "\u001b[K    100% |████████████████████████████████| 51kB 2.1MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting pydot>=1.2.4 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/ea/76/75b1bb82e9bad3e3d656556eaa353d8cd17c4254393b08ec9786ac8ed273/pydot-1.4.2-py2.py3-none-any.whl\n",
      "Collecting funcy>=1.14 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/cd/1c/895001e29f870c4625a90af00895ef9c9f4f37b0a9b967d2ed810b7be0fc/funcy-1.17-py2.py3-none-any.whl\n",
      "Collecting toml>=0.10.1 (from dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl\n",
      "Collecting requests>=2.22.0 (from dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/2d/61/08076519c80041bc0ffa1a8af0cbd3bf3e2b62af10435d269a9d0f40564d/requests-2.27.1-py2.py3-none-any.whl (63kB)\n",
      "\u001b[K    100% |████████████████████████████████| 71kB 2.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting aiohttp; extra == \"http\" (from fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/f6/3b/2e3b8a5b19cdceb532c61d83077a09afe1f120cb876fb771b0ce577cc0ea/aiohttp-3.8.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1MB)\n",
      "\u001b[K    100% |████████████████████████████████| 1.1MB 729kB/s ta 0:00:01\n",
      "\u001b[?25hCollecting decorator<5,>=4.3 (from networkx>=2.5->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/ed/1b/72a1821152d07cf1d8b6fce298aeb06a7eb90f4d6d41acec9861e7cc6df0/decorator-4.4.2-py2.py3-none-any.whl\n",
      "Collecting atpublic (from flufl.lock<4,>=3.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/ab/3d/3df1468805427fedcf880da42fa26353feea3a31b5a0cc71008adcfdb816/atpublic-2.3.tar.gz\n",
      "Collecting gitdb<5,>=4.0.1 (from gitpython>3->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/a3/7c/5d747655049bfbf75b5fcec57c8115896cb78d6fafa84f6d3ef4c0f13a98/gitdb-4.0.9-py3-none-any.whl\n",
      "Collecting importlib-resources; python_version < \"3.7\" (from tqdm<5,>=4.45.0->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/24/1b/33e489669a94da3ef4562938cd306e8fa915e13939d7b8277cb5569cb405/importlib_resources-5.4.0-py3-none-any.whl\n",
      "Collecting commonmark<0.10.0,>=0.9.0 (from rich>=10.9.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/b1/92/dfd892312d822f36c55366118b95d914e5f16de11044a27cf10a7d71bbbf/commonmark-0.9.1-py2.py3-none-any.whl (51kB)\n",
      "\u001b[K    100% |████████████████████████████████| 51kB 1.6MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting pygments<3.0.0,>=2.6.0 (from rich>=10.9.0->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/5c/8e/1d9017950034297fffa336c72e693a5b51bbf85141b24a763882cf1977b5/Pygments-2.12.0-py3-none-any.whl\n",
      "Collecting six<2.0,>=1.12 (from flatten-dict<1,>=0.4.1->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl\n",
      "Collecting urllib3>=1.24.1 (from dulwich>=0.20.23->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/ec/03/062e6444ce4baf1eac17a6a0ebfe36bb1ad05e1df0e20b110de59c278498/urllib3-1.26.9-py2.py3-none-any.whl (138kB)\n",
      "\u001b[K    100% |████████████████████████████████| 143kB 1.7MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting certifi (from dulwich>=0.20.23->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/11/dd/e015f3780f42dd9af62cf0107b44ea1298926627ecd70c17b0e484e95bcd/certifi-2022.5.18.1-py3-none-any.whl (155kB)\n",
      "\u001b[K    100% |████████████████████████████████| 163kB 1.8MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting ruamel.yaml.clib>=0.2.6; platform_python_implementation == \"CPython\" and python_version < \"3.11\" (from ruamel.yaml>=0.17.11->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/2a/25/5b1dfc832ef3b83576c546d1fb3e27f136022cdd1008aab290a1e28ef220/ruamel.yaml.clib-0.2.6-cp36-cp36m-manylinux1_x86_64.whl (552kB)\n",
      "\u001b[K    100% |████████████████████████████████| 552kB 1.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting python-slugify<7.0.0,>=6.0.1 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/c1/35/74ab800f1108b95ff9b8e7672a01dbf1f357159e6d06c1f16e983674ff0c/python_slugify-6.1.2-py2.py3-none-any.whl\n",
      "Collecting python-fsutil<1.0.0,>=0.6.0 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/d3/67/3bceac53a29c2cf6a27ca9a940c7a059af13b5b4283d09368e9c0d7f91b4/python_fsutil-0.6.1-py3-none-any.whl\n",
      "Collecting mailchecker<5.0.0,>=4.1.0 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/1a/11/49e85a526dd3f2cb6bdf1dc519d64395bd56a88d9b2a5f6f3acd1f8c3f51/mailchecker-4.1.17.tar.gz (232kB)\n",
      "\u001b[K    100% |████████████████████████████████| 235kB 2.7MB/s ta 0:00:011\n",
      "\u001b[?25hCollecting pyyaml<7.0,>=6.0 (from python-benedict>=0.24.2->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/b3/85/79b9e5b4e8d3c0ac657f4e8617713cca8408f6cdc65d2ee6554217cedff1/PyYAML-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (603kB)\n",
      "\u001b[K    100% |████████████████████████████████| 604kB 1.1MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting xmltodict<1.0.0,>=0.12.0 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/94/db/fd0326e331726f07ff7f40675cd86aa804bfd2e5016c727fa761c934990e/xmltodict-0.13.0-py2.py3-none-any.whl\n",
      "Collecting phonenumbers<9.0.0,>=8.12.0 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/a1/c8/23c7185f589da0edf4f9831de1e718989073aa094780f8ee34e6c29b7a2e/phonenumbers-8.12.48-py2.py3-none-any.whl (2.6MB)\n",
      "\u001b[K    100% |████████████████████████████████| 2.6MB 434kB/s ta 0:00:011\n",
      "\u001b[?25hCollecting ftfy<7.0.0,>=6.0.0 (from python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/af/da/d215a091986e5f01b80f5145cff6f22e2dc57c6b048aab2e882a07018473/ftfy-6.0.3.tar.gz (64kB)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[K    100% |████████████████████████████████| 71kB 2.0MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting python-dateutil<3.0.0,>=2.8.0 (from python-benedict>=0.24.2->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl\n",
      "Collecting future (from grandalf==0.6->dvc)\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "\u001b[33m  Cache entry deserialization failed, entry ignored\u001b[0m\n",
      "  Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)\n",
      "\u001b[K    100% |████████████████████████████████| 829kB 1.5MB/s eta 0:00:01\n",
      "\u001b[?25hCollecting cffi>=1.4.0 (from pygit2<1.7,>=1.5.0; python_version < \"3.7\"->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/49/7b/449daf9cacfd7355cea1b4106d2be614315c29ac16567e01756167f6daab/cffi-1.15.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl\n",
      "Collecting cached-property (from pygit2<1.7,>=1.5.0; python_version < \"3.7\"->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/48/19/f2090f7dad41e225c7f2326e4cfe6fff49e57dedb5b53636c9551f86b069/cached_property-1.5.2-py2.py3-none-any.whl\n",
      "Collecting immutables>=0.9 (from contextvars>=2.1; python_version < \"3.7\"->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/ff/88/9c71337193c3d24c2cf3c14d5ed05eeb502f9f21fa6117edfa9b3b43bff1/immutables-0.18-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (115kB)\n",
      "\u001b[K    100% |████████████████████████████████| 122kB 2.3MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting zipp>=0.5 (from importlib-metadata>=1.4; python_version < \"3.8\"->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/bd/df/d4a4974a3e3957fd1c1fa3082366d7fff6e428ddb55f074bf64876f8e8ad/zipp-3.6.0-py3-none-any.whl\n",
      "Collecting idna<4,>=2.5; python_version >= \"3\" (from requests>=2.22.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/04/a2/d918dcd22354d8958fe113e1a3630137e0fc8b44859ade3063982eacd2a4/idna-3.3-py3-none-any.whl (61kB)\n",
      "\u001b[K    100% |████████████████████████████████| 61kB 1.7MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting charset-normalizer~=2.0.0; python_version >= \"3\" (from requests>=2.22.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/06/b3/24afc8868eba069a7f03650ac750a778862dc34941a4bebeb58706715726/charset_normalizer-2.0.12-py3-none-any.whl\n",
      "Collecting asynctest==0.13.0; python_version < \"3.8\" (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/e8/b6/8d17e169d577ca7678b11cd0d3ceebb0a6089a7f4a2de4b945fe4b1c86db/asynctest-0.13.0-py3-none-any.whl\n",
      "Collecting yarl<2.0,>=1.0 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/fa/cb/8791922f5ec97b9ebec516d062c0e113da963568ffe2c7c04d8187ab7cc3/yarl-1.7.2-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (270kB)\n",
      "\u001b[K    100% |████████████████████████████████| 276kB 1.9MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting idna-ssl>=1.0; python_version < \"3.7\" (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/46/03/07c4894aae38b0de52b52586b24bf189bb83e4ddabfe2e2c8f2419eec6f4/idna-ssl-1.1.0.tar.gz\n",
      "Collecting frozenlist>=1.1.1 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/51/3f/f67395ff0090b9f2835838a1f61c3e840baac70fd65bae762095dead48b2/frozenlist-1.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (191kB)\n",
      "\u001b[K    100% |████████████████████████████████| 194kB 1.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/d6/c1/8991e7c5385b897b8c020cdaad718c5b087a6626d1d11a23e1ea87e325a7/async_timeout-4.0.2-py3-none-any.whl\n",
      "Collecting aiosignal>=1.1.2 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/3b/87/fe94898f2d44a93a35d5aa74671ed28094d80753a1113d68b799fab6dc22/aiosignal-1.2.0-py3-none-any.whl\n",
      "Collecting attrs>=17.3.0 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/be/be/7abce643bfdf8ca01c48afa2ddf8308c2308b0c3b239a44e57d020afa0ef/attrs-21.4.0-py2.py3-none-any.whl\n",
      "Collecting multidict<7.0,>=4.5 (from aiohttp; extra == \"http\"->fsspec[http]>=2021.10.0->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/82/43/81ddfbcfbdfaeaa0624f36dcb715dc8135562377b3292e93b0315a861e92/multidict-5.2.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (159kB)\n",
      "\u001b[K    100% |████████████████████████████████| 163kB 2.0MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython>3->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/6d/01/7caa71608bc29952ae09b0be63a539e50d2484bc37747797a66a60679856/smmap-5.0.0-py3-none-any.whl\n",
      "Collecting text-unidecode>=1.3 (from python-slugify<7.0.0,>=6.0.1->python-benedict>=0.24.2->dvc)\n",
      "  Downloading https://files.pythonhosted.org/packages/a6/a5/c0b6468d3824fe3fde30dbb5e1f687b291608f9473681bbf7dabbf5a87d7/text_unidecode-1.3-py2.py3-none-any.whl (78kB)\n",
      "\u001b[K    100% |████████████████████████████████| 81kB 2.2MB/s ta 0:00:01\n",
      "\u001b[?25hCollecting wcwidth (from ftfy<7.0.0,>=6.0.0->python-benedict>=0.24.2->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/59/7c/e39aca596badaf1b78e8f547c807b04dae603a433d3e7a7e04d67f2ef3e5/wcwidth-0.2.5-py2.py3-none-any.whl\n",
      "Collecting pycparser (from cffi>=1.4.0->pygit2<1.7,>=1.5.0; python_version < \"3.7\"->dvc)\n",
      "  Using cached https://files.pythonhosted.org/packages/62/d5/5f610ebe421e85889f2e55e33b7f9a6795bd982198517d912eb1c76e1a53/pycparser-2.21-py2.py3-none-any.whl\n",
      "Building wheels for collected packages: nanotime, flufl.lock, psutil, configobj, pygtrie, dulwich, pygit2, contextvars, atpublic, mailchecker, ftfy, future, idna-ssl\n",
      "  Running setup.py bdist_wheel for nanotime ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/41/99/17/7135f635215e1f61e906295afd11f4f791cfe4ab45f3bfdca2\n",
      "  Running setup.py bdist_wheel for flufl.lock ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/4f/51/d7/f65a7b7f37da7594f7021b122fe677187667ad21f1171d2514\n",
      "  Running setup.py bdist_wheel for psutil ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/6e/94/8f/ef906811f8dcf6824a9747df0381615be48d723073fb59a317\n",
      "  Running setup.py bdist_wheel for configobj ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/f1/e4/16/4981ca97c2d65106b49861e0b35e2660695be7219a2d351ee0\n",
      "  Running setup.py bdist_wheel for pygtrie ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/db/57/91/73782136379fe419036c5ec0e4070d8b3a35f2a36bd6a94ed8\n",
      "  Running setup.py bdist_wheel for dulwich ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/63/28/8c/0bbff7d6e30f3fc523639b000b33aba9155152e9eb23689ba0\n",
      "  Running setup.py bdist_wheel for pygit2 ... \u001b[?25lerror\n",
      "  Complete output from command /usr/bin/python3 -u -c \"import setuptools, tokenize;__file__='/tmp/pip-build-hr8mtrcf/pygit2/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\\r\\n', '\\n');f.close();exec(compile(code, __file__, 'exec'))\" bdist_wheel -d /tmp/tmp4kyicel3pip-wheel- --python-tag cp36:\n",
      "  running bdist_wheel\n",
      "  running build\n",
      "  running build_py\n",
      "  creating build\n",
      "  creating build/lib.linux-x86_64-3.6\n",
      "  creating build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/__init__.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/_build.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/_run.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/blame.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/callbacks.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/config.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/credentials.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/errors.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/ffi.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/index.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/packbuilder.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/refspec.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/remote.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/repository.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/settings.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/submodule.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  copying pygit2/utils.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "  creating build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/attr.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/blame.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/buffer.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/callbacks.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/checkout.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/clone.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/common.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/config.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/describe.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/diff.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/errors.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/graph.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/index.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/indexer.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/merge.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/net.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/oid.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/pack.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/proxy.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/refspec.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/remote.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/repository.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/revert.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/stash.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/strarray.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/submodule.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/transport.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  copying pygit2/decl/types.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "  running build_ext\n",
      "  generating cffi module 'build/temp.linux-x86_64-3.6/pygit2._libgit2.c'\n",
      "  creating build/temp.linux-x86_64-3.6\n",
      "  building 'pygit2._pygit2' extension\n",
      "  creating build/temp.linux-x86_64-3.6/src\n",
      "  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/include -I/usr/include/python3.6m -c src/blob.c -o build/temp.linux-x86_64-3.6/src/blob.o\n",
      "  In file included from src/blob.c:30:0:\n",
      "  src/blob.h:33:10: fatal error: git2.h: No such file or directory\n",
      "   #include <git2.h>\n",
      "            ^~~~~~~~\n",
      "  compilation terminated.\n",
      "  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1\n",
      "  \n",
      "  ----------------------------------------\n",
      "\u001b[31m  Failed building wheel for pygit2\u001b[0m\n",
      "\u001b[?25h  Running setup.py clean for pygit2\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  Running setup.py bdist_wheel for contextvars ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/a5/7d/68/1ebae2668bda2228686e3c1cf16f2c2384cea6e9334ad5f6de\n",
      "  Running setup.py bdist_wheel for atpublic ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/33/25/82/57d46b60a048f8e30b31f10497539498a3b826c78e2433c2d4\n",
      "  Running setup.py bdist_wheel for mailchecker ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/fd/e1/e7/804e77a70eac7103bdba2f4b3e1eba36840b38554a4b8152c8\n",
      "  Running setup.py bdist_wheel for ftfy ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/99/2c/e6/109c8a28fef7a443f67ba58df21fe1d0067ac3322e75e6b0b7\n",
      "  Running setup.py bdist_wheel for future ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/8b/99/a0/81daf51dcd359a9377b110a8a886b3895921802d2fc1b2397e\n",
      "  Running setup.py bdist_wheel for idna-ssl ... \u001b[?25ldone\n",
      "\u001b[?25h  Stored in directory: /home/tomek/.cache/pip/wheels/d3/00/b3/32d613e19e08a739751dd6bf998cfed277728f8b2127ad4eb7\n",
      "Successfully built nanotime flufl.lock psutil configobj pygtrie dulwich contextvars atpublic mailchecker ftfy future idna-ssl\n",
      "Failed to build pygit2\n",
      "Installing collected packages: setuptools, colorama, dictdiffer, ply, nanotime, asynctest, idna, multidict, typing-extensions, yarl, idna-ssl, frozenlist, async-timeout, aiosignal, charset-normalizer, attrs, aiohttp, certifi, urllib3, requests, fsspec, decorator, networkx, atpublic, flufl.lock, shtab, shortuuid, smmap, gitdb, gitpython, zipp, importlib-resources, tqdm, zc.lockfile, dataclasses, commonmark, pygments, rich, aiohttp-retry, voluptuous, psutil, six, importlib-metadata, flatten-dict, configobj, pygtrie, tabulate, dulwich, appdirs, ruamel.yaml.clib, ruamel.yaml, text-unidecode, python-slugify, python-fsutil, mailchecker, pyyaml, toml, xmltodict, phonenumbers, wcwidth, ftfy, python-dateutil, python-benedict, future, pyparsing, grandalf, dpath, pycparser, cffi, cached-property, pygit2, pyasn1, pathspec, immutables, contextvars, distro, packaging, diskcache, pydot, funcy, dvc\n",
      "  Running setup.py install for pygit2 ... \u001b[?25lerror\n",
      "    Complete output from command /usr/bin/python3 -u -c \"import setuptools, tokenize;__file__='/tmp/pip-build-hr8mtrcf/pygit2/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\\r\\n', '\\n');f.close();exec(compile(code, __file__, 'exec'))\" install --record /tmp/pip-6tml5js1-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:\n",
      "    running install\n",
      "    /home/tomek/.local/lib/python3.6/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.\n",
      "      setuptools.SetuptoolsDeprecationWarning,\n",
      "    running build\n",
      "    running build_py\n",
      "    creating build\n",
      "    creating build/lib.linux-x86_64-3.6\n",
      "    creating build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/__init__.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/_build.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/_run.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/blame.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/callbacks.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/config.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/credentials.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/errors.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/ffi.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/index.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/packbuilder.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/refspec.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/remote.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/repository.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/settings.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/submodule.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    copying pygit2/utils.py -> build/lib.linux-x86_64-3.6/pygit2\n",
      "    creating build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/attr.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/blame.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/buffer.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/callbacks.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/checkout.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/clone.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/common.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/config.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/describe.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/diff.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/errors.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/graph.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/index.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/indexer.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/merge.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/net.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/oid.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/pack.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/proxy.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/refspec.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/remote.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/repository.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/revert.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/stash.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/strarray.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/submodule.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/transport.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    copying pygit2/decl/types.h -> build/lib.linux-x86_64-3.6/pygit2/decl\n",
      "    running build_ext\n",
      "    generating cffi module 'build/temp.linux-x86_64-3.6/pygit2._libgit2.c'\n",
      "    creating build/temp.linux-x86_64-3.6\n",
      "    building 'pygit2._pygit2' extension\n",
      "    creating build/temp.linux-x86_64-3.6/src\n",
      "    x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/include -I/usr/include/python3.6m -c src/blob.c -o build/temp.linux-x86_64-3.6/src/blob.o\n",
      "    In file included from src/blob.c:30:0:\n",
      "    src/blob.h:33:10: fatal error: git2.h: No such file or directory\n",
      "     #include <git2.h>\n",
      "              ^~~~~~~~\n",
      "    compilation terminated.\n",
      "    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1\n",
      "    \n",
      "    ----------------------------------------\n",
      "\u001b[31mCommand \"/usr/bin/python3 -u -c \"import setuptools, tokenize;__file__='/tmp/pip-build-hr8mtrcf/pygit2/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\\r\\n', '\\n');f.close();exec(compile(code, __file__, 'exec'))\" install --record /tmp/pip-6tml5js1-record/install-record.txt --single-version-externally-managed --compile --user --prefix=\" failed with error code 1 in /tmp/pip-build-hr8mtrcf/pygit2/\u001b[0m\n",
      "\u001b[?25h"
     ]
    }
   ],
   "source": [
    "!pip3 install dvc"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20975d62",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Stwórzmy katalog, w którym będziemy przechowywać nasz projekt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "aae59ec2",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "!mkdir -p IUM_10/sample-ml-project"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1e522a93",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project\n"
     ]
    }
   ],
   "source": [
    "#Jupyter notebook magic https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-cd\n",
    "%cd \"IUM_10/sample-ml-project\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "199c0d92",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Inicjalizujemy puste repozytorium Git (możemy też pominąć ten krok i działać w istniejącym już repozytorium)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c13c525b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reinitialized existing Git repository in /home/tomek/repos/aitech-ium/IUM_10/sample-ml-project/.git/\r\n"
     ]
    }
   ],
   "source": [
    "!git init"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7155369",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Teraz inicjalizujemy repozytorium DVC:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "44f28226",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized DVC repository.\n",
      "\n",
      "You can now commit the changes to git.\n",
      "\n",
      "\u001b[31m+---------------------------------------------------------------------+\n",
      "\u001b[0m\u001b[31m|\u001b[0m                                                                     \u001b[31m|\u001b[0m\n",
      "\u001b[31m|\u001b[0m        DVC has enabled anonymous aggregate usage analytics.         \u001b[31m|\u001b[0m\n",
      "\u001b[31m|\u001b[0m     Read the analytics documentation (and how to opt-out) here:     \u001b[31m|\u001b[0m\n",
      "\u001b[31m|\u001b[0m             <\u001b[36mhttps://dvc.org/doc/user-guide/analytics\u001b[39m>              \u001b[31m|\u001b[0m\n",
      "\u001b[31m|\u001b[0m                                                                     \u001b[31m|\u001b[0m\n",
      "\u001b[31m+---------------------------------------------------------------------+\n",
      "\u001b[0m\n",
      "\u001b[33mWhat's next?\u001b[39m\n",
      "\u001b[33m------------\u001b[39m\n",
      "- Check out the documentation: <\u001b[36mhttps://dvc.org/doc\u001b[39m>\n",
      "- Get help and share ideas: <\u001b[36mhttps://dvc.org/chat\u001b[39m>\n",
      "- Star us on GitHub: <\u001b[36mhttps://github.com/iterative/dvc\u001b[39m>\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc init"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00bc72ed",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Zobaczmy jakie pliki dodał (również do repozytorium git) DVC.\n",
    "Ich opis znajdziemy tutaj: https://dvc.org/doc/user-guide/project-structure/internal-files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d1aefe16",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On branch master\r\n",
      "\r\n",
      "No commits yet\r\n",
      "\r\n",
      "Changes to be committed:\r\n",
      "  (use \"git rm --cached <file>...\" to unstage)\r\n",
      "\t\u001b[32mnew file:   .dvc/.gitignore\u001b[m\r\n",
      "\t\u001b[32mnew file:   .dvc/config\u001b[m\r\n",
      "\t\u001b[32mnew file:   .dvcignore\u001b[m\r\n",
      "\r\n"
     ]
    }
   ],
   "source": [
    "!git status"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72e0a272",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Możemy teraz zacommitować zmiany w git:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "59780e99",
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[master (root-commit) d00d0ac] Initial commit\r\n",
      " 3 files changed, 6 insertions(+)\r\n",
      " create mode 100644 .dvc/.gitignore\r\n",
      " create mode 100644 .dvc/config\r\n",
      " create mode 100644 .dvcignore\r\n"
     ]
    }
   ],
   "source": [
    "!git commit -m \"Initial commit\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d4ce1cb",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Śledzenie plików za pomocą DVC\n",
    " - dużymi plikami, takimi jak plikami z danymi wejściowymi czy plikami modeli, trudno zarządza się za pomocą gita, ze względu na problemy z:\n",
    "   - wydajnością\n",
    "   - przestrzenią w repozytorium\n",
    "   - ograniczenia ze strony serwisu (np. [limit 100 MB na plik w Github](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github))\n",
    " - Git posiada rozszerzenie [lfs(Large File Storage)](https://git-lfs.github.com/), które stanowi pewne rozwiązanie tego problemu. Same pliki przechowywane są na specjalnym zdalnym serwerze, w repozytorium przechowywane są jedynie odnośniki do tych plików i pewne metadane"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd8e529b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " - DVC proponuje podobne podejście, ale:\n",
    "   - pliki mogą być przechowywane na niemal dowolnym serwerze, również lokalnie\n",
    "   - brak limitu wielkości plików (w Git-LFS na Github [limit 2GB](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage))\n",
    "   - DVC zapewnia dodatkowe narzędzie umożliwiające śledzenie plików i ich powiązań z  wynikami eksperymentów\n",
    "   - więcej, patrz [tutaj](https://dvc.org/doc/user-guide/related-technologies)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8861abe",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Przygotujmy przykładowe dane, pobierając je z Kaggle:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f05ece1b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/tomek/.kaggle/kaggle.json'\n",
      "Downloading iris.zip to /home/tomek/repos/aitech-ium/IUM_10/sample-ml-project\n",
      "  0%|                                               | 0.00/3.60k [00:00<?, ?B/s]\n",
      "100%|██████████████████████████████████████| 3.60k/3.60k [00:00<00:00, 1.66MB/s]\n",
      "Archive:  iris.zip\n",
      "  inflating: Iris.csv                \n",
      "  inflating: database.sqlite         \n"
     ]
    }
   ],
   "source": [
    "!kaggle datasets download -d uciml/iris\n",
    "!unzip -o iris.zip\n",
    "!rm database.sqlite iris.zip\n",
    "!mkdir -p data\n",
    "!mv Iris.csv data/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adb9a522",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Teraz dodamy plik(i) z danymi do DVC:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "74d182c7",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2K\u001b[32m⠧\u001b[0m Checking graph                                                   \u001b[32m⠋\u001b[0m Checking graph\n",
      "Adding...                                                                       \n",
      "!\u001b[A\n",
      "  0% Checking cache in '/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project/.d\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |Transferring                          0/1 [00:00<?,     ?file/s]\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |.oAL9GSGErYepJSZTnvkTL8.tmp        0.00/? [00:00<?,        ?B/s]\u001b[A\n",
      "  0%|          |.oAL9GSGErYepJSZTnvkTL8.tmp     0.00/4.00 [00:00<?,        ?B/s]\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |7820ef0af287ff346c5cabfb4c612c     0.00/? [00:00<?,        ?B/s]\u001b[A\n",
      "  0%|          |7820ef0af287ff346c5cabfb4c612c 0.00/4.99k [00:00<?,        ?B/s]\u001b[A\n",
      "100% Adding...|████████████████████████████████████████|1/1 [00:00,  2.59file/s]\u001b[A\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "    git add data/.gitignore data/Iris.csv.dvc\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc add data/Iris.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72c6b5d0",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " - DVC utworzył plik `data/Iris.csv.dvc` i dadał oryginalny plik do `.gitignore`\n",
    " - W repozytorium będzie obecny tylko plik `*.dvc`, zawierający odnośnik do prawdziwego pliku"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "74d54652",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On branch master\r\n",
      "Untracked files:\r\n",
      "  (use \"git add <file>...\" to include in what will be committed)\r\n",
      "\t\u001b[31mdata/.gitignore\u001b[m\r\n",
      "\t\u001b[31mdata/Iris.csv.dvc\u001b[m\r\n",
      "\r\n",
      "nothing added to commit but untracked files present (use \"git add\" to track)\r\n"
     ]
    }
   ],
   "source": [
    "!git status -u"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8589fecf",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Dodajmy pliki `data/Iris.csv.dvc data/.gitignore` do repozytorium git, zgodnie z sugestią DVC:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "460c4a17",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "!git add data/Iris.csv.dvc data/.gitignore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "80644077",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[master 67214ea] Dodano dane IRIS (DVC)\n",
      " 2 files changed, 5 insertions(+)\n",
      " create mode 100644 data/.gitignore\n",
      " create mode 100644 data/Iris.csv.dvc\n"
     ]
    }
   ],
   "source": [
    "!git commit -m \"Dodano dane IRIS (DVC)\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03899863",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Plik `*.dvc` zawiera m.in. hash pliku. Więcej o plikach `*.dvc`: [link](https://dvc.org/doc/user-guide/project-structure/dvc-files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cb2ba7c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# %load data/Iris.csv.dvc\n",
    "outs:\n",
    "- md5: 717820ef0af287ff346c5cabfb4c612c\n",
    "  size: 5107\n",
    "  path: Iris.csv\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b421d45",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Oryginalny plik `Iris.csv` został przeniesiony do katalogu ./dvc/cache/{wartość hash pliku) i podlinkowany z powrotem do oryginalnej lokalizacji. Sposób tworzenia linków może być [różny w zależności od systemu plików](https://dvc.org/doc/user-guide/large-dataset-optimization)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "1d471f3a",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 8\r\n",
      "-r--r--r-- 1 tomek tomek 5107 Sep 19  2019 7820ef0af287ff346c5cabfb4c612c\r\n"
     ]
    }
   ],
   "source": [
    "!ls -l .dvc/cache/71"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "f86a5b55",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 8\r\n",
      "-rw-r--r-- 1 tomek tomek 5107 May 29 09:19 Iris.csv\r\n",
      "-rw-r--r-- 1 tomek tomek   76 May 29 09:19 Iris.csv.dvc\r\n"
     ]
    }
   ],
   "source": [
    "!ls -l ./data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "901e8e90",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## dvc remote\n",
    " - żeby wysłać właściwe pliki śledzone przez DVC do zdalnej lokalizacji (z której będą mogłby być pobrane np. przez system CI albo innych użytkowników) musimy mieć skonfigurowaną taką lokazliację\n",
    " - służy do tego polecenie [`dvc remote add`](https://dvc.org/doc/command-reference/remote/add)\n",
    " - użyjemy lokalnego \"remote\". Tutaj będzie to po prostu utworzony wcześniej katalog `/dvcstore`. Taki katalog istnieje też na naszym Jenkinsie, oczywiście należy go podmontować w Dockerze\n",
    " - w realnych zastosowaniach podalibyśmy tutaj ścieżkę do jakiegoś zasobu dostępnego przez inernet jak np. serwer SFTP, ścieżka do AWS S3 itp."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53429521",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Obsługiwane typy zdalnych lokalizacji (remotes): https://dvc.org/doc/command-reference/remote/add#supported-storage-types\n",
    " - Amazon S3\n",
    " - S3-compatible storage\n",
    " - Microsoft Azure Blob Storage\n",
    " - Google Drive\n",
    " - Google Cloud Storage\n",
    " - Aliyun OSS\n",
    " - SSH\n",
    " - HDFS\n",
    " - WebHDFS\n",
    " - HTTP\n",
    " - WebDAV\n",
    " - local remote"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "id": "a16f2bfa",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setting 'my_local_remote' as a default remote.\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc remote add -d my_local_remote /dvcstore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "9c3deeaf",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On branch master\r\n",
      "nothing to commit, working tree clean\r\n"
     ]
    }
   ],
   "source": [
    "!git status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "899eac7d",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On branch master\r\n",
      "nothing to commit, working tree clean\r\n"
     ]
    }
   ],
   "source": [
    "!git add .dvc/config\n",
    "!git commit -m \"Added DVC remote\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c556c96",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## dvc push\n",
    "Kiedy mamy już skonfigurowany \"remote\" możemy wypchnąć do niego pliki korzystając z polecenia `dvc push`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "c7f24f75",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  0% Transferring|                                   |0/1 [00:00<?,     ?file/s]\n",
      "!\u001b[A\n",
      "  0%|          |7820ef0af287ff346c5cabfb4c612c     0.00/? [00:00<?,        ?B/s]\u001b[A\n",
      "  0%|          |7820ef0af287ff346c5cabfb4c612c 0.00/4.99k [00:00<?,        ?B/s]\u001b[A\n",
      "1 file pushed                                                                   \u001b[A\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc push"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8a355575",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34;42m/dvcstore\u001b[0m\r\n",
      "└── \u001b[01;34m71\u001b[0m\r\n",
      "    └── 7820ef0af287ff346c5cabfb4c612c\r\n",
      "\r\n",
      "1 directory, 1 file\r\n"
     ]
    }
   ],
   "source": [
    "!tree /dvcstore"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af59ecb3",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## dvc pull\n",
    "Żeby pobrać dane z DVC (np. w innej lokalizacji, przez innego użytkownika), musimy:\n",
    " - sklonować repozytorium git (żeby m.in. pobrać pliki `*.dvc`\n",
    " - wykonać `dvc pull`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fa914a7",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Dodawanie nowych plików i modyfikacja istniejących wygląda podobnie jak przy zwykłych plikach śledzonych przez  git, tylko zamiast `git` używamy polecenia `dvc` a dodatkowo pamiętamy o zarządzaniu plikami `*.dvc` za pomocą gita:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "dde39796",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "!head -n -1 data/Iris.csv | sponge data/Iris.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "7f14ec60",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On branch master\r\n",
      "nothing to commit, working tree clean\r\n"
     ]
    }
   ],
   "source": [
    "!git status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "8a841039",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data/Iris.csv.dvc:                                                    core\u001b[39m>\n",
      "\tchanged outs:\n",
      "\t\tmodified:           data/Iris.csv\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "bf6c1067",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[2K\u001b[32m⠹\u001b[0m Checking graph                                                   \u001b[32m⠋\u001b[0m Checking graph\n",
      "Adding...                                                                       \n",
      "!\u001b[A\n",
      "  0% Checking cache in '/home/tomek/repos/aitech-ium/IUM_10/sample-ml-project/.d\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |Transferring                          0/1 [00:00<?,     ?file/s]\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |.GbNyfXVqWGYkQKjqaSP8tL.tmp        0.00/? [00:00<?,        ?B/s]\u001b[A\n",
      "  0%|          |.GbNyfXVqWGYkQKjqaSP8tL.tmp     0.00/4.00 [00:00<?,        ?B/s]\u001b[A\n",
      "                                                                                \u001b[A\n",
      "!\u001b[A\n",
      "  0%|          |cff2e578d76852294184c1dce9fdbf     0.00/? [00:00<?,        ?B/s]\u001b[A\n",
      "  0%|          |cff2e578d76852294184c1dce9fdbf 0.00/4.95k [00:00<?,        ?B/s]\u001b[A\n",
      "100% Adding...|████████████████████████████████████████|1/1 [00:00, 11.00file/s]\u001b[A\n",
      "\n",
      "To track the changes with git, run:\n",
      "\n",
      "    git add data/Iris.csv.dvc\n",
      "\n",
      "To enable auto staging, run:\n",
      "\n",
      "\tdvc config core.autostage true\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc add data/Iris.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "4a4865c9",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[master d6ff265] Removed last line from Iris dataset\r\n",
      " 1 file changed, 2 insertions(+), 2 deletions(-)\r\n"
     ]
    }
   ],
   "source": [
    "!git add data/Iris.csv.dvc\n",
    "!git commit -m \"Removed last line from Iris dataset\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d710977c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### dvc checkout\n",
    "  - Polecenia `dvc checkout` używamy razem z `git checkout`, żeby zmienić branch, na którym pracujemy.\n",
    "  - DVC podmieni wersje plików śledzonych przez siebie na pochodzące z innego brancha (o ile pliki te się różnią i różnią się pliki `*.dvc` w odpowiednich branchach\n",
    "  - zmiana brancha przez git powoduje (ewentualną) zmianę plików `*.dvc` a `dvc checkout` kopiuje/linkuje pliki z katalogu `.dvc/cache` o wartościach hash odpowiadających tym z plików `*.dvc`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5897e8eb",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Wymiana danych między projektami\n",
    " - za pomocą poleceń `dvc import` i `dvc update` możemy dodać i później aktualizować pliki śledzone przez DVC w innym repozytorium"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "9b018146",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'\n",
      "  0% Downloading|                                    |0/1 [00:00<?,     ?file/s]\n",
      "!\u001b[A\n",
      "  0%|          |get-started/data.xml           0.00/37.9M [00:00<?,       ?it/s]\u001b[A\n",
      "  0%|          |get-started/data.xml      64.0k/36.1M [00:00<02:12,     286kB/s]\u001b[A\n",
      "  0%|          |get-started/data.xml       128k/36.1M [00:00<01:33,     403kB/s]\u001b[A\n",
      "  1%|          |get-started/data.xml       256k/36.1M [00:00<00:57,     658kB/s]\u001b[A\n",
      "  1%|          |get-started/data.xml       384k/36.1M [00:00<00:45,     818kB/s]\u001b[A\n",
      "  1%|▏         |get-started/data.xml       512k/36.1M [00:00<00:53,     693kB/s]\u001b[A\n",
      "  2%|▏         |get-started/data.xml       640k/36.1M [00:01<00:57,     644kB/s]\u001b[A\n",
      "  2%|▏         |get-started/data.xml       768k/36.1M [00:01<00:59,     619kB/s]\u001b[A\n",
      "  2%|▏         |get-started/data.xml       896k/36.1M [00:01<00:51,     718kB/s]\u001b[A\n",
      "  3%|▎         |get-started/data.xml      1.00M/36.1M [00:01<00:55,     666kB/s]\u001b[A\n",
      "  3%|▎         |get-started/data.xml      1.12M/36.1M [00:01<00:57,     633kB/s]\u001b[A\n",
      "  3%|▎         |get-started/data.xml      1.25M/36.1M [00:02<00:57,     638kB/s]\u001b[A\n",
      "  4%|▍         |get-started/data.xml      1.38M/36.1M [00:02<00:52,     698kB/s]\u001b[A\n",
      "  4%|▍         |get-started/data.xml      1.50M/36.1M [00:02<00:55,     656kB/s]\u001b[A\n",
      "  4%|▍         |get-started/data.xml      1.62M/36.1M [00:02<00:57,     628kB/s]\u001b[A\n",
      "  5%|▍         |get-started/data.xml      1.69M/36.1M [00:02<00:58,     618kB/s]\u001b[A\n",
      "  5%|▌         |get-started/data.xml      1.81M/36.1M [00:02<00:53,     675kB/s]\u001b[A\n",
      "  5%|▌         |get-started/data.xml      1.94M/36.1M [00:03<00:53,     672kB/s]\u001b[A\n",
      "  6%|▌         |get-started/data.xml      2.06M/36.1M [00:03<00:55,     642kB/s]\u001b[A\n",
      "  6%|▌         |get-started/data.xml      2.12M/36.1M [00:03<00:56,     628kB/s]\u001b[A\n",
      "  6%|▌         |get-started/data.xml      2.19M/36.1M [00:03<00:57,     616kB/s]\u001b[A\n",
      "  6%|▌         |get-started/data.xml      2.25M/36.1M [00:03<00:58,     606kB/s]\u001b[A\n",
      "  7%|▋         |get-started/data.xml      2.38M/36.1M [00:03<00:48,     732kB/s]\u001b[A\n",
      "  7%|▋         |get-started/data.xml      2.50M/36.1M [00:04<00:52,     666kB/s]\u001b[A\n",
      "  7%|▋         |get-started/data.xml      2.62M/36.1M [00:04<00:55,     636kB/s]\u001b[A\n",
      "  8%|▊         |get-started/data.xml      2.75M/36.1M [00:04<00:56,     614kB/s]\u001b[A\n",
      "  8%|▊         |get-started/data.xml      2.88M/36.1M [00:04<00:49,     711kB/s]\u001b[A\n",
      "  8%|▊         |get-started/data.xml      3.00M/36.1M [00:04<00:52,     663kB/s]\u001b[A\n",
      "  9%|▊         |get-started/data.xml      3.12M/36.1M [00:05<00:54,     637kB/s]\u001b[A\n",
      "  9%|▉         |get-started/data.xml      3.25M/36.1M [00:05<00:55,     623kB/s]\u001b[A\n",
      "  9%|▉         |get-started/data.xml      3.38M/36.1M [00:05<00:48,     710kB/s]\u001b[A\n",
      " 10%|▉         |get-started/data.xml      3.50M/36.1M [00:05<00:51,     664kB/s]\u001b[A\n",
      " 10%|█         |get-started/data.xml      3.62M/36.1M [00:05<00:45,     751kB/s]\u001b[A\n",
      " 10%|█         |get-started/data.xml      3.75M/36.1M [00:05<00:49,     691kB/s]\u001b[A\n",
      " 11%|█         |get-started/data.xml      3.88M/36.1M [00:06<00:43,     777kB/s]\u001b[A\n",
      " 11%|█         |get-started/data.xml      4.00M/36.1M [00:06<00:47,     705kB/s]\u001b[A\n",
      " 11%|█▏        |get-started/data.xml      4.12M/36.1M [00:06<00:42,     790kB/s]\u001b[A\n",
      " 12%|█▏        |get-started/data.xml      4.25M/36.1M [00:06<00:46,     716kB/s]\u001b[A\n",
      " 12%|█▏        |get-started/data.xml      4.38M/36.1M [00:06<00:44,     749kB/s]\u001b[A\n",
      " 12%|█▏        |get-started/data.xml      4.50M/36.1M [00:07<00:45,     734kB/s]\u001b[A\n",
      " 13%|█▎        |get-started/data.xml      4.62M/36.1M [00:07<00:40,     810kB/s]\u001b[A\n",
      " 13%|█▎        |get-started/data.xml      4.75M/36.1M [00:07<00:42,     773kB/s]\u001b[A\n",
      " 13%|█▎        |get-started/data.xml      4.88M/36.1M [00:07<00:41,     795kB/s]\u001b[A\n",
      " 14%|█▍        |get-started/data.xml      5.00M/36.1M [00:07<00:37,     870kB/s]\u001b[A\n",
      " 14%|█▍        |get-started/data.xml      5.12M/36.1M [00:07<00:34,     932kB/s]\u001b[A\n",
      " 15%|█▍        |get-started/data.xml      5.25M/36.1M [00:07<00:35,     916kB/s]\u001b[A\n",
      " 15%|█▍        |get-started/data.xml      5.38M/36.1M [00:08<00:35,     898kB/s]\u001b[A\n",
      " 15%|█▌        |get-started/data.xml      5.50M/36.1M [00:08<00:33,     962kB/s]\u001b[A\n",
      " 16%|█▌        |get-started/data.xml      5.62M/36.1M [00:08<00:33,     949kB/s]\u001b[A\n",
      " 16%|█▌        |get-started/data.xml      5.75M/36.1M [00:08<00:31,    1.00MB/s]\u001b[A\n",
      " 16%|█▋        |get-started/data.xml      5.88M/36.1M [00:08<00:30,    1.04MB/s]\u001b[A\n",
      " 17%|█▋        |get-started/data.xml      6.06M/36.1M [00:08<00:26,    1.19MB/s]\u001b[A\n",
      " 17%|█▋        |get-started/data.xml      6.19M/36.1M [00:08<00:26,    1.19MB/s]\u001b[A\n",
      " 17%|█▋        |get-started/data.xml      6.31M/36.1M [00:08<00:26,    1.19MB/s]\u001b[A\n",
      " 18%|█▊        |get-started/data.xml      6.50M/36.1M [00:08<00:23,    1.31MB/s]\u001b[A\n",
      " 18%|█▊        |get-started/data.xml      6.62M/36.1M [00:09<00:23,    1.30MB/s]\u001b[A\n",
      " 19%|█▉        |get-started/data.xml      6.81M/36.1M [00:09<00:21,    1.41MB/s]\u001b[A\n",
      " 19%|█▉        |get-started/data.xml      7.00M/36.1M [00:09<00:20,    1.48MB/s]\u001b[A\n",
      " 20%|█▉        |get-started/data.xml      7.19M/36.1M [00:09<00:19,    1.54MB/s]\u001b[A\n",
      " 20%|██        |get-started/data.xml      7.38M/36.1M [00:09<00:18,    1.60MB/s]\u001b[A\n",
      " 21%|██        |get-started/data.xml      7.56M/36.1M [00:09<00:18,    1.62MB/s]\u001b[A\n",
      " 21%|██▏       |get-started/data.xml      7.75M/36.1M [00:09<00:17,    1.68MB/s]\u001b[A\n",
      " 22%|██▏       |get-started/data.xml      7.94M/36.1M [00:09<00:17,    1.70MB/s]\u001b[A\n",
      " 22%|██▏       |get-started/data.xml      8.12M/36.1M [00:10<00:17,    1.72MB/s]\u001b[A\n",
      " 23%|██▎       |get-started/data.xml      8.38M/36.1M [00:10<00:15,    1.88MB/s]\u001b[A\n",
      " 24%|██▎       |get-started/data.xml      8.56M/36.1M [00:10<00:15,    1.84MB/s]\u001b[A\n",
      " 24%|██▍       |get-started/data.xml      8.81M/36.1M [00:10<00:14,    1.96MB/s]\u001b[A\n",
      " 25%|██▌       |get-started/data.xml      9.06M/36.1M [00:10<00:13,    2.06MB/s]\u001b[A\n",
      " 26%|██▌       |get-started/data.xml      9.31M/36.1M [00:10<00:13,    2.14MB/s]\u001b[A\n",
      " 27%|██▋       |get-started/data.xml      9.62M/36.1M [00:10<00:11,    2.32MB/s]\u001b[A\n",
      " 27%|██▋       |get-started/data.xml      9.88M/36.1M [00:10<00:11,    2.33MB/s]\u001b[A\n",
      " 28%|██▊       |get-started/data.xml      10.2M/36.1M [00:10<00:11,    2.46MB/s]\u001b[A\n",
      " 29%|██▉       |get-started/data.xml      10.4M/36.1M [00:11<00:10,    2.45MB/s]\u001b[A\n",
      " 30%|██▉       |get-started/data.xml      10.8M/36.1M [00:11<00:10,    2.57MB/s]\u001b[A\n",
      " 31%|███       |get-started/data.xml      11.1M/36.1M [00:11<00:09,    2.67MB/s]\u001b[A\n",
      " 32%|███▏      |get-started/data.xml      11.4M/36.1M [00:11<00:09,    2.84MB/s]\u001b[A\n",
      " 33%|███▎      |get-started/data.xml      11.8M/36.1M [00:11<00:08,    2.85MB/s]\u001b[A\n",
      " 34%|███▎      |get-started/data.xml      12.1M/36.1M [00:11<00:08,    3.01MB/s]\u001b[A\n",
      " 35%|███▍      |get-started/data.xml      12.5M/36.1M [00:11<00:07,    3.12MB/s]\u001b[A\n",
      " 36%|███▌      |get-started/data.xml      12.9M/36.1M [00:11<00:07,    3.22MB/s]\u001b[A\n",
      " 37%|███▋      |get-started/data.xml      13.2M/36.1M [00:11<00:07,    3.31MB/s]\u001b[A\n",
      " 38%|███▊      |get-started/data.xml      13.7M/36.1M [00:12<00:06,    3.49MB/s]\u001b[A\n",
      " 39%|███▉      |get-started/data.xml      14.1M/36.1M [00:12<00:06,    3.62MB/s]\u001b[A\n",
      " 40%|████      |get-started/data.xml      14.6M/36.1M [00:12<00:06,    3.74MB/s]\u001b[A\n",
      " 42%|████▏     |get-started/data.xml      15.0M/36.1M [00:12<00:05,    3.82MB/s]\u001b[A\n",
      " 43%|████▎     |get-started/data.xml      15.4M/36.1M [00:12<00:05,    3.97MB/s]\u001b[A\n",
      " 44%|████▍     |get-started/data.xml      15.9M/36.1M [00:12<00:05,    4.08MB/s]\u001b[A\n",
      " 45%|████▌     |get-started/data.xml      16.4M/36.1M [00:12<00:04,    4.23MB/s]\u001b[A\n",
      " 47%|████▋     |get-started/data.xml      17.0M/36.1M [00:12<00:04,    4.44MB/s]\u001b[A\n",
      " 48%|████▊     |get-started/data.xml      17.5M/36.1M [00:12<00:04,    4.52MB/s]\u001b[A\n",
      " 50%|████▉     |get-started/data.xml      18.1M/36.1M [00:13<00:04,    4.69MB/s]\u001b[A\n",
      " 52%|█████▏    |get-started/data.xml      18.6M/36.1M [00:13<00:03,    4.84MB/s]\u001b[A\n",
      " 53%|█████▎    |get-started/data.xml      19.2M/36.1M [00:13<00:03,    5.05MB/s]\u001b[A\n",
      " 55%|█████▍    |get-started/data.xml      19.8M/36.1M [00:13<00:03,    5.16MB/s]\u001b[A\n",
      " 57%|█████▋    |get-started/data.xml      20.4M/36.1M [00:13<00:03,    5.35MB/s]\u001b[A\n",
      " 58%|█████▊    |get-started/data.xml      21.1M/36.1M [00:13<00:02,    5.49MB/s]\u001b[A\n",
      " 60%|██████    |get-started/data.xml      21.8M/36.1M [00:13<00:02,    5.66MB/s]\u001b[A\n",
      " 62%|██████▏   |get-started/data.xml      22.4M/36.1M [00:13<00:02,    5.83MB/s]\u001b[A\n",
      " 64%|██████▍   |get-started/data.xml      23.2M/36.1M [00:14<00:02,    6.05MB/s]\u001b[A\n",
      " 66%|██████▌   |get-started/data.xml      23.9M/36.1M [00:14<00:02,    6.20MB/s]\u001b[A\n",
      " 68%|██████▊   |get-started/data.xml      24.6M/36.1M [00:14<00:01,    6.40MB/s]\u001b[A\n",
      " 70%|███████   |get-started/data.xml      25.4M/36.1M [00:14<00:01,    6.51MB/s]\u001b[A\n",
      " 72%|███████▏  |get-started/data.xml      26.0M/36.1M [00:14<00:01,    5.75MB/s]\u001b[A\n",
      " 74%|███████▎  |get-started/data.xml      26.6M/36.1M [00:14<00:02,    4.26MB/s]\u001b[A\n",
      " 75%|███████▍  |get-started/data.xml      27.1M/36.1M [00:14<00:02,    3.53MB/s]\u001b[A\n",
      " 76%|███████▌  |get-started/data.xml      27.5M/36.1M [00:15<00:02,    3.26MB/s]\u001b[A\n",
      " 77%|███████▋  |get-started/data.xml      27.9M/36.1M [00:15<00:02,    3.00MB/s]\u001b[A\n",
      " 78%|███████▊  |get-started/data.xml      28.2M/36.1M [00:15<00:02,    2.95MB/s]\u001b[A\n",
      " 79%|███████▉  |get-started/data.xml      28.5M/36.1M [00:15<00:02,    2.91MB/s]\u001b[A\n",
      " 80%|███████▉  |get-started/data.xml      28.8M/36.1M [00:15<00:02,    2.88MB/s]\u001b[A\n",
      " 81%|████████  |get-started/data.xml      29.1M/36.1M [00:15<00:02,    2.86MB/s]\u001b[A\n",
      " 81%|████████▏ |get-started/data.xml      29.4M/36.1M [00:15<00:02,    2.84MB/s]\u001b[A\n",
      " 82%|████████▏ |get-started/data.xml      29.8M/36.1M [00:16<00:02,    2.83MB/s]\u001b[A\n",
      " 83%|████████▎ |get-started/data.xml      30.1M/36.1M [00:16<00:02,    2.83MB/s]\u001b[A\n",
      " 84%|████████▍ |get-started/data.xml      30.4M/36.1M [00:16<00:02,    2.83MB/s]\u001b[A\n",
      " 85%|████████▍ |get-started/data.xml      30.7M/36.1M [00:16<00:02,    2.83MB/s]\u001b[A\n",
      " 86%|████████▌ |get-started/data.xml      31.0M/36.1M [00:16<00:01,    2.83MB/s]\u001b[A\n",
      " 87%|████████▋ |get-started/data.xml      31.3M/36.1M [00:16<00:01,    2.83MB/s]\u001b[A\n",
      " 88%|████████▊ |get-started/data.xml      31.6M/36.1M [00:16<00:01,    2.83MB/s]\u001b[A\n",
      " 88%|████████▊ |get-started/data.xml      31.9M/36.1M [00:16<00:01,    2.84MB/s]\u001b[A\n",
      " 89%|████████▉ |get-started/data.xml      32.2M/36.1M [00:16<00:01,    2.85MB/s]\u001b[A\n",
      " 90%|█████████ |get-started/data.xml      32.6M/36.1M [00:17<00:01,    2.85MB/s]\u001b[A\n",
      " 91%|█████████ |get-started/data.xml      32.9M/36.1M [00:17<00:01,    2.86MB/s]\u001b[A\n",
      " 92%|█████████▏|get-started/data.xml      33.2M/36.1M [00:17<00:01,    2.86MB/s]\u001b[A\n",
      " 93%|█████████▎|get-started/data.xml      33.5M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 94%|█████████▎|get-started/data.xml      33.8M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 94%|█████████▍|get-started/data.xml      34.1M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 95%|█████████▌|get-started/data.xml      34.4M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 96%|█████████▌|get-started/data.xml      34.8M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 97%|█████████▋|get-started/data.xml      35.1M/36.1M [00:17<00:00,    2.87MB/s]\u001b[A\n",
      " 98%|█████████▊|get-started/data.xml      35.4M/36.1M [00:18<00:00,    2.87MB/s]\u001b[A\n",
      " 99%|█████████▉|get-started/data.xml      35.7M/36.1M [00:18<00:00,    2.88MB/s]\u001b[A\n",
      "100%|█████████▉|get-started/data.xml      36.0M/36.1M [00:18<00:00,    2.87MB/s]\u001b[A\n",
      "                                                                                \u001b[A\n",
      "To track the changes with git, run:\n",
      "\n",
      "\tgit add data/.gitignore data/data.xml.dvc\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc import https://github.com/iterative/dataset-registry \\\n",
    "             get-started/data.xml -o data/data.xml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "be2c1a37",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data and pipelines are up to date.                                              \n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "3306c5b7",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 37020\r\n",
      "-rw-rw-r-- 1 tomek tomek 37891850 maj 31 11:10 data.xml\r\n",
      "-rw-rw-r-- 1 tomek tomek      284 maj 31 11:10 data.xml.dvc\r\n",
      "-rw-rw-r-- 1 tomek tomek     5072 maj 31 11:01 Iris.csv\r\n",
      "-rw-rw-r-- 1 tomek tomek       76 maj 31 11:01 Iris.csv.dvc\r\n"
     ]
    }
   ],
   "source": [
    "ls -l data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b73c56ea",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# %load data/data.xml.dvc\n",
    "md5: a7cd139231cc35ed63541ce3829b96db\n",
    "frozen: true\n",
    "deps:\n",
    "- path: get-started/data.xml\n",
    "  repo:\n",
    "    url: https://github.com/iterative/dataset-registry\n",
    "    rev_lock: ba014f40e29670421a67cb1c47543f402348aa13\n",
    "outs:\n",
    "- md5: a304afb96060aad90176268345e10355\n",
    "  size: 37891850\n",
    "  path: data.xml\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db1063ac",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## DVC pipelines\n",
    " - wprowadzenie: https://youtu.be/71IGzyH95UY\n",
    " - Getting started: https://dvc.org/doc/start/data-pipelines\n",
    " - dvc pipelines pozwala nam zbudować (za pomocą polecenie `dvc run`) lub zdefiniować (edytując plik `dvc.yaml`) graf zależności między krokami wykonywanymi w naszym projekcie (takimi jak \"przygotowanie danych\", \"trenowanie\", \"ewaluacja\")\n",
    "  - tak zdefiniowany pipeline można potem uruchomić za pomocą polecenia `dvc reproduce`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2939867",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Zadania [15pkt]\n",
    "1. Zainicjalizuj repozytorium DVC wewnątrz Twojego repozytorium z projektem [1pkt]\n",
    "2. Dodaj plik(i) z danymi w Twoim projekcie do DVC [1pkt]\n",
    "3. Skonfiguruj remote (dane do konfiguracji podane poniżej) [3pkt]\n",
    "4. Stwórz/zdefiniuj i dodaj do repozytorium plik `dvc.yaml` opisujący kroki wykonywane w Twoim projekcie. Wydziel przynajmniej 2 kroki (np. przygotowanie danych/trenowanie) powiązane ze sobą za pomocą zależności (skorzystaj z \n",
    "materiałów \"Getting started\", link powyżej) [6pkt]\n",
    "5. Stwórz projekt na Jenkinsie (`s1233456-dvc`), w którym sklonujesz repozytorium, ściągniesz pliki dvc (za pomocą `dvc pull`) i uruchomisz pipeline (za pomocą `dvc reproduce`) [4pkt]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f5a8590",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## SSH remote\n",
    "Jednym z remote obsługiwanych przez DVC jest SFTP/SSH.\n",
    "W celu jego wykorzystania na serwerze tzietkiewicz.vm.wmi.amu.edu.pl utworzony został użytkownik `ium-sftp` i skonfigurowany serwer SFTP.\n",
    "Został też dla niego wygenerowany klucz ssh, który został dodany jako \"Jenkins credential\" (patrz opis konfiguracji na Jenkins poniżej)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82a61107",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Lokalnie\n",
    "Będziemy potrzebować zależności ([szczegóły](https://dvc.org/doc/command-reference/remote/add))\n",
    "  \n",
    "  `conda install dvc-ssh` \n",
    "\n",
    "albo\n",
    "\n",
    "`pip install dvc[ssh] paramiko`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c48c5b8e",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting package metadata (current_repodata.json): done\n",
      "Solving environment: done\n",
      "\n",
      "## Package Plan ##\n",
      "\n",
      "  environment location: /home/tomek/miniconda3\n",
      "\n",
      "  added / updated specs:\n",
      "    - dvc-ssh\n",
      "\n",
      "\n",
      "The following packages will be downloaded:\n",
      "\n",
      "    package                    |            build\n",
      "    ---------------------------|-----------------\n",
      "    bcrypt-3.2.0               |   py39h3811e60_1          44 KB  conda-forge\n",
      "    ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge\n",
      "    certifi-2021.5.30          |   py39hf3d152e_0         141 KB  conda-forge\n",
      "    dvc-2.3.0                  |   py39hf3d152e_0         542 KB  conda-forge\n",
      "    dvc-ssh-2.3.0              |   py39hf3d152e_0           9 KB  conda-forge\n",
      "    fsspec-2021.5.0            |     pyhd8ed1ab_0          77 KB  conda-forge\n",
      "    invoke-1.5.0               |     pyhd3deb0d_0         137 KB  conda-forge\n",
      "    paramiko-2.7.2             |     pyh9f0ad1d_0         135 KB  conda-forge\n",
      "    pynacl-1.4.0               |   py39h3811e60_2         1.3 MB  conda-forge\n",
      "    ------------------------------------------------------------\n",
      "                                           Total:         2.5 MB\n",
      "\n",
      "The following NEW packages will be INSTALLED:\n",
      "\n",
      "  bcrypt             conda-forge/linux-64::bcrypt-3.2.0-py39h3811e60_1\n",
      "  dvc-ssh            conda-forge/linux-64::dvc-ssh-2.3.0-py39hf3d152e_0\n",
      "  invoke             conda-forge/noarch::invoke-1.5.0-pyhd3deb0d_0\n",
      "  paramiko           conda-forge/noarch::paramiko-2.7.2-pyh9f0ad1d_0\n",
      "  pynacl             conda-forge/linux-64::pynacl-1.4.0-py39h3811e60_2\n",
      "\n",
      "The following packages will be UPDATED:\n",
      "\n",
      "  ca-certificates                      2020.12.5-ha878542_0 --> 2021.5.30-ha878542_0\n",
      "  certifi                          2020.12.5-py39hf3d152e_1 --> 2021.5.30-py39hf3d152e_0\n",
      "  dvc                                  2.1.0-py39hf3d152e_0 --> 2.3.0-py39hf3d152e_0\n",
      "  fsspec                                 0.9.0-pyhd8ed1ab_2 --> 2021.5.0-pyhd8ed1ab_0\n",
      "\n",
      "\n",
      "\n",
      "Downloading and Extracting Packages\n",
      "certifi-2021.5.30    | 141 KB    | ##################################### | 100% \n",
      "fsspec-2021.5.0      | 77 KB     | ##################################### | 100% \n",
      "dvc-2.3.0            | 542 KB    | ##################################### | 100% \n",
      "invoke-1.5.0         | 137 KB    | ##################################### | 100% \n",
      "paramiko-2.7.2       | 135 KB    | ##################################### | 100% \n",
      "bcrypt-3.2.0         | 44 KB     | ##################################### | 100% \n",
      "pynacl-1.4.0         | 1.3 MB    | ##################################### | 100% \n",
      "dvc-ssh-2.3.0        | 9 KB      | ##################################### | 100% \n",
      "ca-certificates-2021 | 136 KB    | ##################################### | 100% \n",
      "Preparing transaction: done\n",
      "Verifying transaction: done\n",
      "Executing transaction: done\n",
      "\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "conda install -c conda-forge dvc-ssh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "e9a04876",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Setting 'ium_ssh_remote' as a default remote.\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc remote add -f -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "e3f27bbb",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "my_local_remote\t/dvcstore\n",
      "ium_ssh_remote\tssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl\n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc remote list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c92edd7b",
   "metadata": {},
   "source": [
    "Zapisujemy hasło:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "5b2fa175",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc remote modify --local ium_ssh_remote password IUM@2021"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "id": "ea6e16fa",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Everything is up to date.                                                       \n",
      "\u001b[0m"
     ]
    }
   ],
   "source": [
    "!dvc push"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1468c44c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Jenkins\n",
    "\n",
    "W Jenkins można użyć mechanizmu \"Credentials\", żeby w bezpieczny sposób przekazać hasło albo klucz prywatny.\n",
    "\n",
    "Takie dane dla użytkownika ium-sftp zostały stworzone na Jenkinsie:\n",
    "\n",
    " - typu ssh key: https://tzietkiewicz.vm.wmi.amu.edu.pl:8080/credentials/store/system/domain/_/credential/48ac7004-216e-4260-abba-1fe5db753e18/\n",
    " - typu \"secret text\" - zawierający hasło użytkownika ium-shftp: https://tzietkiewicz.vm.wmi.amu.edu.pl:8080/credentials/store/system/domain/_/credential/ium-sftp-password/\n",
    "\n",
    "Opis używania \"Credentials\" w Jenkinsfile: https://www.jenkins.io/doc/book/pipeline/jenkinsfile/#for-other-credential-types\n",
    "\n",
    "Klucza ssh można użyć tak: \n",
    "\n",
    "```Jenkinsfile\n",
    "withCredentials(\n",
    "    [sshUserPrivateKey(credentialsId: '48ac7004-216e-4260-abba-1fe5db753e18', keyFileVariable: 'IUM_SFTP_KEY', passphraseVariable: '', usernameVariable: '')]) {\n",
    "                sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'\n",
    "                sh 'dvc remote modify --local ium_ssh_remote keyfile $IUM_SFTP_KEY'\n",
    "                sh 'dvc pull'}\n",
    "```\n",
    "\n",
    "Secret text tak:\n",
    "\n",
    "```Jenkinsfile\n",
    "    withCredentials([string(credentialsId: 'ium-sftp-password', variable: 'IUM_SFTP_PASS')]) {\n",
    "                sh 'dvc remote add -d ium_ssh_remote ssh://ium-sftp@tzietkiewicz.vm.wmi.amu.edu.pl/ium-sftp'\n",
    "                sh 'dvc remote modify --local ium_ssh_remote password $IUM_SFTP_KEY'\n",
    "                sh 'dvc pull'\n",
    "    }\n",
    "```\n",
    "\n",
    "Przykład kongiguracji: \n",
    " - https://tzietkiewicz.vm.wmi.amu.edu.pl:8080/job/docker-test-mount/ \n",
    " - https://git.wmi.amu.edu.pl/tzietkiewicz/ium-helloworld"
   ]
  }
 ],
 "metadata": {
  "author": "Tomasz Ziętkiewicz",
  "celltoolbar": "Slideshow",
  "email": "tomasz.zietkiewicz@amu.edu.pl",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "lang": "pl",
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  },
  "slideshow": {
   "slide_type": "slide"
  },
  "subtitle": "10.DVC[laboratoria]",
  "title": "Inżynieria uczenia maszynowego",
  "year": "2021"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}