1
0
forked from pms/ium
ium/IUM_11.CML.ipynb
Paweł Skórzewski f8e196a585 Lab. 10
2024-05-22 08:17:56 +02:00

1091 lines
31 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "9d06fc91",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inżynieria uczenia maszynowego\n",
"### 29 maja 2024\n",
"# 11. GitHub Actions"
]
},
{
"cell_type": "markdown",
"id": "beeb17b2",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Github actions\n",
"<img src=\"img/expcontrol/github-actions.jpeg\">"
]
},
{
"cell_type": "markdown",
"id": "752995e1",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - https://docs.github.com/en/actions\n",
" - System ciągłej integracji \"wbudowany\" w GitHub\n",
" - Darmowy dla publicznych repozytoriów (z większymi niż w płatnych planach [ograniczeniami dotyczącymi zasobów](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits))\n",
" "
]
},
{
"cell_type": "markdown",
"id": "2b06cc01",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"https://youtu.be/cP0I9w2coGU"
]
},
{
"cell_type": "markdown",
"id": "b66dd41f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Terminologia Github Actions\n",
" - *Workflow* - workflow odpowiada \"Pipeline\" z Jenkinsa.\n",
" - *Event* - zdarzenie, które odapala (\"triggers\") \"Workflow\". Np. wypchnięcie zmiany do repozytorium (\"push\"), utworzenie Pull requesta. [Pełna lista](https://docs.github.com/en/actions/reference/events-that-trigger-workflows)\n",
" - *Job* - workflow składa się z jednego lub kilku zadań (\"jobs\"). Każde z nich może być wykonywane równolegle na innej maszynie (patrz \"runner\")\n",
" - *Step* - odpowiednik \"Stage\" z Jenkinsa - służu do grupowania \"Actions\"\n",
" - *Action/command* - odpowiednik \"Step\" z Jenkinsa - pojedyncze polecenie do wykonania, np. dodanie komentarze do Pull requesta, wykonanie polecenia systemowego itp.\n",
" - *Runner* - odpowiednik Jenkinsowego \"Agent\" - serwer, na którym mogą być wykonywane zadania (\"jobs\")\n",
" - *Github-hosted runner* - serwer utrzymywany przez Github (2-core CPU, 7 GB RAM, 14 GB SSD). Windows, Linux albo macOS\n",
" - *Self-hosted runner* - nasz własny serwer, z zinstalowaną aplikacją [Github actions runner](https://github.com/actions/runner)"
]
},
{
"cell_type": "markdown",
"id": "9f1f6d0a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Definicja workflow\n",
" - Workflow definiuje się w plikach YAML(o rozszerzeniu `*.yml` albo `*.yaml`) umieszczonych w specjalnym folderze `.github/workflows/` wewnątrz repozytorium\n",
" - Pełna składnia jest opisana [tutaj](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions)\n",
" - Podstawowe pola:\n",
" - `name` [opcjonalna] - nazwa, pod którą workflow/step będzie widoczny w UI. Domyślnie ścieżka do pliku yaml\n",
" - `on` - definiuje kiedy workflow ma być odpalony\n",
" - `jobs` - grupuje razem \"zadania\" do wykonania. Każde może być wykonane na innym \"runnerze\". Domyślnie wykonywane są równolegle (ale możemy definiować [zależności między jobami](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idneeds), co powoduje wykonanie ich sekwencyjnie\n",
" - `runs-on` - parametr joba, definujący na jakiej maszynie wirtualnej ma być uruchomiony (np. `ubuntu-latest`)\n",
" - `uses` - umożliwia użycie gotowych akcji zdefiniowanych przez nas, albo przez innych użytkowników, np. `-uses: actions/checkout@v2` spowoduje checkout plików z repozytorium\n",
" - `run` - pozwala uruchomić dowolne ([dostępne/zainstalowane](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#preinstalled-software)) polecenie, np. `python3 train.py`\n",
" - `env` - pozwala zdefiniować zmienne środowiskowe dostępne dla akcji lub skorzystać ze [zmiennych ustawionych przez Github](https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "f4916c1f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/tomek/repos/aitech-ium/IUM_11/github-actions-hello\n"
]
}
],
"source": [
"!mkdir -p IUM_11/github-actions-hello\n",
"%cd IUM_11/github-actions-hello\n",
"!mkdir -p .github/workflows"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "88ce689f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Reinitialized existing Git repository in /home/tomek/repos/aitech-ium/IUM_11/github-actions-hello/.git/\n",
"Enumerating objects: 6, done.\n",
"Counting objects: 100% (6/6), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (4/4), done.\n",
"Writing objects: 100% (6/6), 780 bytes | 780.00 KiB/s, done.\n",
"Total 6 (delta 0), reused 0 (delta 0), pack-reused 0\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" * [new branch] main -> main\n",
"Branch 'main' set up to track remote branch 'main' from 'origin'.\n"
]
}
],
"source": [
"!git init\n",
"!git branch -M main\n",
"!git remote add origin git@github.com:TomekZet/ium-ga-hello.git\n",
"!git push -u origin main"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "dde8d432",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/workflow.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/workflow.yml\n",
"name: github-actions-hello\n",
"on: [push]\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Setup Python\n",
" uses: actions/setup-python@v2.2.2\n",
" with:\n",
" python-version: '3.7'\n",
" - run: python3 --version"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "ff1e011e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch main\n",
"Your branch is up to date with 'origin/main'.\n",
"\n",
"nothing to commit, working tree clean\n",
"Everything up-to-date\n"
]
}
],
"source": [
"!git add .github/workflows/workflow.yml\n",
"!git commit -m \"Github Actions Workflow\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "3e237076",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Zakładka actions na stronie repozytorium:\n",
"https://github.com/TomekZet/ium-ga-hello/actions"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "32701383",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 16\r\n",
"drwxr-sr-x 2 tomek tomek 4096 May 17 11:51 .\r\n",
"drwxr-sr-x 3 tomek tomek 4096 May 17 11:51 ..\r\n",
"-rw-r--r-- 1 tomek tomek 456 May 17 11:51 parametrized.yml\r\n",
"-rw-r--r-- 1 tomek tomek 305 May 17 12:01 workflow.yml\r\n"
]
}
],
"source": [
"!ls -al .github/workflows"
]
},
{
"cell_type": "markdown",
"id": "1c01acb5",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Ręczne wywoływanie\n",
"Workflow można również wywołać ręcznie, podając parametry.\n",
"Więcej informacji np. tutaj: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "a7250bf7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/parametrized.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/parametrized.yml\n",
"name: github-actions-hello-parametrized\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Text to display' \n",
" required: true\n",
" default: 'Hello World'\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Install dependencies\n",
" run:\n",
" sudo apt update;\n",
" sudo apt install -y figlet\n",
" - name: Write\n",
" run:\n",
" figlet \"${{ github.event.inputs.input_text }}\""
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "36ddaac0",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main a98938d] just dispatch\n",
" 1 file changed, 6 deletions(-)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (3/3), done.\n",
"Writing objects: 100% (5/5), 411 bytes | 411.00 KiB/s, done.\n",
"Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 6c4a361..a98938d main -> main\n"
]
}
],
"source": [
"!git add -u .github/workflows\n",
"!git commit -m \"just dispatch\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "ed780dea",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Zależności\n",
"\n",
"Maszyny wirtualne (\"runners\"), na których uruchamiane są \"joby\" mają zainstalowany zbiór narzędzi. Przykładowa lista dla [Ubuntu 20.04](https://github.com/actions/virtual-environments/blob/main/images/linux/Ubuntu2004-README.md)\n",
"\n",
"Brakujące zależności można zainstalować, korzystając z:\n",
" - akcji\n",
" - poleceń systemowych takich jak `apt install` czy `pip install` uruchomionych poprzez `run`. Patrz [przykład](https://docs.github.com/en/actions/using-github-hosted-runners/customizing-github-hosted-runners#installing-software-on-ubuntu-runners)"
]
},
{
"cell_type": "markdown",
"id": "28b582c4",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Akcje\n",
"Za pomocą polecenia `uses` możemy używać przygotowanych wcześniej akcji. Mogą one pochodzić:\n",
" - z tego samego repozytorium co workflow ([więcej](https://docs.github.com/en/actions/learn-github-actions/finding-and-customizing-actions#referencing-an-action-in-the-same-repository-where-a-workflow-file-uses-the-action))\n",
" - z dowolnego publicznego repozytorium Github (np. [repozytorioum iterative/setup-clm](https://github.com/iterative/setup-cml), patrz przykład poniżej\n",
" - z [Github Marketplace](https://github.com/marketplace?type=actions)"
]
},
{
"cell_type": "markdown",
"id": "a764cc0d",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Akcje wykonywane w kontenerze Docker\n",
"Akcja może być wywołana w kontenerze Docker (pobranym z Docker hub albo zbudowanym z Dockerfile)\n",
"W tym celu należy stworzyć własną akcję w pliku action.yaml i potem użyć jej w Workflow"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "ff4dab8c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting action.yml\n"
]
}
],
"source": [
"%%writefile action.yml\n",
"name: 'Hello World'\n",
"description: 'Greet someone and record the time'\n",
"inputs:\n",
" who-to-greet: # id of input\n",
" description: 'Who to greet'\n",
" required: true\n",
" default: 'World'\n",
"outputs:\n",
" time: # id of output\n",
" description: 'The time we greeted you'\n",
"runs:\n",
" using: 'docker'\n",
" image: 'Dockerfile'\n",
" args:\n",
" - ${{ inputs.who-to-greet }}"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "f1aaff7c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting Dockerfile\n"
]
}
],
"source": [
"%%writefile Dockerfile\n",
"# Container image that runs your code\n",
"FROM ubuntu:latest\n",
" \n",
"RUN apt update && apt install -y figlet\n",
"\n",
"# Copies your code file from your action repository to the filesystem path `/` of the container\n",
"COPY entrypoint.sh /entrypoint.sh\n",
"\n",
"VOLUME /github/workspace/\n",
"\n",
"WORKDIR /github/workspace/\n",
"\n",
"# Code file to execute when the docker container starts up (`entrypoint.sh`)\n",
"ENTRYPOINT [\"/entrypoint.sh\"]"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "7f778025",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting entrypoint.sh\n"
]
}
],
"source": [
"%%writefile entrypoint.sh\n",
"#!/bin/sh -l\n",
"\n",
"figlet \"Hello $1\" | tee figlet.txt\n",
"echo \"Entrypoint invoked in: $PWD\"\n",
"readlink -f figlet.txt\n",
"time=$(date)\n",
"echo \"time=$time\" >> $GITHUB_OUTPUT"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "911975de",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"!chmod +x entrypoint.sh"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "483e0498",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/docker.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/docker.yml\n",
"name: github-actions-hello-docker\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Who to greet' \n",
" required: true\n",
" default: 'World'\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Use docker action\n",
" id: hello\n",
" uses: ./\n",
" with:\n",
" who-to-greet: \"${{ github.event.inputs.input_text }}\"\n",
" # Use the output from the `hello` step\n",
" - name: Get the output time\n",
" run: echo \"The time was ${{ steps.hello.outputs.time }}\"\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "bc24dff3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main 22a5094] Fix path\n",
" 1 file changed, 1 insertion(+)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (5/5), done.\n",
"Writing objects: 100% (5/5), 570 bytes | 570.00 KiB/s, done.\n",
"Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 97c7272..22a5094 main -> main\n"
]
}
],
"source": [
"!git add .github entrypoint.sh Dockerfile\n",
"!git commit -m \"Fix path\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "12af9d1b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Archiwizowanie artefaktów\n",
"https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts\n",
"\n",
"Do archiwizowania artefaktów służy akcja \"upload-artifact\":\n",
"\n",
"```yaml\n",
" - name: Archive artifacts\n",
" uses: actions/upload-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" path: figlet.txt\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "245f7c8a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/docker-artifact.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/docker-artifact.yml\n",
"name: github-actions-hello-docker-artifact\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Who to greet' \n",
" required: true\n",
" default: 'World'\n",
"jobs:\n",
" hello-job:\n",
" name: \"Do all the hard stuff\"\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Use docker action\n",
" id: hello\n",
" uses: ./\n",
" with:\n",
" who-to-greet: \"${{ github.event.inputs.input_text }}\"\n",
" # Use the output from the `hello` step\n",
" - name: Get the output time\n",
" run: echo \"The time was ${{ steps.hello.outputs.time }}\" > time.txt\n",
" - name: Archive artifacts\n",
" uses: actions/upload-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" path: |\n",
" figlet.txt\n",
" time.txt\n",
" publish:\n",
" name: \"Publish as github comment\"\n",
" runs-on: ubuntu-latest\n",
" needs: hello-job\n",
" steps:\n",
" - uses: actions/checkout@v3\n",
" #We need to download the artifact first, jobs do not share workflow files\n",
" - name: get-artifact \n",
" uses: actions/download-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" - name: display_artifact_contents\n",
" run:\n",
" cat time.txt ; tr ' ' '#' < figlet.txt\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "47e301f9",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main 5a40228] Archive in one job, use in other\n",
" 1 file changed, 1 insertion(+)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (5/5), done.\n",
"Writing objects: 100% (5/5), 622 bytes | 622.00 KiB/s, done.\n",
"Total 5 (delta 2), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 4df6dc0..5a40228 main -> main\n"
]
}
],
"source": [
"!git add -u\n",
"!git commit -m \"Archive in one job, use in other\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "805622e8",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## CML - Continous Machine Learning\n",
"<img src=\"img/expcontrol/cml.png\">"
]
},
{
"cell_type": "markdown",
"id": "e0b3acbf",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - Tworzone przez [iterative.ai](iterative.ai) (tak jak DVC)\n",
" - https://cml.dev/\n",
" - Dokumentacja: https://dvc.org/doc/cml\n",
" - Korzysta z Github Actions lub Gitlab CI (a także [Bitbucket Pipelines](https://github.com/iterative/cml/wiki/CML-with-Bitbucket-Cloud))\n",
" - CML dodaje do Github Actions kilka \"akcji\":\n",
" - `iterative/setup-cml` - dodaje poniższe akcje\n",
" - `cml-send-comment` - dodaje raport CML jako komentarz do Pull Requesta na Githubie\n",
" - `cml-send-github-check` - dodaje raport CML do zakładki \"Checks\" Pull Requesta na Githubie\n",
" - `cml-publish` - umożliwia dodanie obrazka do raportu\n",
" "
]
},
{
"cell_type": "markdown",
"id": "cdb54b38",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Przykładowy Workflow CML:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "07b1035a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/tomek/AITech/repo/aitech-ium-private/IUM_11\n",
"Cloning into 'example_cml'...\n",
"remote: Enumerating objects: 25, done.\u001b[K\n",
"remote: Total 25 (delta 0), reused 0 (delta 0), pack-reused 25\u001b[K\n",
"Receiving objects: 100% (25/25), 222.95 KiB | 920.00 KiB/s, done.\n",
"Resolving deltas: 100% (6/6), done.\n"
]
}
],
"source": [
"!git clone git@github.com:TomekZet/example_cml.git"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bf27a2b3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/tomek/AITech/repo/aitech-ium-private/IUM_11/example_cml\n"
]
}
],
"source": [
"%cd example_cml\n",
"!mkdir -p .github/workflows/"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "64f6e21d",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/cml.yaml\n"
]
}
],
"source": [
"%%writefile .github/workflows/cml.yaml\n",
"name: model-training\n",
"on: [push]\n",
"jobs:\n",
" run:\n",
" runs-on: [ubuntu-latest]\n",
" steps:\n",
" - uses: actions/checkout@v2\n",
" - uses: actions/setup-python@v2\n",
" - uses: iterative/setup-cml@v1\n",
" - name: Train model\n",
" env:\n",
" REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n",
" run: |\n",
" pip install -r requirements.txt\n",
" python train.py\n",
"\n",
" cat metrics.txt >> report.md\n",
" cml-publish confusion_matrix.png --md >> report.md\n",
" cml-send-comment report.md"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83e49d3b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# %load train.py\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import plot_confusion_matrix\n",
"import matplotlib.pyplot as plt\n",
"import json\n",
"import os\n",
"import numpy as np\n",
"\n",
"# Read in data\n",
"X_train = np.genfromtxt(\"data/train_features.csv\")\n",
"y_train = np.genfromtxt(\"data/train_labels.csv\")\n",
"X_test = np.genfromtxt(\"data/test_features.csv\")\n",
"y_test = np.genfromtxt(\"data/test_labels.csv\")\n",
"\n",
"\n",
"# Fit a model\n",
"depth = 2\n",
"clf = RandomForestClassifier(max_depth=depth)\n",
"clf.fit(X_train,y_train)\n",
"\n",
"acc = clf.score(X_test, y_test)\n",
"print(acc)\n",
"with open(\"metrics.txt\", 'w') as outfile:\n",
" outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n",
"\n",
"\n",
"# Plot it\n",
"disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n",
"plt.savefig('confusion_matrix.png')\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "8dc5748f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Wprowadźmy zmianę do pliku (linijka 17: `depth= = 6`)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "afeaf939",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting train.py\n"
]
}
],
"source": [
"%%writefile train.py\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import plot_confusion_matrix\n",
"import matplotlib.pyplot as plt\n",
"import json\n",
"import os\n",
"import numpy as np\n",
"\n",
"# Read in data\n",
"X_train = np.genfromtxt(\"data/train_features.csv\")\n",
"y_train = np.genfromtxt(\"data/train_labels.csv\")\n",
"X_test = np.genfromtxt(\"data/test_features.csv\")\n",
"y_test = np.genfromtxt(\"data/test_labels.csv\")\n",
"\n",
"\n",
"# Fit a model\n",
"depth = 6\n",
"clf = RandomForestClassifier(max_depth=depth)\n",
"clf.fit(X_train,y_train)\n",
"\n",
"acc = clf.score(X_test, y_test)\n",
"print(acc)\n",
"with open(\"metrics.txt\", 'w') as outfile:\n",
" outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n",
"\n",
"\n",
"# Plot it\n",
"disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n",
"plt.savefig('confusion_matrix.png')"
]
},
{
"cell_type": "markdown",
"id": "3e4a711a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Stwórzmy nowy branch \"deep_depth\":"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ab019b0b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Switched to a new branch 'deep_depth'\n",
"[deep_depth 0df0f2c] Changed depth and added cml workflow\n",
" 2 files changed, 19 insertions(+), 2 deletions(-)\n",
" create mode 100644 .github/workflows/cml.yaml\n",
"Enumerating objects: 8, done.\n",
"Counting objects: 100% (8/8), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (4/4), done.\n",
"Writing objects: 100% (6/6), 738 bytes | 738.00 KiB/s, done.\n",
"Total 6 (delta 2), reused 0 (delta 0)\n",
"remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n",
"remote: \n",
"remote: Create a pull request for 'deep_depth' on GitHub by visiting:\u001b[K\n",
"remote: https://github.com/TomekZet/example_cml/pull/new/deep_depth\u001b[K\n",
"remote: \n",
"To github.com:TomekZet/example_cml.git\n",
" * [new branch] deep_depth -> deep_depth\n"
]
}
],
"source": [
"!git checkout -b deep_depth\n",
"!git add train.py .github/workflows/cml.yaml\n",
"!git commit -m \"Changed depth and added cml workflow\"\n",
"!git push origin deep_depth"
]
},
{
"cell_type": "markdown",
"id": "b50f46a8",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<img src=\"IUM_11/img/github-pr.png\">"
]
},
{
"cell_type": "markdown",
"id": "c56c8785",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<img src=\"IUM_11/img/github-checks.png\">"
]
},
{
"cell_type": "markdown",
"id": "fb25c587",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Zadania [20 pkt] (termin: 05.06.2024)\n",
"1. Utwórz konto na Github (jeśli jeszcze nie masz)\n",
"2. Stwórz publiczne repozytorium. Link do niego wklej do kolumny \"Link Github\" w arkuszu [\"Zapisy na zbiory\"](https://teams.microsoft.com/l/file/F62B5988-A797-418D-B085-52E0AF8BD55E?tenantId=73689ee1-b42f-4e25-a5f6-66d1f29bc092&fileType=xlsx&objectUrl=https%3A%2F%2Fuam.sharepoint.com%2Fsites%2F2021SL06-DIUMUI0LABInynieriauczeniamaszynowego-Grupa11%2FShared%20Documents%2FGeneral%2FZapisy%20na%20zbiory.xlsx&baseUrl=https%3A%2F%2Fuam.sharepoint.com%2Fsites%2F2021SL06-DIUMUI0LABInynieriauczeniamaszynowego-Grupa11&serviceName=teams&threadId=19:d67b0dc2ee0849eba517a2aa8507df9c@thread.tacv2&groupId=8cd6b30e-edd9-48db-85ab-259fc11d0c5b) [1 pkt]\n",
"2. Stwórz prosty Github workflow który:\n",
" - zrobi checkout Twojego repozytorium [1 pkt]\n",
" - ściągnie pliki trenujące. Najlepiej byłoby to zrobić za pomocą DVC, ale tym razem uprośćmy zadanie ze względu na komplikacje, które mogą się pojawić przy konfiguracji uwierzytelniania. Pliki można po prostu dodać do repozytorium albo ściągnąć przez wget jeśli są publicznie dostępne [2 pkt]\n",
" - będzie wywoływalny przez \"Workflow dispatch\" z parametrami trenowania [2 pkt]\n",
" - składał się będzie z co najmniej 3 jobów:\n",
" 1. dokona trenowania jako osobnej akcji wykonanej w Dockerze [8 pkt]\n",
" 2. dokona ewaluacji modelu [6 pkt]\n",
" 3. zarchiwizuje plik z modelem"
]
}
],
"metadata": {
"author": "Tomasz Ziętkiewicz",
"celltoolbar": "Slideshow",
"email": "tomasz.zietkiewicz@amu.edu.pl",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"lang": "pl",
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"slideshow": {
"slide_type": "slide"
},
"subtitle": "11.CML[laboratoria]",
"title": "Inżynieria uczenia maszynowego",
"year": "2021"
},
"nbformat": 4,
"nbformat_minor": 5
}