{
"cells": [
{
"cell_type": "markdown",
"id": "9d06fc91",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Inżynieria uczenia maszynowego\n",
"### 29 maja 2024\n",
"# 11. GitHub Actions"
]
},
{
"cell_type": "markdown",
"id": "beeb17b2",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "752995e1",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - https://docs.github.com/en/actions\n",
" - System ciągłej integracji „wbudowany” w GitHub\n",
" - Darmowy dla publicznych repozytoriów (z większymi niż w płatnych planach [ograniczeniami dotyczącymi zasobów](https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration#usage-limits))\n",
" - https://youtu.be/cP0I9w2coGU"
]
},
{
"cell_type": "markdown",
"id": "b66dd41f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Terminologia GitHub Actions\n",
" - ***Workflow*** odpowiada *pipeline*'owi z Jenkinsa.\n",
" - ***Event*** to zdarzenie, które uruchamia/wyzwala (*triggers*) *workflow*. Np. wypchnięcie zmiany do repozytorium (*push*), utworzenie pull requestu ([pełna lista tutaj](https://docs.github.com/en/actions/reference/events-that-trigger-workflows)).\n",
" - ***Job*** - zadanie. Workflow składa się z jednego lub kilku zadań (*jobs*). Każde z nich może być wykonywane równolegle na innej maszynie (patrz *runner*).\n",
" - ***Step*** (krok) - odpowiednik *stage* z Jenkinsa - służy do grupowania *actions*.\n",
" - ***Action/command*** (akcja/polecenie) - odpowiednik *step* z Jenkinsa - pojedyncze polecenie do wykonania, np. dodanie komentarza do pull requestu, wykonanie polecenia systemowego itp.\n",
" - ***Runner*** (wykonawca) - odpowiednik jenkinsowego *agent* - serwer, na którym mogą być wykonywane zadania (*jobs*):\n",
" - *GitHub-hosted runner* - serwer utrzymywany przez GitHub (2-core CPU, 7 GB RAM, 14 GB SSD). Windows, Linux albo macOS.\n",
" - *Self-hosted runner* - własny serwer, z zainstalowaną aplikacją [GitHub Actions Runner](https://github.com/actions/runner)."
]
},
{
"cell_type": "markdown",
"id": "9f1f6d0a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Definicja *workflow*\n",
" - *Workflow* definiuje się w plikach YAML (o rozszerzeniu `*.yml` albo `*.yaml`) umieszczonych w specjalnym folderze `.github/workflows/` wewnątrz repozytorium.\n",
" - Pełna składnia jest opisana [tutaj](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions).\n",
" - Podstawowe pola:\n",
" - `name` (opcjonalne) - nazwa, pod którą *workflow*/*step* będzie widoczny w UI. Domyślnie: ścieżka do pliku YAML.\n",
" - `on` - definiuje, kiedy workflow ma być uruchomiony.\n",
" - `jobs` - grupuje razem zadania (*jobs*) do wykonania. Każde może być wykonane na innym „wykonawcy” (*runner*). Domyślnie wykonywane są równolegle (ale możemy definiować [zależności między jobami](https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idneeds), co powoduje wykonanie ich sekwencyjnie).\n",
" - `runs-on` - parametr zadania (*job*) definiujący, na jakiej maszynie wirtualnej ma być uruchomiony (np. `ubuntu-latest`).\n",
" - `uses` - umożliwia użycie gotowych akcji zdefiniowanych przez nas albo przez innych użytkowników, np. `-uses: actions/checkout@v2` spowoduje *checkout* plików z repozytorium.\n",
" - `run` - pozwala uruchomić dowolne ([dostępne/zainstalowane](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#preinstalled-software)) polecenie, np. `python3 train.py`\n",
" - `env` - pozwala zdefiniować zmienne środowiskowe dostępne dla akcji lub skorzystać ze [zmiennych ustawionych przez Github](https://docs.github.com/en/actions/reference/environment-variables#default-environment-variables)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "f4916c1f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/pawel/ium/IUM_11/github-actions-hello\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/pawel/ium/venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.\n",
" self.shell.db['dhist'] = compress_dhist(dhist)[-100:]\n"
]
}
],
"source": [
"!mkdir -p IUM_11/github-actions-hello\n",
"%cd IUM_11/github-actions-hello\n",
"!mkdir -p .github/workflows"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "88ce689f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initialized empty Git repository in /home/pawel/ium/IUM_11/github-actions-hello/.git/\n",
"Switched to a new branch 'main'\n"
]
}
],
"source": [
"!git init\n",
"!git checkout -b main\n",
"!git remote add origin git@github.com:USERNAME/ium-ga-hello.git\n",
"!git push -u origin main"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "dde8d432",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing .github/workflows/workflow.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/workflow.yml\n",
"name: github-actions-hello\n",
"on: [push]\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Setup Python\n",
" uses: actions/setup-python@v2.2.2\n",
" with:\n",
" python-version: '3.10'\n",
" - run: python3 --version"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "ff1e011e",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"On branch main\n",
"Your branch is up to date with 'origin/main'.\n",
"\n",
"nothing to commit, working tree clean\n",
"Everything up-to-date\n"
]
}
],
"source": [
"!git add .github/workflows/workflow.yml\n",
"!git commit -m \"Github Actions Workflow\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "3e237076",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Zakładka *Actions* na stronie repozytorium:\n",
"https://github.com/skorzewski/ium-ga-hello/actions"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "32701383",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 24\n",
"drwxr-xr-x 2 pawel pawel 4096 May 28 10:10 .\n",
"drwxr-xr-x 3 pawel pawel 4096 May 28 10:10 ..\n",
"-rw-r--r-- 1 pawel pawel 1451 May 28 10:10 docker-artifact.yml\n",
"-rw-r--r-- 1 pawel pawel 882 May 28 10:10 docker.yml\n",
"-rw-r--r-- 1 pawel pawel 603 May 28 10:10 parametrized.yml\n",
"-rw-r--r-- 1 pawel pawel 306 May 28 10:10 workflow.yml\n"
]
}
],
"source": [
"!ls -al .github/workflows"
]
},
{
"cell_type": "markdown",
"id": "1c01acb5",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Ręczne wywoływanie\n",
"Workflow można również wywołać ręcznie, podając parametry.\n",
"Więcej informacji np. tutaj: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "a7250bf7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/parametrized.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/parametrized.yml\n",
"name: github-actions-hello-parametrized\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Text to display' \n",
" required: true\n",
" default: 'Hello World'\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Install dependencies\n",
" run:\n",
" sudo apt update;\n",
" sudo apt install -y figlet\n",
" - name: Write\n",
" run:\n",
" figlet \"${{ github.event.inputs.input_text }}\""
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "36ddaac0",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main a98938d] just dispatch\n",
" 1 file changed, 6 deletions(-)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (3/3), done.\n",
"Writing objects: 100% (5/5), 411 bytes | 411.00 KiB/s, done.\n",
"Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 6c4a361..a98938d main -> main\n"
]
}
],
"source": [
"!git add -u .github/workflows\n",
"!git commit -m \"just dispatch\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "ed780dea",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Zależności\n",
"\n",
"Maszyny wirtualne (*runners*), na których uruchamiane są zadania, mają zainstalowany zbiór narzędzi. Przykładowa lista dla Ubuntu 24.04: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md\n",
"\n",
"Brakujące zależności można zainstalować, korzystając z:\n",
" - akcji\n",
" - poleceń systemowych takich jak `apt install` czy `pip install` uruchomionych poprzez `run`. Patrz [przykład](https://docs.github.com/en/actions/using-github-hosted-runners/customizing-github-hosted-runners#installing-software-on-ubuntu-runners)"
]
},
{
"cell_type": "markdown",
"id": "28b582c4",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Akcje\n",
"Za pomocą polecenia `uses` możemy używać przygotowanych wcześniej akcji. Mogą one pochodzić:\n",
" - z tego samego repozytorium co workflow ([więcej](https://docs.github.com/en/actions/learn-github-actions/finding-and-customizing-actions#referencing-an-action-in-the-same-repository-where-a-workflow-file-uses-the-action))\n",
" - z dowolnego publicznego repozytorium Github (np. [repozytorioum iterative/setup-clm](https://github.com/iterative/setup-cml), patrz przykład poniżej\n",
" - z [Github Marketplace](https://github.com/marketplace?type=actions)"
]
},
{
"cell_type": "markdown",
"id": "a764cc0d",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Akcje wykonywane w kontenerze Docker\n",
"Akcja może być wywołana w kontenerze Docker (pobranym z Docker Hub albo zbudowanym z `Dockerfile`).\n",
"W tym celu należy stworzyć własną akcję w pliku `action.yaml` i potem użyć jej w *workflow*."
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "ff4dab8c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting action.yml\n"
]
}
],
"source": [
"%%writefile action.yml\n",
"name: 'Hello World'\n",
"description: 'Greet someone and record the time'\n",
"inputs:\n",
" who-to-greet: # id of input\n",
" description: 'Who to greet'\n",
" required: true\n",
" default: 'World'\n",
"outputs:\n",
" time: # id of output\n",
" description: 'The time we greeted you'\n",
"runs:\n",
" using: 'docker'\n",
" image: 'Dockerfile'\n",
" args:\n",
" - ${{ inputs.who-to-greet }}"
]
},
{
"cell_type": "code",
"execution_count": 80,
"id": "f1aaff7c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting Dockerfile\n"
]
}
],
"source": [
"%%writefile Dockerfile\n",
"# Container image that runs your code\n",
"FROM ubuntu:latest\n",
" \n",
"RUN apt update && apt install -y figlet\n",
"\n",
"# Copies your code file from your action repository to the filesystem path `/` of the container\n",
"COPY entrypoint.sh /entrypoint.sh\n",
"\n",
"VOLUME /github/workspace/\n",
"\n",
"WORKDIR /github/workspace/\n",
"\n",
"# Code file to execute when the docker container starts up (`entrypoint.sh`)\n",
"ENTRYPOINT [\"/entrypoint.sh\"]"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "7f778025",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting entrypoint.sh\n"
]
}
],
"source": [
"%%writefile entrypoint.sh\n",
"#!/bin/sh -l\n",
"\n",
"figlet \"Hello $1\" | tee figlet.txt\n",
"echo \"Entrypoint invoked in: $PWD\"\n",
"readlink -f figlet.txt\n",
"time=$(date)\n",
"echo \"time=$time\" >> $GITHUB_OUTPUT"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "911975de",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"!chmod +x entrypoint.sh"
]
},
{
"cell_type": "code",
"execution_count": 62,
"id": "483e0498",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/docker.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/docker.yml\n",
"name: github-actions-hello-docker\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Who to greet' \n",
" required: true\n",
" default: 'World'\n",
"jobs:\n",
" hello-job:\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Use docker action\n",
" id: hello\n",
" uses: ./\n",
" with:\n",
" who-to-greet: \"${{ github.event.inputs.input_text }}\"\n",
" # Use the output from the `hello` step\n",
" - name: Get the output time\n",
" run: echo \"The time was ${{ steps.hello.outputs.time }}\"\n",
" \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 63,
"id": "bc24dff3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main 22a5094] Fix path\n",
" 1 file changed, 1 insertion(+)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (5/5), done.\n",
"Writing objects: 100% (5/5), 570 bytes | 570.00 KiB/s, done.\n",
"Total 5 (delta 1), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (1/1), completed with 1 local object.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 97c7272..22a5094 main -> main\n"
]
}
],
"source": [
"!git add .github entrypoint.sh Dockerfile\n",
"!git commit -m \"Fix path\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "12af9d1b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Archiwizowanie artefaktów\n",
"https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts\n",
"\n",
"Do archiwizowania artefaktów służy akcja \"upload-artifact\":\n",
"\n",
"```yaml\n",
" - name: Archive artifacts\n",
" uses: actions/upload-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" path: figlet.txt\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "245f7c8a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/docker-artifact.yml\n"
]
}
],
"source": [
"%%writefile .github/workflows/docker-artifact.yml\n",
"name: github-actions-hello-docker-artifact\n",
"on: \n",
" workflow_dispatch:\n",
" inputs:\n",
" input_text:\n",
" description: 'Who to greet' \n",
" required: true\n",
" default: 'World'\n",
"jobs:\n",
" hello-job:\n",
" name: \"Do all the hard stuff\"\n",
" runs-on: ubuntu-latest\n",
" steps:\n",
" - name: Checkout repo\n",
" uses: actions/checkout@v2\n",
" - name: Use docker action\n",
" id: hello\n",
" uses: ./\n",
" with:\n",
" who-to-greet: \"${{ github.event.inputs.input_text }}\"\n",
" # Use the output from the `hello` step\n",
" - name: Get the output time\n",
" run: echo \"The time was ${{ steps.hello.outputs.time }}\" > time.txt\n",
" - name: Archive artifacts\n",
" uses: actions/upload-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" path: |\n",
" figlet.txt\n",
" time.txt\n",
" publish:\n",
" name: \"Publish as github comment\"\n",
" runs-on: ubuntu-latest\n",
" needs: hello-job\n",
" steps:\n",
" - uses: actions/checkout@v3\n",
" #We need to download the artifact first, jobs do not share workflow files\n",
" - name: get-artifact \n",
" uses: actions/download-artifact@v3\n",
" with:\n",
" name: figlet-output\n",
" - name: display_artifact_contents\n",
" run:\n",
" cat time.txt ; tr ' ' '#' < figlet.txt\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "47e301f9",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[main 5a40228] Archive in one job, use in other\n",
" 1 file changed, 1 insertion(+)\n",
"Enumerating objects: 9, done.\n",
"Counting objects: 100% (9/9), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (5/5), done.\n",
"Writing objects: 100% (5/5), 622 bytes | 622.00 KiB/s, done.\n",
"Total 5 (delta 2), reused 0 (delta 0), pack-reused 0\n",
"remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n",
"To github.com:TomekZet/ium-ga-hello.git\n",
" 4df6dc0..5a40228 main -> main\n"
]
}
],
"source": [
"!git add -u\n",
"!git commit -m \"Archive in one job, use in other\"\n",
"!git push"
]
},
{
"cell_type": "markdown",
"id": "805622e8",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## CML - Continous Machine Learning"
]
},
{
"cell_type": "markdown",
"id": "e0b3acbf",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
" - Tworzone przez [iterative.ai](iterative.ai) (tak jak DVC)\n",
" - https://cml.dev/\n",
" - Dokumentacja: https://dvc.org/doc/cml\n",
" - Korzysta z Github Actions lub Gitlab CI (a także [Bitbucket Pipelines](https://github.com/iterative/cml/wiki/CML-with-Bitbucket-Cloud))\n",
" - CML dodaje do Github Actions kilka \"akcji\":\n",
" - `iterative/setup-cml` - dodaje poniższe akcje\n",
" - `cml-send-comment` - dodaje raport CML jako komentarz do Pull Requesta na Githubie\n",
" - `cml-send-github-check` - dodaje raport CML do zakładki \"Checks\" Pull Requesta na Githubie\n",
" - `cml-publish` - umożliwia dodanie obrazka do raportu\n",
" "
]
},
{
"cell_type": "markdown",
"id": "cdb54b38",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Przykładowy Workflow CML:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "07b1035a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/tomek/AITech/repo/aitech-ium-private/IUM_11\n",
"Cloning into 'example_cml'...\n",
"remote: Enumerating objects: 25, done.\u001b[K\n",
"remote: Total 25 (delta 0), reused 0 (delta 0), pack-reused 25\u001b[K\n",
"Receiving objects: 100% (25/25), 222.95 KiB | 920.00 KiB/s, done.\n",
"Resolving deltas: 100% (6/6), done.\n"
]
}
],
"source": [
"!git clone git@github.com:TomekZet/example_cml.git"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "bf27a2b3",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/home/tomek/AITech/repo/aitech-ium-private/IUM_11/example_cml\n"
]
}
],
"source": [
"%cd example_cml\n",
"!mkdir -p .github/workflows/"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "64f6e21d",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting .github/workflows/cml.yaml\n"
]
}
],
"source": [
"%%writefile .github/workflows/cml.yaml\n",
"name: model-training\n",
"on: [push]\n",
"jobs:\n",
" run:\n",
" runs-on: [ubuntu-latest]\n",
" steps:\n",
" - uses: actions/checkout@v2\n",
" - uses: actions/setup-python@v2\n",
" - uses: iterative/setup-cml@v1\n",
" - name: Train model\n",
" env:\n",
" REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n",
" run: |\n",
" pip install -r requirements.txt\n",
" python train.py\n",
"\n",
" cat metrics.txt >> report.md\n",
" cml-publish confusion_matrix.png --md >> report.md\n",
" cml-send-comment report.md"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "83e49d3b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# %load train.py\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import plot_confusion_matrix\n",
"import matplotlib.pyplot as plt\n",
"import json\n",
"import os\n",
"import numpy as np\n",
"\n",
"# Read in data\n",
"X_train = np.genfromtxt(\"data/train_features.csv\")\n",
"y_train = np.genfromtxt(\"data/train_labels.csv\")\n",
"X_test = np.genfromtxt(\"data/test_features.csv\")\n",
"y_test = np.genfromtxt(\"data/test_labels.csv\")\n",
"\n",
"\n",
"# Fit a model\n",
"depth = 2\n",
"clf = RandomForestClassifier(max_depth=depth)\n",
"clf.fit(X_train,y_train)\n",
"\n",
"acc = clf.score(X_test, y_test)\n",
"print(acc)\n",
"with open(\"metrics.txt\", 'w') as outfile:\n",
" outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n",
"\n",
"\n",
"# Plot it\n",
"disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n",
"plt.savefig('confusion_matrix.png')\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "8dc5748f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Wprowadźmy zmianę do pliku (linijka 17: `depth= = 6`)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "afeaf939",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting train.py\n"
]
}
],
"source": [
"%%writefile train.py\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.metrics import plot_confusion_matrix\n",
"import matplotlib.pyplot as plt\n",
"import json\n",
"import os\n",
"import numpy as np\n",
"\n",
"# Read in data\n",
"X_train = np.genfromtxt(\"data/train_features.csv\")\n",
"y_train = np.genfromtxt(\"data/train_labels.csv\")\n",
"X_test = np.genfromtxt(\"data/test_features.csv\")\n",
"y_test = np.genfromtxt(\"data/test_labels.csv\")\n",
"\n",
"\n",
"# Fit a model\n",
"depth = 6\n",
"clf = RandomForestClassifier(max_depth=depth)\n",
"clf.fit(X_train,y_train)\n",
"\n",
"acc = clf.score(X_test, y_test)\n",
"print(acc)\n",
"with open(\"metrics.txt\", 'w') as outfile:\n",
" outfile.write(\"Accuracy: \" + str(acc) + \"\\n\")\n",
"\n",
"\n",
"# Plot it\n",
"disp = plot_confusion_matrix(clf, X_test, y_test, normalize='true',cmap=plt.cm.Blues)\n",
"plt.savefig('confusion_matrix.png')"
]
},
{
"cell_type": "markdown",
"id": "3e4a711a",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Stwórzmy nowy branch \"deep_depth\":"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "ab019b0b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Switched to a new branch 'deep_depth'\n",
"[deep_depth 0df0f2c] Changed depth and added cml workflow\n",
" 2 files changed, 19 insertions(+), 2 deletions(-)\n",
" create mode 100644 .github/workflows/cml.yaml\n",
"Enumerating objects: 8, done.\n",
"Counting objects: 100% (8/8), done.\n",
"Delta compression using up to 4 threads\n",
"Compressing objects: 100% (4/4), done.\n",
"Writing objects: 100% (6/6), 738 bytes | 738.00 KiB/s, done.\n",
"Total 6 (delta 2), reused 0 (delta 0)\n",
"remote: Resolving deltas: 100% (2/2), completed with 2 local objects.\u001b[K\n",
"remote: \n",
"remote: Create a pull request for 'deep_depth' on GitHub by visiting:\u001b[K\n",
"remote: https://github.com/TomekZet/example_cml/pull/new/deep_depth\u001b[K\n",
"remote: \n",
"To github.com:TomekZet/example_cml.git\n",
" * [new branch] deep_depth -> deep_depth\n"
]
}
],
"source": [
"!git checkout -b deep_depth\n",
"!git add train.py .github/workflows/cml.yaml\n",
"!git commit -m \"Changed depth and added cml workflow\"\n",
"!git push origin deep_depth"
]
},
{
"cell_type": "markdown",
"id": "b50f46a8",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "c56c8785",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "fb25c587",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Zadania [20 pkt] (termin: 5 czerwca 2024)\n",
"1. Utwórz konto w serwisie GitHub (jeśli jeszcze nie masz)\n",
"2. Stwórz publiczne repozytorium. Link do niego wklej do kolumny *Link GitHub (Actions)* w arkuszu `IUM-2024.xlsx` [1 pkt]\n",
"3. Stwórz prosty *GitHub workflow*, który:\n",
" - zrobi checkout Twojego repozytorium [1 pkt]\n",
" - pobierze pliki z danymi uczącymi (pliki można po prostu dodać do repozytorium albo pobrać przez `wget` jeśli są publicznie dostępne) [2 pkt]\n",
" - będzie wywoływalny przez \"Workflow dispatch\" z parametrami uczenia [2 pkt]\n",
" - będzie się składał z co najmniej 2 zadań (*job*):\n",
" - uczenie modelu jako osobna akcja wykonana w Dockerze [8 pkt]\n",
" - ewaluacja modelu [6 pkt]"
]
}
],
"metadata": {
"author": "Tomasz Ziętkiewicz",
"celltoolbar": "Slideshow",
"email": "tomasz.zietkiewicz@amu.edu.pl",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"lang": "pl",
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"slideshow": {
"slide_type": "slide"
},
"subtitle": "11.CML[laboratoria]",
"title": "Inżynieria uczenia maszynowego",
"year": "2021"
},
"nbformat": 4,
"nbformat_minor": 5
}