520 lines
15 KiB
Plaintext
520 lines
15 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "80377b3b",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"![Logo 1](img/aitech-logotyp-1.jpg)\n",
|
||
|
"<div class=\"alert alert-block alert-info\">\n",
|
||
|
"<h1> Widzenie komputerowe </h1>\n",
|
||
|
"<h2> 09. <i>Metody głębokiego uczenia (1)</i> [laboratoria]</h2> \n",
|
||
|
"<h3>Andrzej Wójtowicz (2021)</h3>\n",
|
||
|
"</div>\n",
|
||
|
"\n",
|
||
|
"![Logo 2](img/aitech-logotyp-2.jpg)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "07159136",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"W poniższym materiale zobaczymy w jaki sposób korzystać z metod głębokiego uczenia sieci neuronowych w pakiecie OpenCV.\n",
|
||
|
"\n",
|
||
|
"Na początku załadujmy niezbędne biblioteki:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "b2e906f0",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import cv2 as cv\n",
|
||
|
"import numpy as np\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"%matplotlib inline"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f4348bc5",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"OpenCV wspiera [wiele](https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV) bibliotek i modeli sieci neuronowych. Modele trenuje się poza OpenCV - bibliotekę wykorzystuje się tylko do predykcji, aczkolwiek sama w sobie ma całkiem sporo możliwych optymalizacji w porównaniu do źródłowych bibliotek neuronowych, więc predykcja może być tutaj faktycznie szybsza.\n",
|
||
|
"\n",
|
||
|
"Pliki z modelami i danymi pomocniczymi będziemy pobierali z sieci i będziemy je zapisywali w katalogu `dnn`:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "42b85f55",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!mkdir -p dnn"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ac09b098",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Klasyfikacja obrazów\n",
|
||
|
"\n",
|
||
|
"Spróbujemy wykorzystać sieć do klasyfikacji obrazów wyuczonej na zbiorze [ImageNet](https://www.image-net.org/). Pobierzmy plik zawierający opis 1000 możliwych klas:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "85b1b68c",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!wget -q --show-progress -O dnn/classification_classes_ILSVRC2012.txt https://raw.githubusercontent.com/opencv/opencv/master/samples/data/dnn/classification_classes_ILSVRC2012.txt "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "fd0c577b",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Spójrzmy na pierwsze pięć klas w pliku:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "fb0d0546",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"with open('dnn/classification_classes_ILSVRC2012.txt', 'r') as f_fd:\n",
|
||
|
" classes = f_fd.read().splitlines()\n",
|
||
|
" \n",
|
||
|
"print(len(classes), classes[:5])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "5b0ee6ff",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Do klasyfikacji użyjemy sieci [DenseNet](https://arxiv.org/abs/1608.06993). Pobierzemy jedną z mniejszych [reimplementacji](https://github.com/shicai/DenseNet-Caffe), która jest hostowana m.in. na Google Drive (musimy doinstalować jeden pakiet):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "fb2bf2a1",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!pip3 install --user --disable-pip-version-check gdown"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "27996509",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import gdown\n",
|
||
|
"\n",
|
||
|
"url = 'https://drive.google.com/uc?id=0B7ubpZO7HnlCcHlfNmJkU2VPelE'\n",
|
||
|
"output = 'dnn/DenseNet_121.caffemodel'\n",
|
||
|
"gdown.download(url, output, quiet=False)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "648ec9c9",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!wget -q --show-progress -O dnn/DenseNet_121.prototxt https://raw.githubusercontent.com/shicai/DenseNet-Caffe/master/DenseNet_121.prototxt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "f7294c54",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Konkretne biblioteki neuronowe posiadają dedykowane funkcje do ładowania modeli, np. [`readNetFromCaffe()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga29d0ea5e52b1d1a6c2681e3f7d68473a) lub [`readNetFromTorch()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga65a1da76cb7d6852bdf7abbd96f19084), jednak można też użyć ogólnej [`readNet()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga3b34fe7a29494a6a4295c169a7d32422):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "6fd2d6b3",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"model = cv.dnn.readNet(model='dnn/DenseNet_121.prototxt', config='dnn/DenseNet_121.caffemodel', framework='Caffe')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "fe22fd6f",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Spróbujemy sklasyfikować poniższy obraz:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "6ace4606",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"image = cv.imread('img/flamingo.jpg')\n",
|
||
|
"plt.figure(figsize=[5,5])\n",
|
||
|
"plt.imshow(image[:,:,::-1]);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "e51db3ac",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Aby móc przepuścić obraz przez sieć musimy zmienić jego formę reprezentacji poprzez funkcję [`blobFromImage()`](https://docs.opencv.org/4.5.3/d6/d0f/group__dnn.html#ga29f34df9376379a603acd8df581ac8d7). Aby uzyskać sensowne dane musimy ustawić parametry dotyczące preprocessingu (informacje o tym są zawarte na [stronie modelu](https://github.com/shicai/DenseNet-Caffe)):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "d4e945ae",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"image_blob = cv.dnn.blobFromImage(image=image, scalefactor=0.017, size=(224, 224), mean=(104, 117, 123), \n",
|
||
|
" swapRB=False, crop=False)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "625aebdd",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Ustawiamy dane wejściowe w naszej sieci i pobieramy obliczone wartości:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "753333a1",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"model.setInput(image_blob)\n",
|
||
|
"outputs = model.forward()[0]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "34316ddb",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Wyliczamy która klasa jest najbardziej prawdopodobna:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "13423a6d",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"outputs = outputs.reshape(1000, 1)\n",
|
||
|
"\n",
|
||
|
"label_id = np.argmax(outputs)\n",
|
||
|
"\n",
|
||
|
"probs = np.exp(outputs) / np.sum(np.exp(outputs))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "874c1b1d",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Wynik:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "ec75a3c5",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"plt.imshow(image[:,:,::-1])\n",
|
||
|
"plt.title(classes[label_id])\n",
|
||
|
"print(\"{:.2f} %\".format(np.max(probs) * 100.0))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "3808c42c",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Wykrywanie twarzy\n",
|
||
|
"\n",
|
||
|
"Do wykrywania twarzy użyjemy sieci bazującej na [SSD](https://github.com/weiliu89/caffe/tree/ssd):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "3c0df387",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20180205_fp16/res10_300x300_ssd_iter_140000_fp16.caffemodel\n",
|
||
|
"!wget -q --show-progress -O dnn/res10_300x300_ssd_iter_140000_fp16.prototxt https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "c6142f6e",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Ładujemy model:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "60d41efb",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"model = cv.dnn.readNet(model='dnn/res10_300x300_ssd_iter_140000_fp16.prototxt', config='dnn/res10_300x300_ssd_iter_140000_fp16.caffemodel', framework='Caffe')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "ad612cc6",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Będziemy chcieli wykryć twarze na poniższym obrazie:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "b404d8c4",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"image = cv.imread('img/people.jpg')\n",
|
||
|
"plt.figure(figsize=[7,7])\n",
|
||
|
"plt.imshow(image[:,:,::-1]);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "a77f8e64",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Znajdujemy twarze i oznaczamy je na zdjęciu (za próg przyjęliśmy 0.5; zob. informacje o [preprocessingu](https://github.com/opencv/opencv/tree/master/samples/dnn#face-detection)):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "1d16f230",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"height, width, _ = image.shape\n",
|
||
|
"\n",
|
||
|
"image_blob = cv.dnn.blobFromImage(image, scalefactor=1.0, size=(300, 300), mean=[104, 177, 123], \n",
|
||
|
" swapRB=False, crop=False)\n",
|
||
|
"\n",
|
||
|
"model.setInput(image_blob)\n",
|
||
|
"\n",
|
||
|
"detections = model.forward()\n",
|
||
|
"\n",
|
||
|
"image_out = image.copy()\n",
|
||
|
"\n",
|
||
|
"for i in range(detections.shape[2]):\n",
|
||
|
" confidence = detections[0, 0, i, 2]\n",
|
||
|
" if confidence > 0.5:\n",
|
||
|
"\n",
|
||
|
" box = detections[0, 0, i, 3:7] * np.array([width, height, width, height])\n",
|
||
|
" (x1, y1, x2, y2) = box.astype('int')\n",
|
||
|
"\n",
|
||
|
" cv.rectangle(image_out, (x1, y1), (x2, y2), (0, 255, 0), 6)\n",
|
||
|
" label = '{:.3f}'.format(confidence)\n",
|
||
|
" label_size, base_line = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 3.0, 1)\n",
|
||
|
" cv.rectangle(image_out, (x1, y1 - label_size[1]), (x1 + label_size[0], y1 + base_line), \n",
|
||
|
" (255, 255, 255), cv.FILLED)\n",
|
||
|
" cv.putText(image_out, label, (x1, y1), cv.FONT_HERSHEY_SIMPLEX, 3.0, (0, 0, 0))\n",
|
||
|
" \n",
|
||
|
"plt.figure(figsize=[12,12])\n",
|
||
|
"plt.imshow(image_out[:,:,::-1]);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "590841cd",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Punkty charakterystyczne twarzy\n",
|
||
|
"\n",
|
||
|
"W OpenCV jest możliwość wykrywania punktów charakterystycznych twarzy (ang. *facial landmarks*). Użyjemy zaimplementowanego [modelu](http://www.jiansun.org/papers/CVPR14_FaceAlignment.pdf) podczas Google Summer of Code przy użyciu [`createFacemarkLBF()`](https://docs.opencv.org/4.5.3/d4/d48/namespacecv_1_1face.html#a0bec73a729ed878430c2feb9ce65bc2a):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "8534a399",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!wget -q --show-progress -O dnn/lbfmodel.yaml https://raw.githubusercontent.com/kurnianggoro/GSOC2017/master/data/lbfmodel.yaml"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "c2971f10",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"landmark_detector = cv.face.createFacemarkLBF()\n",
|
||
|
"landmark_detector.loadModel('dnn/lbfmodel.yaml')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "761dbc15",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Ograniczamy nasze poszukiwania do twarzy:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "39215601",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"faces = []\n",
|
||
|
"\n",
|
||
|
"for detection in detections[0][0]:\n",
|
||
|
" if detection[2] >= 0.5:\n",
|
||
|
" left = detection[3] * width\n",
|
||
|
" top = detection[4] * height\n",
|
||
|
" right = detection[5] * width\n",
|
||
|
" bottom = detection[6] * height\n",
|
||
|
"\n",
|
||
|
" face_w = right - left\n",
|
||
|
" face_h = bottom - top\n",
|
||
|
"\n",
|
||
|
" face_roi = (left, top, face_w, face_h)\n",
|
||
|
" faces.append(face_roi)\n",
|
||
|
"\n",
|
||
|
"faces = np.array(faces).astype(int)\n",
|
||
|
"\n",
|
||
|
"_, landmarks_list = landmark_detector.fit(image, faces)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "56aa90c9",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Model generuje 68 punktów charakterycznych, które możemy zwizualizować:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "6d3ab726",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"image_display = image.copy()\n",
|
||
|
"landmarks = landmarks_list[0][0].astype(int)\n",
|
||
|
"\n",
|
||
|
"for idx, landmark in enumerate(landmarks):\n",
|
||
|
" cv.circle(image_display, landmark, 2, (0,255,255), -1)\n",
|
||
|
" cv.putText(image_display, str(idx), landmark, cv.FONT_HERSHEY_SIMPLEX, 0.35, (0, 255, 0), 1, \n",
|
||
|
" cv.LINE_AA)\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(10,10))\n",
|
||
|
"plt.imshow(image_display[700:1050,500:910,::-1]);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "7cee8969",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Jeśli nie potrzebujemy numeracji, to możemy użyć prostszego podejścia, tj. funkcji [`drawFacemarks()`](https://docs.opencv.org/4.5.3/db/d7c/group__face.html#ga318d9669d5ed4dfc6ab9fae2715310f5):"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"id": "1039e253",
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"image_display = image.copy()\n",
|
||
|
"for landmarks_set in landmarks_list:\n",
|
||
|
" cv.face.drawFacemarks(image_display, landmarks_set, (0, 255, 0))\n",
|
||
|
"\n",
|
||
|
"plt.figure(figsize=(10,10))\n",
|
||
|
"plt.imshow(image_display[500:1050,500:1610,::-1]);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"id": "db16a1bf",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Zadanie 1\n",
|
||
|
"\n",
|
||
|
"W katalogu `vid` znajdują się filmy `blinking-*.mp4`. Napisz program do wykrywania mrugnięć. Opcjonalnie możesz użyć *eye aspect ratio* z [tego artykułu](http://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf) lub zaproponować własne rozwiązanie."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"author": "Andrzej Wójtowicz",
|
||
|
"email": "andre@amu.edu.pl",
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"lang": "pl",
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.3"
|
||
|
},
|
||
|
"subtitle": "09. Wykrywanie i rozpoznawanie tekstu [laboratoria]",
|
||
|
"title": "Widzenie komputerowe",
|
||
|
"year": "2021"
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 5
|
||
|
}
|