460 lines
14 KiB
Plaintext
460 lines
14 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "909d3c02",
|
|
"metadata": {},
|
|
"source": [
|
|
"![Logo 1](img/aitech-logotyp-1.jpg)\n",
|
|
"<div class=\"alert alert-block alert-info\">\n",
|
|
"<h1> Widzenie komputerowe </h1>\n",
|
|
"<h2> 08. <i>Rozpoznawanie twarzy</i> [laboratoria]</h2> \n",
|
|
"<h3>Andrzej Wójtowicz (2021)</h3>\n",
|
|
"</div>\n",
|
|
"\n",
|
|
"![Logo 2](img/aitech-logotyp-2.jpg)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7a9fde6b",
|
|
"metadata": {},
|
|
"source": [
|
|
"W poniższych materiałach zaprezentujemy klasyczne metody rozpoznawania twarzy. Opisywane zagadnienia można odnaleźć w *5.2.3 Principal component analysis* R. Szeliski (2022) *Computer Vision: Algorithms and Applications* oraz [dokumentacji](https://docs.opencv.org/4.5.3/da/d60/tutorial_face_main.html).\n",
|
|
"\n",
|
|
"Na początku załadujmy niezbędne biblioteki."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1d86977a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import cv2 as cv\n",
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"%matplotlib inline\n",
|
|
"import sklearn.metrics\n",
|
|
"import ipywidgets\n",
|
|
"import os\n",
|
|
"import random"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c5a62135",
|
|
"metadata": {},
|
|
"source": [
|
|
"Rozpakujmy zbiór danych, na którym będziemy pracować:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0e0f1723",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!cd datasets && unzip -qo yaleextb.zip"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e6a0efb1",
|
|
"metadata": {},
|
|
"source": [
|
|
"Nasz zbiór zawiera po kilkadziesiąt zdjęć kilkudziesięciu osób, które zostały sfotografowane w różnych warunkach oświetlenia. Wczytane zdjęcia podzielimy na zbiór treningowy i testowy w stosunku 3/1 oraz wyświetlimy kilka przykładowych zdjęć:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7b775bbf",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"dataset_dir = \"datasets/yaleextb\"\n",
|
|
"\n",
|
|
"img_data = []\n",
|
|
"img_labels = []\n",
|
|
"\n",
|
|
"images = os.listdir(dataset_dir)\n",
|
|
"\n",
|
|
"n_examples = 15\n",
|
|
"\n",
|
|
"for i in range(1, 40):\n",
|
|
" i_str = str(i).zfill(2)\n",
|
|
" images_p = [img for img in images if img.startswith(f\"yaleB{i_str}\")]\n",
|
|
" \n",
|
|
" for img in images_p[:n_examples]:\n",
|
|
" img_data.append(cv.imread(f\"{dataset_dir}/{img}\", cv.IMREAD_GRAYSCALE))\n",
|
|
" img_labels.append(i)\n",
|
|
"\n",
|
|
"random.seed(1337)\n",
|
|
"selector = random.choices([False, True], k=len(images), weights=[3, 1])\n",
|
|
"train_data = [x for x, y in zip(img_data, selector) if not y]\n",
|
|
"train_labels = [x for x, y in zip(img_labels, selector) if not y]\n",
|
|
"test_data = [x for x, y in zip(img_data, selector) if y]\n",
|
|
"test_labels = [x for x, y in zip(img_labels, selector) if y]\n",
|
|
"\n",
|
|
"plt.figure(figsize=(12,5))\n",
|
|
"for i in range(4):\n",
|
|
" plt.subplot(251 + i)\n",
|
|
" plt.imshow(train_data[i], cmap='gray');\n",
|
|
"for i in range(4):\n",
|
|
" plt.subplot(256 + i)\n",
|
|
" plt.imshow(train_data[-i-20], cmap='gray');"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6e315630",
|
|
"metadata": {},
|
|
"source": [
|
|
"Pierwszym modelem jest *Eigenfaces* zaimplementowany w [`EigenFaceRecognizer`](https://docs.opencv.org/4.5.3/dd/d7c/classcv_1_1face_1_1EigenFaceRecognizer.html). Główny pomysł polega na użyciu PCA do redukcji wymiarów. W naszym przykładzie zachowamy 60 wektorów własnych."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0473c8ae",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = cv.face.EigenFaceRecognizer_create(60)\n",
|
|
"model.train(np.array(train_data), np.array(train_labels))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7a753f2d",
|
|
"metadata": {},
|
|
"source": [
|
|
"Zachowane wektory własne możemy zwizualizować:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "f797fe86",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"img_shape = train_data[0].shape\n",
|
|
"plt.figure(figsize=(12,5))\n",
|
|
"for i in range(5):\n",
|
|
" e_v = model.getEigenVectors()[:,i]\n",
|
|
" e_v = np.reshape(e_v, img_shape)\n",
|
|
"\n",
|
|
" plt.subplot(151+i)\n",
|
|
" plt.imshow(e_v, cmap='gray');"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "19545151",
|
|
"metadata": {},
|
|
"source": [
|
|
"Możemy zobaczyć jakie potencjalne twarze znajdują się w naszej przestrzeni. Do *uśrednionej* twarzy dodajemy kolejne wektory własne z odpowiednimi wagami. Poniżej mamy przykład wykorzystujący 6 wektorów:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5265f337",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"mean = model.getMean()\n",
|
|
"W = model.getEigenVectors()\n",
|
|
"\n",
|
|
"def generate_face(**args):\n",
|
|
" img = mean.copy()\n",
|
|
" for i, k in enumerate(args.keys()):\n",
|
|
" img = np.add(img, W[:,i]*(10*args[k]))\n",
|
|
" \n",
|
|
" img = np.reshape(img, img_shape)\n",
|
|
" plt.figure(figsize=(5,5))\n",
|
|
" plt.imshow(img, cmap='gray')\n",
|
|
" plt.show()\n",
|
|
" \n",
|
|
"ipywidgets.interactive(generate_face, \n",
|
|
" w_0=ipywidgets.IntSlider(min=-128, max=128),\n",
|
|
" w_1=ipywidgets.IntSlider(min=-128, max=128),\n",
|
|
" w_2=ipywidgets.IntSlider(min=-128, max=128),\n",
|
|
" w_3=ipywidgets.IntSlider(min=-128, max=128),\n",
|
|
" w_4=ipywidgets.IntSlider(min=-128, max=128),\n",
|
|
" w_5=ipywidgets.IntSlider(min=-128, max=128))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fd4bdce6",
|
|
"metadata": {},
|
|
"source": [
|
|
"Możemy teraz spróbować zrobić rekonstrukcję np. pierwszej twarzy ze zbioru treningowego. Pobieramy dla niej projekcje (wagi) z naszego modelu i podobnie jak wyżej wykorzystujemy uśrednioną twarz i wektory własne. Możemy zobaczyć, że użycie większej liczby wektorów powoduje zwiększenie precyzji rekonstrukcji:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2619c6f9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pro = model.getProjections()[0]\n",
|
|
"\n",
|
|
"def reconstruct_face(k):\n",
|
|
" img = mean.copy()\n",
|
|
"\n",
|
|
" for i in range(k):\n",
|
|
" img = np.add(img, W[:,i]*pro[0,i])\n",
|
|
" \n",
|
|
" return img\n",
|
|
"\n",
|
|
"plt.figure(figsize=(12,6))\n",
|
|
"for i in range(6):\n",
|
|
" k = (i+1)*10\n",
|
|
" r_face = np.reshape(reconstruct_face(k), img_shape)\n",
|
|
" j = 0 if i <= 4 else 10\n",
|
|
" plt.subplot(151+i+100)\n",
|
|
" plt.imshow(r_face, cmap='gray')\n",
|
|
" plt.title(f\"k = {k}\")\n",
|
|
" \n",
|
|
"plt.subplot(257)\n",
|
|
"plt.imshow(train_data[0], cmap='gray');\n",
|
|
"plt.title(\"original\");"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ae87277a",
|
|
"metadata": {},
|
|
"source": [
|
|
"Spróbujmy teraz odnaleźć osobny znajdujące się na dwóch przykładowych obrazach ze zbioru testowego. Dla nieznanej twarzy obliczamy projekcje i szukamy metodą najbliższego sąsiada projekcji ze zbioru treningowego. Poniżej mamy przykład z poprawnym rozpoznaniem osoby oraz z niepoprawnym rozpoznaniem:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "828f3134",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def find_face(query_id):\n",
|
|
" query_face = test_data[query_id]\n",
|
|
" query_label = test_labels[query_id]\n",
|
|
"\n",
|
|
" x = np.reshape(query_face, mean.shape)\n",
|
|
" x_coeff = np.dot(x - mean, W)\n",
|
|
"\n",
|
|
" best_face = None\n",
|
|
" best_label = None\n",
|
|
" best_dist = float('inf')\n",
|
|
"\n",
|
|
" for i, p in enumerate(model.getProjections()):\n",
|
|
" dist = np.linalg.norm(np.reshape(p, 60) - np.reshape(x_coeff, 60))\n",
|
|
"\n",
|
|
" if dist < best_dist:\n",
|
|
" best_face = train_data[i]\n",
|
|
" best_label = train_labels[i]\n",
|
|
" best_dist = dist\n",
|
|
" \n",
|
|
" return query_face, query_label, best_face, best_label\n",
|
|
"\n",
|
|
"qf_1, ql_1, bf_1, bl_1 = find_face(45)\n",
|
|
"qf_2, ql_2, bf_2, bl_2 = find_face(10)\n",
|
|
"\n",
|
|
"plt.figure(figsize=(8,11))\n",
|
|
"plt.subplot(221)\n",
|
|
"plt.imshow(qf_1, cmap='gray')\n",
|
|
"plt.title(f\"Face 1: query label = {ql_1}\")\n",
|
|
"plt.subplot(222)\n",
|
|
"plt.imshow(bf_1, cmap='gray');\n",
|
|
"plt.title(f\"Face 1: best label = {bl_1}\")\n",
|
|
"plt.subplot(223)\n",
|
|
"plt.imshow(qf_2, cmap='gray')\n",
|
|
"plt.title(f\"Face 2: query label = {ql_2}\")\n",
|
|
"plt.subplot(224)\n",
|
|
"plt.imshow(bf_2, cmap='gray');\n",
|
|
"plt.title(f\"Face 2: best label = {bl_2}\");"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "43f9a8e5",
|
|
"metadata": {},
|
|
"source": [
|
|
"Bardziej kompaktowe wykonanie predykcji możemy uzyskać poprzez metodę `predict()`:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bf736bdd",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(test_labels[45], model.predict(test_data[45])[0])\n",
|
|
"print(test_labels[10], model.predict(test_data[10])[0])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "eeaf62b5",
|
|
"metadata": {},
|
|
"source": [
|
|
"Jak widać poniżej, metoda ta nie uzyskuje szczególnie zadowalających wyników (generalnie słabo sobie radzi w sytuacji zmian oświetlenia):"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "12c65438",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"predictions = []\n",
|
|
"for test_img in test_data:\n",
|
|
" p_label, p_conf = model.predict(test_img)\n",
|
|
" predictions.append(p_label)\n",
|
|
" \n",
|
|
"print(f\"Accuracy: {sklearn.metrics.accuracy_score(test_labels, predictions) * 100:.2f} %\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ea5d879b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Poniżej krótko zaprezentujemy jeszcze dwa rozwinięcia tego algorytmu. Pierwszym z nich jest *Fisherfaces* zaimplementowany w [`FisherFaceRecognizer`](https://docs.opencv.org/4.5.3/d2/de9/classcv_1_1face_1_1FisherFaceRecognizer.html). Tym razem przy pomocy LDA chcemy dodatkowo uwzględnić rozrzut pomiędzy klasami (por. [przykład](https://sthalles.github.io/fisher-linear-discriminant/)). Poniżej tworzymy model z 40 komponentami:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4eb5b746",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = cv.face.FisherFaceRecognizer_create(40)\n",
|
|
"model.train(np.array(train_data), np.array(train_labels))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e9f334be",
|
|
"metadata": {},
|
|
"source": [
|
|
"Zauważmy, że uzyskujemy tutaj ponad dwukrotnie lepszy wynik:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "96faa192",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"predictions = []\n",
|
|
"for test_img in test_data:\n",
|
|
" p_label, p_conf = model.predict(test_img)\n",
|
|
" predictions.append(p_label)\n",
|
|
" \n",
|
|
"print(f\"Accuracy: {sklearn.metrics.accuracy_score(test_labels, predictions) * 100:.2f} %\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "02220e5f",
|
|
"metadata": {},
|
|
"source": [
|
|
"Dalszym rozwinięciem jest model *Local Binary Patterns Histograms* (LBPH) zaimplementowany w [`LBPHFaceRecognizer`](https://docs.opencv.org/4.5.3/df/d25/classcv_1_1face_1_1LBPHFaceRecognizer.html). W tym wypadku chcemy np. uwzględnić możliwość innego oświetlenia osób niż taki, który występuje w naszym zbiorze treningowym. Podobnie jak wcześniej zależy nam na redukcji wymiarów, ale tym razem uzyskamy to poprzez wyliczanie cech (progowanie) dla poszczególnych pikseli w zadanych regionach."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "61eeffdf",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"model = cv.face.LBPHFaceRecognizer_create(radius=10, neighbors=10, grid_x=32, grid_y=32)\n",
|
|
"model.train(np.array(train_data), np.array(train_labels))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0d64cb5a",
|
|
"metadata": {},
|
|
"source": [
|
|
"Uzyskany wynik jest o kilka punktów procentowy lepszy od poprzedniego modelu, jednak możemy zauważyć, że zmiana domyślnych parametrów na takie, które zwiększają precyzję, powoduje również zwiększenie czasu potrzebnego na wykonanie predykcji:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ca2e319d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"predictions = []\n",
|
|
"for test_img in test_data:\n",
|
|
" p_label, p_conf = model.predict(test_img)\n",
|
|
" predictions.append(p_label)\n",
|
|
" \n",
|
|
"print(f\"Accuracy: {sklearn.metrics.accuracy_score(test_labels, predictions) * 100:.2f} %\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "00196405",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Zadanie 1\n",
|
|
"\n",
|
|
"W katalogu `datasets` znajduje się zbiór zdjęć `att_faces`. Sprawdź jakiego typu są to zdjęcia oraz jak powyższe algorytmy działają na tym zbiorze."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "51b8a256",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# miejsce na eksperymenty"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"author": "Andrzej Wójtowicz",
|
|
"email": "andre@amu.edu.pl",
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"lang": "pl",
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.3"
|
|
},
|
|
"subtitle": "08. Rozpoznawanie twarzy [laboratoria]",
|
|
"title": "Widzenie komputerowe",
|
|
"year": "2021"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|