434 lines
12 KiB
Plaintext
434 lines
12 KiB
Plaintext
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
|
||
|
"<div class=\"alert alert-block alert-info\">\n",
|
||
|
"<h1> Systemy Dialogowe </h1>\n",
|
||
|
"<h2> 11. <i>Generowanie odpowiedzi</i> [laboratoria]</h2> \n",
|
||
|
"<h3> Marek Kubis (2021)</h3>\n",
|
||
|
"</div>\n",
|
||
|
"\n",
|
||
|
"![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Generowanie odpowiedzi\n",
|
||
|
"======================\n",
|
||
|
"\n",
|
||
|
"W systemie dialogowym taktyka prowadzenia dialogu odpowiada za wyznaczanie aktów systemowych, czyli wskazanie tego **co ma zostać przez system wypowiedziane** i/lub wykonane.\n",
|
||
|
"Zadaniem modułu generowania odpowiedzi jest zamiana aktów dialogowych na wypowiedzi w języku\n",
|
||
|
"naturalnym, czyli wskazanie tego **w jaki sposób** ma zostać wypowiedziane to co ma zostać\n",
|
||
|
"wypowiedziane.\n",
|
||
|
"\n",
|
||
|
"Generowanie odpowiedzi przy użyciu szablonów\n",
|
||
|
"--------------------------------------------\n",
|
||
|
"Podstawowe narzędzie wykorzystywane w modułach generowania odpowiedzi stanowią szablony tekstowe\n",
|
||
|
"interpolujące zmienne. W Pythonie mechanizm ten jest dostępny za pośrednictwem\n",
|
||
|
"[f-stringów](https://docs.python.org/3/reference/lexical_analysis.html#f-strings), metody\n",
|
||
|
"[format](https://docs.python.org/3/library/string.html#formatstrings) oraz zewnętrznych bibliotek takich, jak [Jinja2](https://jinja.palletsprojects.com/).\n",
|
||
|
"\n",
|
||
|
"O ile podejście wykorzystujące wbudowane mechanizmy języka Python sprawdza się w prostych\n",
|
||
|
"przypadkach..."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def nlg(system_act):\n",
|
||
|
" domain, intent, slot, value = system_act\n",
|
||
|
"\n",
|
||
|
" if intent == 'Inform' and slot == 'Phone':\n",
|
||
|
" return f'Numer telefonu to {value}'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Hotel', 'Inform', 'Phone', '1234567890'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"source": [
|
||
|
"... to trzeba mieć świadomość, że w toku prac nad agentem dialogowym może być konieczne\n",
|
||
|
"uwzględnienie m.in.:\n",
|
||
|
"\n",
|
||
|
" 1. szablonów zależnych od wartości slotów"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def nlg(system_act):\n",
|
||
|
" domain, intent, slot, value = system_act\n",
|
||
|
"\n",
|
||
|
" if domain == 'Restaurant' and intent == 'Inform' and slot == 'Count':\n",
|
||
|
" if value == 0:\n",
|
||
|
" return f'Nie znalazłem restauracji spełniających podane kryteria.'\n",
|
||
|
" elif value == 1:\n",
|
||
|
" return f'Znalazłem jedną restaurację spełniającą podane kryteria.'\n",
|
||
|
" elif value <= 4:\n",
|
||
|
" return f'Znalazłem {value} restauracje spełniające podane kryteria.'\n",
|
||
|
" elif value <= 9:\n",
|
||
|
" return f'Znalazłem {value} restauracji spełniających podane kryteria.'\n",
|
||
|
" else:\n",
|
||
|
" return f'Znalazłem wiele restauracji spełniających podane kryteria.'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Restaurant', 'Inform', 'Count', 0])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Restaurant', 'Inform', 'Count', 1])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Restaurant', 'Inform', 'Count', 2])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Restaurant', 'Inform', 'Count', 6])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg(['Restaurant', 'Inform', 'Count', 100])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"source": [
|
||
|
" 2. wielu wariantów tej samej wypowiedzi"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import random\n",
|
||
|
"\n",
|
||
|
"def nlg(system_act):\n",
|
||
|
" domain, intent, slot, value = system_act\n",
|
||
|
"\n",
|
||
|
" if intent == 'Affirm':\n",
|
||
|
" r = random.randint(1, 3)\n",
|
||
|
"\n",
|
||
|
" if r == 1:\n",
|
||
|
" return 'Tak'\n",
|
||
|
" elif r == 2:\n",
|
||
|
" return 'Zgadza się'\n",
|
||
|
" else:\n",
|
||
|
" return 'Potwierdzam'\n",
|
||
|
"\n",
|
||
|
"nlg(['Hotel', 'Affirm', '', ''])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"source": [
|
||
|
" 3. wielojęzycznego interfejsu użytkownika"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def nlg_en(system_act):\n",
|
||
|
" domain, intent, slot, value = system_act\n",
|
||
|
"\n",
|
||
|
" if domain == 'Hotel' and intent == 'Request' and slot == 'CreditCardNo':\n",
|
||
|
" return 'What is your credit card number?'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"nlg_en(['Hotel', 'Request', 'CreditCardNo', '?'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"lines_to_next_cell": 0
|
||
|
},
|
||
|
"source": [
|
||
|
"Generowanie odpowiedzi z wykorzystaniem uczenia maszynowego\n",
|
||
|
"-----------------------------------------------------------\n",
|
||
|
"Obok mechanizmu szablonów do generowania odpowiedzi można również\n",
|
||
|
"stosować techniki uczenia maszynowego.\n",
|
||
|
"Zagadnienie to stanowiło\n",
|
||
|
"przedmiot konkursu [E2E NLG Challenge](http://www.macs.hw.ac.uk/InteractionLab/E2E/) (Novikova i in., 2017).\n",
|
||
|
"Przyjrzyjmy się danym, jakie udostępnili organizatorzy."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"!mkdir -p l10\n",
|
||
|
"!curl -L -C - https://github.com/tuetschek/e2e-dataset/releases/download/v1.0.0/e2e-dataset.zip -o l10/e2e-dataset.zip\n",
|
||
|
"!unzip l10/e2e-dataset.zip -d l10"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"\n",
|
||
|
"trainset = pd.read_csv('l10/e2e-dataset/trainset.csv')\n",
|
||
|
"trainset"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Zadanie\n",
|
||
|
"-------\n",
|
||
|
"Zaimplementować moduł generowania odpowiedzi obejmujący akty systemowe występujące w zgromadzonym korpusie.\n",
|
||
|
"\n",
|
||
|
"Termin: 1.06.2022, godz. 23:59.\n",
|
||
|
"\n",
|
||
|
"Literatura\n",
|
||
|
"----------\n",
|
||
|
" 1. Jekaterina Novikova, Ondřej Dušek, Verena Rieser, The E2E Dataset: New Challenges For End-to-End Generation, Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany https://arxiv.org/pdf/1706.09254.pdf"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"df = pd.read_csv('../data/dialog-17-04-03.tsv', sep='\\t', header=None)\n",
|
||
|
"df.columns = ['user','text','data']\n",
|
||
|
"df= df[df.user=='system']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df.drop(axis=1, labels=['user'], inplace=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>user</th>\n",
|
||
|
" <th>text</th>\n",
|
||
|
" <th>data</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1</th>\n",
|
||
|
" <td>system</td>\n",
|
||
|
" <td>Witamy w internetowym systemie rezerwacji Nach...</td>\n",
|
||
|
" <td>welcomemsg()</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>3</th>\n",
|
||
|
" <td>system</td>\n",
|
||
|
" <td>System Nachos obsługuje następujące kina: Mult...</td>\n",
|
||
|
" <td>select(location)</td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" user text data\n",
|
||
|
"1 system Witamy w internetowym systemie rezerwacji Nach... welcomemsg()\n",
|
||
|
"3 system System Nachos obsługuje następujące kina: Mult... select(location)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def nlg(system_act):\n",
|
||
|
" intent, slot, value = system_act\n",
|
||
|
"\n",
|
||
|
" if intent=='welcomemsg':\n",
|
||
|
" return 'Witamy w internetowym systemie rezerwacji Nachos, w czym mogę pomóc?'\n",
|
||
|
" elif intent=='inform':\n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" \n",
|
||
|
" if domain == 'Restaurant' and intent == 'Inform' and slot == 'Count':\n",
|
||
|
" if value == 0:\n",
|
||
|
" return f'Nie znalazłem restauracji spełniających podane kryteria.'\n",
|
||
|
" elif value == 1:\n",
|
||
|
" return f'Znalazłem jedną restaurację spełniającą podane kryteria.'\n",
|
||
|
" elif value <= 4:\n",
|
||
|
" return f'Znalazłem {value} restauracje spełniające podane kryteria.'\n",
|
||
|
" elif value <= 9:\n",
|
||
|
" return f'Znalazłem {value} restauracji spełniających podane kryteria.'\n",
|
||
|
" else:\n",
|
||
|
" return f'Znalazłem wiele restauracji spełniających podane kryteria.'"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"inform(quantity=2) AND inform(time=12:00)\n",
|
||
|
"['inform','quantity','2']"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"author": "Marek Kubis",
|
||
|
"email": "mkubis@amu.edu.pl",
|
||
|
"jupytext": {
|
||
|
"cell_metadata_filter": "-all",
|
||
|
"main_language": "python",
|
||
|
"notebook_metadata_filter": "-all"
|
||
|
},
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"lang": "pl",
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.8.10"
|
||
|
},
|
||
|
"subtitle": "11.Generowanie odpowiedzi[laboratoria]",
|
||
|
"title": "Systemy Dialogowe",
|
||
|
"year": "2021"
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 4
|
||
|
}
|