Merge branch 'nlu' of https://git.wmi.amu.edu.pl/s444417/SystemyDialogowe-ProjektMagisterski into nlu
This commit is contained in:
commit
b2ccdf7945
File diff suppressed because it is too large
Load Diff
433
lab/11-generowanie-odpowiedzi.ipynb
Normal file
433
lab/11-generowanie-odpowiedzi.ipynb
Normal file
@ -0,0 +1,433 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
|
||||
"<div class=\"alert alert-block alert-info\">\n",
|
||||
"<h1> Systemy Dialogowe </h1>\n",
|
||||
"<h2> 11. <i>Generowanie odpowiedzi</i> [laboratoria]</h2> \n",
|
||||
"<h3> Marek Kubis (2021)</h3>\n",
|
||||
"</div>\n",
|
||||
"\n",
|
||||
"![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Generowanie odpowiedzi\n",
|
||||
"======================\n",
|
||||
"\n",
|
||||
"W systemie dialogowym taktyka prowadzenia dialogu odpowiada za wyznaczanie aktów systemowych, czyli wskazanie tego **co ma zostać przez system wypowiedziane** i/lub wykonane.\n",
|
||||
"Zadaniem modułu generowania odpowiedzi jest zamiana aktów dialogowych na wypowiedzi w języku\n",
|
||||
"naturalnym, czyli wskazanie tego **w jaki sposób** ma zostać wypowiedziane to co ma zostać\n",
|
||||
"wypowiedziane.\n",
|
||||
"\n",
|
||||
"Generowanie odpowiedzi przy użyciu szablonów\n",
|
||||
"--------------------------------------------\n",
|
||||
"Podstawowe narzędzie wykorzystywane w modułach generowania odpowiedzi stanowią szablony tekstowe\n",
|
||||
"interpolujące zmienne. W Pythonie mechanizm ten jest dostępny za pośrednictwem\n",
|
||||
"[f-stringów](https://docs.python.org/3/reference/lexical_analysis.html#f-strings), metody\n",
|
||||
"[format](https://docs.python.org/3/library/string.html#formatstrings) oraz zewnętrznych bibliotek takich, jak [Jinja2](https://jinja.palletsprojects.com/).\n",
|
||||
"\n",
|
||||
"O ile podejście wykorzystujące wbudowane mechanizmy języka Python sprawdza się w prostych\n",
|
||||
"przypadkach..."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def nlg(system_act):\n",
|
||||
" domain, intent, slot, value = system_act\n",
|
||||
"\n",
|
||||
" if intent == 'Inform' and slot == 'Phone':\n",
|
||||
" return f'Numer telefonu to {value}'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Hotel', 'Inform', 'Phone', '1234567890'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"source": [
|
||||
"... to trzeba mieć świadomość, że w toku prac nad agentem dialogowym może być konieczne\n",
|
||||
"uwzględnienie m.in.:\n",
|
||||
"\n",
|
||||
" 1. szablonów zależnych od wartości slotów"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def nlg(system_act):\n",
|
||||
" domain, intent, slot, value = system_act\n",
|
||||
"\n",
|
||||
" if domain == 'Restaurant' and intent == 'Inform' and slot == 'Count':\n",
|
||||
" if value == 0:\n",
|
||||
" return f'Nie znalazłem restauracji spełniających podane kryteria.'\n",
|
||||
" elif value == 1:\n",
|
||||
" return f'Znalazłem jedną restaurację spełniającą podane kryteria.'\n",
|
||||
" elif value <= 4:\n",
|
||||
" return f'Znalazłem {value} restauracje spełniające podane kryteria.'\n",
|
||||
" elif value <= 9:\n",
|
||||
" return f'Znalazłem {value} restauracji spełniających podane kryteria.'\n",
|
||||
" else:\n",
|
||||
" return f'Znalazłem wiele restauracji spełniających podane kryteria.'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Restaurant', 'Inform', 'Count', 0])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Restaurant', 'Inform', 'Count', 1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Restaurant', 'Inform', 'Count', 2])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Restaurant', 'Inform', 'Count', 6])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg(['Restaurant', 'Inform', 'Count', 100])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"source": [
|
||||
" 2. wielu wariantów tej samej wypowiedzi"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import random\n",
|
||||
"\n",
|
||||
"def nlg(system_act):\n",
|
||||
" domain, intent, slot, value = system_act\n",
|
||||
"\n",
|
||||
" if intent == 'Affirm':\n",
|
||||
" r = random.randint(1, 3)\n",
|
||||
"\n",
|
||||
" if r == 1:\n",
|
||||
" return 'Tak'\n",
|
||||
" elif r == 2:\n",
|
||||
" return 'Zgadza się'\n",
|
||||
" else:\n",
|
||||
" return 'Potwierdzam'\n",
|
||||
"\n",
|
||||
"nlg(['Hotel', 'Affirm', '', ''])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"source": [
|
||||
" 3. wielojęzycznego interfejsu użytkownika"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def nlg_en(system_act):\n",
|
||||
" domain, intent, slot, value = system_act\n",
|
||||
"\n",
|
||||
" if domain == 'Hotel' and intent == 'Request' and slot == 'CreditCardNo':\n",
|
||||
" return 'What is your credit card number?'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nlg_en(['Hotel', 'Request', 'CreditCardNo', '?'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"lines_to_next_cell": 0
|
||||
},
|
||||
"source": [
|
||||
"Generowanie odpowiedzi z wykorzystaniem uczenia maszynowego\n",
|
||||
"-----------------------------------------------------------\n",
|
||||
"Obok mechanizmu szablonów do generowania odpowiedzi można również\n",
|
||||
"stosować techniki uczenia maszynowego.\n",
|
||||
"Zagadnienie to stanowiło\n",
|
||||
"przedmiot konkursu [E2E NLG Challenge](http://www.macs.hw.ac.uk/InteractionLab/E2E/) (Novikova i in., 2017).\n",
|
||||
"Przyjrzyjmy się danym, jakie udostępnili organizatorzy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!mkdir -p l10\n",
|
||||
"!curl -L -C - https://github.com/tuetschek/e2e-dataset/releases/download/v1.0.0/e2e-dataset.zip -o l10/e2e-dataset.zip\n",
|
||||
"!unzip l10/e2e-dataset.zip -d l10"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"trainset = pd.read_csv('l10/e2e-dataset/trainset.csv')\n",
|
||||
"trainset"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Zadanie\n",
|
||||
"-------\n",
|
||||
"Zaimplementować moduł generowania odpowiedzi obejmujący akty systemowe występujące w zgromadzonym korpusie.\n",
|
||||
"\n",
|
||||
"Termin: 1.06.2022, godz. 23:59.\n",
|
||||
"\n",
|
||||
"Literatura\n",
|
||||
"----------\n",
|
||||
" 1. Jekaterina Novikova, Ondřej Dušek, Verena Rieser, The E2E Dataset: New Challenges For End-to-End Generation, Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany https://arxiv.org/pdf/1706.09254.pdf"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"df = pd.read_csv('../data/dialog-17-04-03.tsv', sep='\\t', header=None)\n",
|
||||
"df.columns = ['user','text','data']\n",
|
||||
"df= df[df.user=='system']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df.drop(axis=1, labels=['user'], inplace=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>user</th>\n",
|
||||
" <th>text</th>\n",
|
||||
" <th>data</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>system</td>\n",
|
||||
" <td>Witamy w internetowym systemie rezerwacji Nach...</td>\n",
|
||||
" <td>welcomemsg()</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>system</td>\n",
|
||||
" <td>System Nachos obsługuje następujące kina: Mult...</td>\n",
|
||||
" <td>select(location)</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" user text data\n",
|
||||
"1 system Witamy w internetowym systemie rezerwacji Nach... welcomemsg()\n",
|
||||
"3 system System Nachos obsługuje następujące kina: Mult... select(location)"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def nlg(system_act):\n",
|
||||
" intent, slot, value = system_act\n",
|
||||
"\n",
|
||||
" if intent=='welcomemsg':\n",
|
||||
" return 'Witamy w internetowym systemie rezerwacji Nachos, w czym mogę pomóc?'\n",
|
||||
" elif intent=='inform':\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" \n",
|
||||
" \n",
|
||||
" \n",
|
||||
" \n",
|
||||
" \n",
|
||||
" if domain == 'Restaurant' and intent == 'Inform' and slot == 'Count':\n",
|
||||
" if value == 0:\n",
|
||||
" return f'Nie znalazłem restauracji spełniających podane kryteria.'\n",
|
||||
" elif value == 1:\n",
|
||||
" return f'Znalazłem jedną restaurację spełniającą podane kryteria.'\n",
|
||||
" elif value <= 4:\n",
|
||||
" return f'Znalazłem {value} restauracje spełniające podane kryteria.'\n",
|
||||
" elif value <= 9:\n",
|
||||
" return f'Znalazłem {value} restauracji spełniających podane kryteria.'\n",
|
||||
" else:\n",
|
||||
" return f'Znalazłem wiele restauracji spełniających podane kryteria.'"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"inform(quantity=2) AND inform(time=12:00)\n",
|
||||
"['inform','quantity','2']"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"author": "Marek Kubis",
|
||||
"email": "mkubis@amu.edu.pl",
|
||||
"jupytext": {
|
||||
"cell_metadata_filter": "-all",
|
||||
"main_language": "python",
|
||||
"notebook_metadata_filter": "-all"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"lang": "pl",
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"subtitle": "11.Generowanie odpowiedzi[laboratoria]",
|
||||
"title": "Systemy Dialogowe",
|
||||
"year": "2021"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
Loading…
Reference in New Issue
Block a user