{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n", "
\n", "

Modelowanie Języka

\n", "

2. Język [laboratoria]

\n", "
\n", "\n", "![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import random\n", "import plotly.express as px\n", "import numpy as np\n", "import pandas as pd\n", "import nltk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://github.com/sdadas/polish-nlp-resources" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "program : program\n", "programs : program\n", "programmer : programm\n", "programming : program\n", "programmers : programm\n" ] } ], "source": [ "ps = nltk.stem.PorterStemmer()\n", "\n", "for w in [\"program\", \"programs\", \"programmer\", \"programming\", \"programmers\"]:\n", " print(w, \" : \", ps.stem(w))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[nltk_data] Downloading package punkt to /home/pawel/nltk_data...\n", "[nltk_data] Package punkt is already up-to-date!\n", "[nltk_data] Downloading package stopwords to /home/pawel/nltk_data...\n", "[nltk_data] Package stopwords is already up-to-date!\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nltk.download('punkt')\n", "nltk.download('stopwords')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Python',\n", " 'is',\n", " 'dynamically-typed',\n", " 'and',\n", " 'garbage-collected',\n", " '.',\n", " 'It',\n", " 'supports',\n", " 'multiple',\n", " 'programming',\n", " 'paradigms',\n", " ',',\n", " 'including',\n", " 'structured',\n", " '(',\n", " 'particularly',\n", " ',',\n", " 'procedural',\n", " ')',\n", " ',',\n", " 'object-oriented',\n", " 'and',\n", " 'functional',\n", " 'programming',\n", " '.',\n", " 'It',\n", " 'is',\n", " 'often',\n", " 'described',\n", " 'as',\n", " 'a',\n", " '``',\n", " 'batteries',\n", " 'included',\n", " \"''\",\n", " 'language',\n", " 'due',\n", " 'to',\n", " 'its',\n", " 'comprehensive',\n", " 'standard',\n", " 'library',\n", " '.']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"\"\"Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. It is often described as a \"batteries included\" language due to its comprehensive standard library.\"\"\"\n", "nltk.tokenize.word_tokenize(text)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['Python is dynamically-typed and garbage-collected.',\n", " 'It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming.',\n", " 'It is often described as a \"batteries included\" language due to its comprehensive standard library.']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nltk.tokenize.sent_tokenize(text)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['aber',\n", " 'alle',\n", " 'allem',\n", " 'allen',\n", " 'aller',\n", " 'alles',\n", " 'als',\n", " 'also',\n", " 'am',\n", " 'an',\n", " 'ander',\n", " 'andere',\n", " 'anderem',\n", " 'anderen',\n", " 'anderer',\n", " 'anderes',\n", " 'anderm',\n", " 'andern',\n", " 'anderr',\n", " 'anders',\n", " 'auch',\n", " 'auf',\n", " 'aus',\n", " 'bei',\n", " 'bin',\n", " 'bis',\n", " 'bist',\n", " 'da',\n", " 'damit',\n", " 'dann',\n", " 'der',\n", " 'den',\n", " 'des',\n", " 'dem',\n", " 'die',\n", " 'das',\n", " 'dass',\n", " 'daß',\n", " 'derselbe',\n", " 'derselben',\n", " 'denselben',\n", " 'desselben',\n", " 'demselben',\n", " 'dieselbe',\n", " 'dieselben',\n", " 'dasselbe',\n", " 'dazu',\n", " 'dein',\n", " 'deine',\n", " 'deinem',\n", " 'deinen',\n", " 'deiner',\n", " 'deines',\n", " 'denn',\n", " 'derer',\n", " 'dessen',\n", " 'dich',\n", " 'dir',\n", " 'du',\n", " 'dies',\n", " 'diese',\n", " 'diesem',\n", " 'diesen',\n", " 'dieser',\n", " 'dieses',\n", " 'doch',\n", " 'dort',\n", " 'durch',\n", " 'ein',\n", " 'eine',\n", " 'einem',\n", " 'einen',\n", " 'einer',\n", " 'eines',\n", " 'einig',\n", " 'einige',\n", " 'einigem',\n", " 'einigen',\n", " 'einiger',\n", " 'einiges',\n", " 'einmal',\n", " 'er',\n", " 'ihn',\n", " 'ihm',\n", " 'es',\n", " 'etwas',\n", " 'euer',\n", " 'eure',\n", " 'eurem',\n", " 'euren',\n", " 'eurer',\n", " 'eures',\n", " 'für',\n", " 'gegen',\n", " 'gewesen',\n", " 'hab',\n", " 'habe',\n", " 'haben',\n", " 'hat',\n", " 'hatte',\n", " 'hatten',\n", " 'hier',\n", " 'hin',\n", " 'hinter',\n", " 'ich',\n", " 'mich',\n", " 'mir',\n", " 'ihr',\n", " 'ihre',\n", " 'ihrem',\n", " 'ihren',\n", " 'ihrer',\n", " 'ihres',\n", " 'euch',\n", " 'im',\n", " 'in',\n", " 'indem',\n", " 'ins',\n", " 'ist',\n", " 'jede',\n", " 'jedem',\n", " 'jeden',\n", " 'jeder',\n", " 'jedes',\n", " 'jene',\n", " 'jenem',\n", " 'jenen',\n", " 'jener',\n", " 'jenes',\n", " 'jetzt',\n", " 'kann',\n", " 'kein',\n", " 'keine',\n", " 'keinem',\n", " 'keinen',\n", " 'keiner',\n", " 'keines',\n", " 'können',\n", " 'könnte',\n", " 'machen',\n", " 'man',\n", " 'manche',\n", " 'manchem',\n", " 'manchen',\n", " 'mancher',\n", " 'manches',\n", " 'mein',\n", " 'meine',\n", " 'meinem',\n", " 'meinen',\n", " 'meiner',\n", " 'meines',\n", " 'mit',\n", " 'muss',\n", " 'musste',\n", " 'nach',\n", " 'nicht',\n", " 'nichts',\n", " 'noch',\n", " 'nun',\n", " 'nur',\n", " 'ob',\n", " 'oder',\n", " 'ohne',\n", " 'sehr',\n", " 'sein',\n", " 'seine',\n", " 'seinem',\n", " 'seinen',\n", " 'seiner',\n", " 'seines',\n", " 'selbst',\n", " 'sich',\n", " 'sie',\n", " 'ihnen',\n", " 'sind',\n", " 'so',\n", " 'solche',\n", " 'solchem',\n", " 'solchen',\n", " 'solcher',\n", " 'solches',\n", " 'soll',\n", " 'sollte',\n", " 'sondern',\n", " 'sonst',\n", " 'über',\n", " 'um',\n", " 'und',\n", " 'uns',\n", " 'unsere',\n", " 'unserem',\n", " 'unseren',\n", " 'unser',\n", " 'unseres',\n", " 'unter',\n", " 'viel',\n", " 'vom',\n", " 'von',\n", " 'vor',\n", " 'während',\n", " 'war',\n", " 'waren',\n", " 'warst',\n", " 'was',\n", " 'weg',\n", " 'weil',\n", " 'weiter',\n", " 'welche',\n", " 'welchem',\n", " 'welchen',\n", " 'welcher',\n", " 'welches',\n", " 'wenn',\n", " 'werde',\n", " 'werden',\n", " 'wie',\n", " 'wieder',\n", " 'will',\n", " 'wir',\n", " 'wird',\n", " 'wirst',\n", " 'wo',\n", " 'wollen',\n", " 'wollte',\n", " 'würde',\n", " 'würden',\n", " 'zu',\n", " 'zum',\n", " 'zur',\n", " 'zwar',\n", " 'zwischen']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nltk.corpus.stopwords.words('german')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('Python', 'is'), ('is', 'dynamically-typed'), ('dynamically-typed', 'and'), ('and', 'garbage-collected'), ('garbage-collected', '.'), ('.', 'It'), ('It', 'supports'), ('supports', 'multiple'), ('multiple', 'programming'), ('programming', 'paradigms'), ('paradigms', ','), (',', 'including'), ('including', 'structured'), ('structured', '('), ('(', 'particularly'), ('particularly', ','), (',', 'procedural'), ('procedural', ')'), (')', ','), (',', 'object-oriented'), ('object-oriented', 'and'), ('and', 'functional'), ('functional', 'programming'), ('programming', '.'), ('.', 'It'), ('It', 'is'), ('is', 'often'), ('often', 'described'), ('described', 'as'), ('as', 'a'), ('a', '``'), ('``', 'batteries'), ('batteries', 'included'), ('included', \"''\"), (\"''\", 'language'), ('language', 'due'), ('due', 'to'), ('to', 'its'), ('its', 'comprehensive'), ('comprehensive', 'standard'), ('standard', 'library'), ('library', '.')]\n" ] } ], "source": [ "nltk_tokens = nltk.word_tokenize(text)\n", "print(list(nltk.bigrams(nltk_tokens)))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "hovertemplate": "słowo=%{x}
liczba=%{y}", "legendgroup": "", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, "name": "", "offsetgroup": "", "orientation": "v", "showlegend": false, "textposition": "auto", "type": "bar", "x": [ "ma", "ala", "psa", "kota" ], "xaxis": "x", "y": [ 20, 15, 10, 10 ], "yaxis": "y" } ], "layout": { "barmode": "relative", "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ], "title": { "text": "słowo" } }, "yaxis": { "anchor": "x", "domain": [ 0, 1 ], "title": { "text": "liczba" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df = pd.DataFrame([['ma', 20], ['ala', 15], ['psa', 10], ['kota', 10]], columns=['słowo', 'liczba'])\n", "fig = px.bar(df, x=\"słowo\", y=\"liczba\")\n", "fig.show()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "język=ang
długość=%{x}
count=%{y}", "legendgroup": "", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, "name": "", "nbinsx": 50, "offsetgroup": "", "orientation": "v", "showlegend": false, "type": "histogram", "x": [ 1, 12, 3, 3, 15, 3, 1, 24, 12, 3, 27, 10, 4, 12, 1, 1, 4, 3, 5, 2, 1, 5, 7, 16, 9, 5, 2, 2, 1, 1, 3, 1, 1, 3, 4, 6, 1, 6, 3, 3, 4, 5, 1, 12, 2, 6, 9, 11, 7, 5, 9, 7, 6, 5, 3, 5, 4, 1, 4, 2, 1, 2, 2, 6, 1, 2, 6, 6, 7, 1, 4, 3, 3, 1, 5, 4, 24, 9, 5, 4, 2, 1, 4, 4, 2, 13, 16, 2, 4, 8, 9, 1, 3, 9, 1, 4, 2, 2, 4, 4, 2, 6, 15, 2, 4, 9, 3, 3, 4, 6, 1, 1, 1, 5, 2, 5, 1, 15, 14, 2, 8, 1, 5, 2, 12, 1, 1, 2, 1, 5, 3, 6, 4, 7, 2, 4, 14, 3, 5, 6, 2, 3, 18, 5, 3, 6, 1, 5, 9, 14, 7, 1, 1, 5, 1, 2, 4, 1, 13, 2, 3, 3, 3, 11, 1, 11, 1, 8, 4, 3, 2, 6, 14, 1, 2, 3, 9, 2, 5, 4, 2, 5, 4, 7, 6, 9, 5, 5, 5, 1, 2, 8, 2, 1, 9, 3, 25, 1, 2, 3, 1, 8, 14, 13, 2, 2, 1, 6, 1, 1, 1, 5, 2, 11, 2, 4, 1, 2, 1, 1, 8, 3, 6, 8, 7, 2, 4, 2, 6, 2, 3, 14, 17, 12, 8, 1, 2, 3, 15, 1, 3, 2, 1, 5, 1, 3, 1, 6, 10, 7, 1, 12, 17, 5, 4, 11, 7, 6, 3, 4, 2, 2, 8, 18, 3, 7, 5, 5, 1, 2, 6, 3, 8, 2, 15, 4, 27, 1, 3, 12, 1, 7, 9, 5, 6, 1, 1, 12, 1, 1, 6, 2, 10, 2, 5, 3, 4, 2, 1, 11, 12, 6, 12, 5, 16, 4, 1, 3, 1, 5, 2, 1, 9, 9, 8, 13, 8, 2, 5, 2, 1, 7, 6, 1, 4, 10, 3, 11, 9, 3, 4, 1, 2, 5, 1, 8, 4, 7, 4, 4, 5, 3, 6, 18, 3, 6, 5, 1, 1, 10, 1, 1, 1, 1, 3, 2, 5, 9, 5, 2, 11, 6, 2, 2, 1, 12, 2, 1, 5, 12, 5, 2, 1, 4, 13, 3, 7, 2, 2, 3, 5, 4, 1, 2, 13, 8, 1, 1, 1, 64, 3, 4, 9, 17, 2, 12, 8, 2, 8, 1, 9, 6, 2, 5, 11, 6, 5, 3, 2, 3, 1, 4, 9, 6, 2, 5, 7, 2, 6, 8, 5, 9, 9, 1, 1, 1, 7, 7, 4, 5, 5, 8, 8, 5, 2, 1, 7, 10, 4, 7, 2, 3, 1, 4, 14, 1, 2, 3, 3, 3, 2, 3, 1, 8, 3, 3, 2, 3, 4, 5, 6, 3, 1, 3, 6, 7, 4, 6, 10, 6, 1, 2, 3, 1, 3, 3, 1, 8, 1, 10, 6, 12, 2, 3, 6, 1, 8, 1, 2, 3, 3, 1, 9, 5, 5, 7, 9, 9, 3, 3, 2, 1, 3, 7, 10, 6, 3, 4, 10, 5, 1, 4, 3, 4, 22, 10, 1, 7, 6, 6, 2, 5, 16, 10, 8, 13, 2, 3, 4, 5, 3, 1, 14, 3, 2, 4, 13, 1, 5, 8, 1, 2, 1, 4, 1, 1, 1, 4, 7, 3, 2, 1, 6, 5, 10, 1, 1, 6, 3, 1, 5, 5, 10, 8, 9, 2, 1, 2, 1, 6, 2, 5, 3, 12, 1, 1, 3, 2, 1, 1, 6, 2, 2, 1, 3, 3, 5, 1, 7, 2, 3, 1, 8, 1, 3, 2, 8, 8, 1, 3, 12, 15, 1, 5, 5, 13, 4, 6, 6, 10, 10, 6, 9, 5, 3, 1, 9, 6, 1, 7, 7, 4, 8, 8, 5, 3, 1, 1, 1, 5, 1, 2, 2, 6, 3, 3, 1, 18, 10, 8, 2, 1, 15, 1, 2, 3, 8, 5, 11, 1, 3, 2, 1, 3, 6, 5, 7, 3, 2, 7, 6, 5, 19, 4, 11, 5, 6, 4, 8, 1, 13, 1, 1, 2, 8, 1, 2, 1, 14, 3, 17, 1, 3, 2, 8, 5, 5, 2, 3, 4, 5, 3, 8, 1, 1, 1, 3, 13, 3, 2, 8, 2, 4, 1, 7, 9, 2, 10, 4, 2, 3, 3, 3, 1, 1, 5, 2, 1, 2, 12, 1, 8, 1, 1, 1, 17, 2, 3, 3, 5, 3, 1, 5, 3, 1, 1, 10, 2, 1, 12, 2, 9, 11, 14, 3, 2, 1, 1, 14, 5, 5, 9, 1, 1, 1, 4, 15, 5, 4, 1, 11, 5, 2, 3, 14, 1, 4, 2, 16, 6, 8, 1, 1, 6, 3, 3, 6, 10, 1, 4, 12, 3, 9, 9, 3, 1, 10, 5, 1, 4, 4, 3, 1, 3, 1, 4, 4, 1, 5, 4, 9, 1, 1, 4, 1, 1, 5, 1, 7, 1, 4, 2, 1, 3, 3, 4, 1, 1, 2, 2, 2, 16, 14, 4, 4, 36, 25, 10, 2, 5, 1, 4, 4, 2, 12, 12, 7, 5, 18, 11, 6, 1, 10, 10, 1, 5, 10, 6, 1, 2, 13, 6, 3, 3, 8, 7, 5, 4, 12, 6, 7, 2, 2, 9, 1, 9, 1, 8, 2, 2, 2, 7, 2, 6, 3, 3, 13, 2, 1, 16, 6, 5, 1, 1, 2, 2, 3, 28, 3, 8, 1, 11, 9, 14, 2, 4, 6, 1, 1, 3, 4, 3, 2, 1, 3, 3, 1, 1, 2, 1, 3, 1, 7, 12, 5, 1, 1, 4, 3, 1, 1, 4, 2, 1, 1, 2, 8, 6, 2, 4, 4, 1, 1, 7, 2, 4, 2, 6, 15, 3, 3, 2, 2, 1, 1, 5, 1, 1, 3, 2, 5, 3, 1, 3, 2, 1, 2, 3, 2, 4, 3, 3, 1, 1, 3, 3, 6, 5, 16, 4, 1, 1, 9, 8, 8, 6, 1, 4, 2, 1, 6, 18, 5, 10, 5, 3, 14, 6, 3, 2, 1, 1, 13, 2, 7, 4, 1, 3, 4, 20, 1, 1, 2, 5, 6, 7, 5, 3, 3, 2, 1, 2, 1, 16, 6, 2, 7, 2, 3, 7, 2, 3, 4, 5, 5, 5, 10, 15, 11, 2, 4, 1, 8, 2, 8, 5, 2, 5, 5, 6, 1, 2, 15, 2, 2, 6, 1, 1, 1, 6, 6, 7, 8, 9, 1, 1, 1, 11, 2, 2, 9, 1, 1, 11, 4, 4, 3, 8, 6, 2, 5, 2, 2, 12, 1, 8, 1, 1, 2, 4, 13, 4, 1, 20, 11, 3, 2, 4, 5, 5, 4, 1, 10, 6, 2, 1, 10, 1, 3, 1, 3, 10, 3, 5, 2, 2, 6, 1, 1, 10, 28, 6, 6, 5, 3, 1, 8, 7, 3, 18, 12, 5, 1, 3, 4, 2, 7, 6, 6, 4, 9, 1, 2, 8, 7, 1, 1, 1, 15, 5, 9, 1, 3, 2, 9, 2, 2, 11, 1, 3, 2, 21, 2, 13, 2, 1, 1, 21, 3, 1, 6, 2, 11, 2, 1, 12, 1, 3, 1, 11, 3, 3, 1, 3, 3, 9, 3, 4, 4, 12, 4, 6, 2, 1, 3, 1, 3, 1, 4, 1, 10, 2, 10, 1, 11, 1, 4, 7, 18, 4, 13, 11, 2, 2, 2, 9, 4, 23, 10, 6, 1, 7, 1, 2, 7, 7, 7, 4, 1, 3, 2, 3, 2, 17, 4, 1, 9, 12, 1, 1, 3, 3, 1, 2, 1, 1, 3, 1, 5, 5, 3, 5, 6, 3, 6, 10, 6, 5, 10, 4, 2, 9, 2, 1, 11, 6, 5, 1, 3, 1, 10, 8, 5, 1, 27, 5, 3, 2, 1, 1, 2, 2, 3, 3, 2, 4, 1, 2, 7, 1, 2, 3, 2, 6, 12, 1, 7, 1, 9, 8, 15, 2, 1, 5, 1, 3, 1, 1, 17, 4, 3, 10, 4, 13, 2, 7, 3, 5, 12, 1, 5, 4, 4, 4, 8, 5, 5, 2, 2, 6, 6, 2, 2, 7, 1, 2, 3, 2, 4, 12, 3, 3, 5, 7, 7, 4, 3, 7, 4, 6, 9, 6, 2, 12, 4, 4, 2, 4, 7, 2, 3, 6, 1, 2, 1, 1, 3, 4, 11, 3, 7, 2, 5, 7, 6, 5, 2, 15, 2, 12, 1, 8, 3, 1, 4, 3, 1, 3, 2, 2, 2, 6, 1, 8, 3, 4, 14, 7, 4, 31, 3, 1, 5, 4, 1, 9, 3, 9, 1, 8, 3, 4, 5, 2, 3, 1, 3, 15, 4, 7, 1, 9, 4, 1, 4, 9, 2, 3, 9, 9, 8, 2, 5, 4, 8, 2, 11, 11, 2, 7, 5, 1, 5, 11, 3, 1, 3, 6, 25, 13, 2, 2, 3, 1, 7, 16, 1, 7, 5, 10, 7, 5, 3, 7, 23, 3, 2, 5, 5, 3, 3, 9, 12, 8, 3, 1, 2, 1, 3, 14, 5, 7, 1, 4, 6, 1, 6, 2, 5, 3, 20, 5, 3, 5, 1, 24, 3, 3, 2, 2, 8, 2, 7, 26, 6, 3, 14, 2, 3, 1, 10, 13, 14, 17, 11, 3, 11, 10, 6, 2, 7, 8, 5, 3, 2, 3, 8, 4, 1, 12, 16, 4, 1, 7, 3, 1, 3, 1, 18, 1, 3, 1, 3, 2, 7, 4, 4, 4, 2, 1, 4, 1, 6, 6, 4, 5, 6, 4, 6, 1, 4, 1, 3, 1, 1, 24, 2, 3, 3, 1, 2, 2, 3, 1, 2, 4, 10, 2, 3, 6, 2, 6, 2, 4, 1, 2, 1, 8, 8, 8, 7, 10, 1, 7, 4, 1, 5, 4, 11, 20, 3, 3, 1, 16, 3, 7, 2, 2, 13, 7, 10, 2, 6, 2, 1, 7, 11, 12, 6, 12, 13, 1, 1, 1, 6, 6, 7, 4, 3, 1, 1, 5, 3, 3, 13, 5, 4, 4, 7, 10, 8, 25 ], "xaxis": "x3", "yaxis": "y3" }, { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "język=polski
długość=%{x}
count=%{y}", "legendgroup": "", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, "name": "", "nbinsx": 50, "offsetgroup": "", "orientation": "v", "showlegend": false, "type": "histogram", "x": [ 2, 5, 1, 3, 10, 8, 2, 10, 8, 10, 1, 1, 4, 5, 1, 2, 1, 4, 5, 1, 5, 8, 1, 6, 10, 1, 2, 4, 2, 8, 5, 1, 1, 11, 1, 9, 2, 4, 5, 2, 5, 1, 3, 11, 2, 1, 1, 14, 10, 1, 2, 1, 5, 18, 1, 7, 3, 13, 4, 11, 2, 6, 1, 7, 3, 12, 4, 15, 3, 9, 7, 4, 1, 2, 11, 18, 1, 20, 3, 12, 9, 8, 3, 15, 2, 28, 18, 7, 1, 8, 3, 2, 3, 5, 8, 3, 7, 3, 2, 2, 5, 2, 9, 2, 2, 2, 3, 10, 2, 3, 1, 5, 1, 5, 2, 1, 5, 3, 3, 3, 12, 1, 2, 3, 7, 1, 11, 7, 2, 7, 16, 6, 10, 4, 4, 9, 1, 6, 4, 8, 2, 1, 15, 2, 1, 11, 1, 1, 5, 2, 1, 3, 4, 2, 3, 9, 8, 1, 11, 1, 1, 5, 4, 5, 4, 1, 1, 7, 4, 12, 4, 17, 5, 1, 1, 14, 1, 3, 14, 8, 2, 4, 2, 5, 1, 5, 7, 2, 1, 9, 4, 8, 7, 10, 5, 10, 2, 2, 6, 2, 4, 6, 15, 3, 3, 8, 2, 2, 3, 25, 6, 6, 2, 2, 9, 5, 2, 3, 2, 1, 1, 5, 1, 1, 4, 5, 1, 7, 2, 2, 3, 4, 5, 1, 7, 2, 2, 3, 3, 1, 13, 1, 4, 5, 12, 5, 6, 9, 3, 2, 3, 2, 10, 1, 6, 15, 5, 3, 2, 25, 2, 2, 5, 2, 20, 4, 1, 1, 5, 1, 11, 1, 1, 3, 5, 7, 4, 8, 12, 2, 2, 8, 1, 14, 15, 2, 5, 7, 3, 6, 3, 4, 4, 1, 5, 2, 1, 7, 5, 1, 1, 3, 5, 11, 1, 9, 13, 1, 9, 4, 9, 1, 1, 1, 2, 6, 4, 2, 1, 1, 15, 7, 5, 1, 4, 1, 6, 18, 1, 3, 2, 10, 12, 8, 1, 11, 4, 6, 1, 1, 1, 3, 7, 6, 11, 23, 21, 6, 3, 6, 1, 1, 2, 1, 3, 2, 12, 1, 5, 4, 2, 5, 3, 9, 4, 4, 6, 3, 8, 1, 18, 2, 13, 5, 6, 6, 2, 2, 6, 2, 3, 5, 3, 3, 7, 13, 4, 10, 2, 3, 8, 5, 3, 7, 3, 3, 2, 2, 1, 5, 1, 12, 1, 3, 6, 1, 8, 1, 7, 4, 4, 2, 2, 2, 2, 9, 7, 7, 2, 1, 5, 11, 1, 3, 9, 6, 3, 2, 3, 3, 6, 9, 20, 1, 4, 3, 20, 1, 2, 5, 4, 3, 2, 1, 15, 5, 4, 1, 1, 5, 6, 7, 8, 1, 2, 11, 12, 4, 2, 8, 5, 7, 8, 2, 7, 5, 1, 4, 6, 5, 9, 6, 2, 5, 5, 4, 10, 11, 3, 2, 8, 3, 6, 3, 10, 6, 1, 1, 3, 6, 15, 4, 4, 9, 2, 6, 2, 1, 1, 14, 6, 5, 10, 5, 3, 1, 6, 7, 3, 5, 3, 10, 12, 3, 8, 5, 3, 1, 2, 7, 8, 2, 1, 4, 1, 5, 3, 2, 4, 4, 1, 1, 3, 1, 1, 3, 5, 13, 4, 2, 13, 1, 9, 2, 7, 11, 2, 2, 1, 5, 9, 3, 3, 2, 5, 1, 2, 8, 3, 5, 9, 1, 1, 9, 3, 3, 15, 2, 1, 3, 2, 6, 8, 3, 3, 19, 6, 4, 2, 2, 4, 1, 1, 1, 3, 3, 15, 1, 6, 4, 6, 5, 19, 1, 2, 12, 4, 13, 4, 3, 1, 3, 3, 1, 4, 4, 5, 1, 13, 8, 8, 5, 4, 7, 7, 4, 4, 10, 3, 6, 1, 16, 2, 3, 10, 2, 1, 1, 1, 2, 5, 4, 10, 2, 3, 8, 3, 1, 10, 4, 15, 2, 11, 3, 6, 1, 10, 2, 7, 5, 4, 3, 1, 2, 5, 1, 12, 3, 4, 7, 7, 12, 1, 6, 2, 5, 1, 2, 2, 7, 1, 1, 2, 7, 2, 8, 1, 4, 1, 4, 3, 2, 4, 2, 4, 6, 1, 7, 1, 1, 3, 6, 5, 23, 3, 2, 7, 3, 3, 3, 1, 1, 11, 1, 3, 5, 12, 13, 2, 2, 5, 4, 2, 1, 6, 6, 4, 1, 8, 11, 9, 2, 12, 2, 3, 1, 7, 17, 20, 6, 1, 1, 5, 1, 3, 4, 2, 4, 7, 14, 1, 15, 2, 2, 9, 5, 4, 1, 5, 7, 2, 7, 1, 2, 2, 9, 3, 2, 9, 3, 1, 2, 4, 2, 1, 8, 5, 3, 15, 6, 4, 6, 5, 5, 5, 2, 13, 2, 2, 3, 3, 4, 3, 8, 1, 2, 2, 2, 3, 7, 1, 2, 7, 4, 3, 6, 4, 6, 5, 4, 2, 5, 1, 14, 3, 3, 10, 10, 4, 8, 2, 4, 21, 7, 1, 1, 3, 9, 1, 4, 6, 2, 4, 7, 1, 1, 3, 2, 26, 10, 1, 6, 6, 1, 2, 1, 3, 9, 3, 5, 5, 2, 5, 1, 13, 8, 2, 5, 2, 14, 1, 2, 1, 1, 1, 6, 1, 2, 3, 9, 3, 1, 16, 4, 7, 1, 10, 13, 5, 7, 3, 4, 8, 11, 8, 10, 2, 6, 1, 2, 1, 3, 1, 1, 9, 2, 2, 10, 2, 1, 1, 5, 12, 3, 3, 13, 12, 4, 6, 1, 3, 8, 16, 2, 2, 2, 5, 8, 1, 3, 8, 9, 9, 2, 10, 5, 1, 1, 4, 2, 2, 6, 20, 7, 2, 3, 1, 9, 9, 1, 2, 4, 8, 7, 4, 4, 7, 1, 7, 1, 2, 2, 2, 1, 1, 3, 8, 3, 12, 5, 2, 2, 3, 2, 9, 1, 9, 6, 4, 1, 5, 2, 2, 3, 3, 3, 3, 13, 1, 1, 5, 7, 1, 5, 3, 1, 2, 4, 7, 1, 6, 1, 6, 8, 2, 1, 6, 1, 4, 1, 3, 3, 2, 2, 1, 4, 15, 1, 9, 1, 3, 2, 5, 7, 1, 1, 2, 5, 2, 6, 3, 14, 3, 1, 3, 9, 7, 12, 2, 7, 19, 5, 4, 2, 5, 11, 2, 4, 1, 11, 3, 2, 1, 3, 1, 19, 1, 1, 3, 4, 1, 11, 5, 6, 6, 5, 3, 16, 17, 6, 2, 1, 3, 2, 5, 2, 18, 4, 5, 1, 2, 1, 6, 1, 6, 7, 3, 9, 1, 9, 4, 1, 1, 7, 7, 3, 9, 7, 11, 12, 2, 3, 2, 5, 4, 4, 3, 1, 3, 4, 9, 1, 4, 3, 1, 1, 5, 26, 1, 2, 2, 1, 1, 4, 1, 1, 3, 2, 13, 1, 3, 12, 3, 3, 17, 2, 1, 5, 10, 1, 5, 7, 3, 16, 1, 4, 2, 1, 1, 2, 8, 9, 4, 2, 4, 2, 1, 5, 1, 1, 3, 3, 4, 1, 10, 2, 8, 7, 7, 15, 1, 3, 4, 4, 6, 4, 13, 4, 5, 1, 3, 14, 12, 1, 3, 1, 1, 5, 4, 4, 8, 8, 4, 3, 9, 3, 14, 8, 6, 6, 5, 1, 5, 8, 5, 4, 4, 8, 1, 6, 2, 8, 2, 1, 1, 3, 3, 2, 6, 3, 11, 5, 4, 5, 1, 5, 5, 6, 2, 2, 1, 2, 17, 13, 1, 1, 7, 7, 3, 1, 6, 2, 4, 6, 1, 5, 9, 4, 5, 10, 2, 1, 4, 2, 8, 1, 2, 10, 9, 1, 2, 1, 11, 2, 2, 8, 1, 1, 3, 1, 2, 1, 2, 1, 1, 2, 1, 3, 2, 11, 5, 1, 9, 5, 11, 1, 3, 5, 9, 6, 12, 9, 6, 3, 3, 1, 15, 2, 3, 5, 18, 7, 3, 2, 5, 2, 1, 8, 8, 6, 8, 4, 2, 1, 5, 11, 2, 2, 3, 5, 8, 3, 26, 1, 4, 2, 3, 1, 3, 4, 1, 13, 1, 2, 1, 6, 5, 1, 2, 10, 5, 13, 15, 2, 4, 4, 3, 32, 4, 16, 2, 4, 1, 13, 1, 2, 4, 6, 1, 5, 9, 5, 8, 10, 3, 9, 3, 3, 9, 12, 1, 1, 4, 5, 3, 13, 3, 1, 3, 2, 9, 12, 12, 2, 2, 2, 15, 3, 1, 1, 3, 2, 3, 3, 6, 5, 1, 2, 1, 8, 4, 2, 2, 4, 5, 6, 8, 7, 20, 6, 1, 1, 6, 3, 4, 3, 3, 4, 12, 4, 2, 1, 5, 2, 1, 10, 7, 1, 4, 1, 4, 5, 3, 10, 3, 3, 3, 4, 1, 1, 9, 3, 1, 2, 5, 2, 1, 2, 1, 1, 3, 1, 5, 3, 10, 1, 4, 8, 8, 7, 7, 9, 1, 2, 12, 2, 2, 8, 2, 5, 7, 2, 2, 9, 3, 2, 22, 9, 3, 1, 3, 8, 5, 4, 4, 3, 3, 5, 3, 2, 4, 15, 2, 6, 26, 2, 5, 5, 4, 4, 2, 2, 1, 1, 1, 2, 2, 3, 5, 1, 7, 2, 8, 5, 20, 1, 1, 7, 3, 3, 1, 1, 1, 17, 11, 3, 1, 18, 6, 4, 4, 1, 2, 3, 1, 1, 8, 20, 7, 5, 8, 6, 6, 5, 4, 5, 7, 2, 1, 3, 13, 1, 5, 17, 1, 10, 1, 6, 9, 1, 1, 6, 3, 2, 4, 4, 5, 3, 14, 1, 3, 2, 2, 13, 28, 2, 7, 8, 2, 3, 4, 2, 2, 8, 5, 1, 4, 5, 8, 6, 1, 1, 2, 3, 5, 7, 28, 1, 2, 1, 4, 2, 5, 11, 1, 7, 8, 1, 3, 4, 5, 1, 9, 4, 3, 3, 1, 5, 5, 7, 7, 8, 1, 8, 3, 3, 4, 2, 6, 13, 2, 8, 5, 2, 6, 4, 3, 11, 2, 2, 2, 18, 6, 24, 6, 2, 8, 4, 28, 7, 3, 2, 6, 2, 1, 12, 3, 2, 1, 10, 10, 13, 10, 3, 1, 6, 4, 1, 4, 4, 3, 30, 2, 1 ], "xaxis": "x2", "yaxis": "y2" }, { "alignmentgroup": "True", "bingroup": "x", "hovertemplate": "język=hiszp
długość=%{x}
count=%{y}", "legendgroup": "", "marker": { "color": "#636efa", "pattern": { "shape": "" } }, "name": "", "nbinsx": 50, "offsetgroup": "", "orientation": "v", "showlegend": false, "type": "histogram", "x": [ 6, 7, 1, 2, 9, 16, 2, 16, 5, 8, 2, 6, 1, 8, 6, 1, 1, 2, 6, 1, 1, 6, 1, 5, 8, 7, 6, 4, 2, 3, 3, 3, 17, 12, 2, 7, 2, 5, 3, 8, 2, 9, 3, 2, 1, 7, 14, 1, 2, 19, 23, 8, 3, 1, 1, 2, 24, 1, 8, 9, 7, 3, 4, 13, 7, 6, 5, 5, 6, 5, 2, 3, 2, 1, 1, 8, 7, 2, 2, 14, 1, 4, 2, 4, 6, 12, 9, 14, 2, 10, 6, 1, 17, 13, 4, 12, 3, 5, 3, 13, 11, 1, 10, 1, 1, 2, 1, 1, 2, 1, 1, 1, 5, 1, 1, 7, 6, 16, 3, 1, 2, 1, 6, 6, 4, 1, 5, 3, 2, 5, 1, 11, 4, 3, 7, 7, 3, 2, 8, 13, 3, 2, 1, 2, 2, 10, 1, 1, 2, 5, 11, 1, 3, 4, 1, 18, 13, 3, 4, 17, 2, 2, 8, 9, 7, 4, 8, 11, 1, 12, 3, 5, 1, 4, 2, 2, 7, 2, 9, 1, 1, 5, 3, 1, 5, 3, 2, 1, 2, 1, 2, 5, 5, 2, 1, 2, 2, 11, 13, 14, 23, 8, 1, 5, 1, 1, 5, 5, 1, 3, 8, 1, 5, 4, 5, 7, 1, 5, 4, 6, 12, 8, 4, 1, 1, 1, 10, 1, 3, 30, 8, 2, 4, 3, 1, 4, 3, 9, 7, 4, 1, 8, 3, 2, 2, 2, 1, 3, 8, 4, 2, 5, 6, 3, 12, 3, 1, 1, 1, 4, 6, 1, 30, 5, 22, 5, 3, 6, 3, 2, 8, 11, 2, 8, 2, 2, 1, 16, 31, 1, 2, 12, 1, 3, 1, 1, 2, 5, 2, 7, 4, 1, 11, 7, 4, 8, 11, 6, 5, 1, 7, 1, 1, 15, 6, 2, 2, 2, 5, 9, 3, 5, 4, 9, 2, 4, 9, 4, 2, 2, 3, 2, 1, 2, 5, 3, 6, 5, 8, 2, 5, 1, 4, 2, 1, 5, 3, 4, 12, 2, 4, 7, 1, 1, 2, 6, 1, 3, 5, 4, 4, 2, 11, 2, 13, 2, 3, 5, 1, 4, 2, 1, 2, 4, 1, 19, 10, 7, 9, 7, 5, 4, 12, 3, 3, 6, 1, 1, 4, 2, 4, 1, 3, 1, 2, 2, 8, 1, 1, 2, 5, 1, 7, 12, 7, 1, 2, 3, 3, 2, 10, 4, 7, 1, 4, 3, 6, 4, 1, 1, 1, 6, 4, 2, 3, 19, 3, 2, 2, 2, 3, 1, 1, 2, 2, 9, 9, 3, 1, 3, 2, 17, 5, 2, 3, 4, 4, 9, 6, 1, 7, 1, 2, 1, 2, 1, 11, 13, 17, 1, 7, 8, 5, 3, 3, 1, 1, 8, 1, 10, 2, 3, 2, 3, 7, 1, 17, 1, 1, 4, 5, 7, 2, 15, 1, 5, 15, 1, 15, 5, 7, 2, 10, 3, 18, 5, 3, 25, 3, 2, 1, 7, 11, 9, 12, 1, 2, 1, 8, 6, 9, 2, 2, 13, 1, 1, 3, 14, 12, 14, 1, 4, 4, 15, 9, 2, 7, 4, 7, 3, 7, 3, 2, 2, 3, 1, 3, 1, 2, 5, 1, 2, 8, 2, 1, 1, 3, 9, 4, 4, 13, 3, 7, 12, 11, 6, 5, 7, 2, 2, 2, 4, 7, 3, 1, 5, 2, 1, 9, 1, 2, 3, 10, 8, 4, 6, 6, 4, 6, 2, 6, 10, 1, 2, 1, 6, 2, 7, 1, 3, 5, 1, 11, 4, 6, 2, 3, 2, 13, 4, 6, 5, 6, 4, 2, 1, 6, 6, 4, 4, 2, 1, 16, 5, 4, 3, 15, 10, 2, 5, 1, 1, 1, 1, 2, 1, 3, 4, 6, 3, 1, 10, 4, 3, 9, 5, 1, 4, 8, 4, 1, 2, 7, 20, 2, 6, 25, 2, 2, 3, 6, 1, 7, 1, 2, 1, 12, 3, 13, 13, 1, 11, 1, 6, 1, 23, 6, 4, 12, 1, 5, 2, 2, 1, 6, 3, 1, 5, 2, 9, 5, 2, 8, 4, 4, 6, 6, 1, 12, 2, 9, 6, 4, 15, 9, 6, 3, 2, 9, 2, 7, 2, 2, 2, 1, 5, 6, 3, 2, 1, 4, 2, 1, 3, 4, 5, 1, 1, 1, 4, 5, 6, 1, 2, 13, 13, 1, 3, 4, 1, 3, 5, 1, 11, 2, 2, 5, 3, 3, 4, 2, 2, 2, 2, 1, 5, 6, 9, 4, 1, 1, 1, 8, 11, 9, 1, 8, 4, 1, 3, 2, 3, 14, 6, 10, 1, 2, 1, 5, 1, 14, 2, 9, 4, 10, 2, 5, 8, 4, 4, 9, 3, 1, 2, 4, 13, 5, 1, 2, 3, 2, 4, 3, 1, 7, 3, 8, 3, 4, 8, 1, 3, 4, 4, 3, 4, 1, 17, 1, 1, 1, 2, 1, 8, 3, 5, 2, 1, 2, 6, 5, 1, 1, 7, 2, 1, 2, 6, 4, 1, 1, 3, 6, 4, 1, 1, 4, 1, 1, 1, 4, 2, 2, 2, 13, 1, 2, 2, 2, 1, 17, 3, 7, 3, 3, 15, 8, 1, 11, 1, 8, 7, 3, 4, 13, 2, 2, 1, 2, 1, 1, 7, 2, 1, 3, 1, 1, 12, 2, 13, 3, 3, 2, 6, 6, 4, 4, 1, 2, 7, 7, 5, 2, 12, 8, 4, 10, 2, 4, 4, 2, 3, 6, 1, 8, 12, 10, 22, 4, 4, 3, 2, 10, 9, 4, 1, 1, 7, 1, 6, 16, 2, 3, 5, 5, 4, 9, 30, 12, 2, 3, 1, 3, 1, 1, 9, 7, 13, 2, 11, 5, 2, 3, 7, 4, 18, 2, 3, 8, 2, 1, 4, 5, 1, 6, 2, 5, 1, 5, 1, 2, 1, 3, 2, 13, 4, 4, 4, 2, 1, 5, 7, 2, 4, 3, 3, 4, 1, 1, 18, 16, 11, 5, 9, 3, 4, 5, 1, 2, 1, 4, 4, 14, 1, 4, 7, 1, 1, 5, 3, 8, 2, 14, 7, 11, 5, 7, 8, 12, 5, 2, 8, 6, 8, 11, 3, 12, 3, 1, 12, 1, 5, 3, 4, 3, 4, 5, 1, 5, 9, 11, 3, 6, 9, 3, 1, 5, 9, 4, 2, 7, 8, 2, 2, 9, 4, 6, 16, 7, 1, 6, 3, 13, 9, 2, 1, 7, 7, 9, 9, 1, 2, 10, 7, 1, 3, 3, 3, 11, 3, 1, 11, 2, 1, 3, 1, 1, 6, 2, 6, 1, 1, 4, 9, 2, 4, 8, 5, 5, 6, 4, 5, 5, 2, 2, 6, 6, 1, 5, 1, 4, 2, 4, 9, 2, 1, 1, 3, 3, 2, 2, 1, 1, 3, 4, 1, 3, 1, 6, 2, 14, 5, 7, 14, 7, 4, 2, 4, 6, 1, 2, 6, 8, 1, 8, 4, 2, 3, 13, 1, 10, 1, 1, 14, 8, 20, 4, 5, 4, 9, 1, 3, 1, 6, 3, 6, 22, 10, 13, 16, 7, 4, 1, 8, 1, 8, 1, 10, 2, 1, 8, 11, 2, 9, 2, 7, 4, 10, 4, 1, 1, 6, 5, 5, 1, 1, 1, 6, 11, 9, 3, 1, 3, 3, 15, 14, 3, 2, 1, 4, 7, 21, 2, 5, 6, 5, 2, 4, 10, 1, 9, 3, 3, 4, 1, 8, 7, 2, 7, 1, 8, 4, 5, 7, 2, 4, 4, 6, 1, 1, 4, 2, 6, 2, 5, 2, 1, 8, 14, 7, 11, 7, 9, 2, 5, 1, 2, 3, 5, 3, 5, 1, 1, 13, 4, 2, 3, 13, 8, 7, 3, 11, 1, 6, 4, 1, 6, 3, 5, 3, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 7, 2, 8, 1, 3, 7, 1, 4, 4, 1, 3, 6, 4, 1, 3, 7, 3, 7, 2, 19, 8, 3, 13, 3, 11, 2, 1, 6, 4, 7, 8, 1, 7, 4, 3, 1, 1, 1, 10, 1, 8, 1, 3, 2, 1, 5, 8, 6, 2, 14, 9, 2, 5, 1, 11, 4, 3, 8, 2, 4, 1, 15, 1, 3, 2, 7, 2, 5, 1, 5, 1, 2, 1, 11, 4, 2, 2, 8, 15, 9, 5, 7, 3, 1, 3, 2, 1, 11, 1, 1, 3, 1, 6, 2, 1, 1, 4, 6, 3, 2, 5, 19, 4, 4, 13, 11, 5, 8, 5, 1, 6, 1, 5, 2, 2, 15, 4, 17, 1, 2, 13, 3, 20, 3, 1, 4, 6, 3, 12, 13, 19, 1, 2, 13, 2, 4, 1, 6, 5, 5, 12, 1, 1, 3, 1, 1, 1, 14, 2, 5, 2, 3, 7, 1, 4, 1, 2, 7, 1, 7, 2, 7, 3, 7, 4, 2, 1, 5, 9, 1, 4, 8, 6, 4, 13, 2, 1, 5, 5, 7, 30, 10, 7, 3, 1, 1, 2, 2, 10, 17, 4, 7, 3, 11, 2, 6, 12, 5, 2, 2, 1, 5, 2, 8, 3, 6, 18, 4, 3, 15, 3, 8, 9, 6, 3, 2, 3, 1, 5, 6, 5, 2, 11, 2, 2, 3, 1, 5, 4, 1, 10, 12, 13, 8, 4, 6, 17, 9, 4, 3, 8, 3, 4, 22, 10, 1, 1, 1, 10, 4, 3, 2, 3, 2, 5, 1, 9, 11, 8, 10, 5, 13, 6, 4, 9, 4, 12, 2, 6, 18, 1, 5, 2, 12, 5, 2, 7, 1, 5, 13, 6, 4, 2, 5, 2, 10, 9, 12, 1, 3, 1, 30, 2, 9, 5, 2, 3, 2, 2, 3, 1, 4, 3, 7, 5, 2, 14, 2, 6, 9, 13, 4, 8, 4, 2, 5, 4, 2, 8, 3, 1, 7, 5, 4, 6, 1, 1, 4, 4, 3, 11, 7, 3, 3, 8, 10, 1, 1, 1, 1, 1, 15, 15, 2, 4, 3, 11, 1, 1, 2, 1, 1, 2, 1, 4, 3, 3, 4, 2, 16, 9, 13, 3, 1, 1, 6, 2, 1, 3, 2, 2, 1, 4, 6, 1, 1, 3, 10, 6, 4, 6, 3, 9, 3, 7, 5, 7, 3, 5, 8, 10, 5, 3, 2, 3, 1, 1, 4, 8, 7, 15, 4, 2, 12, 3, 1, 6, 8, 1, 1, 7, 5, 8, 1, 1, 3, 22, 7, 3, 2, 2, 6, 2, 3, 1, 6, 9, 8, 5, 2, 3, 6, 6, 2, 2, 19, 11, 2, 3, 12, 2, 5, 3, 4 ], "xaxis": "x", "yaxis": "y" } ], "layout": { "annotations": [ { "font": {}, "showarrow": false, "text": "język=hiszp", "textangle": 90, "x": 0.98, "xanchor": "left", "xref": "paper", "y": 0.15666666666666665, "yanchor": "middle", "yref": "paper" }, { "font": {}, "showarrow": false, "text": "język=polski", "textangle": 90, "x": 0.98, "xanchor": "left", "xref": "paper", "y": 0.4999999999999999, "yanchor": "middle", "yref": "paper" }, { "font": {}, "showarrow": false, "text": "język=ang", "textangle": 90, "x": 0.98, "xanchor": "left", "xref": "paper", "y": 0.8433333333333332, "yanchor": "middle", "yref": "paper" } ], "barmode": "relative", "legend": { "tracegroupgap": 0 }, "margin": { "t": 60 }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "xaxis": { "anchor": "y", "domain": [ 0, 0.98 ], "title": { "text": "długość" } }, "xaxis2": { "anchor": "y2", "domain": [ 0, 0.98 ], "matches": "x", "showticklabels": false }, "xaxis3": { "anchor": "y3", "domain": [ 0, 0.98 ], "matches": "x", "showticklabels": false }, "yaxis": { "anchor": "x", "domain": [ 0, 0.3133333333333333 ], "title": { "text": "count" } }, "yaxis2": { "anchor": "x2", "domain": [ 0.34333333333333327, 0.6566666666666665 ], "matches": "y", "title": { "text": "count" } }, "yaxis3": { "anchor": "x3", "domain": [ 0.6866666666666665, 0.9999999999999998 ], "matches": "y", "title": { "text": "count" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df = pd.DataFrame(\n", " [\n", " [random.choice([\"ang\", \"polski\", \"hiszp\"]), np.random.geometric(0.2)]\n", " for i in range(5000)\n", " ],\n", " columns=[\"język\", \"długość\"],\n", ")\n", "fig = px.histogram(df, x=\"długość\", facet_row=\"język\", nbins=50, hover_data=df.columns)\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ZADANIE 1 \n", "\n", "(40 punktów)\n", "\n", "ZNAJDŹ PRZYKŁAD TEKSTÓW Z TEJ SAMEJ DOMENY (1 000 000) słów albo nawet tłumaczenie :\n", "- język angielski \n", "- język polski\n", "- inny język\n", "\n", "Proponowane narzędzia:\n", "- nltk\n", "- plotly express\n", "- biblioteka collections\n", "- spacy (niekoniecznie)\n", "\n", "\n", "Dla każdego z języków:\n", "- policz ilosć unikalnych (po sprowadzeniu do lowercase) słów (ze stemmingiem i bez)\n", "- policz ilosć znaków\n", "- policz ilosć unikalnych znaków\n", "- policz ilosć zdań zdań\n", "- policz ilosć unikalnych zdań\n", "- podaj min, max, średnią oraz medianę ilości znaków w słowie \n", "- podaj min, max, średnią oraz medianę ilości słów w zdaniu, znajdz najkrotsze i najdluzsze zdania\n", "- wygeneruj word cloud (normalnie i po usunięciu stopwordów)\n", "- wypisz 20 najbardziej popularnych słów (normalnie i po usunięciu stopwordów) (po sprowazdeniu do lowercase)\n", "- wypisz 20 najbardziej popularnych bigramów (normalnie i po usunięciu stopwordów)\n", "- narysuj wykres częstotliwości słów (histogram lub linie) w taki sposób żeby był czytelny, wypróbuj skali logarytmicznej dla osi x (ale na razie nie dla y), usuwanie słów poniżej limitu wystąpień itp.\n", "- punkt jak wyżej, tylko dla bigramów\n", "- punkt jak wyżej, tylko dla znaków\n", "- narysuj wykres barplot dla części mowy (PART OF SPEECH TAGS, tylko pierwszy stopień zagłębienia)\n", "- dla próbki 10000 zdań sprawdź jak często langdetect https://pypi.org/project/langdetect/ się myli i w jaki sposób.\n", "- zilustruj prawo zipfa ( px.line z zaznaczonymi punktami)\n", "- napisz wnioski (10-50 zdań)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### START ZADANIA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### KONIEC ZADANIA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://github.com/sdadas/polish-nlp-resources\n", "\n", "## Indeks czytelności Gunninga (*fog* – „mgła”)\n", "\n", "\n", "Indeks czytelności Gunninga (*Gunning fog index*) ilustruje stopień trudności tekstu. Im wyższa liczba, tym trudniejszy jest tekst w odbiorze. Ze względu na charakterystyki różnych języków nie powinno się porównywać tekstów pisanych w różnych językach. Indeks służy do porównywania różnych tekstów w tym samym języku.\n", "\n", "$$FOG = 0.4\\left(\\frac{\\rm liczba\\ słów}{\\rm liczba\\ zdań} + 100 \\cdot \\left(\\frac{\\rm liczba\\ słów\\ skomplikowanych}{\\rm liczba\\ słów}\\right) \\right)$$\n", "\n", "Słowa skomplikowane mogą pochodzić ze specjalnej listy, a jeżeli nie ma takiej listy, to można przyjąć, że są to słowa składające sie z więcej niz 3 sylab (dla języka polskiego).\n", "\n", "https://en.wikipedia.org/wiki/Gunning_fog_index\n", "\n", "## Prawo Heapsa\n", "\n", "Prawo Heapsa to empiryczne prawo lingwistyczne. Stanowi, że liczba odmiennych słów rośnie wykładniczo (z wykładnikiem <1) względem długości dokumentu.\n", "\n", "Ilosć odmiennych słów $V_R$ względem całkowitej ilości słów w tekście $n$ można opisać wzorem:\n", "$$V_R(n) = Kn^{\\beta},$$\n", "gdzie $K$ i $\\beta$ to parametry wyznaczone empirycznie.\n", "\n", "Podobnie jak w przypadku indeksu czytelności Gunninga, nie powinno się porównywać różnych tekstów w różnych językach pod kątem prawa Heapsa.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ZADANIE 2\n", "\n", "(50 punktów)\n", "\n", "Znajdź teksty w języku polskim (powinny składać sie po 5 osobnych dokumentów każdy, długości powinny być różne):\n", "- tekst prawny\n", "- tekst naukowy\n", "- tekst z polskiego z powieści (np. wolne lektury)\n", "- tekst z polskiego internetu (reddit, wykop, komentarze)\n", "- transkrypcja tekstu mówionego\n", "\n", "Dla znalezionych tekstów:\n", "- Zilustruj *Gunning fog index* (oś *y*) i średnią długość zdania (oś *x*) na jednym wykresie dla wszystkich tekstów. Domeny oznacz kolorami (`px.scatter`), dla języka polskiego traktuj jako wyrazy skomplikowane te powyżej 3 sylab, do liczenia sylab możesz użyć https://pyphen.org/ \n", "- Zilustruj prawo Heapsa dla wszystkich tekstów na jednym wykresie, domeny oznacz kolorami (`px.scatter`).\n", "- Napisz wnioski (10-50 zdań).\n", "\n", "\n", "#### START ZADANIA\n", "\n", "#### KONIEC ZADANIA" ] } ], "metadata": { "author": "Jakub Pokrywka", "email": "kubapok@wmi.amu.edu.pl", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "lang": "pl", "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "subtitle": "0.Informacje na temat przedmiotu[ćwiczenia]", "title": "Ekstrakcja informacji", "year": "2021" }, "nbformat": 4, "nbformat_minor": 4 }