modelowanie-jezykowe-aitech-cw/cw/01_Kodowanie_tekstu.ipynb

1147 lines
52 KiB
Plaintext
Raw Normal View History

2022-02-27 20:42:21 +01:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Logo 1](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech1.jpg)\n",
"<div class=\"alert alert-block alert-info\">\n",
"<h1> Ekstrakcja informacji </h1>\n",
"<h2> 0. <i>Kodowanie tekstu</i> [ćwiczenia]</h2> \n",
"<h3> Jakub Pokrywka (2022)</h3>\n",
"</div>\n",
"\n",
"![Logo 2](https://git.wmi.amu.edu.pl/AITech/Szablon/raw/branch/master/Logotyp_AITech2.jpg)\n",
"\n"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 1,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"NR_INDEKSU = 375985"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# UTF-8 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Kodowanie znaku na bity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Pierwszy znak | Ostatni znak| Bajt 1 | Bajt 2 | Bajt 3 | bajt 4 |\n",
"| ----------- | ----------- |-----------|----------- | ----------- | -----------| \n",
"| U+0000 | U+007F |0xxxxxxx | | | |\n",
"| U+0080 | U+07FF |110xxxxx | 10xxxxxx | | |\n",
"| U+0800 | U+FFFF |1110xxxx | 10xxxxxx | 10xxxxxx | |\n",
"| U+10000 | U+10FFFF |11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |\n"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"- 2. wiersz 5 + 6 = 11\n",
"- 3. wiersz 16"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"2A03"
]
},
{
"cell_type": "code",
"execution_count": 3,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"c = '⨃'"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 3,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10755"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 3,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ord(c)"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 9,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'⨃'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 9,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(10755)"
]
},
2022-03-06 19:21:08 +01:00
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 2*16**3 - 10*16**2 - 0 * 16**1 - 3 *16**0"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 2*16**3 - 10*16**2 - 0* 16**1 - 3* 16**0"
]
},
2022-02-27 20:42:21 +01:00
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 2* 16**3 - 10* 16**2 - 0 * 16**1 - 3* 16**0"
]
},
2022-03-06 19:21:08 +01:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"2A03"
]
},
2022-02-27 20:42:21 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$10755_{10} = 2* 16^3 + 10* 16^2 + 0 * 16^1 + 3* 16^0 =$ U+2A03 \n",
"\n"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"515"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 1 * 2**13 - 0 * 2 **12 - 1* 2 **11"
]
},
{
"cell_type": "code",
"execution_count": 29,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 29,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 1*2**13 - 0*2**12 - 1*2**11 - 0*2**10 - 1*2**9 -0*2**8 -0*2**7-0*2**6-0*2**5-0*2**4-0*2**3-0*2**2-0*2**1 - 1*2**1 - 1*2**0"
]
},
2022-03-06 19:21:08 +01:00
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-1533"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"10755 - 1*2 ** 13 - 0* 2 ** 12"
]
},
2022-02-27 20:42:21 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$10755 = 1*2**13 - 0*2**12 - 1*2**11 - 0*2**10 - 1*2**9 -0*2**8 -0*2**7-0*2**6-0*2**5-0*2**4-0*2**3-0*2**2-0*2**1 - 1*2**1 - 1*2**0 = 10101000000011_{2}$"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 26,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"14"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 26,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len('10101000000011')"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 30,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0010101000000011'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 30,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2022-03-06 19:21:08 +01:00
"''"
2022-02-27 20:42:21 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1110xxxx\t10xxxxxx\t10xxxxxx"
]
},
2022-03-06 19:21:08 +01:00
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'11100010 10101000 10000011'"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'11100010 10101000 10000011'"
]
},
2022-02-27 20:42:21 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"11100010\t10101000\t10000011"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'11100010 10101000 10000011'"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'11100010 10101000 10000011'"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 29,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"!echo '⨃' > '01_materialy/znak.txt'"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 11,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"⨃\r\n"
]
}
],
"source": [
"cat '01_materialy/znak.txt'"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"00000000: 11100010 10101000 10000011 00001010 ....\r\n"
]
}
],
"source": [
"!xxd -b '01_materialy/znak.txt'"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\x0c'"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(2*2+4*2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ZADANIE SAMODZIELNE 1 ( 10 punktów)\n",
"\n",
"Zakoduj poniższe znaki na bity wykonując niezbędne oblicznia"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 33,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'U'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 33,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(NR_INDEKSU % 100)"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"NR_INDEKSU = 426206"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'11001110'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'11001110'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"110xxxxx 10xxxxxx"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"''"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'11000011 10001110'"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"'11000011 10001110'"
]
},
{
"cell_type": "code",
"execution_count": 52,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2022-03-06 19:21:08 +01:00
"8"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len('11001110')"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"'Î'"
2022-02-27 20:42:21 +01:00
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 50,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(NR_INDEKSU % 1000)"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 35,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\U00012856'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 35,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(NR_INDEKSU % 100000 - 123)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### START ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KONIEC ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Jakie są zakresy znaków?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"!echo 'zażółć gęślą jaźń' > '01_materialy/polski_tekst.txt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| Pierwszy znak | Ostatni znak| Bajt 1 | Bajt 2 | Bajt 3 | bajt 4 |\n",
"| ----------- | ----------- |-----------|----------- | ----------- | -----------| \n",
"| U+0000 | U+007F |0xxxxxxx | | | |\n",
"| U+0080 | U+07FF |110xxxxx | 10xxxxxx | | |\n",
"| U+0800 | U+FFFF |1110xxxx | 10xxxxxx | 10xxxxxx | |\n",
"| U+10000 | U+10FFFF |11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"00000000: 01111010 01100001 11000101 10111100 11000011 10110011 za....\r\n",
"00000006: 11000101 10000010 11000100 10000111 00100000 01100111 .... g\r\n",
"0000000c: 11000100 10011001 11000101 10011011 01101100 11000100 ....l.\r\n",
"00000012: 10000101 00100000 01101010 01100001 11000101 10111010 . ja..\r\n",
"00000018: 11000101 10000100 00001010 ...\r\n"
]
}
],
"source": [
"!xxd -b '01_materialy/polski_tekst.txt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ZADANIE SAMODZIELNE 2 ( 10 punktów)\n",
"\n",
"Zamień poniższy ciąg binarny na tekst UTF-8. Jeżeli tekst nie zaczyna się od prawidłowego bitu/bitów należy je pominąć."
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 37,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = '01001111 00100000 01110111 01101001 11000100 10011001 01101011 01110011 01111010 01100101 01100111 01101111 00100000 01110100 01110010 01110101 01100100 01101110 01101111 00100000 01111010 01110101 01100011 01101000 01100001 00101100 00001010 01001010 01100001 01101011 00100000 01100010 01111001 11000101 10000010 00100000 01010011 01110100 01100101 01100110 01100101 01101011 00100000 01000010 01110101 01110010 01100011 01111010 01111001 01101101 01110101 01100011 01101000 01100001 11100010 10000000 10100110 00001010 11100010 10000000 10010100 00100000 01001010 01100001 00100000 01101110 01101001 01101011 01101111 01100111 01101111 00100000 01110011 01101001 11000100 10011001 00100000 01101110 01101001 01100101 00100000 01100010 01101111 01101010 11000100 10011001 00100001 00001010 01000011 01101000 01101111 11000100 10000111 01100010 01111001 00100000 01101110 01101001 01100101 01100100 11000101 10111010 01110111 01101001 01100101 01100100 11000101 10111010 11100010 10000000 10100110 00100000 01110100 01101111 00100000 01100100 01101111 01110011 01110100 01101111 01101010 11000100 10011001 00100001 00001010 01010111 01101001 01101100 01101011 01101001 00111111 00101110 00101110 00100000 01001010 01100001 00100000 01101001 01100011 01101000 00100000 01100011 01100001 11000101 10000010 11000100 10000101 00100000 01111010 01100111 01110010 01100001 01101010 11000100 10011001 00001010 01010000 01101111 01111010 01100001 01100010 01101001 01101010 01100001 01101101 00100000 01101001 00100000 01110000 01101111 01101011 01110010 01100001 01101010 11000100 10011001 00100001 00001010 01010100 01100101 00100000 01101000 01101001 01101010 01100101 01101110 01111001 00101100 00100000 01110100 01100101 00100000 01101100 01100001 01101101 01110000 01100001 01110010 01110100 01111001 00101100 00001010 01010100 01101111 00100000 01110011 11000100 10000101 00100000 01100100 01101100 01100001 00100000 01101101 01101110 01101001 01100101 00100000 01100011 01111010 01111001 01110011 01110100 01100101 00100000 11000101 10111100 01100001 01110010 01110100 01111001 00100001 00001010 01000001 00100000 01110000 01100001 01101110 01110100 01100101 01110010 01111001 00100000 01101001 00100000 01110100 01111001 01100111 01110010 01111001 01110011 01111001 00001010 01001110 01100001 00100000 01110011 01111010 01110100 01111001 01101011 00100000 01110111 01100101 01111010 01101101 11000100 10011001 00100000 01110101 00100000 01110011 01110111 01100101 01101010 00100000 01110011 01110000 01101001 01110011 01111001 00100001 00001010 01001100 01100101 01110111 00100001 11100010 10000000 10100110 00100000 01000011 11000011 10110011 11000101 10111100 00100000 01101100 01100101 01110111 00100000 01101010 01100101 01110011 01110100 00111111 00100000 11100010 10000000 10010100 00100000 01101011 01101111 01100011 01101001 01100001 01101011 00100000 01100100 01110101 11000101 10111100 01111001 00100001 00001010 01001110 01100001 01100011 01111010 01111001 01110100 01100001 11000101 10000010 01100101 01101101 00100000 01110011 01101001 11000100 10011001 00100000 01110000 01101111 01100100 01110010 11000011 10110011 11000101 10111100 01111001 00100001 00001010 01001001 00100000 01111010 01101110 01100001 01101101 00100000 01110100 01100101 01100111 01101111 00100000 01101010 01100101 01100111 01101111 01101101 01101111 11000101 10011011 01100011 01101001 00101100 00001010 01000011 01101111 00100000 01111010 11000101 10000010 01111001 00100000 01110100 01111001 01101100 01101011 01101111 00100000 01101011 01101001 01100101 01100100 01111001 00100000 01110000 01101111 11000101 10011011 01100011 01101001 00101110 00001010 01010011 01111010 01100001 01101011 01100001 01101100 00101100 00100000 01110111 01101001 01101100 01101011 00111111 11100010 10000000 10100110 00100000 01010011 01110100 01110010 01100001 01110011 01111010 01101110 01100001 00100000 01101110 01101111 01110111 01101001 01101110 01100001 00100001 00001010 01010100 01101111 00100000 01101010 01100101 01110011 01110100 00100000 01110100 01111001 01101100 01101011 01101111 00100000 0111
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 38,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = tekst.split(' ')"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 46,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = ' '.join(tekst[(NR_INDEKSU % 10)*5:(NR_INDEKSU % 10) + 108 ])"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\x0e'"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chr(14)"
]
},
{
"cell_type": "code",
"execution_count": 41,
2022-02-27 20:42:21 +01:00
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"'00101100 00001010 01001010 01100001 01101011 00100000 01100010 01111001 11000101 10000010 00100000 01010011 01110100 01100101 01100110 01100101 01101011 00100000 01000010 01110101 01110010 01100011 01111010 01111001 01101101 01110101 01100011 01101000 01100001 11100010 10000000 10100110 00001010 11100010 10000000 10010100 00100000 01001010 01100001 00100000 01101110 01101001 01101011 01101111 01100111 01101111 00100000 01110011 01101001 11000100 10011001 00100000 01101110 01101001 01100101 00100000 01100010 01101111 01101010 11000100 10011001 00100001 00001010 01000011 01101000 01101111 11000100 10000111 01100010 01111001 00100000 01101110 01101001 01100101 01100100 11000101 10111010 01110111 01101001 01100101 01100100 11000101 10111010 11100010 10000000 10100110 00100000 01110100'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 41,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tekst"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### START ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KONIEC ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ZADANIE SAMODZIELNE 3 ( 10 punktów)\n",
"\n",
"Zamień poniższy ciąg binarny na tekst UTF-8. Jeżeli tekst nie zaczyna się od prawidłowego bitu/bitów należy je pominąć."
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 42,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = '0x57 0x6f 0x6a 0x6e 0xc4 0x99 0x20 0x70 0x69 0x73 0x61 0xc5 0x82 0x20 0x50 0x6f 0x74 0x6f 0x63 0x6b 0x69 0x20 0x77 0x20 0x72 0x2e 0x20 0x31 0x36 0x37 0x30 0x2c 0x20 0x70 0x72 0x7a 0x65 0x64 0x20 0x75 0x70 0x61 0x64 0x6b 0x69 0x65 0x6d 0x20 0x4b 0x61 0x6d 0x69 0x65 0xc5 0x84 0x63 0x61 0x20 0x69 0x20 0x68 0x61 0x6e 0x69 0x65 0x62 0x6e 0x79 0x6d 0x69 0x20 0x75 0x6b 0xc5 0x82 0x61 0x64 0x61 0x6d 0x69 0x20 0x62 0x75 0x63 0x7a 0x61 0x63 0x6b 0x69 0x6d 0x69 0x2c 0x20 0x6b 0x74 0xc3 0xb3 0x72 0x65 0x20 0x6f 0x62 0x6f 0x77 0x69 0xc4 0x85 0x7a 0x79 0x77 0x61 0xc5 0x82 0x79 0x20 0x50 0x6f 0x6c 0x73 0x6b 0xc4 0x99 0x20 0x64 0x6f 0x20 0x68 0x61 0x72 0x61 0x63 0x7a 0x75 0x20 0x74 0x75 0x72 0x65 0x63 0x6b 0x69 0x65 0x67 0x6f 0x20 0x69 0x20 0x64 0x6f 0x20 0x75 0x74 0x72 0x61 0x74 0x79 0x20 0x50 0x6f 0x64 0x6f 0x6c 0x61 0x2e 0x20 0x50 0x69 0x73 0x61 0xc5 0x82 0x20 0x6a 0xc4 0x85 0x20 0x77 0x6f 0x62 0x65 0x63 0x20 0x63 0x6f 0x72 0x61 0x7a 0x20 0x67 0x72 0x6f 0xc5 0xba 0x6e 0x69 0x65 0x6a 0x73 0x7a 0x65 0x67 0x6f 0x20 0x6e 0x69 0x65 0x62 0x65 0x7a 0x70 0x69 0x65 0x63 0x7a 0x65 0xc5 0x84 0x73 0x74 0x77 0x61 0x20 0x6f 0x64 0x20 0x54 0x75 0x72 0x6b 0xc3 0xb3 0x77 0x2c 0x20 0x67 0x64 0x79 0x20 0x73 0x69 0xc4 0x99 0x20 0x69 0x6d 0x20 0x44 0x6f 0x72 0x6f 0x73 0x7a 0x65 0xc5 0x84 0x6b 0x6f 0x20 0x7a 0x20 0x4b 0x6f 0x7a 0x61 0x6b 0x61 0x6d 0x69 0x20 0x70 0x6f 0x64 0x64 0x61 0xc5 0x82 0x2c 0x20 0x61 0x62 0x79 0x20 0x77 0x6e 0x75 0x6b 0x6f 0x6d 0x20 0x77 0x79 0x73 0x74 0x61 0x77 0x69 0xc4 0x87 0x20 0x6a 0x61 0x6b 0x6f 0x20 0x77 0x7a 0xc3 0xb3 0x72 0x20 0x6d 0xc4 0x99 0x73 0x74 0x77 0x6f 0x20 0x64 0x7a 0x69 0x61 0x64 0xc3 0xb3 0x77 0x2c 0x20 0x6f 0x62 0x75 0x64 0x7a 0x69 0xc4 0x87 0x20 0x69 0x63 0x68 0x20 0x77 0x61 0x6c 0x65 0x63 0x7a 0x6e 0x6f 0xc5 0x9b 0xc4 0x87 0x2c 0x20 0x77 0x79 0x73 0xc5 0x82 0x61 0x77 0x69 0x61 0x6a 0xc4 0x85 0x63 0x20 0x77 0x69 0x65 0x6c 0x6b 0x69 0x65 0x20 0x64 0x7a 0x69 0x65 0xc5 0x82 0x6f 0x20 0x6f 0x73 0x77 0x6f 0x62 0x6f 0x64 0x7a 0x65 0x6e 0x69 0x61 0x20 0x63 0x68 0x72 0x7a 0x65 0xc5 0x9b 0x63 0x69 0x6a 0x61 0xc5 0x84 0x73 0x74 0x77 0x61 0x20 0x70 0x72 0x7a 0x65 0x7a 0x20 0x6a 0x65 0x67 0x6f 0x20 0x70 0x72 0x7a 0x65 0x64 0x6d 0x75 0x72 0x7a 0x65 0x2c 0x20 0x50 0x6f 0x6c 0x73 0x6b 0xc4 0x99 0x2c 0x20 0x6f 0x64 0x20 0x6e 0x61 0x6a 0x70 0x6f 0x74 0xc4 0x99 0xc5 0xbc 0x6e 0x69 0x65 0x6a 0x73 0x7a 0x65 0x67 0x6f 0x20 0x77 0x72 0x6f 0x67 0x61 0x2c 0x20 0x6b 0x74 0xc3 0xb3 0x72 0x79 0x20 0x77 0x73 0x7a 0x79 0x73 0x74 0x6b 0x69 0x65 0x20 0x73 0x69 0xc5 0x82 0x79 0x20 0x73 0x6b 0x75 0x70 0x69 0xc5 0x82 0x20 0x64 0x6c 0x61 0x20 0x6a 0x65 0x67 0x6f 0x20 0x6f 0x73 0x74 0x61 0x74 0x65 0x63 0x7a 0x6e 0x65 0x67 0x6f 0x20 0x7a 0x68 0x6f 0xc5 0x82 0x64 0x6f 0x77 0x61 0x6e 0x69 0x61 0x2e 0x20 0x44 0x6c 0x61 0x20 0x6e 0x61 0x73 0x20 0x6a 0x65 0x73 0x74 0x20 0x70 0x6f 0x20 0x70 0x72 0x6f 0x73 0x74 0x75 0x20 0x6e 0x69 0x65 0x7a 0x72 0x6f 0x7a 0x6d 0x69 0x61 0xc5 0x82 0x79 0x6d 0x2c 0x20 0x6a 0x61 0x6b 0x20 0x6d 0xc3 0xb3 0x67 0xc5 0x82 0x20 0x70 0x6f 0x65 0x74 0x61 0x20 0x6e 0x61 0x6a 0x62 0x61 0x72 0x64 0x7a 0x69 0x65 0x6a 0x20 0x70 0x61 0x74 0x72 0x69 0x6f 0x74 0x79 0x63 0x7a 0x6e 0x65 0x20 0x69 0x20 0x6e 0x61 0x6a 0x61 0x6b 0x74 0x75 0x61 0x6c 0x6e 0x69 0x65 0x6a 0x73 0x7a 0x65 0x20 0x73 0x77 0x65 0x20 0x64 0x7a 0x69 0x65 0xc5 0x82 0x6f 0x20 0x7a 0x20 0x67 0xc3 0xb3 0x72 0x79 0x20 0x64 0x6f 0x20 0x7a 0x61 0x6c 0x65 0xc5 0xbc 0x65 0x6e 0x69 0x61 0x20 0x77 0x20 0xe2 0x80 0x9e 0x73 0x65 0x70 0x65 0x63 0x69 0x65 0xe2 0x80 0x9d 0x20 0x28 0x62 0x69 0x75 0x72 0x6b 0x75 0x29 0x20 0x70 0x72 0x7a 0x65 0x7a 0x6e 0x61 0x63 0x7a 0x61 0xc4 0x87 0x20 0x69 0x20 0x61 0x6e 0x69 0x20 0x6e 0x61 0x20 0x63 0x68 0x77 0x69 0x6c 0xc4 0x99 0x20 0x6e 0x69 0x65 0x20 0x70 0x6f 0x6d 0x79 0xc5 0x9b 0x6c 0x65 0xc4 0x87 0x20 0x6f 0x20 0x6a 0x65 0x67 0x6f 0x20 0x6f 0x67 0xc5 0x82 0x6f 0x73 0x7a 0x65 0x6e 0x69 0x75 0x2c 0x20 0x63 0x68 0x6f 0x63 0x69 0x61 0xc5 0xbc 0x20 0x74 0x65 0x67 0x6f 0x20 0x77 0xc5 0x82 0x61 0xc5 0x9b 0x6e 0x69 0x65 0x20 0x69 0x20 0x63 0x7a 0x61 0x73 0x79 0x20 0x69 0x20 0x6c 0x75 0x64 0x7a 0x69 0x65 0x20 0x6b 0x6f 0x6e 0x69 0x65 0x63 0
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 43,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = tekst.split(' ')"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 44,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = ' '.join(tekst[(NR_INDEKSU % 10)*5:(NR_INDEKSU % 10) + 108 ])"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 45,
2022-02-27 20:42:21 +01:00
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"'0x2e 0x20 0x31 0x36 0x37 0x30 0x2c 0x20 0x70 0x72 0x7a 0x65 0x64 0x20 0x75 0x70 0x61 0x64 0x6b 0x69 0x65 0x6d 0x20 0x4b 0x61 0x6d 0x69 0x65 0xc5 0x84 0x63 0x61 0x20 0x69 0x20 0x68 0x61 0x6e 0x69 0x65 0x62 0x6e 0x79 0x6d 0x69 0x20 0x75 0x6b 0xc5 0x82 0x61 0x64 0x61 0x6d 0x69 0x20 0x62 0x75 0x63 0x7a 0x61 0x63 0x6b 0x69 0x6d 0x69 0x2c 0x20 0x6b 0x74 0xc3 0xb3 0x72 0x65 0x20 0x6f 0x62 0x6f 0x77 0x69 0xc4 0x85 0x7a 0x79 0x77 0x61 0xc5 0x82'"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 45,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tekst"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### START ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KONIEC ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ZADANIE SAMODZIELNE 4 (5 punktów)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wykonaj następujące operacje w jednym bashowym/shellowym pipelinie:\n",
"\n",
"- scal a.txt i b.txt łącząc je spacją (tak, że pierwszy wiersz a.txt i pierwszy wiersz b.txt są połaczone spacją w nowy wiersz, drugi wiersz a.txt i drugi wiersz b.txt są połaczone spacją w następny wiersz, itp.) \n",
"\n",
"- wyfiltruj tylko linijki gdzie nie ma cyfr z Twojego numeru indeksu\n",
"\n",
"- usuń wszystkie litery 'a',\n",
"\n",
"- zamień wszystkie litery 'c' na literę 'd'\n",
"\n",
"- potrój każde wystąpienie litery e (małe i wielkie)\n",
"\n",
"- przestaw kolejność wiersz od końca (ostatni wiersz jest pierwszym, przedostani drugim, itp.)\n",
"\n",
"- wyfiltruj linijki od piątej do szóstej od końca (wg nowej kolejności)\n",
"\n",
"- zapisz pliku c.txt\n",
"\n",
"\n",
"Następnie wyprintuj zawartość pliku c.txt do tego notebooka\n",
"\n",
"\n",
"\n",
"Możesz użyć nastepujących programów: pipe, paste, sed, awk, tr, grep, head, tail, cut, echo, redirect. Nie używaj pythona, perla, ani innych podobnych języków."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### START ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KONIEC ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ZADANIE SAMODZIELNE 5 (5 punktów)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Napisz funkcję sortującą dla stringów, która będzie działać następująco:\n",
"\n",
"- sortujemy słowa zgodnie z polskim alfabetem\n",
"- jeżeli są małe litery i wielkie to wielkie przed małymi\n",
"- jeżeli wyraz `x` jest początkiem wyrazu `y`, to wyraz `x` ma być pierwszy\n",
"\n",
"\n",
"Posortuj poniższy tekst"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 56,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"przykladowa_lista = ['ą', 'a','b','B', 'cef', 'ce', 'A','Ą', 'ż', ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"tak nie chcemy:"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 57,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['A', 'B', 'a', 'b', 'ce', 'cef', 'Ą', 'ą', 'ż']"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 57,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted(przykladowa_lista)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"tak chcemy:"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 58,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['A', 'Ą', 'a', 'ą', 'B', 'b', 'ce', 'cef', 'ż']"
]
},
2022-03-06 19:21:08 +01:00
"execution_count": 58,
2022-02-27 20:42:21 +01:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"['A', 'Ą', 'a', 'ą' ,'B', 'b', 'ce', 'cef', 'ż']"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"?sorted"
]
},
{
"cell_type": "code",
"execution_count": 60,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = !cat 01_materialy/magiczny-ogrod.txt"
]
},
{
"cell_type": "code",
2022-03-06 19:21:08 +01:00
"execution_count": 61,
2022-02-27 20:42:21 +01:00
"metadata": {},
"outputs": [],
"source": [
"tekst = [x for x in ' '.join(tekst).split(' ') if x ][NR_INDEKSU % 1000 : NR_INDEKSU % 1000 + 100]"
]
},
2022-03-06 19:21:08 +01:00
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"['uczuła',\n",
" 'odurzenie,',\n",
" 'wróciła',\n",
" 'do',\n",
" 'swej',\n",
" 'nursery',\n",
" 'i',\n",
" 'znów',\n",
" 'się',\n",
" 'zamknęła,',\n",
" 'przestraszona',\n",
" 'krzykami',\n",
" 'i',\n",
" 'odgłosami',\n",
" 'uciekających',\n",
" 'kroków.',\n",
" 'Uczuła',\n",
" 'ogarniającą',\n",
" 'ją,',\n",
" 'nieprzezwyciężoną',\n",
" 'senność,',\n",
" 'położyła',\n",
" 'się',\n",
" 'na',\n",
" 'łóżeczku',\n",
" 'i',\n",
" 'na',\n",
" 'bardzo',\n",
" 'długo',\n",
" 'straciła',\n",
" 'świadomość',\n",
" 'tego,',\n",
" 'co',\n",
" 'się',\n",
" 'wkoło',\n",
" 'niej',\n",
" 'dzieje.',\n",
" 'Tymczasem',\n",
" 'zaszło',\n",
" 'bardzo',\n",
" 'wiele,',\n",
" 'gdy',\n",
" 'Mary',\n",
" 'spała',\n",
" 'snem',\n",
" 'tak',\n",
" 'twardym,',\n",
" 'lecz',\n",
" 'jej',\n",
" 'już',\n",
" 'nie',\n",
" 'budziły',\n",
" 'ani',\n",
" 'jęki,',\n",
" 'ani',\n",
" 'odgłosy',\n",
" 'wnoszonych',\n",
" 'i',\n",
" 'wynoszonych',\n",
" 'przedmiotów.',\n",
" 'Po',\n",
" 'przebudzeniu',\n",
" 'Mary',\n",
" 'leżała',\n",
" 'jeszcze,',\n",
" 'patrząc',\n",
" 'w',\n",
" 'sufit.',\n",
" 'W',\n",
" 'domu',\n",
" 'była',\n",
" 'cisza',\n",
" 'zupełna.',\n",
" 'Nie',\n",
" 'znała',\n",
" 'ona',\n",
" 'ciszy',\n",
" 'takiej',\n",
" 'nigdy',\n",
" 'przedtem.',\n",
" 'Nie',\n",
" 'słyszała',\n",
" 'ani',\n",
" 'głosów,',\n",
" 'ani',\n",
" 'kroków',\n",
" 'niczyich',\n",
" 'i',\n",
" 'ciekawa',\n",
" 'była,',\n",
" 'czy',\n",
" 'już',\n",
" 'wszyscy',\n",
" 'wyzdrowieli',\n",
" 'i',\n",
" 'czy',\n",
" 'trwoga',\n",
" 'minęła.',\n",
" 'Ciekawa',\n",
" 'też']"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tekst"
]
},
2022-02-27 20:42:21 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### START ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### KONIEC ZADANIA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## WYKONANIE ZADAŃ\n",
"\n",
"- skopiuj niniejszy notebook\n",
"- podmień wartość zmiennej NR_INDEKSU na własny numer indeksu\n",
"- zadania wykonaj w tym jupyterze- dodawaj własne komórki tylko miedzy komórkami START ZADANIA, a KONIEC ZADANIA\n",
"- Zadania wykonaj tak, żeby po kliknięciu w Kernel → Restart & Run All notebook wykonał się bez błędów\n",
"- następnie wygeneruj z notebooka PDF (File → Download As → PDF via Latex).\n",
"- notebook z kodem oraz PDF zamieść w zakładce zadań w MS TEAMS"
]
}
],
"metadata": {
"author": "Jakub Pokrywka",
"email": "kubapok@wmi.amu.edu.pl",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"lang": "pl",
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
},
"subtitle": "0.Informacje na temat przedmiotu[ćwiczenia]",
"title": "Ekstrakcja informacji",
"year": "2021"
},
"nbformat": 4,
"nbformat_minor": 4
}