djfz-2020-info/laboratoria2/info_en.md
2020-11-12 13:49:22 +01:00

4.1 KiB

DJFZ 2020 laboratories 2

Upgrade repo

First, please update your repo. There are new tasks.

git pull git@git.wmi.amu.edu.pl:filipg/djfz-2020.git

Tasks to do on your own

Tasks from section B and C - regular expressions. Deadline is till the end of November 22nd.

Please do all B tasks. These are the same tasks as A, but differ in the solution method. This time please use regular expressions. When writing the solutions, please pay attention to the diffrence in complexity and execution time between regular expressions (B) and solutions based on basic mechanisms (A).

Each of you is assigned exactly 4 C tasks (in total these tasks will constitute an "easy task" with section C, although you don't have to do them all). Note: which task falls to you depends on the student index number! The tasks are grouped into 4 blocks:

  • TaskC00-TaskC09 - the remainder from dividing by 10,
  • TaskC10-TaskC36 - the remainder from dividing by 27,
  • TaskC37-TaskC43 - the remainder from dividing by 7,
  • TaskC44-TaskC48 - the remainder from dividing by 5.

Please check in the repository which task from each block is assigned to you. So each of you has exactly one task from each of these 4 blocks.

You can get a total of 16 points for the second laboratory.

Regular expressions

We will do regular expressions based on python3. Documentation: https://docs.python.org/3/library/re.html.

Basic functions

search - returns the first match in the substring

findall - returns a list of all matches (not overlapping)

match - returns the match from the beginning of the string

These are just the basic functions that we will use. The documentation describes them all.

match object

import re
answer = re.search('na','banana')
print(answer)
print(answer.start())
print(answer.end())
print(answer.group())

answer = re.search('na','kabanos')
print(answer)
type(answer)

if answer:
    print(answer.group())
else:
    pass

Metacharacters

  • [] - set of characters

  • . - any sign

  • ^ - beginning of a string

  • $ - end of the a string

  • ? - the character preceding is or is not present

  • * - zero or more appearances of the character preceding

  • + - one or more appearances of the character preceding

  • {} - exactly as many appearances of the character preceding

  • | - or

  • () - group

  • \ - escape character

  • \d a digit

  • \D every character, but digit

  • \s a whitespace

  • \S every character, but whitespace

Flags

You can use special flags, such as in example: re.search('ma', 'AlA Ma KoTa', re.IGNORECASE).

Examples (explained during labs)

To study, it is better to use interactive python interpreter, preferably ipython.

import re

text = 'Ala ma kota i hamak, oraz 150 bananów.'

re.search('ma',text)
re.match('ma',text)
re.match('Ala ma',text)
re.findall('ma',text)

re.findall('[mn]a',text)
re.findall('[0-9]',text)
re.findall('[0-9abc]',text)
re.findall('[a-z][a-z]ma[a-z]',text)
re.findall('[a-zA-Z][a-zA-Z]ma[a-zA-z0-9]',text)
re.findall('\d',text)

re.search('[0-9][0-9][0-9]',text)
re.search('[\d][\d][\d]',text)

re.search('\d{2}',text)
re.search('\d{3}',text)

re.search('\d+',text)

re.search('\d+ bananów',text)
re.search('\d* bananów','Ala ma dużo bananów')
re.search('\d* bananów',text)
re.search('ma \d? bananów','Ala ma 5 bananów')
re.search('ma ?\d? bananów','Ala ma bananów')
re.search('ma( \d)? bananów','Ala ma bananów') 

re.search('\d+ bananów','Ala ma 10 bananów albo 20 bananów')
re.search('\d+ bananów$','Ala ma 10 bananów albo 20 bananów')

text = 'Ala ma kota i hamak, oraz 150	bananów.'

re.search('\d+ bananów',text)

re.search('\d+\sbananów',text)

re.search('kota . hamak',text)

re.search('kota . hamak','Ala ma kota z hamakiem')

re.search('kota .* hamak','Ala ma kota lub hamak')

re.search('\.',text)

re.search('kota|psa','Ala ma kota lub hamak')

re.findall('kota|psa','Ala ma kota lub psa')

re.search('kota (i|lub) psa','Ala ma kota lub psa')

re.search('mam (kota).*(kota|psa)','Ja mam kota. Ala ma psa.').group(0)

re.search('mam (kota).*(kota|psa)','Ja mam kota. Ala ma psa.').group(1)

re.search('mam (kota).*(kota|psa)','Ja mam kota. Ala ma psa.').group(2)