KWT-2024/lab/lab_09-10.ipynb
2024-04-23 11:15:21 +02:00

80 KiB
Raw Permalink Blame History

Logo 1

Komputerowe wspomaganie tłumaczenia

9,10. Web scraping [laboratoria]

Rafał Jaworski (2021)

Logo 2

Jak dobrze wiemy, w procesie wspomagania tłumaczenia oraz w zagadnieniach przetwarzania języka naturalnego ogromną rolę pełnią zasoby lingwistyczne. Należą do nich korpusy równoległe (pamięci tłumaczeń), korpusy jednojęzyczne oraz słowniki. Bywa, że zasoby te nie są dostępne dla języka, nad którym chcemy pracować.

W tej sytuacji jest jeszcze dla nas ratunek - możemy skorzystać z zasobów dostępnych publicznie w Internecie. Na dzisiejszych zajęciach przećwiczymy techniki pobierania tekstu ze stron internetowych.

Poniższy kod służy do ściągnięcia zawartości strony (w formacie HTML do zmiennej) oraz do wyszukania na tej stronie konkretnych elementów. Przed jego uruchomieniem należy zainstalować moduł BeautifulSoup: pip3 install beautifulsoup4

import requests
from bs4 import BeautifulSoup

url='https://epoznan.pl'

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

headers = soup.find_all('h3', {'class':'postItem__title'})

print('\n'.join([header.get_text() for header in headers]))
Pożar w piwnicy w jednym z budynków. Spłonęły trzy pomieszczenia
Nie czekaj! Wiosenne promocje w gotowych inwestycjach Nickel Development trwają tylko do końca kwietnia
Posadzili pierwszy taki kieszonkowy las metodą Miyawaki. To z okazji "Dnia Ziemi"
Zabawki na podwórku przy śmietniku. "Może komuś się przydadzą?"
Zapowiadają Antykapitalistyczny Marsz Równości w Poznaniu. "Radosna celebracja różnorodności"
Pierwsze w Wielkopolsce Centrum Pomocy Dzieciom bazujące na modelu Barnahus
Z podpoznańskiego marketu tuż przed Wigilią zniknął sprzęt za kilkanaście tysięcy złotych
Mania zdyskwalifikowana! Powód nieznany
Są wyniki rekrutacji do przedszkoli. Część maluchów nigdzie się nie dostała
Pracownicy Poczty Polskiej zapowiadają strajk ostrzegawczy. Chcą podwyżek
Kiedy w końcu zrobi się cieplej? Mamy dobre wiadomości!
ETC Swarzędz świętuje 30. urodziny
Zbigniew Czerwiński gratuluje Jaśkowiakowi i dziękuje poznaniakom. "Jestem dumny"
Mieszkańcy zdecydowali w sprawie wyboru wariantu Spółdzielczej Strefy Parkowania
12-latka prawdziwą bohaterką. Uratowała kolegę, który nieprzytomny leżał przy drodze
Wyszedł ze szpitala, ślad po nim zaginął. Rodzina zaczęła go szukać przez media społecznościowe, jedna z mieszkanek znalazła go na przystanku
Zamenhofa: zderzenie samochodu i tramwaju
Jacek Jaśkowiak podziękował wyborcom. "Jestem przekonany, że to będzie najlepsze 5 lat w historii Poznania"
Współpracując z profesjonalnym odbiorcą odpadów unikniesz negatywnych konsekwencji
Pogryzienie przez amstafa na Nowym Mieście. "Cała okolica boi się psa"
Baza wojskowa z Krzesin wyjaśnia wątpliwości w sprawie pokazów lotniczych. Wiele osób zawiedzionych
Wiadomo, kiedy Jacek Jaśkowiak i nowi radni zostaną zaprzysiężeni
Wybory i decyzje związane z pogrzebem - Universum radzi
Są wyniki w powiecie poznańskim
Poznaniacy wybrali prezydenta. Mamy wyniki ze 100% obwodów
Donald Tusk: "chyba wygraliśmy Poznań"
Koniec ciszy wyborczej. Wyniki dla Poznania poznamy prawdopodobnie około północy
Ogromna inwestycja planowana jest pod Poznaniem. Tu będą szkolić  się załogi czołgów Abrams i K2
Galeria Malta powoli znika z powierzchni ziemi. Tak wygląda dziś
Druga tura wyborów. Podano frekwencję w Poznaniu na godzinę 17:00

Ćwiczenie 1: Napisz funkcję do pobierania nazw towarów z serwisu Ceneo.pl. Typ towaru, np. telewizor, pralka, laptop jest parametrem funkcji. Wystarczy pobierać dane z pierwszej strony wyników wyszukiwania.

import requests
from bs4 import BeautifulSoup

def get_names(article_type):
    url = f"https://www.ceneo.pl/;szukaj-{article_type}"
    
    response = requests.get(url)
    
    if response.status_code != 200:
        print("Nie udało się pobierać danych")
        return []
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    produkty = soup.find_all('div', class_='cat-prod-row__content')
    
    nazwy = [prod.find('strong', class_='cat-prod-row__name').text.strip() for prod in produkty if prod.find('strong', class_='cat-prod-row__name')]
    
    return nazwy
get_names('telewizor')
['Telewizor Direct Led KIVI KidsTV 32 cale Full HD',
 'Telewizor LED Samsung UE43CU7172 43 cale 4K UHD',
 'Telewizor LED Philips 43PUS7608 43 cale 4K UHD',
 'Telewizor LED Philips 43PUS8118/12 43 cale 4K UHD',
 'Telewizor QLED Samsung QE43Q60C 43 cale 4K UHD',
 'Telewizor QLED Sharp 70GP6760E 70 cali 4K UHD',
 'Telewizor QLED Samsung QE75Q80C 75 cali 4K UHD',
 'Telewizor LED Samsung UE43AU7092 43 cale 4K UHD',
 'Telewizor LED Samsung UE55CU7172 55 cali 4K UHD',
 'Telewizor LED Blaupunkt 43UBG6000S 43 cale 4K UHD',
 'Telewizor LED United 43DU58 43 cale 4K UHD',
 'Telewizor LED Samsung UE50CU7172 50 cali 4K UHD',
 'Telewizor QLED Toshiba 43QA5D63DG 43 cale 4K UHD',
 'Telewizor LED TCL 43P638 43 cale 4K UHD',
 'Telewizor LED Blaupunkt 43UBG6010S 43 cale 4K UHD',
 'Telewizor LED LG 55UR81003LJ 55 cali 4K UHD',
 'Telewizor LED Philips 55PUS7608 55 cali 4K UHD',
 'Telewizor LED LG 28TN525S 28 cali HD Ready',
 'Telewizor LCD LG 43NANO753QC 43 cale 4K UHD',
 'Telewizor LED Ud 24DW4210 24 cale HD Ready',
 'Telewizor LED Toshiba 43UA2363DG 43 cale 4K UHD',
 'Telewizor LED Samsung UE50AU7092 50 cali 4K UHD',
 'Telewizor LED Samsung UE43CU7192 43 cale 4K UHD',
 'Telewizor LED Samsung UE75CU7172 75 cali 4K UHD',
 'Telewizor QLED Samsung QE50Q67C 50 cali 4K UHD',
 'Telewizor LED TCL 43P631 43 cale 4K UHD',
 'Telewizor LED Philips 43PFS5507/12 43 cale Full HD',
 'Telewizor LED TCL 50P638 50 cali 4K UHD',
 'Telewizor LED TCL 43P635 43 cale 4K UHD',
 'Telewizor LED Sony KD-50X85K 50 cali 4K UHD',
 'Telewizor QLED TCL 43C635 43 cale 4K UHD']

W ten sposób pobieramy dane z jednej strony. Nic jednak nie stoi nam na przeszkodzie, aby zasymulować przełączanie stron.

Ćwiczenie 2: Zaobserwuj, jak zmienia się url strony podczas przechodzenia do kolejnych stron wyników wyszukiwania na Ceneo.pl. Wykorzystaj tę informację i uruchom funkcję get_names() na więcej niż jednej stronie wyników.

def get_names(article_type, page_number):
    url = f"https://www.ceneo.pl/;szukaj-{article_type}"
    if page_number > 1:
        url += f";0020-30-0-0-{page_number - 1}.htm"

    response = requests.get(url)
    
    if response.status_code != 200:
        print("Nie udało się pobierać danych")
        return []
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    produkty = soup.find_all('div', class_='cat-prod-row__content')
    
    nazwy = [prod.find('strong', class_='cat-prod-row__name').text.strip() for prod in produkty if prod.find('strong', class_='cat-prod-row__name')]
    
    return nazwy

def scrape_names(article_type, max_pages):
    all_names = []
    for page_number in range(1, max_pages + 1):
        names = get_names(article_type, page_number)
        all_names.extend(names)
        if not names:
            break
    return all_names
typ_towaru = "pralka"
max_pages = 3
nazwy_towarow = scrape_names(typ_towaru, max_pages)
for nazwa in nazwy_towarow:
    print(nazwa)
Oferta specjalna
Pralka Whirlpool - Tylko do poniedziałku 500 zł rabatu za każde 2000
Pralka Electrolux SteamCare 700 EW7F348ASP
Pralka MPM MPM-4610-PH-02
Pralka Candy CS4 1062DE/1-S
Pralka Electrolux SensiCare 600 EW6SN506WP
Pralka Sharp ES-HFM6103WD-PL
Pralka Beko WUE7512WPBE
Pralka Whirlpool WRSB 7238 BB EU
Pralka Indesit MTWSA 61294 W PL
Pralka Amica WA1S610CLiSH
Pralka Indesit BTW L50300 PL/N
Pralka Whirlpool FFL 6038 B PL
Pralka Electrolux SensiCare 600 EW6S0506OP
Pralka Beko WUE6512WPBSE
Pralka Candy RapidO RO41274DWMST/1-S
Pralka Candy Smart CST 26LET/1-S
Pralka Candy RO 1284DWMCT/1-S
Pralka Electrolux SensiCare 600 EW6SN0506OP
Pralka Gorenje WNHEI72SAS/PL
Pralka Whirlpool TDLR 65230S PL/N
Pralka Indesit MTWSC 510511 W PL
Pralka Hisense  WFQA9014EVJMT
Pralka Bosch Serie 4 WGG0420GPL
Pralka LG  F2WV3S7N6E
Pralka Amica WA1S610CLISMT
Pralka Indesit BWSA 61294 W EU N
Pralka Beko WUE7512WWE
Pralka Candy CST 06LET/1-S
Pralka Electrolux TimeCare 500 EW2TN5261FP
Pralka Beko B5WFT89408MDC
Pralka Whirlpool TDLR 6040L PL/N
Pralka Samsung WW60A3120BH
Pralka Vivax WFL120615B
Pralka Candy CSO4 1075TE
Pralka Indesit  MTWSA 61051 W PL
Pralka Gorenje WNHB6X2SDS/PL
Polecany
Pralka Whirlpool MEFFD 9469 WSBSV PL
Pralka Whirlpool WRBSS 6249 W EU
Pralka Whirlpool FFS 7259 B EE
Pralka Whirlpool FFB 9258 SV PL
Pralka Candy CSO 1275TBE/1-S
Pralka Samsung WW70TA026AE
Pralka Indesit MTWE 71252 WK PL
Pralka Candy Smart CST262D3/1-S
Pralka Beko B3WFU59415MPBS
Pralka Gorenje W2NEI62SBS/PL
Pralka Whirlpool FFB 8469 BV PL
Pralka Samsung WW60A3120WH
Pralka Electrolux SensiCare 600 EW6TN4261P
Pralka Candy Smart CS4 1061DE/1-S
Pralka Sharp ES-HFM6102WD-PL
Pralka Samsung EcoBubble WW90T534DAE
Pralka Gorenje W1NHPI60SCS/PL
Pralka MPM MPM-4610-PH-03
Pralka Amica EWAS610DL
Pralka Beko WUE6512WWE
Pralka Whirlpool WRSB 7259 BB EU
Pralka Haier HW80-B14959TU1-S
Pralka Candy SmartPRO Slim CO4 1265TWBE/1-S
Pralka Samsung WW60A3120BE
Pralka Candy RapidO RO41274DWME/1-S
Pralka Candy CS 147TXME/1-S
Pralka Bosch Serie 6 WGG242ZGPL
Pralka Bosch Serie 4 WAN2425EPL
Pralka Samsung AddWash Slim WW8NK62E0RW
Pralka Whirlpool TDLR 6241BS PL/N
Pralka Whirlpool FFB 7038 BV PL
Pralka Whirlpool TDLR 5030L PL/N
Pralka Radomet PWR-13A Inox
Pralka Bosch Serie 6 WGG242ZKPL
Pralka Samsung WW70AGAS21AE
Pralka Amica NWAS610DL
Pralka Haier Mini Inverter HW50-BP12307-S
Pralka Bosch Serie 2 WAJ2407KPL
Pralka Electrolux TimeCare 500 EW2TN5061FP
Pralka Gorenje WNHPI60SCS/PL
Pralka Amica DWAC712DL
Pralka Samsung WW60A3120WE
Pralka Whirlpool FFB 8258 BV PL
Pralka Candy RapidO RO1494DWMCE/1-S
Pralka Candy Smart CS4 1172DE/1-S
Pralka Sharp ES-HFA6103WD-PL
Pralka Sharp ES-HFA6102WD-PL
Pralka Samsung WW70TA026TE
Pralka Kernau KFWM I 6501
Pralka Luxpol Lusia PB60-2000E
Pralka Sharp ES-NFA612DW1B-PL
Pralka Candy Mini CW50-BP12307-S
Pralka Beko WFTC9723XW
Pralka Samsung WW90CGC04DAB
Pralka Samsung EcoBubble WW80CGC04DAB
Pralka Candy SmartPro CSO 1295TW4/1-S

Technika pobierania treści z Internetu jest szczególnie efektywnym sposobem na pozyskiwanie dużych ilości tekstu. Poniższy fragment kodu służy do ściągnięcia całości tekstu ze strony.

import re

url = "https://www.yahoo.com"

page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')

# usunięcie elementów script i style
for script in soup(["script", "style"]):
    script.extract()    # usuń element

# pobierz tekst
text = soup.get_text()

# usuń wielokrotne białe znaki
text = re.sub(r"\s+", " ", text)

print(text)
Yahoo | Mail, Weather, Search, Politics, News, Finance, Sports & Videos News Today's news US Politics World Tech Reviews and deals Audio Computing Gaming Health Home Phones Science TVs Climate change Health Science 2024 election Originals The 360 Life Health COVID-19 Fall allergies Health news Mental health Relax Sexual health Studies The Unwind Parenting Family health So mini ways Style and beauty It Figures Unapologetically Horoscopes Shopping Buying guides Food Travel Autos Gift ideas Entertainment Celebrity TV Movies Music How to Watch Interviews Videos Shopping Finance My portfolio My watchlist News Stock market Economics Earnings Crypto Politics Biden economy Personal finance Markets Stocks: most active Stocks: gainers Stocks: losers Trending tickers Futures World indices US Treasury bonds Currencies Crypto Top ETFs Top mutual funds Highest open interest Highest implied volatility Currency converter Sectors Basic materials Communication services Consumer cyclical Consumer defensive Energy Financial services Healthcare Industrials Real estate Technology Utilities Screeners Watchlists Equities ETFs Futures Index Mutual funds Analyst rating screener Technical events screener Smart money screener Top holdings screener Personal finance Credit cards Credit card rates Balance transfer credit cards Business credit cards Cash back credit cards Rewards credit cards Travel credit cards CD rates Checking accounts Online checking accounts High-yield savings accounts Money market accounts Personal loans Student loans Car insurance Home buying Taxes Videos ETF report FA corner Options pit Finance Plus Community Investment ideas Research reports Webinars Crypto Industries Sports Fantasy News Fantasy football Best Ball Pro Pick 'Em College Pick 'Em Fantasy baseball Fantasy hockey Fantasy basketball Download the app Daily fantasy NFL News Scores and schedules Standings Stats Teams Players Drafts Injuries Odds Super Bowl GameChannel Videos MLB News Scores and schedules Standings Stats Teams Players Odds Videos World Baseball Classic NBA News Draft Scores and schedules Standings Stats Teams Players Injuries Videos Odds Playoffs NHL News Scores and schedules Standings Stats Teams Players Odds Playoffs Soccer News Scores and schedules Premier League MLS NWSL Liga MX CONCACAF League Champions League La Liga Serie A Bundesliga Ligue 1 World Cup College football News Scores and schedules Standings Rankings Stats Teams Show all MMA WNBA Sportsbook NCAAF Tennis Golf NASCAR NCAAB NCAAW Boxing USFL Cycling Motorsports Olympics Horse racing GameChannel Rivals Newsletters Podcasts Videos RSS Jobs Help World Cup More news New on Yahoo Games Tech Terms Privacy Privacy & Cookie Settings Feedback US English Select edition USEnglish US y LATAMEspañol AustraliaEnglish CanadaEnglish CanadaFrançais DeutschlandDeutsch FranceFrançais 香港繁中 MalaysiaEnglish New ZealandEnglish SingaporeEnglish 台灣繁中 UKEnglish © 2024 All rights reserved. About our ads Advertising Careers Yahoo Home Yahoo Home Search query Select an edition USEnglish US y LATAMEspañol AustraliaEnglish CanadaEnglish CanadaFrançais DeutschlandDeutsch FranceFrançais 香港繁中 MalaysiaEnglish New ZealandEnglish SingaporeEnglish 台灣繁中 UKEnglish News Finance Sports More News Today's news US Politics World Weather Climate change Health Science 2024 election Originals Life Health Parenting Style and beauty Horoscopes Shopping Food Travel Autos Gift ideas Entertainment Celebrity TV Movies Music How to watch Interviews Videos Shopping Finance My portfolio Watchlists Markets News Videos Yahoo Finance Plus Screeners Personal finance Crypto Industries Sports Fantasy NFL NBA MLB NHL College football College basketball Soccer MMA Yahoo Sports AM New on Yahoo Games Tech Selected edition USEnglish Mail Sign in Mail News Finance Sports Entertainment Life Yahoo Plus More... … Skip navigation linksSkip to main contentSkip to sidebarAdvertisementtop storiesHush money trial: Trump is accused of criminal conspiracy and cover-upDonald Trump's historic trial on 34 felony counts of falsifying business records got underway in Manhattan on Monday.Opening statements and witness testimony »15-year-old golf phenom Miles Russell makes historyPublic health alert issued over E. coli fears in ground beefTesla cuts prices in U.S., China and Germany as competition heats upApparel retailer Express files for bankruptcy protection, to close 100 stores15 million in U.S. face severe weather threat this weekStories for you Entertainment·Yahoo SportsTom Brady will be mercilessly mocked in Netflix's 'Greatest Roast of All Time' comedy specialGet ready for a lot of jokes about Tom Brady never eating strawberries and divorcing one of the most successful supermodels in history.2 min read Thanks for your feedback! Politics·Yahoo NewsTrump hush money trial adjourned after opening statements conclude, 1st witness David Pecker briefly testifies: full coverageOpening statements are set to begin today in the first-ever criminal trial of a former U.S. president following a dramatic week of jury selection.1 min read Thanks for your feedback! News·The YodelWhat to know about Trumps hush money trial witnesses, Columbia holds virtual classes amid protests and new Rock Hall inducteesThe stories you need to start your day: Trumps trial witnesses, the NBA playoffs and more in todays edition of The Yodel newsletter4 min read Thanks for your feedback! Politics·USA TODAYJudge approves safeguards for Donald Trump's $175 million civil business fraud appeal bondA New York judge approved an agreement Monday to strengthen the $175 million bond in Donald Trump's civil fraud case while he appeals.3 min read Thanks for your feedback! Lifestyle·HuffPostMelania Trump Resurfaces With Unexpected 'Narcissist' Message As Trial Heats UpThe former first lady broke her silence — but it wasn't with a message of support for her husband.3 min read Thanks for your feedback! Sports·Yahoo SportsScottie Scheffler backs up Masters win with dominant, historic victory at RBC HeritageScottie Scheffler is just the third Masters winner in history to win the following week on the PGA Tour, and the first since 1985.4 min read Thanks for your feedback! US·WPIX New York City, NYA pilots fateful, career-altering flight under Michigans Mackinac BridgeIt'll be 65 years this week since an Air Force pilot pulled off the ultimate stunt — flying under the Mackinac Bridge. But he paid the price for it.5 min read Thanks for your feedback! Business·Yahoo FinanceStock market today: Stocks turn upbeat with Big Tech earnings in viewBig Techs are the highlight as hopes rest on this week's flood of earnings to reassure and reignite the market.1 min read Thanks for your feedback! Entertainment·Country LivingBlake Shelton Said He Would Return to 'The Voice' Under One CircumstanceBlake Shelton said he would return to 'The Voice' under one condition.3 min read Thanks for your feedback! Business·EngadgetEmbracer Group is splitting up its messy gaming empire into three different companiesEmbracer Group has announced plans to split into three separate, publicly listed entities, following an epic losing streak.2 min read Thanks for your feedback! Entertainment·BuzzFeedKelly Clarkson Nearly Walked Off Stage After She Unintentionally Made A Sexual Remark About Meat To Henry Golding, Who Couldn't Help But LaughEggs, toast, and morning meat.2 min read Thanks for your feedback! Business·ReutersKroger, Albertsons to sell 166 more stores to gain regulatory approval for $25 billion mergerThe companies have been looking to offload stores to address regulatory concerns that the merger would lead to higher prices, store closures and job losses that have risen since they first announced the merger in October 2022. Under the new agreement, C&S will pay Kroger an all-cash consideration of about $2.9 billion, up from the previous payout of $1.9 billion. Kroger had earlier proposed divesting 413 stores and eight distribution centers to C&S Wholesale Grocers.2 min read Thanks for your feedback! Celebrity·STYLECASTERPrince Harry Makes Official Change That Speaks Volumes About His Royal FutureIt's the latest major step since his royal exit.3 min read Thanks for your feedback! Celebrity·PeopleSalma Hayek Shares Look at Victoria Beckham's Unforgettable 50th Birthday with Spice Girls, Tom Cruise and MoreAll five members of the Spice Girls even performed the choreography for their 1997 hit “Stop” at the party in London3 min read Thanks for your feedback! Lifestyle·allrecipesMcDonald's Has a New Limited-Time Sandwich—And Fans Are Already Lovin' ItIt's a new flavor spin on a Mickey D's classic.2 min read Thanks for your feedback! Lifestyle·CubbyThe Best Frozen Pizzas You Can Buy at the Grocery StoreTwo really stood out as clear winners!4 min read Thanks for your feedback! Business·Yahoo Personal FinanceMortgage rates today, April 21, 2024: Interest costs on the riseThese are today's mortgage rates. Interest costs are on the rise for home shoppers nationwide. Lock in your rate today.4 min read Thanks for your feedback! Entertainment·Entertainment WeeklyRock and Roll Hall of Fame 2024 inductees include Cher, Ozzy Osbourne, Mary J. Blige, moreCher's honor comes after she publicly rebuked the Rock Hall for never nominating her, saying, "I wouldnt be in it now if they gave me a million dollars."2 min read Thanks for your feedback! Celebrity·SheKnowsInside Sources Reveal How Kim Kardashian Is Feeling After Taylor Swift Reignited Their Feud AgainNot only has Taylor Swifts new album The Tortured Poets Department refueled theories about her romances with Matt Healy and Joe Alwyn, but it also reignited the long-standing feud between her and Kim Kardashian. They had a feud that lasted years, and one many thought was long dead and gone. However, with one of Swifts …3 min read Thanks for your feedback! Sports·Yahoo SportsNFL mock draft: With one major trade-up, it's a QB party in the top 5Our final 2024 mock draft projects four quarterbacks in the first five picks, but the Cardinals at No. 4 might represent the key pivot point of the entire board.14 min read Thanks for your feedback! World·NBC NewsAfter $1.5 billion was spent, the centerpiece of Paris' Olympic efforts may still be too filthy to useWith less than 100 days to go before the Paris 2024 Olympic games in France, concerns about pollution in Paris' Seine river are being raised.4 min read Thanks for your feedback! Lifestyle·Yahoo LifeHeres when people think old age begins — and why experts think its starting laterPeople's definition of "old age" is older than it used to be, new research suggests.7 min read Thanks for your feedback! Sports·Yahoo SportsNBA announces finalists for season-long awards, including Nikola Jokić for MVP, Victor Wembanyama for ROYJokić is nominated for his third MVP in four seasons. Chet Holmgren joins Wembanyama as a Rookie of the Year finalist.2 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingThe 30 best Walmart deals to shop this weekend — save up to 80% on outdoor gear, gardening supplies, tech and moreSome major deals on board: a Mother's Day-ready digital picture frame for $30 off, a cordless 6-in-1 stick vac for just $90, and a Chromebook laptop for under $150.2 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingThe best Amazon deals to shop this weekend: Save up to 90% on home appliances, gardening essentials and moreA few of our faves? A Shark stick vac for just $100 and a vertical garden on sale for $34, plus sweet gift ideas for Mom.2 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingBest prepared meal delivery services for 2024, tested and reviewedWe've picked the best meal kit for every taste and budget.5 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingThe 30 most unique gifts you can buy anyoneThese gift ideas from Amazon, Uncommon Goods, Etsy and other retailers are anything but ordinary.2 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingThe 25 best gifts for your sister-in-law that'll cement you as her favorite family memberTreat your SIL to these finds from Sephora, Anthropologie and more.1 min read Thanks for your feedback! Sports·EngadgetThe best PS5 games for 2024: Top PlayStation titles to play right nowHere are the best games you can get for the PlayStation 5 right now, as chosen by Engadget editors.1 min read Thanks for your feedback! Lifestyle·Yahoo Life ShoppingThe 10 best deodorants and antiperspirants for women of 2024, according to dermatologists and testersFrom Secret to Kopari, Megababe and Mitchum, these are the best deodorants for women, tested and dermatologist-approved.9 min read Thanks for your feedback! Business·Insider Monkey15 Countries with the Largest Proven Oil Reserves in the WorldIn this article, we are going to discuss the 15 countries with the largest proven oil reserves in the world. You can skip our detailed analysis of the global oil and gas market, the impact of the Russia-Ukraine war on the global oil sector, and the steps taken by major oil companies to achieve net […]12 min read Thanks for your feedback! Sports·Yahoo SportsPass or Fail: Broncos release 'Mile High Collection,' first new uniforms in over 25 yearsThe Broncos may have committed the greatest fashion faux pas there is: being boring.2 min read Thanks for your feedback! Sports·Yahoo SportsRyan Garcia's win over Devin Haney raises more questions than answersRyan Garcia showed up for Saturday nights fight against Devin Haney in New York as a 6-1 underdog. He also showed up several pounds heavy, towing behind him what seemed to be some heavy psychological baggage. And then he won.3 min read Thanks for your feedback! Sports·Yahoo SportsOakland University outfielders combine to make spectacular catch vs. Northern KentuckyOakland University outfielders John Lauinger and Reggie Bussey combined on what could be college baseball's best catch of the 2024 season against Northern Kentucky.2 min read Thanks for your feedback! Sports·Yahoo SportsScottie Scheffler holds massive 5-shot lead as RBC Heritage is called for darkness, set for Monday finishScottie Scheffler held a five-shot lead when play was called for the night on Sunday.3 min read Thanks for your feedback! Sports·Yahoo SportsShohei Ohtani breaks Hideki Matsui's MLB record for HRs by Japanese-born playerShohei Ohtani keeps dominating.3 min read Thanks for your feedback! Business·AutoblogThe 10 car brands cheapest to maintain over 10 yearsConsumer Reports found that the cheapest brands to maintain cost just a fraction of those at the other end of the spectrum.2 min read Thanks for your feedback! Sports·Yahoo SportsBulls reportedly offered DeMar DeRozan 2-year deal worth $40 million per seasonThe Chicago Bulls reportedly offered DeMar DeRozan a two-year contract that could be worth up to $40 million per season. DeRozan is set to become an unrestricted free agent.3 min read Thanks for your feedback! Sports·Yahoo SportsNBA playoffs: Clippers dismantle Mavericks with Kawhi Leonard in street clothesVintage James Harden and a stifling Clippers defense help lead Los Angeles to a dominant win as Leonard watched from the sideline with a lingering knee injury.4 min read Thanks for your feedback! Sports·Yahoo SportsChicago Bears' futility at the QB position defies probabilityChicago will attempt to end its run of positional failure by selecting Caleb Williams, the Bears next shot at finding the franchise's first great quarterback in the Super Bowl era.5 min read Thanks for your feedback! Entertainment·Yahoo TVJon Bon Jovi corrects record on Richie Sambora leaving band: He 'chose not to come back'The Bon Jovi front man talks to Yahoo about the new docuseries telling the story of the New Jersey rock bands 40-year history — and shares where things stand with Sambora after the guitarists 2013 departure.5 min read Thanks for your feedback! Sports·Yahoo SportsMock Draft Monday with Dane Brugler: Cowboys solve for OL and RB, Colts land a WRDraft week has arrived and with that comes our final installment of 'Mock Draft Mondays'. We go out with a bang as The Athletic's Dane Brugler joins Matt Harmon to share his five favorite picks in his latest seven-round mock draft. Yes, Brugler doesn't just put together 'The Beast' but a seven round mock. Everything you need to get ready for Thursday night.2 min read Thanks for your feedback! Sports·Yahoo SportsWay-too-early fantasy basketball top-12 rankings: Victor Wembanyama climbs to the topWith the fantasy basketball season behind us and the NBA playoffs in full swing, Dan Titus takes what he learned from this past campaign and reveals his first crack at next season's draft rankings.6 min read Thanks for your feedback! Sports·Yahoo SportsCommanders release DE Shaka Toney after being reinstated from gambling suspensionDays after being reinstated by the NFL after serving a one-year suspension for gambling, defensive end Shaka Toney was released by the Washington Commanders.1 min read Thanks for your feedback! Sports·Yahoo SportsNelly Korda grabs historic 5th straight win, 2nd major title with victory at Chevron ChampionshipNelly Korda is now just the third player in LPGA Tour history to win in five straight starts, and the first since Annika Sorenstam did so in 2004-05.5 min read Thanks for your feedback! Business·AutoblogThe new Ford Mustang's V8 is available as a crate engineFord offers the new Mustang's updated 5.0-liter Coyote V8 as a crate engine, and it also sells a supercharger kit that unlocks a total of 810 horsepower.1 min read Thanks for your feedback! Business·Yahoo FinanceRe-vote on Elon Musks pay could expose Tesla to even more legal troubleTesla is likely in for some fresh legal entanglements after recommending stockholders vote to reinstate Elon Musks compensation package.5 min read Thanks for your feedback! Sports·The Maize And Blue ReviewMichigan Football: Position-by-position spring game takeaways on offenseMichigan held its spring game at Michigan Stadium on Saturday and there is plenty to discuss, especially on the offensive side of the ball. How does the quarterback battle look? To take sample size into account, was Jayden Denegal's performance an indicator that Alex Orji and Davis Warren are steps ahead in the quarterback race?5 min read Thanks for your feedback! Sports·Yahoo SportsRockies nearly lose after fan interference call takes away walk-off home runThe Colorado Rockies thought they had a walk-off win over the Seattle Mariners, but a home run was taken away by a fan interference call.2 min read Thanks for your feedback! Business·Yahoo FinanceWhy the Magnificent 7's 'momentum is collapsing'Six of the largest tech companies are expected to see earnings growth slow over the next year, leaving room for other companies to lead the next leg of the stock market rally, UBS analysts say.3 min read Thanks for your feedback! Lifestyle·Yahoo Canada StyleDeal alert: This Amazon swimsuit is 'very flattering' — get it on sale for under $30Hurry to snag this lightning deal while you can!2 min read Thanks for your feedback! Business·Yahoo FinanceFed's favorite inflation gauge and Big Tech earnings greet a slumping stock market: What to know this weekWith the stock market rally at its most fragile stage in months, big tech earnings, a reading on economic growth and a fresh inflation print are set to greet investors in the week ahead.7 min read Thanks for your feedback! Sports·The Maize And Blue ReviewMichigan Football: Position-by-position spring game takeaways on defenseMichigan's defense returns a handful of familiar names from an elite defensive unit a year ago but will have to replace some key contributors from that squad. What do the Wolverines have? This group will be the catalyst of what could be an elite defense again this season.4 min read Thanks for your feedback! Sports·Yahoo SportsNASCAR: Tyler Reddick wins at Talladega as Michael McDowell triggers massive crashIt's Reddick's first win of the season and it came as McDowell crashed from the lead on the final lap.5 min read Thanks for your feedback! Sports·Yahoo SportsFantasy Baseball: The top starting pitchers to stream in Week 4Week 4 of the fantasy baseball season actually is the opportune time to start streaming. Fred Zinkie breaks down the schedule and the options.5 min read Thanks for your feedback! Business·Yahoo FinanceHow Jay Powell and the Fed pivoted back to higher for longerFed Chair Jay Powell and other Fed officials struck a more hawkish stance this past week, setting off a new debate across Wall Street about how the rest of 2024 could play out.6 min read Thanks for your feedback! Health·Yahoo LifeQuiz: How much do you know about marijuana? Test your knowledge now.Test yourself on side effects, which states have legalized marijuana for recreational use and more.1 min read Thanks for your feedback! Sports·Yahoo SportsNBA playoffs: Damian Lillard's 35 powers Giannis-less Bucks in Game 1 drubbing of PacersWelcome back to the playoffs, Damian Lillard.5 min read Thanks for your feedback! Sports·Yahoo SportsEx-Duke All-ACC guard Jeremy Roach commits to BaylorRoach is one of nine players to leave Duke this offseason in the transfer portal or to the NBA draft.2 min read Thanks for your feedback! Sports·Yahoo SportsTyrese Maxey listed as questionable for Game 2 of 76ers-Knicks playoff seriesPhiladelphia 76ers guard Tyrese Maxey is listed as questionable for Monday's Game 2 of their NBA playoff series with the New York Knicks.1 min read Thanks for your feedback! Sports·HawgBeatArkansas basketball to host four-star forward Karter KnoxHead coach John Calipari and the Arkansas basketball team are set to host a four-star forward in the 2024 class on Monday, according to multiple reports. Karter Knox — who was committed to Kentucky but recently reopened his recruitment after Calipari's exodus to Arkansas — will be in Fayetteville just after participating in the Overtime Elite Combine.2 min read Thanks for your feedback! Lifestyle·Idaho StatesmanUp in flames: These household items are setting Boise-area garbage trucks on fireThe problem is on the rise in the Treasure Valley, and internationally  sparking a movement to keep these out of the trash.5 min read Thanks for your feedback! Opinion·The New RepublicTrump Suffers a Major Loss Just Minutes into Hush-Money TrialSome of the former presidents actions are coming back to bite him.2 min read Thanks for your feedback! US·CNNA White author calculated just how much racism has benefited her. Heres what she foundJournalist Tracie McMillan traces just how much of her familys modest wealth can be attributed to race in “The White Bonus: Five Families and the Cash Value of Racism in America.”13 min read Thanks for your feedback! Health·INSIDERA woman lost 55 pounds making 2 easy changes to her diet and exercise habits. Wegovy got her to 105.A woman eased herself into exercising and healthy eating and lost 55 pounds. That gave her the foundation she needed to lose another 50lbs on Wegovy healthily.8 min read Thanks for your feedback! US·WJBF AugustaDiver pinned under water by alligator figured he had choice; lose his arm or lose his lifeGOOSE CREEK, S.C. (AP) — Out of air and pinned by an alligator to the bottom of the Cooper River in South Carolina, Will Georgitis decided his only chance to survive might be to lose his arm. The alligator had fixed his jaws around Georgitis arm and after he tried to escape by stabbing it […]2 min read Thanks for your feedback! US·Pioneer Press, St. Paul, Minn.Target store in Woodbury on lockdown; SWAT team dispatched to active sceneA SWAT team was dispatched Monday morning to the Target store in Woodbury Village for an “active scene,” police said. The Minnesota Bureau of Criminal Apprehension, which investigates when law enforcement officers uses force, said the agency is responding to a use-of-force incident at the location in Woodbury on Monday afternoon. The store, located near Valley Creek and Interstate 494, was ...1 min read Thanks for your feedback! Entertainment·SheKnowsAnne Hathaway Says She Stopped Being Offered Rom-Com Roles After This Major Life EventAhead of her highly-anticipated rom-com return in The Idea of You, Anne Hathaway is explaining why we havent seen her play a romantic lead in several years after a string of hits including Love & Other Drugs and One Day. Hathaway, 41, is starring in Prime Videos The Idea of You, which premieres on May …2 min read Thanks for your feedback! US·USA TODAYFamily mourns Wisconsin mother of 10 whose body was found in trunkAuthorities havent named the man arrested in connection to Tomitka Jurnett-Stewart's death but her death is related to domestic violence..3 min read Thanks for your feedback! Style·CNNCoachella 2024: The most stand-out celebrity stage looksAs the dust settles on the desert festival for another year, its clear the most memorable fashion moments happened on stage.2 min read Thanks for your feedback! Celebrity·PeoplePregnant Hilary Duff Says She's 'No Longer Responding' to Messages About 'When Baby Is Coming'The actress is currently expecting her fourth baby, her third with husband Matthew Koma2 min read Thanks for your feedback! Celebrity·CNNVictoria Beckham reunites with the Spice Girls for iconic singalong at 50th birthday partyVictoria Beckhams birthday party on Saturday got a little spicy while reuniting with her fellow Spice Girls for an impromptu singalong.1 min read Thanks for your feedback! Celebrity·PeopleAnne Hathaway Recalls 'Gross' Request to Kiss 10 Men for a Costar Chemistry Test: I 'Pretended I Was Excited'"I thought, 'Is there something wrong with me?' because I wasnt excited," she recalled2 min read Thanks for your feedback! Celebrity·PeopleAs Amber Heard Marks Her 38th Birthday, Get to Know Her Quiet Life in Madrid Two Years After Johnny Depp TrialThe actress moved to Spain for a quieter life with daughter Oonagh, who turned 3 earlier in April3 min read Thanks for your feedback! Lifestyle·CNNHow moving from the US to Costa Ricas blue zone transformed this familys life foreverAfter a series of bad events, Kema Ward-Hopper and Nicholas Hopper, abandoned life in Texas and moved to Costa Rica. Seven years later theyre feeling the benefits.10 min read Thanks for your feedback! Entertainment·CinemaBlendWhy The Trailer For M. Night Shyamalan's New Serial Killer Movie Has Me Convinced It's A Secret Sequel To One Of His Biggest FilmsThe trailer to M. Night Shyamalan's new serial killer thriller Trap has convinced me it's a secret sequel to one of his most famous flicks.4 min read Thanks for your feedback! World·Associated PressMexico's leading presidential candidate stopped by masked men who ask for help in stemming violenceMasked men stopped a vehicle carrying Mexicos leading presidential candidate while she was traveling between campaign stops Sunday to ask that she address the violence in the southern state of Chiapas if she wins the June 2 election. The border area of Chiapas has been plagued by violence as the rival Sinaloa and Jalisco New Generation cartels battle for territory.2 min read Thanks for your feedback! Politics·HuffPostMary Trump 'Can't Help Laughing' At This 'Schadenfreude' In Uncle's TrialDonald Trump's niece suggested what he's "probably been dreading" for decades.1 min read Thanks for your feedback! Celebrity·SheKnowsTaylor Swift Just Got Her Flowers From This Music Icon & Even Swift Must Be Freaking OutAs the biggest pop star in the world right now, its safe to say most people would be starstruck to meet or just see Taylor Swift in person. And while that may be true, the singer has some idols herself that would leave her speechless. Among them is singer, poet and artist Patti Smith, who …2 min read Thanks for your feedback! NextTrending Now1.Donald Trump2.RBC Heritage3.Passover4.NFL Draft5.Russia-Ukraine War6.Columbia University7.Rock And Roll Inductees8.Taylor Swift9.Kim Kardashian10.Dubai FloodingAdvertisementWeatherWroclawView your LocationsDetect my locationView your LocationsPlease enable location service for your browserEnter City or ZipcodeTodayClear. Winds variable at 3 to 6 mph (4.8 to 9.7 kph). The overnight low will be 27 °F (-2.8 °C).43°27°TueMostly cloudy today with a high of 51 °F (10.6 °C) and a low of 31 °F (-0.6 °C). There is a 56% chance of precipitation.51°31°WedRain today with a high of 51 °F (10.6 °C) and a low of 27 °F (-2.8 °C). There is a 75% chance of precipitation.51°27°ThuScattered showers today with a high of 45 °F (7.2 °C) and a low of 34 °F (1.1 °C). There is a 63% chance of precipitation.45°34°See moreScoreboardChange Sports to display different scoresTrending MLB NBA NCAAB NCAAF NFL NHL YesterdayTodayTomorrowOAK 0Bot 5thNYY 0IND 94FinalMIL 109NO 92FinalOKC 94See moreDaily HoroscopeChange your horoscope signAquariusAriesCancerCapricornGeminiLeoLibraPiscesSagittariusScorpioTaurusVirgoTaurusApril 22 -Your love life may see a boost today, or you may uncover a new passion for a hobby or your work. Whatever it is, expect some intensity as you dive as deeply as you can into the new thing. See moreAdvertisement 

Ćwiczenie 3: Napisz program do pobrania tekstu ze strony Wydziału Matematyki i Informatyki. Pobierz cały tekst ze strony głównej a następnie wyszukaj na tej stronie wszystkich linków wewnętrznych i pobierz tekst ze stron wskazywanych przez te linki. Nie zagłębiaj się już dalej.

from urllib.parse import urljoin, urlparse

def scrape_wmi(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        else:
            print(f"Błąd przy pobieraniu strony {url}")
    except requests.RequestException as e:
        print(f"Nie udało się połączyć z {url}. Błąd: {e}")
    return ""

def scrape_wmi_links(url, html):
    soup = BeautifulSoup(html, 'html.parser')
    links = set()
    for link in soup.find_all('a', href=True):
        link_url = urljoin(url, link['href'])
        if urlparse(link_url).netloc == urlparse(url).netloc:
            links.add(link_url)
    return links

def get_text_from_links(url):
    tekst_glowny = scrape_wmi(url)
    linki_wewnetrzne = scrape_wmi_links(url, tekst_glowny)
    
    teksty = {'Strona główna': tekst_glowny}
    for link in linki_wewnetrzne:
        teksty[link] = scrape_wmi(link)
    
    return teksty
url = 'https://wmi.amu.edu.pl/'

teksty_z_linkow = get_text_from_links(url)

for adres, tekst in teksty_z_linkow.items():
    print(f"Tekst z {adres}:")
    print(tekst[:200]) 
Błąd przy pobieraniu strony https://wmi.amu.edu.pl/intranet-studenta
Błąd przy pobieraniu strony https://wmi.amu.edu.pl/intranet-pracownika
Tekst z Strona główna:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/konkurs-im.-edyty-szymanskiej:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Konkurs im. Edyty Szymańskiej | Wydział Matematyki i Informatyki</title>
        <meta
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-ii-stopnia/nauczanie-matematyki-i-informatyki:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Nauczanie matematyki i informatyki | Wydział Matematyki i Informatyki</title>
        
Tekst z https://wmi.amu.edu.pl/30-lecie/wydarzenia/wyklad-nr-19-problem-stabilnego-skojarzenia,-czyli-o-trwalych-malzenstwach-i-nie-tylko:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wykład nr 19: Problem stabilnego skojarzenia, czyli o trwałych małżeństwach i nie tylk
Tekst z https://wmi.amu.edu.pl/dla-szkol/wspolpraca-ze-szkolami:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Współpraca ze szkołami | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/dla-kandydata/stypendium-rektora-dla-laureatow-finalistow-olimpiad:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Stypendium rektora dla laureatów i finalistów olimpiad | Wydział Matematyki i Informat
Tekst z https://wmi.amu.edu.pl/dla-kandydata/rekrutacja-krok-po-kroku:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Rekrutacja krok po kroku | Wydział Matematyki i Informatyki</title>
        <meta name
Tekst z https://wmi.amu.edu.pl/#main:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/dla-pracownika:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Dla Pracownika | Wydział Matematyki i Informatyki</title>
        <meta name="viewport
Tekst z https://wmi.amu.edu.pl/30-lecie:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> 30-LECIE | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" cont
Tekst z https://wmi.amu.edu.pl/dla-kandydata/akademia-cisco:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Akademia CISCO | Wydział Matematyki i Informatyki</title>
        <meta name="viewport
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-ii-stopnia/informatyka:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Informatyka | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" c
Tekst z https://wmi.amu.edu.pl/dla-kandydata/uniwersytet-otwarty:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Uniwersytet Otwarty | Wydział Matematyki i Informatyki</title>
        <meta name="vie
Tekst z https://wmi.amu.edu.pl/wiadomosci:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wiadomości | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/30-lecie/konferencja-wladz-uczelnianych-matematyki-i-informatyki-2023:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> KWUMI | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" content
Tekst z https://wmi.amu.edu.pl/wydzial/ai-tech:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Akademia Innowacyjnych Zastosowań Technologii Cyfrowych | Wydział Matematyki i Informa
Tekst z https://wmi.amu.edu.pl/wydzial/informator:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Informator | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/awanse-naukowe:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Awanse naukowe | Wydział Matematyki i Informatyki</title>
        <meta name="viewport
Tekst z https://wmi.amu.edu.pl/zapytania-ofertowe:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Zapytania ofertowe | Wydział Matematyki i Informatyki</title>
        <meta name="view
Tekst z https://wmi.amu.edu.pl/wydzial/struktura-wydzialu:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Struktura wydziału | Wydział Matematyki i Informatyki</title>
        <meta name="view
Tekst z https://wmi.amu.edu.pl/zycie-naukowe:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Życie naukowe | Wydział Matematyki i Informatyki</title>
        <meta name="viewport"
Tekst z https://wmi.amu.edu.pl/intranet-studenta:

Tekst z https://wmi.amu.edu.pl/dla-szkol/wmi-emi:
<!DOCTYPE html>
<html lang="pl-PL" >

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <link rel="profile" href="//gmpg.org/xfn/11">
    <link href='//fonts.gstatic
Tekst z https://wmi.amu.edu.pl/30-lecie/wyklady-naukowe-z-okazji-30-lecia-wmi:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wykłady naukowe z okazji 30-lecia WMI | Wydział Matematyki i Informatyki</title>
     
Tekst z https://wmi.amu.edu.pl/wydzial/baza-wiedzy:
<!DOCTYPE html>

    <!--[if lte IE 10]>
        <p class="chromeframe">Używana wersja przeglądarki jest <strong>przestarzała</strong>. Zaktualizuj lub wymień przeglądarkę na inną.</p>
    <![endif]--
Tekst z https://wmi.amu.edu.pl/dla-kandydata/kola-i-organizacje-studenckie:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Koła i organizacje studenckie | Wydział Matematyki i Informatyki</title>
        <meta
Tekst z https://wmi.amu.edu.pl/30-lecie/zjazd-absolwentow:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Zjazd Absolwentów | Wydział Matematyki i Informatyki</title>
        <meta name="viewp
Tekst z https://wmi.amu.edu.pl/wydzial/projekty:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Projekty | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" cont
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu/ogolnopolska-konferencja-siup-studenckie-i-uczniowskie-pasje-edycja-i:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Ogólnopolska Konferencja SiUP  Studenckie i Uczniowskie Pasje edycja I | Wydział Mate
Tekst z https://wmi.amu.edu.pl/wiadomosci/ogolne/kontakt-do-tlumacza-jezyka-migowego:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Kontakt do tłumacza języka migowego | Wydział Matematyki i Informatyki</title>
       
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/konferencje:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Konferencje | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" c
Tekst z https://wmi.amu.edu.pl/wiadomosci/sukcesy/sukcesy-miedzyszkolnego-kola-olimpijskiego-z-matematyki:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Sukcesy Międzyszkolnego Koła Olimpijskiego z Matematyki | Wydział Matematyki i Informa
Tekst z https://wmi.amu.edu.pl/wiadomosci/sukcesy/ii-miejsce-studentow-informatyki-na-hackathonie-w-krakowie:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> II miejsce studentów Informatyki na Hackathonie w Krakowie | Wydział Matematyki i Info
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-ii-stopnia/analiza-i-przetwarzanie-danych:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Analiza i przetwarzanie danych | Wydział Matematyki i Informatyki</title>
        <met
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/profesorowie-czlonkowie-akademii:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Profesorowie Członkowie Akademii | Wydział Matematyki i Informatyki</title>
        <m
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-podyplomowe:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Studia podyplomowe | Wydział Matematyki i Informatyki</title>
        <meta name="view
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu/absolutorium-2024:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Absolutorium 2024 | Wydział Matematyki i Informatyki</title>
        <meta name="viewp
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-i-stopnia:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Studia I stopnia | Wydział Matematyki i Informatyki</title>
        <meta name="viewpo
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-ii-stopnia:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Studia II stopnia | Wydział Matematyki i Informatyki</title>
        <meta name="viewp
Tekst z https://wmi.amu.edu.pl/wiadomosci/konkursy/konkurs-na-najlepsza-prace-magisterska-dla-absolwentow-kierunku-nauczanie-matematyki-i-informatyki:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Konkurs na najlepszą pracę magisterską dla absolwentów kierunku nauczanie matematyki i
Tekst z https://wmi.amu.edu.pl/wspolpraca/wspolpraca-z-biznesem:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Współpraca z biznesem | Wydział Matematyki i Informatyki</title>
        <meta name="v
Tekst z https://wmi.amu.edu.pl/wydzial/kontakt:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Kontakt | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" conte
Tekst z https://wmi.amu.edu.pl/rss:
<?xml version="1.0" encoding="utf-8"?><rss version="2.0"><channel><title>Aktualności</title><link>https://wnpid.amu.edu.pl</link><description>Aktualności UAM</description><item><title>Sukcesy Międzysz
Tekst z https://wmi.amu.edu.pl/wydzial/biblioteka:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Biblioteka wydziałowa | Wydział Matematyki i Informatyki</title>
        <meta name="v
Tekst z https://wmi.amu.edu.pl/wydzial:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" conten
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/doktorzy-honoris-causa:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Doktorzy honoris causa | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/30-lecie/wydarzenia:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wydarzenia | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/dla-szkol:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Dla szkół | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" con
Tekst z https://wmi.amu.edu.pl/deklaracja-dostepnosci:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Deklaracja dostępności | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/30-lecie/wydarzenia/wyklad-nr-20-metody-analizy-nieliniowej-w-wybranych-zagadnieniach-zagospodarowania-przestrzennego-i-planowania-transportu:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wykład nr 20: Metody analizy nieliniowej w wybranych zagadnieniach zagospodarowania pr
Tekst z https://wmi.amu.edu.pl/:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/wydzial/rada-naukowa-dyscyplin:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Rada Naukowa Dyscyplin | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/wydzial/rady-programowe:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Rady programowe | Wydział Matematyki i Informatyki</title>
        <meta name="viewpor
Tekst z https://wmi.amu.edu.pl/en:
<!DOCTYPE html>
<html lang="en" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Faculty of Mathematics and Computer Science | Faculty of Mathematics and Computer Scien
Tekst z https://wmi.amu.edu.pl/dla-studenta:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Dla Studenta | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" 
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-doktoranckie:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Studia doktoranckie | Wydział Matematyki i Informatyki</title>
        <meta name="vie
Tekst z https://wmi.amu.edu.pl/wspolpraca/targi-pracy-i-stazy-branzy-it:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Targi pracy i staży branży IT | Wydział Matematyki i Informatyki</title>
        <meta
Tekst z https://wmi.amu.edu.pl/#top:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/30-lecie/galeria:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Galeria | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" conte
Tekst z https://wmi.amu.edu.pl/30-lecie/harmonogram:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Harmonogram | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" c
Tekst z https://wmi.amu.edu.pl/wydzial/pracownicy:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Pracownicy | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/wiadomosci/ogolne/ankietyzacja-przyjazne-biuro-obslugi-studentow-2024:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Ankietyzacja Przyjazne Biuro Obsługi Studentów 2024 | Wydział Matematyki i Informatyki
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu/wyklad-z-informatyki-im.-mariana-rejewskiego,-jerzego-rozyckiego,-henryka-zygalskiego-2024:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wykład z informatyki im. Mariana Rejewskiego, Jerzego Różyckiego, Henryka Zygalskiego 
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/cykle-wykladow:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Cykle wykładów | Wydział Matematyki i Informatyki</title>
        <meta name="viewport
Tekst z https://wmi.amu.edu.pl/wydzial/studia-z-przyszloscia:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Kierunek Nauczanie Matematyki i Informatyki z certyfikatem "Studia z Przyszłością" i L
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-i-stopnia/nauczanie-matematyki-i-informatyki:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Nauczanie matematyki i informatyki | Wydział Matematyki i Informatyki</title>
        
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu/publiczna-obrona-rozprawy-doktorskiej-mgra-roberta-kolassy:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Publiczna obrona rozprawy doktorskiej mgra Roberta Kolassy | Wydział Matematyki i Info
Tekst z https://wmi.amu.edu.pl/#main-nav:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-ii-stopnia/matematyka:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Matematyka | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/#accept-cookie:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/wydzial/wladze-wydzialu:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Władze wydziału | Wydział Matematyki i Informatyki</title>
        <meta name="viewpor
Tekst z https://wmi.amu.edu.pl/mapa-serwisu2:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Mapa serwisu | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" 
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-i-stopnia/informatyka-kwantowa:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Informatyka kwantowa | Wydział Matematyki i Informatyki</title>
        <meta name="vi
Tekst z https://wmi.amu.edu.pl/oferty-pracy:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Oferty pracy | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" 
Tekst z https://wmi.amu.edu.pl/wiadomosci/sukcesy/stypendium-ministra-nauki-dla-pani-natalii-adamskiej:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Stypendium Ministra Nauki dla pani Natalii Adamskiej | Wydział Matematyki i Informatyk
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wydarzenia | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/wydarzenia-wydzialu/ogolnopolska-konferencja-studentow-matematyki-c2:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Ogólnopolska Konferencja Studentów Matematyki Θβιcε | Wydział Matematyki i Informaty
Tekst z https://wmi.amu.edu.pl/dostepnosc:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Dostępność | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/wydzial/wmi-w-mediach:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> WMI w mediach | Wydział Matematyki i Informatyki</title>
        <meta name="viewport"
Tekst z https://wmi.amu.edu.pl/wspolpraca:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Współpraca | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/towarzystwa-i-redakcje:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Towarzystwa i redakcje | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/wspolpraca/wspolpraca-ze-szkolami:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Współpraca ze szkołami | Wydział Matematyki i Informatyki</title>
        <meta name="
Tekst z https://wmi.amu.edu.pl/intranet-pracownika:

Tekst z https://wmi.amu.edu.pl/wydzial/o-wydziale:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> O wydziale | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/dla-kandydata/samorzad-studencki:
<!DOCTYPE html>
<html lang="pl-PL">
<head>
<meta name="viewport" content="width=device-width, user-scalable=yes, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge" /><meta charse
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-i-stopnia/informatyka:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Informatyka | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" c
Tekst z https://wmi.amu.edu.pl/wydzial/historia:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Historia | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" cont
Tekst z https://wmi.amu.edu.pl:
<!DOCTYPE html>
<html lang="pl" class="no-js home ">
    <head>
        <meta charset="utf-8" />
        <title> Wydział Matematyki i Informatyki | Wydział Matematyki i Informatyki</title>
        <me
Tekst z https://wmi.amu.edu.pl/dla-kandydata:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Dla Kandydata | Wydział Matematyki i Informatyki</title>
        <meta name="viewport"
Tekst z https://wmi.amu.edu.pl/wydzial/dih:
<!DOCTYPE html>
<html lang="pl-PL" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://ogp.me/ns/fb#" >
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, i
Tekst z https://wmi.amu.edu.pl/wydzial/nauczyciel-mistrz-innowator:
<!DOCTYPE html>
<!--[if IE 7]>
<html class="ie ie7" lang="pl-PL">
<![endif]-->
<!--[if IE 8]>
<html class="ie ie8" lang="pl-PL">
<![endif]-->
<!--[if !(IE 7) & !(IE 8)]><!-->
<html lang="pl-PL
Tekst z https://wmi.amu.edu.pl/dla-kandydata/studia-i-stopnia/matematyka:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Matematyka | Wydział Matematyki i Informatyki</title>
        <meta name="viewport" co
Tekst z https://wmi.amu.edu.pl/zycie-naukowe/wyklady-i-seminaria:
<!DOCTYPE html>
<html lang="pl" class="no-js inner ">
    <head>
        <meta charset="utf-8" />
        <title> Wykłady i seminaria | Wydział Matematyki i Informatyki</title>
        <meta name="vie

Omówione wyżej techniki działają również bardzo dobrze dla zasobów słownikowych.

Ćwiczenie 4: Pobierz jak najwięcej słów w języku albańskim z serwisu glosbe.com.

def scrape_shqip():
    url = "https://glosbe.com/sq/en"
    slowa = set()

    while True:
        response = requests.get(url)
        if response.status_code != 200:
            print("Nie udało się pobierać danych")
            break

        soup = BeautifulSoup(response.text, 'html.parser')

        for item in soup.find_all('a', href=True):
            if '/sq/' in item['href']:
                slowo = item.text.strip()
                if slowo:
                    slowa.add(slowo)

    return list(slowa)
    slowa = set()

    while True:
        response = requests.get(url)
        if response.status_code != 200:
            print("Nie udało się pobierać danych")
            break

        soup = BeautifulSoup(response.text, 'html.parser')

        for item in soup.find_all('a', class_='PhMEF'):
            slowo = item.text.strip()
            if slowo:
                slowa.add(slowo)

    return list(slowa)