From 1d4cb555576c20bc1c737afcdde22477834e3411 Mon Sep 17 00:00:00 2001 From: Dominik Date: Tue, 30 Mar 2021 22:45:49 +0200 Subject: [PATCH] Updated "README" --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index e69de29..5c871a5 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,11 @@ +# Zadanie "robot haskell" + +## Przykładowy output: +```ShadowItem {url = Just "https://aneks.kulturaliberalna.pl/wp-content/uploads/2016/02/51%C3%94%C3%87%C3%B452-With-Watermark.pdf", title = "\nNr 51\8211\&52 1988\n Wy\347wietl ca\322y numer (PDF)", itype = "periodical", originalDate = Just "2016", creator = Nothing, format = Just "pdf", lang = Just "pol", finalUrl = "https://aneks.kulturaliberalna.pl/wp-content/uploads/2016/02/51%C3%94%C3%87%C3%B452-With-Watermark.pdf", description = Nothing}``` + +## Ekstrakcja info o .pdf ze strony https://aneks.kulturaliberalna.pl/archiwum-aneksu/: +```haskell +extractRecords = extractLinksWithText "//a[contains(@title,'Aneks') and contains(text(),'Nr')]" + >>> second (arr $ replace "\r\n " " ") + >>> first (extractLinksWithText "//div/a[contains(@href,'.pdf')]") -- pobieramy stronę z adresu URL i wyciągamy linki z tej strony pasujące do wyrażenia XPathowego + -- ostatecznie wyjdą trójki? ((Link, tekst: "Wyświetl cały numer"), Numer Magazynu)```