mass-scraper/temat.md

## Inteligentny automatyczny system ekstrakcji informacji z witryn sieci WWW

### Intelligent automatic information extraction system from WWW sites

**Opis:**
Celem projektu magisterskiego jest stworzenie inteligentnego automatycznego systemu, który będzie przeszukiwał wszystkie strony parafii oraz zakonów w Polsce i wydobywał z nich godziny mszy świętych. System będzie udostępniał wyszukiwarkę online.

Praca magisterska omawiać będzie budowę tego typu systemów oraz algorytmy ekstrakcji danych.

**Literatura:**
* __Introduction to Information Retrieval__
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Cambridge University Press. 2008.
* __Information extraction__
Jim Cowie, Wendy Lehnert
Communications of the ACM, Volume 39 Issue 1, Jan. 1996, Pages 80-91
* __Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions__
Siddharth Patwardhan, Ellen Riloff
School of Computing University of Utah. 2007
* __Automatically Generating Extraction Patterns from Untagged Text__
Ellen Riloff
Department of Computer Science, University of Utah. 1996
* __Information extraction as a basis for high-precision text classification__
Ellen Riloff, Wendy Lehnert
ACM Transactions on Information Systems (TOIS), Volume 12 Issue 3, July 1994, Pages 296-333
* __Learning Information Extraction Rules for Semi-Structured and Free Text__
Stephen Soderland
Machine Learning, 1999, Volume 34, Number 1-3, Page 233