synthetic_errors/tokenizer.py

9 lines
223 B
Python
Raw Normal View History

2022-04-24 20:48:00 +02:00
import spacy
class Tokenizer:
def __init__(self):
self.polish_tokenizer = spacy.load('pl_core_news_lg')
def tokenize(self, text):
return [tok.text for tok in self.polish_tokenizer.tokenizer(text)]