synthetic_errors/tokenizer.py
2022-04-24 20:48:00 +02:00

9 lines
223 B
Python

import spacy
class Tokenizer:
def __init__(self):
self.polish_tokenizer = spacy.load('pl_core_news_lg')
def tokenize(self, text):
return [tok.text for tok in self.polish_tokenizer.tokenizer(text)]