This repository contains pretrained models for Pie (A Framework for Joint Learning of Sequence Labeling Tasks).
More on Pie: https://github.com/emanjavacas/pie.
Models are arranged by language. TODO: add a json documentation file per model.
german-ren.model.tar
: Lemmatizer pretrained on a subset of the Referenzkorpus Mittelniederdeutsch/Niederrheinisch: https://www.slm.uni-hamburg.de/ren.html
spanish-AnCora.model.tar
: Lemmatizer pretrained on the AnCora corpus for Spanish (part of the Universal Dependencies)
french-Geste.model.tar
: Lemmatizer pretrained on the Geste corpus
fro-poslemmes_cat-lemma-2019_01_22-02_34_11.tar
: lemmatizer and POS-tagger trained on the Geste corpus, and other Old French data from the École des chartes.
Target task: lemma.
Accuracy on test data
lemma: 0.9383
pos: 0.9473
fro-poslemmes_cat-lemma-2019_01_23-00_34_12
: same as the previous one, but using pre-trained word embeddings from a large unlabelled corpus.
Target task: lemma.
Accuracy on test data
lemma: 0.9409
pos: 0.9468
fro-poslemmes_cat-lemma-2019_01_24-00_05_57.tar
: same as the previous one, but using convolutions (cnn
) for the character embeddings.
Target task: lemma.
Accuracy on test data
lemma: 0.9462
pos: 0.9509
model_fro_poslemmesmorph.tar
: POS-tagger, lemmatizer and morphological analyzer trained on the Geste corpus
capitula.model.tar
: Lemmatizer pretrained on a non-open source dataset of medieval latin
turkish-IMST.model.tar
: Lemmatizer pretrained on the IMST corpus for Turkish (part of the Universal Dependencies)
lemma.config.json
is an example config file for training a lemmatizer to reasonable good accuracy.
For more information check the repo at , but in short:
virtualenv env -p python3.7
source env/bin/activate
pip3 install -r requirements.txt