Skip to content

Latest commit

 

History

History
12 lines (7 loc) · 1.4 KB

README.md

File metadata and controls

12 lines (7 loc) · 1.4 KB

Lemmatizing medieval French with Pie & RNNTagger

This repository contains a couple of notebooks created to easily generate a langague model for Pie and RNNTagger for tagging and lemmatizing medieval French with no prior local installation and fastened training. Any training dataset can be used if the language-specific parameters are properly configured for each tool's parameters file. Also, it includes the possibility of tagging files from our Drive with the generated models.

Short description of the task

We've trained a model for medieval French with RNNTagger and Pie in order to tag a number of texts with the Cattex09 morphosyntactic labels. Two different corpora are used for training :

  • BFMGOLDLEM corpus, fully annotated in parts of speech (UD and Cattex POS tags) including a number of morphological labels. The lemmas in this corpus were previously standarized in a previous work. This corpus consists of 431,144 tokens distributed in 20 texts (36.1MB).
  • BFMGOLD corpus, where only a small number of texts include all lemmas. It contains 1,187,061 tokens distributed in 42 texts (75.4MB).

A complete description can be found here [French]