-
Notifications
You must be signed in to change notification settings - Fork 0
Data Sources
TAJSchaaf edited this page Aug 14, 2025
·
5 revisions
This project uses two distinct datasets to test the accuracy of each NLP model. Each dataset provides a gold standard (GS) for lemmatisation and part-of-speech (POS) tagging.
Early medieval prose from the Latin Latin Charter Treebank
Data: ~25,000 tokens from the Latin Latin Charter Treebank dev file
Early medieval glosses from GAMS Gloss-Vibe
Data: 202 glosses (666 tokens) from the Venerable Bede’s De Temporum Ratione