- The global corpus of pre-modern texts is small when compared with modern corpora, and notwithstanding the occasional discovery does not grow.
- The state of the art of natural-language processing is data-hungry machine learning techniques.
- Hence, if research in low-resource languages like
Ancient Greek
,Latin
,Old English
,Pali
,Sanskrit
andClassical Chinese
is to be able to leverage these tools long-term, a strategy for maximizing the data inherent in the small corpus must be adopted. - Like any corpus, the pre-modern corpus can be subjected to data augmentation methods like "sliding window" and shuffling, however data augmentation only gets you so far.
- The real superpower of the corpus lies in the wealth of alternative readings present in critical editions. Every single
- Alternative readings can be either editorial conjectures or differing manuscripts.
-
Notifications
You must be signed in to change notification settings - Fork 0
Urdatorn/kritikos
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Leveraging AI to streamline the OCR of critical editions of pre-modern texts.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published