Here is a comprehensive list of code, data, and models created during the Ajax Multi-Commentary project (AjMC), a four-year research project funded by the Swiss National Science Foundation (grant PZ00P1_186033). The project team included M. Romanello (principal investigator), C. Pletcher (research software engineer), and S. Najem-Meyer (PhD student).
- ajmc-kodon: Minimal computing implementation of the Ajax multi-commentary platform based on Kōdōn
- Kōdōn: Minimal-computing Javascript library built upon Svelte to publish classical commentaries
- ajmc-elixir: Elixir implementation of the Ajax multi-commentary platform (discontinued)
- ajmc-pipeline: Pipeline to process digitised classical commentaries, implemented in Python
- ajmc_annotation_utils: Small Python package for dealing with documents annotated in INCEpTION
- inception-recommender: An external recommender for INCEpTION with support for Classics NER
TBA
- ajmc-public-commentaries: Image data of public domain commentaries
- OCR_artificial_data_sample: Synthetic dataset for multi-script text line recognition (sample)
- ReadableAjax: Basic HTML rendering of critical editions of Sophocles' Ajax encoded in TEI/XML
- ajmc-tei: TEI/XML exports of public domain commentaries from the Ajax Multi-Commentary project
- OCR groundtruth: Ground-truth data for OCR of digitised classical commentaries
- PLA groundtruth: Ground-truth data for PLA of digitised classical commentaries
- NE-annotated corpus: Annotated corpus to support the tasks of named entity recognition, entity linking and citation mining on classicsl commentaries.
- Lemma Linkage corpus (coming soon!): Annotated corpus to support the recognition and linking of commentary lemmas.
- OCR (optical character recognition)
- OCR kraken models: Pre-trained OCR Kraken models for historical classical commentaries
- OCR transformer model: Transformer-based textline recognition model, implemented using PyTorch.
- PLA (page layout analysis)
- Layout YOLO models: YOLOv5m models for page layout analysis of historical commentaries.
- Language Models
- XLM-R-for-classics: XLM-RoBERTa-base model further pre-trained on a 1.4B tokens multi-lingual corpus of classical texts.
- Task-specific models
- XLM-R-for-classics-EvaLatinPOS: model fine-tuned for Latin part-of-speech tagging using the Evalatin corpus.
- XLM-R-for-classics-AjMCNER: model fine-tuned for historical named-entity recognition using the AjMC-NE-Corpus.
- XLM-R-for-classics-AjMCLR: model fine-tuned for lemma recognition on commentaries texts.