Ajax Multi-Commentary

Ajax Multi-Commentary: code, data, and models

Here is a comprehensive list of code, data, and models created during the Ajax Multi-Commentary project (AjMC), a four-year research project funded by the Swiss National Science Foundation (grant PZ00P1_186033). The project team included M. Romanello (principal investigator), C. Pletcher (research software engineer), and S. Najem-Meyer (PhD student).

Repositories

Applications

ajmc-kodon: Minimal computing implementation of the Ajax multi-commentary platform based on Kōdōn
Kōdōn: Minimal-computing Javascript library built upon Svelte to publish classical commentaries
ajmc-elixir: Elixir implementation of the Ajax multi-commentary platform (discontinued)

Pipeline

ajmc-pipeline: Pipeline to process digitised classical commentaries, implemented in Python
ajmc_annotation_utils: Small Python package for dealing with documents annotated in INCEpTION
inception-recommender: An external recommender for INCEpTION with support for Classics NER

Data

Canonical data

TBA

Images

ajmc-public-commentaries: Image data of public domain commentaries
OCR_artificial_data_sample: Synthetic dataset for multi-script text line recognition (sample)

Texts

ReadableAjax: Basic HTML rendering of critical editions of Sophocles' Ajax encoded in TEI/XML
ajmc-tei: TEI/XML exports of public domain commentaries from the Ajax Multi-Commentary project

Datasets

OCR groundtruth: Ground-truth data for OCR of digitised classical commentaries
PLA groundtruth: Ground-truth data for PLA of digitised classical commentaries
NE-annotated corpus: Annotated corpus to support the tasks of named entity recognition, entity linking and citation mining on classicsl commentaries.
Lemma Linkage corpus (coming soon!): Annotated corpus to support the recognition and linking of commentary lemmas.

Machine learning Models

OCR (optical character recognition)
- OCR kraken models: Pre-trained OCR Kraken models for historical classical commentaries
- OCR transformer model: Transformer-based textline recognition model, implemented using PyTorch.
PLA (page layout analysis)
- Layout YOLO models: YOLOv5m models for page layout analysis of historical commentaries.
Language Models
- XLM-R-for-classics: XLM-RoBERTa-base model further pre-trained on a 1.4B tokens multi-lingual corpus of classical texts.
Task-specific models
- XLM-R-for-classics-EvaLatinPOS: model fine-tuned for Latin part-of-speech tagging using the Evalatin corpus.
- XLM-R-for-classics-AjMCNER: model fine-tuned for historical named-entity recognition using the AjMC-NE-Corpus.
- XLM-R-for-classics-AjMCLR: model fine-tuned for lemma recognition on commentaries texts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ajax Multi-Commentary

Ajax Multi-Commentary: code, data, and models

Repositories

Applications

Pipeline

Data

Canonical data

Images

Texts

Datasets

Machine learning Models

Pinned Loading

Repositories

People

Top languages

Most used topics