Lemma are from the Henry George Liddell, Robert Scott, A Greek-English Lexicon
Models are available in releases.
task | Accuracy | Accuracy Ambiguous |
---|---|---|
case | 0.9612 | 0.8854 |
degree | 0.9926 | 0.9596 |
gender | 0.9436 | 0.8296 |
lemma | 0.954 | 0.9097 |
mood | 0.9913 | 0.957 |
num | 0.9841 | 0.9589 |
pers | 0.9864 | 0.9219 |
pos | 0.9287 | 0.8805 |
tense | 0.9917 | 0.9588 |
voice | 0.9915 | 0.9606 |
- Run
build.py
to get the "simple" training data- Warning: default output is NFKD
- Run
build-normalized.py
to get nfd and nfc data
- "Gorman Trees", Vanessa Gorman, University of Nebraska-Lincoln, https://github.com/perseids-publications/gorman-trees, https://doi.org/10.5281/zenodo.3596009
- "Daphne Trees", Francesco Mambrini, https://github.com/perseids-publications/daphne-trees
- "Pedalion Trees", Toon Van Hal et al., https://github.com/perseids-publications/cst-trees
- "Perseus Treebank Data", G. Celano et al., https://github.com/PerseusDL/treebank_data
- "Harrington Trees", J. Matthew Harrington, https://github.com/perseids-publications/harrington-trees.git
Those are sources I do not know the status of (Gold ? Silver ? Bronze ? Wood ?)
- https://github.com/ezhenrik/sematia-tb
- https://github.com/DigitalHill/treebank-data
- https://github.com/danielrruf/AristarchusTreebank-Lit
- https://github.com/Drewlatimer/student-data
- https://github.com/polinayordanova/Treebank-of-Aphtonius-Progymnasmata
Licence are the one from the original repositories. Converted data inherits the
Mozilla Public Licence
- 1,068,131 tokens,
- including 115,412 punctuation signs
- 56,133 different sentences
91 chars found
Char | Count |
---|---|
7743 | |
" | 4219 |
% | 4 |
' | 6745 |
( | 704 |
) | 702 |
, | 142218 |
- | 7085 |
. | 66860 |
0 | 1 |
1 | 5727 |
2 | 3197 |
3 | 1616 |
4 | 2 |
: | 7638 |
; | 7268 |
< | 72 |
> | 74 |
? | 137 |
[ | 577 |
] | 571 |
j | 3 |
{ | 1 |
~ | 38 |
· | 31204 |
ʽ | 17 |
̀ | 230277 |
́ | 1123673 |
̄ | 25 |
̆ | 8 |
̈ | 3682 |
̓ | 584276 |
̔ | 287290 |
͂ | 249187 |
ͅ | 38177 |
Α | 24953 |
Β | 1412 |
Γ | 1957 |
Δ | 4253 |
Ε | 7741 |
Ζ | 2358 |
Η | 2125 |
Θ | 2724 |
Ι | 4642 |
Κ | 9669 |
Λ | 5939 |
Μ | 6123 |
Ν | 1777 |
Ξ | 728 |
Ο | 3754 |
Π | 9063 |
Ρ | 2739 |
Σ | 6155 |
Τ | 5237 |
Υ | 586 |
Φ | 3391 |
Χ | 903 |
Ψ | 34 |
Ω | 346 |
α | 957329 |
β | 53775 |
γ | 152992 |
δ | 248067 |
ε | 880724 |
ζ | 23108 |
η | 294280 |
θ | 112297 |
ι | 845411 |
κ | 294851 |
λ | 281371 |
μ | 315232 |
ν | 617318 |
ξ | 30632 |
ο | 968199 |
π | 330404 |
ρ | 379429 |
ς | 479697 |
σ | 271423 |
τ | 541687 |
υ | 398026 |
φ | 81370 |
χ | 95052 |
ψ | 8992 |
ω | 340318 |
ϝ | 13 |
— | 388 |
‘ | 2 |
’ | 5404 |
“ | 4 |
† | 74 |
⏑ | 4 |