Skip to content

Releases: LanguageMachines/uctodata

v0.11

26 Apr 10:20
Compare
Choose a tag to compare
  • allow NON_SPACING_MARKERs inside NUMBERs in all languages
  • README.md: README: removed LaMachine reference

v0.10.1

02 Apr 11:37
Compare
Choose a tag to compare

Small addition:

  • allow NON_SPACING_MARKERs inside NUMBERs

v0.10

02 Apr 07:10
Compare
Choose a tag to compare
  • modernized configuration step
  • French list of known abbreviations was not used at all!
    added that
  • in English, for shouldn't be handled as an abbreviation, unless
    with a trailing dot. (.``)

v0.9.1

21 Jul 09:48
Compare
Choose a tag to compare

New English twitter config wasn't installed properly yet, fixed now.

v0.9

21 Jul 09:44
Compare
Choose a tag to compare

[Ko van der Sloot]

  • fix for PREFIX rules in french and italian
  • small fix to prevent loosing a character in the PREFIX rule. (see LanguageMachines/ucto#87 ) This doesn't fix the unwanted splits though.
  • added SYMBOL, PICTOGRAM and EMOTICON to setdefinitions
  • relaxed the e-mail rule a bit.

[Piroska Lendvai]

  • Suggestions for German abbreviations

[Antal van den Bosch]

  • New config file for English Twitter data. Recognizes and retains #hastags and @mentions.

v0.8

29 Nov 15:05
Compare
Choose a tag to compare

[Ko van der Sloot]

  • separated .abr files from there main files for all Languages
  • updated italian data (thanks to @texttheater)

[Iris Hendricks]

v0.7.1

17 May 12:24
Compare
Choose a tag to compare

Bug fix release:

  • install some datafiles originally provided by 'ucto'

v0.7

16 May 14:25
Compare
Choose a tag to compare

[Ko vd Sloot]

  • tokconfig-nld-historical: typo in rule
  • updated all languages with new ABBREVIATION and NUMBER-ORDINAL rules:
    = accommodate ABBREVIATIONS within brackets.
    = avoid needless backtracking in NUMBER-ORDINAL

[Maarten van Gompel]

  • Apparent bug in Italian config

v0.6

04 Apr 20:08
Compare
Choose a tag to compare

several fixes for problems addressed in
LanguageMachines/ucto#46
Notes:

  • the suffix problems were already addressed in 0.5
  • the colon problem is not addressed. Do we need REVERSE-SMILEY?

v0.5

18 Oct 10:03
Compare
Choose a tag to compare
  • adding slightly adapted tokenizer configuration for historical dutch
    (a bit more conservative on splitting).
    INL/nederlab-linguistic-enrichment/#7