Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DATA_VERSION 2 to froggen.settings #7

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tcbrouwer
Copy link

@tcbrouwer tcbrouwer commented Apr 12, 2022

I tried to run frog-dum and frog-nld-vnn from docker.

docker run proycon/frog -c /usr/share/frog/dum/frog.cfg

This results in

/usr/share/frog/dum/froggen.settings has a PROBLEM!
No DATA_VERSION setting is found but this version of MbT expects at least 2
Cannot read settingsfile /usr/share/frog/dum/froggen.settings
ucto: textcat configured from: /usr/share/ucto/textcat.cfg
frog-:Initialization failed for: [tagger] 
frog-:fatal error: Frog init failed

For now I fix this by extending the docker image and copying custom setting files with the DATA VERSION prepended.

COPY froggen.settings /usr/share/frog/dum/froggen.settings
COPY froggen.settings /usr/share/frog/nld-vnn/froggen.settings

I do however not know whether the files are actually 'DATA_VERSION' 2. If they are, this PR seems to fix the problem. Otherwise I need another solution.

@kosloot
Copy link
Contributor

kosloot commented Apr 12, 2022

Well, although this trick works, it would be a much better idea to regenerate the data files to be used with the newest Mbt. (main improvement being that it better handles Unicode issues.
I did find the data-file for dum, but cannot access it (@proycon could you change the right on the server @ru ?)
I didn't yet find the original data for nld-vnn. I'm not sure where it stems from. @proycon any idea?

@tcbrouwer For the time being, your hack is sufficient. A regenerated Mbt file most likely just gives minor different results (like in Confidence values)

@proycon
Copy link
Member

proycon commented Apr 12, 2022

@tcbrouwer Thanks for the pull request. I didn't realize that this was now broken out-of-the-box, we definitely need to release a fixed frogdata then. Your fix looks like a quick and acceptable one, unless we indeed want to regenerate it properly like @kosloot suggests.

(@proycon could you change the right on the server @ru ?)

Done!

The best source however is https://github.com/INL/nederlab-linguistic-enrichment/tree/master/resources (it's a private repo at INT but you probably still have access there from back then). This was all done in the scope of the Nederlab project in 2018.

I didn't yet find the original data for nld-vnn. I'm not sure where it stems from. @proycon any idea?

That's this one frog-bab-cgn.

I think the source materialis non-free unfortunately, hence the private repo.

kosloot added a commit that referenced this pull request Apr 13, 2022
@kosloot
Copy link
Contributor

kosloot commented Apr 13, 2022

the 'dum' data is now updated.
nld-vnn is still to be done

@proycon
Copy link
Member

proycon commented Jul 22, 2022

This is now releasedin frogdata v0.21 , but nld-vnn remains to be done so the current state is not ideal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants