-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of language tags in KGCL #60
Comments
Actually even if we decided that there can only be one label (and so, that we can ignore all cases where there are more than one label as being invalid and “not-our-problem”), that wouldn’t solve the general issue: KGCL supports modifying other annotation properties than just
Given a case like this:
What should be the behaviour of How about:
This way the decision to ignore the language tags when matching would be an explicit decision. (Of course we could also do the opposite: no language tag means “ignore the language tags when matching”, and a Then at the level of the Ontobot, individual ontologies can configure the bot to pass to the KGCL engine a |
Thanks, I think this makes sense, but I'd like to reverse it a little
Here there is a slight impedance mismatch with semantic web standards where there is always a literal type commitment (people get caught by this all the time with sparql queries, a string match "works" on one ontology but not another, without inefficient coercion to strings). However, there is less of an impedance mismatch with user expectations. |
I wouldn't hinge anything on You'll see this behavior when RDF passes through Jena, although confusingly the OWL spec was not updated to keep up. |
OK. Regardless of whether ignoring the language tags is the default behaviour (your proposition) or must be explicitly asked (mine,
If the command is
Because:
This is also, I believe, what most users would expect, so this is fine. But what if the command is The “logical” (but not necessarily sensible) output would be:
Because:
Here I don’t think this is a desirable behaviour. Even worse, let’s imagine a term that has labels in several languages, and that in two languages the labels are actually the same string (this won’t be frequent but it may happen; words that are identical across languages are not unheard of). For example, say we have:
A command like I propose something like:
Admittedly this is a bit complicated, but I think that should cover all cases reasonably. For example, given the command
Overall this should work just fine for ontologies that have a mixture of untagged and tagged labels. Aside:
Note that with recent versions of the OWL API, a literal without an explicit datatype (as in |
As discussed in INCATools/kgcl#60, when trying to find an annotation value (e.g. to rename a class, we need to find the annotation corresponding to the old label), language tags should be compared in a relaxed fashion. We should not fail to find an annotation just because the annotation has a language tag and the KGCL command did not specify any language tag at all. This necessitates some pretty important refactoring, because this means, among other things, that more than one annotation values may match (if several annotations have the same literal value but different language tags, or one has a language tag and another does not). This is still a work in progress. For now, this is implemented specifically for the NodeRename operation. After more testing, this will be generalized to all other operations that involve finding an existing annotation value (e.g. RemoveDefinition, ChangeDefinition, SynonymReplacement, etc.).
Sorry for the delay. Thank you for the analysis. I agree with your proposed solution. |
Currently handling of language tags is under-specified in KGCL, both in terms of
Recall also that most OBO ontologies use a mixture of uncommitted literals, xsd:string, and @en to denote english language labels.
As a general principle, the KGCL DSL is intended to be user-friendly. The user shouldn't have to know detailed implementation knowledge about each ontology. In fact it is very hard for them to know these details. As a case in point, for the following two terms in ENVO it's impossible to know from OLS that the first uses an explicit
@en
and the second does not:At the most recent OMO meeting there was heated discussion about whether we should expect cardinality=1 of rdfs:label given that some ontologies may want to be international. It's not up to KGCL to adjudicate here. However, we can make things easy for users:
2 This does place more of a burden on implementors as there needs to be some configuration mechanism, but having this default to untyped literals will work for pretty much all OBO ontologies for now
The text was updated successfully, but these errors were encountered: