-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to deal with proposed obsoletions (or proposed changes) overall #49
Comments
Trying to figure out how that would look like… Do you already have a clear idea in mind? Would it be under the form of annotations on the class concerned by the change? For example:
would yield an annotation like this:
? |
Syntax wise: If we want to allow changes to be “proposed“ (rather than acted upon directly) on a case-by-case basis (e.g. we want to allow people to describe both changes that must be implemented directly and changes that are merely “proposed“), we could have a For example:
? |
probably not! The two options are
I was thinking more like 2, but there are all kinds of issues here. We are "polluting" the ontology with individuals. We have implicit mapping of rdf predicates to APs and all the dangers that entails. It's quite unnatural from the POV of what we normally do in OBO ontologies. But it's very generic and can be used for any change type....
I think this makes sense, but often the "maybe" will be implicit. E.g. the way I think the mondo editors want to work is that it's always a maybe, and there is an explicit state transition from maybe to actual |
It would be very easy to add information about proposed changes in that form, but if the information is intended to be queryable (e.g. someone wants to know which terms are slated for obsoletion), that may not be the most practical. One would need to first extract the annotation (e.g. with SPARQL) and then to parse the KGCL to figure out what the change is (is it an obsoletion or any other kind of change). If the idea is already set to store the information directly into the ontology/KG, I’d be inclined to go all the way and store it in a “native” form that can be manipulated/queried as any other contents in the ontology.
So a change about an existing class is rendered as an annotation on the class. Likewise, a change about a relationship (e.g. kgcl:PredicateChange) can be rendered as an annotation on the axiom representing the relationship to be modified. But I am unsure about a change that proposes to create a new class. Where would we store that (what would we be annotating)? Ideas:
OK. I imagine something as follows:
|
I have a (very) rough, (very) experimental PoC in my wip/provisional-changes branch. It only supports This adds two new commands to my KGCL plugin for ROBOT:
|
Or, we could have a dedicated annotation property (e.g.
I think this would make slightly more sense, and it would also make it slightly easier to later extract the annotations to apply the pending changes – just extract all the |
This is quite ingenious
Of course it’s not ideal having two mappings to rdf/owl but the lack of
quoting in rdf (sigh) makes the direct form impractical as you point out
We might want to have a different namespace for the AP translation just in
case anyone ever wants to combine?
The lack of uniformity between NewX and other changes on X is bothering me
a little. In some ways having all proposals be ontology annotations is more
balanced. But having it be on the about entity is more direct and visible.
But maybe having it on the ontology makes it easier to see all pending
changes in one place? Like the TODOs at the top of a file?
Curious what others think!
…On Fri, Feb 9, 2024 at 4:02 AM Damien Goutte-Gattat < ***@***.***> wrote:
If we do end up storing the provisional changes as annotations, I’d
suggest that we avoid using anonymous individual like in my example above.
They cause at least two issues:
-
They are not rendered very nicely in Protégé:
Screenshot.2024-02-09.at.10.57.03.png (view on web)
<https://github.com/INCATools/kgcl/assets/53821801/55dcdf49-8433-4bb3-ba31-00384db1315c>
-
More importantly, they cannot be stored in an OBO file (not even,
surprisingly, in the owl-axioms header tag), which is obviously a
problem for ontologies that uses this format as their edit format. Sure,
this can be worked around by storing the provisional changes in a dedicated
component in another format, but that’s not great.
Instead, I’d suggest something like this:
AnnotationAssertion(
Annotation(kgcl:has_nondirect_replacement UBERON:2222222)
kgcl:NodeObsoletion UBERON:1111111 "Proposed for obsoletion"^^xsd:string)
where the annotation value is a small, human-readable string.
This would make the provisional change nicer to visualise in Protégé:
Screenshot.2024-02-09.at.11.33.53.png (view on web)
<https://github.com/INCATools/kgcl/assets/53821801/f553aac3-1ab8-4d64-a6b3-7b4a7d16b03b>
and make it perfectly serialisable in the OBO format:
property_value: https://w3id.org/kgcl/NodeObsoletion "Proposed for obsoletion" xsd:string {https://w3id.org/kgcl/has_nondirect_replacement="UBERON:2222222"}
—
Reply to this email directly, view it on GitHub
<#49 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOMHSY4P6DKELK2I2BDYSYF63AVCNFSM6AAAAABC4LFU72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZVHAYDOMZZGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Me too. One particular problem of class creation changes is that such changes would almost never occur in isolation. At the very least, a I am reluctant to use ontology-level annotations to represent all those changes that pertain to a single (new) class. I am not even sure I have a clear idea of what that would look like. For now, I think that actually creating the new class and then annotate it with all the other changes relevant for that class is the best option – but I am also curious of what other folks have to say. An obvious problem of creating the new class immediately is that there is a risk that people start using the class, maybe because they don’t realise it is provisional. But that’s something that can be checked against in CI. |
For clarity, the proposed storage methods envisioned so far are (assuming, for the example, that we want to store the change represented by A. String annotation using the KGCL DSL serialisation format
Simple, but not great as it requires whoever wants to know what the change is to (re-)parse the KGCL change again. B. Parameters stored in annotations, value is an individual
The use of an anonymous individual is annoying (it’s not displayed in an user-friendly manner in Protégé and cannot be serialised properly in the OBO flat file format). C. Similar, but value is a human-readable string
Intended to address the issues with B (displayable nicely in Protégé, serialisable in OBO). D. Value is the type of KGCL change
Also reasonably user-friendly, slightly more developer-friendly (just one annotation property to look for when querying for pending changes in an ontology). That’s what is currently implemented in KGCL-Java. (Note that |
I just finally got to read this proposal. This is extensive, and I think I like the general
I would advocate for system purely build around annotation properties, and avoid individuals, anonymous or otherwise. Some of our tools like SLME are pretty unreasonable when it comes to individuals (including even signature disjoint individuals). I dont know about anonymous individuals but my guess is we will have some OBO format issues with that. FWIW, my favourite is D (assuming that kgcl:PendingChange and kgcl:NodeObsoletion are both APs):
This allows me to SPARQL for all pending changes at once. |
Definitely. Though to be honest I don’t care that much about that. People who want to be able to use new features such as this one should be ready and willing to let go of old stuff like the OBO format (or be willing to invest the time to make the format evolve – and at the moment you say that, suddenly there’s nobody around anymore). If we can store “provisional changes“ in such a way that they can be preserved in the OBO format, that’s good, but if we can’t, I’ll just shrug. “Yeah, it won’t work in OBO. What did you expect? We can’t keep retrofitting new stuff into an old format.”
Good, because that’s we currently have in KGCL-Java. :) Modulo two things:
Yep, that was the idea behind this form. I also think it is shown nicely in Protégé, for example
is shown as: and
is shown as: And it’s not too ugly in the OBO format either:
|
Of course we could also have a KGCL plugin for Protégé that recognises these annotations and displays the pending changes in an even nicer way and… no, I’ll stop this line of thought right now. |
I don't think OBO Format limitations are relevant for this. Regardless of
expressivity, can keep using OBO format (sorry), and just store these in a
separate imported .owl file, we have this pattern for a lot of ontologies.
… Message ID: ***@***.***>
|
Right. I can add a |
It seems we are tending towards (D) and it is quite a practical solution. However, I want to fully articulate my original idea of a direct representation. Every instance in KGCL can be represented either is the KGCL DSL, or as a standard linkml serialization of the underlying data Class: Command: YAML:
Turtle:
I think the most elegant and long-term maintainable approach is to use this direct RDF form, augmented with standard vocabularies to represent things such as pending status (see for example https://www.w3.org/TR/vocab-dcat-3/#life-cycle). E.g
It involves no new mappings, no new annotation properties. Semantically and entailment-wise it behaves entirely as expected. I can query for NodeChanges and get all asserted instances of subclasses of NodeChange. There is no need to develop new sparql queries to check for things such as accidental use of annotation properties. For example, if I accidentally make triples such as: CHANGE:001 kgcl:about_node GO:0005635 ;
kgcl:about_node_representation "curie" ;
kgcl:new_value "foo bar" ;
kgcl:new_value "foo baz" ;
kgcl:old_value "nuclear envelope" ;
rdf:type kgcl:NodeRename . Then existing mechanisms will flag this. The existing triples could be loaded into massive triplestore of all changes in all obo ontologies, with powerful querying over the direct representation, using the standard kgcl vocabulary. While I think this is elegant, clean, the correct way to do it, and the approach with the best long term maintainability and minimal cognitive overhead, I also reluctantly accept that this way also has some short term downsides due to our OWL stacks making various assumptions about individuals, and how confused people get by punning in OWL. While I think these problems are solvable, I don't have an immediate answer to how to resource fixing them. So I am likely to accept mapping all triples (including rdf:type) to annotation properties. It's just one more mapping and piece of tacit knowledge. But I wanted to make sure the full proposal was given due consideration. |
We are not tending towards anything. I implemented (D) only for a small subset of changes (2 or 3, I don’t even remember), so that we can test how it works. It’s not set in stone. For now, my main concern with this whole idea is that it seems to be nothing more than a discussion between the two of us. I have yet to see any hint that other people are interested, which gives me very little motivation to go any further, in any direction. |
Just an aside for @gouttegd:
KGCL is a complex issue, and this particular feature here even more so; I would not expect any specific input until people notice how your proposal will affect their files and tools.. |
Well then, don’t expect any work on that proposal from me until that happens. (And if it does not happen, so be it.) Motivation matters. I do most of my work on KGCL on my own free time (look at the history of KGCL-Java: 112 out of 136 commits are associated with my personal email address and not my Cambridge email address, which means I made those commits from my personal machine outside of my work hours), because it’s far too removed from the work I am actually paid to do for me to be comfortable working on it during my work hours. As far as I am concerned, KGCL-Java is merely one of the several free software projects that I either develop or contribute to. It’s nothing to do with my work. Which means, among other things, that I work on it if and when I want, and that it is in “competition” with those other free software projects for my free time. If I am not motivated to work on it, I don’t. I myself have almost zero use for KGCL. And certainly zero use at all for the “storing provisional changes in the ontology” feature. So without any hint that the feature is going to be useful for someone, I am unlikely to do anything more than what I did so far. I’d much rather spend my free time working on SSSOM, which I do use and for which I have (too many) ideas of things to improve. So, again, motivation matters. You can’t rely on people inventing completely new features in isolation, only for you to come and say “oh cool, I’ll use that, thanks!“. Well, you can, but you’re gonna wait for a long time. You want new features, you have to participate at some point, if only to say what you would like. |
@gouttegd I know I speak for many when I express how grateful we are that you do so much work for KGCL, Uberon and other projects on your own time. I totally get wanting to get confirmation that something will be useful before pouring time into it. After all, if you're not paid for your work on a project, the main reward you get is when people use (and, even better, build on) what you've written. |
@gouttegd I can't argue with that :) I see it the same, and I personally divide my time across things by the same measure! Too many construction sites, too few people to stem the tide. |
The work I do for Uberon and CL is on my paid time. :) Part of my remit is basically ”anything that can make cross-species scRNAseq studies easier to do”, so contributing to cross-species ontologies fits without a doubt. (And I suppose I could argue that, because KGCL is being used on Uberon and CL, it could also be shoehorned into that remit, but I believe this is too far-fetched.) Anyway, sorry for the off-topic. Back to KGCL provisional changes! |
Currently KGCL has a simple data model where each type represents a change or set or changes. A change object can be thought of as a proposition, and can have metadata added to this.
In some cases, ontologies may want to represent the proposition directly in the ontology without fully enacting the change. This is most prominent in the case of obsoletion, where we may want metadata about a proposed ontology to live in the ontology for a period such as a month or two months, where it is queryable. However, we can imagine this for any change type.
An additional challenge here is that the mechanism for representing propositions in an ontology is not as standardized as for example obsoletion (which is itself not as standardized as it could be). E.g in mondo things may go into a mondo-specific "obsoletion_subset".
Some options here:
One is to discourage the notion of storing propositions in the ontology. If you want to query for propositions (such as proposed obsoletions), then query the GitHub issue and PR repository. There is a clear separation of concerns here: the ontology represents the current state of things, and we use infrastructure intended for propositions to store propositions. However, this is not not an ideal solution, e.g. do we expect all ontology browsers to implement some complex ingest mechanism?
Another is to create a collection of shadow classes, e.g. ProposedObsoletion, ProposedNodeMove, ... This is fairly awkward though.
Another option is to add a flag to all classes such as "partial: bool". The actual changes applied to an apply agent would vary depending on the setting of this flag. We can even imagine having maturity levels etc.
The simplest option might be that if something is a proposition we simply insert the change object as KGCL triples into the ontology. The ontology simply stores its own change. This may encounter resistance as people might like continuing to use familiar mechanisms such as oio:subset, IAO IDs etc.
What will probably sit best with existing ontologies is if there is a way to customize how an apply command works on an ontology specific basis, perhaps making "partial" a parameter on the application function
The text was updated successfully, but these errors were encountered: