Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification of file format for GraphML export and import #314

Closed
kdahlquist opened this issue Aug 3, 2016 · 15 comments
Closed

Specification of file format for GraphML export and import #314

kdahlquist opened this issue Aug 3, 2016 · 15 comments

Comments

@kdahlquist
Copy link
Collaborator

I'm creating this issue in parallel to #309 for discussion of GraphML. I'm going to close #287 and #288 because the basic functionality there has been implemented and we are now fine-tuning. It makes more sense now to take on the two formats separately.

I have now read through the GraphML Primer and have done a visual inspection of the GRNsight GraphML exports of an unweighted and weighted graph.

I can match up the XML elements in the primer with what GRNsight has exported, and just have one question so far:

  • Why did you settle on "edge-value-id" as the name of the data key for the weight value? What's throwing me off is the use of the "id" in this term. I have a hard time wrapping my head around a data value being an id. Shouldn't this be called something like "edge-weight-value" or "edge-value" without the "id"? Maybe I don't know about a convention you are using?

I am now proceeding with further testing of *.graphml files moving between programs.

@dondi
Copy link
Owner

dondi commented Aug 3, 2016

Re: "edge-value-id," I just added "-id" because I wanted to explicitly indicate that this expression functions as an identifier. However, in the end what is important is that the identifier is consistently used, so "edge-value" works just as well.

@kdahlquist
Copy link
Collaborator Author

I'm OK with "id" still being there, it was more of a question. I don't want my misunderstanding to trump clarity.

@kdahlquist
Copy link
Collaborator Author

We might consider using the graph id. When Cytoscape exports a graph to graphml format, it gives the graph id as the name of the file. I think this is a good thing. See the relevant line below.

<graph edgedefault="directed" id="21-genes_31-edges_Schade-data_input.graphml">

@dondi
Copy link
Owner

dondi commented Aug 5, 2016

The items listed here have been implemented (and acknowledged that there may be more to come):

  • edge-value is now the key name for the weight attribute, instead of edge-value-id
  • The filename is used as the graph id attribute
  • Although not explicitly discussed, I figured that we will also want the option to export a weighted network as either unweighted or weighted GraphML. Doing this aligned well with the implementation for SIF, so I went ahead and did it for GraphML also. Update documentation page with new sections and descriptions of new features #283 has a comment describing the user interface for this, and this is also available on the beta site.

@dondi
Copy link
Owner

dondi commented Aug 8, 2016

Just noting here that #311 and #312 have been addressed, with a suggestion that #310 be de-prioritized due to the effort involved. I think those are all of the offshoots from this issue.

@kdahlquist
Copy link
Collaborator Author

I think we are close to closing this, pending my testing results tomorrow.

@dondi
Copy link
Owner

dondi commented Aug 9, 2016

Cool, looking forward to it :)

@kdahlquist
Copy link
Collaborator Author

Please take a look at #310 and #321. I think those are fairly minor things we can do to increase our interoperability with yED. I'd like to try to implement those before closing the book on this.

@kdahlquist
Copy link
Collaborator Author

Question:
I'm looking into the FAIR standards for writing the discussion and have a couple of questions about the GraphML format.

  1. is there any way to encode a data usage license into the GraphML (like specifying CC-BY or something)
  2. is there any way to encode the provenance of the data exported
    for example, "Data exported with GRNsight v1.16" and a URL to the release or something?
  3. By virtue of GraphML being XML (and my limited working with it over the last week to define and debug our work), it self-defines it's vocabulary; i.e., part of the issue we are having is that things are defined in slightly different ways by different programs.

These questions pertain to the I, interoperable, and R, reusable parts of the standard.

@dondi
Copy link
Owner

dondi commented Aug 10, 2016

I’ll have to look deeper into the GraphML spec for a definitive answer. If there is no direct support (such as metadata elements or attributes), we can fall back on a comment block.

Also, the variability in vocabulary that we are seeing is outside the XML definition. There is a specific GraphML schema available. The issue is that the schema itself defines this generic key mechanism which turns out to be too generic, as we have seen. So the discrepancy between Cytoscape and yED doesn’t come from the XML aspect; it comes from the way the XML schema is defined that permits this wide-open use of generic keys.

But yes, this needs more research, and in the absence of something more structured, we can go to a comment.

@kdahlquist
Copy link
Collaborator Author

In the interests of FAIR, we want to add a comment to the GRNsight-exported GraphML:

Exported by GRNsight v1.16. http://dondi.github.io/GRNsight/ link to the release time/daystamp

@dondi
Copy link
Owner

dondi commented Aug 11, 2016

This has been implemented and placed on beta. I leapfrogged to v1.18 though, since that is the version we anticipate releasing (import-export + displaying weights). Once we are back to a long-term development track, we can keep the release branch at v1.18 but bump the beta branch to v1.19 when exporting. (unless you had other thoughts on how to proceed)

@kdahlquist
Copy link
Collaborator Author

I'll check the functionality in the morning when I can pay more attention to detail. I assume that you are somehow detecting versions automatically when exporting. It seems like a lot of work to manually adjust that at release time (it seems like it would be easy to forget, like the last modified date on the web pages.) Otherwise, yeah, we were going to release v1.18 next.

@kdahlquist
Copy link
Collaborator Author

I've verified that the comment with the version, and two URLs has come through in the GraphML export from both Firefox and Chrome.

I note that there was no date/timestamp, however.

In the interests of time, I'm willing to let go of that--we need to move on to the release tasks #323.

I did one final round of checking the following for all the demo files and options:
GRNsight-->SIF-->BioTapestry, Cytoscape
GRNsight-->GraphML-->Cytoscape, yED
Cytoscape-->SIF-->GRNsight
Cytoscape-->GraphML-->GRNsight
yED-->GraphML-->GRNsight

and everything checks out.

I'm going to close this now.

@dondi
Copy link
Owner

dondi commented Aug 11, 2016

Ah yes forgot to mention---the date/timestamp was tricky because it's hard to unit test. Additional code will need to be written to assert that the comment is present but either ignore a discrepancy in the date/timestamp, or somehow anticipate what the date/timestamp should be (reading the date/timestamp first and then invoking the export is not guaranteed to match because the test might cross a time unit boundary). We can get back to that when there is more elbow room.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants