Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export data to cytoscape sif and graphml format #287

Closed
kdahlquist opened this issue Jul 20, 2016 · 17 comments
Closed

Export data to cytoscape sif and graphml format #287

kdahlquist opened this issue Jul 20, 2016 · 17 comments

Comments

@kdahlquist
Copy link
Collaborator

This has been separated from issue #59 because image export is a different issue than data export.

Reviewer 1 comment (#278):

"As it does not accept a standard input file type, the output of any other network analysis package requires conversion in to the matrix format required here. Similarly, the tool provides no export function (the option in the File menu remained stubbornly greyed out) and so I can’t take a network from GRNsight and utilise it elsewhere. I also can’t use GRNsight to convert the GRNmap format to something I might like to use elsewhere.

The authors refer to future features coming in version 2 (lines 323-329). I encourage them to consider implementing at least one standard filetype for displaying graph data within their tool. Be it sif, graphml or even gml, it would significantly increase the utility of the tool as it currently exists."

We need to look into import/export of adjacency matrix in sif, graphml, or gml format.

@kdahlquist
Copy link
Collaborator Author

kdahlquist commented Jul 20, 2016

Another format to investigate that was mentioned by a BOSC reviewer is MIMIx.

http://www.nature.com/nbt/journal/v25/n8/full/nbt1324.html

@kdahlquist
Copy link
Collaborator Author

Notes on BioTapestry: http://www.biotapestry.org/

  • written in Java
  • latest version 7.0.0, released 9/22/14
  • locally installable executable
  • BioTapestry Viewer runs in a web browser as of version 7.0.0
  • It supports import of
    • root network from SIF
      • SIF is "simple interaction file", see http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats
      • space-delimited file (except when you want to use space characters, then you should use tab-delimited)
      • Lines in the SIF file specify a source node, a relationship type (or edge type), and one or more target nodes:
      • nodeA nodeB
      • nodeD nodeE nodeF nodeB
      • where is a two letter code, of which we would use "pd" for protein -> DNA
    • full model hierarchy from CSV

@kdahlquist
Copy link
Collaborator Author

I have created a .sif version of the 21-genes_31-edges_Schade-data_input.xlsx demo file that loads in BioTapestry
21-genes_31-edges_Schade-data_input.sif.zip
biotapestry_21-gene_31-edges_schade-data_input sif

@kdahlquist
Copy link
Collaborator Author

kdahlquist commented Jul 21, 2016

  • SIF format can only be used to export unweighted networks (it does not support attributes like numerical weights)
    • although it looks like when BioTapestry exports SIF, instead of "pd", it gives the types of "PROMOTES", "REPRESSES", or "REGULATES" as in
    • nodeID1 [tab] [PROMOTES | REPRESSES | REGULATES] [tab] nodeID2
    • http://www.biotapestry.org/faq/FAQ-ExportOptions.html#sifExportFormat
    • so we could preserve the information of activation or repression even if we do not preserve the thicknesses.
      • The main question would be if "PROMOTES", "REPRESSES", or "REGULATES" is recognized by other programs besides BioTapestry (like Cytoscape) or whether those types are only supported by BioTapestry.
    • So I tried to create a sif file that used PROMOTES for positive weights and REPRESSES for negative weights and import it into BioTapestry and it turns out that it won't load. When importing, it expects you to use the types of "pos", "neg", "neu", although it does export as described above. I redid the file using these types and the import worked (you can see that for the neg, blunt arrowheads are used. But again, the question remains whether these types are recognized by other programs.

21-genes_31-edges_Schade-data_estimation_output.sif.zip
biotapestry_21-gene_31-edges_schade-data_estimation_output sif

  • The CSV version that can be imported to BioTapestry is very specific to that program and how it models/visualizes networks. I think we can safely ignore it if we have SIF.

@kdahlquist
Copy link
Collaborator Author

kdahlquist commented Jul 21, 2016

Notes on yED: http://www.yworks.com/products/yed

  • Stand-alone Java Swing program, requires Java 1.8
  • current version is 3.16
  • not specific to GRNs or biology (e.g., can be used for UML diagrams)
  • File formats that can be imported:
    • ?, an XML format which supports arbitrary, user-defined element properties (this was blank on the web site, but might be GraphML)
    • Excel® XLS spreadsheets, which can be easily imported using our wizard for comfortably specifying matrix and list-like annotated diagram data. Custom properties can be imported, too.
    • GEDCOM, which contains genealogical information
    • GML, a popular text-based diagram file format
    • arbitrary XML run through an XSLT stylesheet that transforms the input into valid GraphML.
    • Predefined stylesheets for Ant build scripts, the OWL Web Ontology Language, and others are included.
  • free, but not open source
  • So I was able to import the demo file 21-genes_31-edges_Schade-data_input.xlsx directly into yED using their Excel import wizard. See below:
    yed_21-gene_31-edges_schade-data_input-xlsx
  • However, the regulators/targets relationship is reversed from what we default to, so I have to transform the "network" sheet to get the directed edges to have the correct directionality.
  • Here is the transposed input file in yED.
    yed_21-gene_31-edges_schade-data_input-transposed-xlsx

@kdahlquist
Copy link
Collaborator Author

@dondi dondi mentioned this issue Jul 27, 2016
@dondi
Copy link
Owner

dondi commented Jul 27, 2016

Initial design decisions have been made for this functionality, with a SIF export function implemented for non-weighted graphs. Key design decisions are:

  • Perform all import/export work on the server side (facilitates unit tests; decouples import/export functionality from web client)
  • Define routes that accept POST requests with the GRNsight-formatted JSON network to export in the body; the routes then call the import/export function and return the converted result
  • Invoke the routes from the web client

@kdahlquist
Copy link
Collaborator Author

http://pynetconv.sourceforge.net/ PyNetConv is a python program that can convert cytoscape sif to gml.

I've been searching for additional documentation about SIF format and am puzzled by two things:

  • the default is to use a space character as a delimiter, but of course this is a problem if there is a space character in the gene IDs/labels. So, in this case one uses a tab delimiter instead. Why not just use tab to begin with and not have two options? Which did you implement?
  • Cytoscape seems to define a certain number of relationship types (http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats), while Pathway Commons(http://www.pathwaycommons.org/pc2/) uses a different set (and based on my experiments with BioTapestry, it uses a third set). However, I can only find documentation for SIF through Cytoscape or Pathway Commons.

@kdahlquist kdahlquist changed the title Export data to cytoscape, graphml, sif, gml format Export data to cytoscape sif, graphml, gml format Jul 27, 2016
@kdahlquist kdahlquist self-assigned this Jul 27, 2016
@kdahlquist
Copy link
Collaborator Author

Here is my exploration of Cytoscape for an unweighted network.

Version 3.4.0 (Windows 64-bit)

  • Launch Cytoscape
  • In "Session" Start-up window, select button "From Network File" button and select the 21-genes_31-edges_Schade-data_input.sif file.
    21-gene_31-edge_schade-data_input_cytoscape-default-view
  • to see directed edges, go to "style" tab
  • click default dropdown
  • choose one of three styles that has directed edges: curved, directed, or marquee
    • directed is the best choice, marquee is animated
      21-gene_31-edge_schade-data_input_cytoscape-directed-view
  • On the node and edge tabs, can set properties.
    21-gene_31-edge_schade-data_input_cytoscape-directed-view-black
  • Played with layouts, has yFiles layouts, which I'm pretty sure is the same company as yED. The best was the yFiles Hierarchic
    21-gene_31-edge_schade-data_input_cytoscape-directed-view-black-hierarchic

@kdahlquist
Copy link
Collaborator Author

OK, here is my exploration of a weighted network in Cytoscape.

I made a .sif file where the relationship type was the weight itself. This then labeled the edges with the weights.
21-gene_31-edges_schade-data_estimation_output_cytoscape-edge-weight

I should be able to set edge colors and widths, but I need a little more time to figure it out.

@kdahlquist
Copy link
Collaborator Author

Latest Cytoscape user manual v3.4.0: http://manual.cytoscape.org/en/3.4.0/index.html

@kdahlquist
Copy link
Collaborator Author

So, it turns out that the stand-alone Cytoscape v3.4.0 can directly import json from cytoscape.js (and export json, too.)
21-genes_31-edges_Schade-data_estimation_output_weights_json.zip

I need a break from this right now. From what I've seen so far we can leave the relationship type for the unweighted network as "pd". For the weighted network, if we use the actual weight values as the relationship type, they will automatically display on the edges in Cytoscape. I just don't know whether they need to be specified differently so they can be used as ways to change the color or width of the edges.

@dondi
Copy link
Owner

dondi commented Jul 30, 2016

The first export variant, export to SIF, has been implemented and has replaced the graph-statistics branch on the Beta page. This beta version is set to 1.15. I will log my proposed version roadmap (which is a touch more complicated than usual due to the concurrent development threads that are going on) in #293 right after I finish this comment.

Placeholder menus have been applied for import, export data, and export image, but for now only Export Data > To SIF will work. Once a GRN has been loaded into the application, this menu item will activate. When selected (and if functional), the menu item will then trigger a download that deposits the SIF export to the user's Downloads folder.

While this full-cycle functionality is under review, I will continue with implementation of import/export for the remaining intended formats.

@dondi
Copy link
Owner

dondi commented Aug 1, 2016

Export to GraphML is now available on the beta 1.15 deployment. Workflow is identical to SIF export, except that the resulting file ends in .graphml. I used http://graphml.graphdrawing.org/primer/graphml-primer.html as my main reference for putting together the exported data.

@kdahlquist
Copy link
Collaborator Author

We are not going to export data in GML format so that we can focus on other issues. Although @dondi notes that the export would be easier than the import if we ever wanted to do it.

@kdahlquist kdahlquist changed the title Export data to cytoscape sif, graphml, gml format Export data to cytoscape sif and graphml format Aug 2, 2016
@dondi
Copy link
Owner

dondi commented Aug 3, 2016

GML menu items have been removed from the deployed beta v1.15 (I guess this note applies to #288 too).

@kdahlquist
Copy link
Collaborator Author

I am closing this now because basic functionality has been implemented and discussion has moved to #309 and #314.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants