-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User tests of SIF export: problems with demo file 1 and 2 #301
Comments
Yes, Notepad is known to only handle one type of linebreak; there are three out there these days, and with the variety of tool chains out there, some of which adjust linebreaks while some don't, the tendency these days is to update the tools themselves to recognize all three kinds. Wordpad, Notepad++, Sublime Text, and Atom are updated but Notepad has not been. The export code is using the OK, we can omit target-less nodes for SIF export. As for filenames, the demos are actually the special case because we inject code that changes the label at the top to something different. The current code just derives the filename unconditionally from that label. I’ll look into finding a way to store both. Regularly-imported files should preserve the filename. |
I agree that regularly-imported files should preserve the filename. Maybe we should allow the filename that shows for the demo files just be their regular filename, too. We would not need to change the descriptions in the Demo menu. I don't know if that makes it more or less confusing. Broken out as issue #308. |
I was able to verify this bug, which might be both a bug in GRNsight and Cytoscape. For clarity, I will describe again what happens. First issue.
So, the conclusion from this is that the combination of having the first gene have no target with some other formatting that GRNsight does causes it to fail in Cytoscape. It could be the newline break format of Excel fixes this. I think this is the case because of the way that the labels on the disconnected nodes in Cytoscape look with the labels smushed together. Second issue with GRNsight.
So, I go back on what I said previously. It's not an issue with targetless nodes per se, but somehow the combination with the newline break (I think). GRNsight needs to be able to handle SIF files that have targetless nodes listed and not listed as Cytoscape can do. For ease of reporting, I only tested this with the unweighted network for the moment. I'm attaching the relevant test files. As we discussed at the meeting today, @dondi can wait on pursuing these fixes until @kdahlquist has completed more testing. |
I investigated this bug today and based on my tests, it looks like the issue was not the line breaks but the expectation that even for non-targeted genes, every line is supposed to have a tab for the relationship and target gene columns, even if there is nothing between the tabs. I reached the conclusion this way:
Given this, I finalized the code to export SIFs in this way, and also updated the unit tests. Further, it turned out that changing the export in this way was easier to code if we went strictly to binary lines (i.e., only one targeted gene per line, with genes repeating over multiple lines if they have multiple targets). Thus, the change dovetails nicely with one of the finalized spec for SIF export in #309. This export change has been uploaded to the beta v1.15 site. Let me know if you see the same results that I do. |
A fresh export of my test files GRNsight-to-SIF are now read correctly in GRNsight, Cytoscape and BioTapestry (including Demos 1 and 2 that were not read correctly in Cytoscape before). However, the second half of the bug has not been resolved. Right now, if a gene has no target, it is required by GRNsight to be in the "source" column (even if there are now tabs). If a targetless gene only appears in the "target" column, the graph is not parsed correctly and the node for the targetless gene is missing, as are all the edges to other nodes that do exist. I've attached two test files. They both should return the same graph in GRNsight, but currently they do not (YOX1 and 2 edges are missing when I remove it from the source column because it has no targets.) Both of these files are read and return identical graphs in both Cytoscape and BioTapestry. |
Ah OK, I understand that better now. Yes, I can see from the code that this will happen. I'll factor this into the revision of SIF import. |
Support for target-less genes that are mentioned only in the edges has been implemented, and is available in the current beta v1.15. I tested this with the 7/10 sample files above and got 7 genes and 10 edges for both files. |
Confirmed that this is fixed and closing. |
I am breaking this off from #287 because the issues are likely to be different depending on the file type.
I tested the SIF export with each of the demo files and a sample unweighted and weighed network from @GraceJohnson I had leftover from last semester.
When I open the exported SIF files in Notepad, there are no line breaks. The files open in Wordpad, Word, and Excel with linebreaks, so there might be some issue with notepad itself. I don't know if there is something we want to do about this, but I wanted to mention it, in case it is an issue.
I then attempted to open them in both BioTapestry and Cytoscape.
As for BioTapestry, they all seemed to open OK (as we expected, since we are not conforming to BioTapestry's convention for pos, neg relationship types, they all opened as "unweighted" networks).
However, there were hiccups with Cytoscape.
Minor observation/feature request (now broken out as issue #308):
The text was updated successfully, but these errors were encountered: