-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single page explaining origin/history of CElegansNeuronTables.xls & NeuronConnectFormatted.xlsx, which one we will use & why... #152
Comments
Right. And there is also NeuronConnect.xls being used in PyOpenWorm (though judging by the file changes in this commit they may have the same contents; I'm not sure). We can begin constructing the document @pgleeson and @slarson outline here. As for handling the discrepancies between these data sources, we can detail them in the same document, but how will we actually handle them? I will begin the list of sources so we will be able to judge their relative authority, but this is something we will need to discuss. |
@travs NeuronConnectFormatted.xlsx in CElegansNeuroML and NeuronConnect.xls which has been taken from here are the same files. The only difference is that the names of some neurons have been changed to match those in CElegansNeuronTables.xls by removing an extra zero in them (e.g. VA01 has been made VA1) so that c302 can use them correctly to generate .nml files. The commit you are referring to adds a comparison script to compare NeuronConnectFormatted.xls and CElegansNeuronTables.xls (which is the primary data source in c302) and write an error log. A similar comparison is also done in the pull request mentioned by @pgleeson. |
@ahrasheed thank you for the information. This discrepancy (zeros in the neuron names) also caused us problems, so we had to normalize the names on insertion into the database. As long as we know these two sources are the same, we are now only dealing with two data sources (CElegansNeuronTables.xls and NeuronConnect.xls). The work done by @aribrich and the logging script mentioned by @ahrasheed are informative for deciding which source to go with/how to handle discrepancies. @pgleeson @slarson |
Most of the history of CElegansNeuronTables.xls is described in this post on the forum. NeuronConnect.xls should be the authoritative reference going forward as it is published on a reference website (i.e. hasn't been processed by us). This should be the case with work done by @ahrasheed and @aribrich . Let's get a draft of this write up in MD in proper documentation format. |
I agree that NeuronConnect.xls should be the authoritative reference (it would be good too to make a test to check that PyOpenWorm's copy is identical one downloaded from WormAtlas, in case it gets updated). The steps along the way from there to CElegansNeuronTables.xls should be documented, and ideally encoded in evidence() calls in PyOpenWorm. When it's all there PyOpenWorm should match the data generated from CElegansNeuronTables.xls in SpreadsheetDataReader.py. Sorry I can't be of more help with this data checking process, but it's pretty crucial for the integrity of data in PyOpenWorm. |
HI @aribrich @pgleeson @travs @ahrasheed -- I've amended @travs good start on this page here: https://github.com/openworm/PyOpenWorm/blob/dev/docs/data_sources.rst My proposal is that we consolidate all .xls or .csv into the aux_data folder under PyOpenWorm. This means the following spreadsheet files in other places should go away:
This page should close issue #10 as well. |
With the advent and revision of the data sources file this seems safe to close. Please reply/reopen if you disagree with this one being closed. |
Sorry, think there's more to go on this... That data description page should be bulletproof from a scientific perspective, it should be as detailed as would be required for a publication, citing in it the original references, discussing the labs involved, saying why these choices were made, etc. Again, sorry to be annoying about this, but PyOpenWorm is going to form the basis for a lot of OpenWorm work and many routes for how we justify the parameters that go into the model will lead back to this page. It should be detailed enough to convince any c. elegans physiologist that we know the state of art in the field and have made informed choices. |
@pgleeson Good points you're raising here. I know I said to reopen this but it is fitting to open a separate issue as it is more general than just the connectome data source. #177 is opened in this regard, and is surely missing some of the relevant details. |
These spreadsheets contain essential information on the "connectome" we use but seem to have conflicting information and there are possibly multiple different versions.
Somewhere there needs to be a record of where they come from, which one will be used and how discrepancies are handled. This is related to a number of other issues:
#10 (data sources)
openworm/CElegansNeuroML#28 (which to use in c302)
openworm/CElegansNeuroML#27 (update c302)
openworm/CElegansNeuroML#23 (pull request comparing xsl files)
The text was updated successfully, but these errors were encountered: