Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single page explaining origin/history of CElegansNeuronTables.xls & NeuronConnectFormatted.xlsx, which one we will use & why... #152

Closed
pgleeson opened this issue Jun 17, 2015 · 10 comments

Comments

@pgleeson
Copy link
Member

These spreadsheets contain essential information on the "connectome" we use but seem to have conflicting information and there are possibly multiple different versions.

Somewhere there needs to be a record of where they come from, which one will be used and how discrepancies are handled. This is related to a number of other issues:

#10 (data sources)
openworm/CElegansNeuroML#28 (which to use in c302)
openworm/CElegansNeuroML#27 (update c302)
openworm/CElegansNeuroML#23 (pull request comparing xsl files)

@travs
Copy link

travs commented Jun 23, 2015

@pgleeson

Right. And there is also NeuronConnect.xls being used in PyOpenWorm (though judging by the file changes in this commit they may have the same contents; I'm not sure).

We can begin constructing the document @pgleeson and @slarson outline here.

As for handling the discrepancies between these data sources, we can detail them in the same document, but how will we actually handle them? I will begin the list of sources so we will be able to judge their relative authority, but this is something we will need to discuss.

@aleph-ra
Copy link
Contributor

@travs NeuronConnectFormatted.xlsx in CElegansNeuroML and NeuronConnect.xls which has been taken from here are the same files. The only difference is that the names of some neurons have been changed to match those in CElegansNeuronTables.xls by removing an extra zero in them (e.g. VA01 has been made VA1) so that c302 can use them correctly to generate .nml files.

The commit you are referring to adds a comparison script to compare NeuronConnectFormatted.xls and CElegansNeuronTables.xls (which is the primary data source in c302) and write an error log. A similar comparison is also done in the pull request mentioned by @pgleeson.

@travs
Copy link

travs commented Jun 23, 2015

@ahrasheed thank you for the information. This discrepancy (zeros in the neuron names) also caused us problems, so we had to normalize the names on insertion into the database. As long as we know these two sources are the same, we are now only dealing with two data sources (CElegansNeuronTables.xls and NeuronConnect.xls).

The work done by @aribrich and the logging script mentioned by @ahrasheed are informative for deciding which source to go with/how to handle discrepancies.

@pgleeson @slarson
We should decide what to do about these differing data sources and begin to incorporate our decision into PyOpenWorm.

@slarson
Copy link
Member

slarson commented Jun 24, 2015

Most of the history of CElegansNeuronTables.xls is described in this post on the forum. NeuronConnect.xls should be the authoritative reference going forward as it is published on a reference website (i.e. hasn't been processed by us). This should be the case with work done by @ahrasheed and @aribrich . Let's get a draft of this write up in MD in proper documentation format.

@travs
Copy link

travs commented Jun 25, 2015

@slarson

I started something here, but I can't access the docs project on ReadTheDocs to add the dev branch.

@pgleeson
Copy link
Member Author

I agree that NeuronConnect.xls should be the authoritative reference (it would be good too to make a test to check that PyOpenWorm's copy is identical one downloaded from WormAtlas, in case it gets updated). The steps along the way from there to CElegansNeuronTables.xls should be documented, and ideally encoded in evidence() calls in PyOpenWorm. When it's all there PyOpenWorm should match the data generated from CElegansNeuronTables.xls in SpreadsheetDataReader.py.

Sorry I can't be of more help with this data checking process, but it's pretty crucial for the integrity of data in PyOpenWorm.

@slarson
Copy link
Member

slarson commented Jun 29, 2015

HI @aribrich @pgleeson @travs @ahrasheed -- I've amended @travs good start on this page here:

https://github.com/openworm/PyOpenWorm/blob/dev/docs/data_sources.rst

My proposal is that we consolidate all .xls or .csv into the aux_data folder under PyOpenWorm. This means the following spreadsheet files in other places should go away:

  • CElegansNeuronTables.xls
  • NeuronConnectFormatted.xlsx

This page should close issue #10 as well.

@travs
Copy link

travs commented Jul 2, 2015

With the advent and revision of the data sources file this seems safe to close.

Please reply/reopen if you disagree with this one being closed.

@travs travs closed this as completed Jul 2, 2015
@pgleeson
Copy link
Member Author

pgleeson commented Jul 3, 2015

Sorry, think there's more to go on this... That data description page should be bulletproof from a scientific perspective, it should be as detailed as would be required for a publication, citing in it the original references, discussing the labs involved, saying why these choices were made, etc.

Again, sorry to be annoying about this, but PyOpenWorm is going to form the basis for a lot of OpenWorm work and many routes for how we justify the parameters that go into the model will lead back to this page. It should be detailed enough to convince any c. elegans physiologist that we know the state of art in the field and have made informed choices.

@travs
Copy link

travs commented Jul 6, 2015

@pgleeson Good points you're raising here. I know I said to reopen this but it is fitting to open a separate issue as it is more general than just the connectome data source. #177 is opened in this regard, and is surely missing some of the relevant details.
Padraig, if you're able to give that list a scan it would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants