Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON schema #8

Open
emiliorighi opened this issue Feb 25, 2022 · 11 comments
Open

Add JSON schema #8

emiliorighi opened this issue Feb 25, 2022 · 11 comments

Comments

@emiliorighi
Copy link

Hi, I have been working on a JSON schema which is a mix between the 'jsonized' ToL checklist of the ENA (with regexs to test against values, type of field, etc.) and this sample manifest.

The format is something like:

{
''model':'SEX',
'description': 'sex of the organism from which the sample was obtained',
'type': 'text_choice_field',
'mandatory': 'mandatory',
'multiplicity': 'single',
'options': ['FEMALE','MALE','HERMAPHRODITE_MONOECIOUS','NOT_COLLECTED','NOT_APPLICABLE','NOT_PROVIDED','ASEXUAL_MORPH','SEXUAL_MORPH']
},

I was wondering if I can create a pull request with this JSON schema or, if you already have something like that, if you can share it in this repo.

@aliceminotto
Copy link
Contributor

@emiliorighi
Copy link
Author

@aliceminotto would it be then possible to join them in a single schema (in a certain sense it will be the JSON version of the manifest), like the one that I mentioned above? Or share the schemas you shared here in this repository as well?

Thanks in advance

@aliceminotto
Copy link
Contributor

@emiliorighi I don't think we can reasonably merge them without breaking our code. I'm happy to copy the separate files here if that's useful, or link them in the readme (thought the format is def not the best for the scope), but that can only happen after we receive the list of changes and make them ourselves so it's effectively delayed from the ERGA consurtium's sign off. If you have a suitable JSON ready that may make more sense, but it will require maintenance to be kept up to date. @gf777 do you have any thoughts on this?

@gf777
Copy link
Contributor

gf777 commented Mar 3, 2022

Hey, here is just my two cents.

I would avoid duplicating the same file in multiple repositories, so a link would be best. However, I am missing the point here. @emiliorighi what are you developing the json for?

@emiliorighi
Copy link
Author

emiliorighi commented Mar 3, 2022

@aliceminotto, sorry maybe I didn't explain myself, I am referring to add the JSON schema in this repository.
The schema would be the merge of the various constants you have in your lookups file, you can then keep your code as it is or you can map the schema to its corresponding constants in your lookup file. To further stick with your code we can even start from the UI mapping JSON you have and use it as a template to formalize the schema, in this way we will only add things without modifying the existing (no breaking changes).

@gf777 , I am using the JSON schema to import samples locally and export them in an excel file (this manifest) to submit to COPO see: https://github.com/guigolab/biogenome-portal/blob/37454c7da460d63cd12d6d14076529e214c5ba06/client/src/utils/static-config.js#L3(this is an informal version, happy to formalize it if necessary)

Here are the pros (in my opinion) of having a JSON schema for the ERGA manifest:

  • enhance version control: you can see the diffs between two commits, while to see these in an excel you have to download the two versions and manually look for the differences, version would be an attribute of the schema instead of the name of the file.
  • automated excel generation: automatically generate the excel https://xlsxwriter.readthedocs.io/ on each update of the schema (each push to main).
  • easier to maintain: one file against many
  • avoid human errors: human errors are limited to the schema, easier to detect them
  • enhance visibility: the open source community can build services on top of it (clients) to facilitate/enforce the excel submission to COPO.

Happy to further discuss it or to collaborate in its implementation.

@gf777
Copy link
Contributor

gf777 commented Mar 3, 2022

Hi @emiliorighi many thanks for the explanation, I still miss lots of technical details but it is much clearer now (note that I dont see your schema example - broken link - but I think I get the point). I am in principle very much in favor of having an txt file that stores the information and is then converted to the excel. @aliceminotto do you see any drawbacks?

@emiliorighi
Copy link
Author

Hi Giulio, sorry for the link, I have updated the comment with the correct one: https://github.com/guigolab/biogenome-portal/blob/37454c7da460d63cd12d6d14076529e214c5ba06/client/src/utils/static-config.js#L3

@aliceminotto
Copy link
Contributor

@gf777 From a technical perspective I think it's a very good idea to have a public JSON and beneficial to the development of other systems. From a project management perspective we need to ensure is kept up to date with rolling changes or I'm afraid it will have the opposite effect.

Mainly as a note to others that may read the thread, there are extra logical validations in COPO that would not be caught by the schema only, so it is possible there are failures.

@emiliorighi
Copy link
Author

@aliceminotto I can understand it will require more effort at the beginning, however I don't understand how the JSON schema will impact the project management, as it would be the same as the current excel but in JSON format.

People will still use the excel to submit to your service and you will have to keep update just the JSON (as you are currently doing with the excel, with the advantage that with the JSON format there is an explicit version control).

How the extra logical validations would not be caught from the JSON and are instead caught from the excel if the content will be the same just in a different format?

@aliceminotto
Copy link
Contributor

@emiliorighi oh yes the spreadsheet it's the same, we would need someone to be looking at these.
We are not doing the spreadsheet (sorry I realise now that this is the cause of the confusion).

In fact, we take the spreadsheet, and together with the SOP we use it as a baseline for COPO -since the SOP does not have the controlled vocabulary in it-. So it would definitely be beneficial for us too to have a JSON instead of tracking SOP and spreadsheet.

For the validation, that was an extra comment, no difference from the spreasheet, that also does not catch everything.

@emiliorighi
Copy link
Author

@aliceminotto thank you for your response and sorry if I didn't explained it correctly.
I am more than happy to collaborate in its implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants