Add JSON schema #8

emiliorighi · 2022-02-25T12:24:58Z

Hi, I have been working on a JSON schema which is a mix between the 'jsonized' ToL checklist of the ENA (with regexs to test against values, type of field, etc.) and this sample manifest.

The format is something like:

{
''model':'SEX',
'description': 'sex of the organism from which the sample was obtained',
'type': 'text_choice_field',
'mandatory': 'mandatory',
'multiplicity': 'single',
'options': ['FEMALE','MALE','HERMAPHRODITE_MONOECIOUS','NOT_COLLECTED','NOT_APPLICABLE','NOT_PROVIDED','ASEXUAL_MORPH','SEXUAL_MORPH']
},

I was wondering if I can create a pull request with this JSON schema or, if you already have something like that, if you can share it in this repo.

aliceminotto · 2022-02-25T12:32:47Z

@emiliorighi we don't have a json schema in that format, for reference see here: https://github.com/collaborative-open-plant-omics/COPO/tree/development/web/apps/web_copo/schemas/copo/uimodels/mappings/isa_mappings/additional_attributes (both the dtol and erga files as erga is an extension), and https://github.com/collaborative-open-plant-omics/COPO/blob/development/web/apps/web_copo/lookup/dtol_lookups.py

emiliorighi · 2022-02-25T12:59:54Z

@aliceminotto would it be then possible to join them in a single schema (in a certain sense it will be the JSON version of the manifest), like the one that I mentioned above? Or share the schemas you shared here in this repository as well?

Thanks in advance

aliceminotto · 2022-03-02T19:00:37Z

@emiliorighi I don't think we can reasonably merge them without breaking our code. I'm happy to copy the separate files here if that's useful, or link them in the readme (thought the format is def not the best for the scope), but that can only happen after we receive the list of changes and make them ourselves so it's effectively delayed from the ERGA consurtium's sign off. If you have a suitable JSON ready that may make more sense, but it will require maintenance to be kept up to date. @gf777 do you have any thoughts on this?

gf777 · 2022-03-03T04:19:51Z

Hey, here is just my two cents.

I would avoid duplicating the same file in multiple repositories, so a link would be best. However, I am missing the point here. @emiliorighi what are you developing the json for?

emiliorighi · 2022-03-03T09:28:03Z

@aliceminotto, sorry maybe I didn't explain myself, I am referring to add the JSON schema in this repository.
The schema would be the merge of the various constants you have in your lookups file, you can then keep your code as it is or you can map the schema to its corresponding constants in your lookup file. To further stick with your code we can even start from the UI mapping JSON you have and use it as a template to formalize the schema, in this way we will only add things without modifying the existing (no breaking changes).

@gf777 , I am using the JSON schema to import samples locally and export them in an excel file (this manifest) to submit to COPO see: https://github.com/guigolab/biogenome-portal/blob/37454c7da460d63cd12d6d14076529e214c5ba06/client/src/utils/static-config.js#L3(this is an informal version, happy to formalize it if necessary)

Here are the pros (in my opinion) of having a JSON schema for the ERGA manifest:

enhance version control: you can see the diffs between two commits, while to see these in an excel you have to download the two versions and manually look for the differences, version would be an attribute of the schema instead of the name of the file.
automated excel generation: automatically generate the excel https://xlsxwriter.readthedocs.io/ on each update of the schema (each push to main).
easier to maintain: one file against many
avoid human errors: human errors are limited to the schema, easier to detect them
enhance visibility: the open source community can build services on top of it (clients) to facilitate/enforce the excel submission to COPO.

Happy to further discuss it or to collaborate in its implementation.

gf777 · 2022-03-03T16:30:53Z

Hi @emiliorighi many thanks for the explanation, I still miss lots of technical details but it is much clearer now (note that I dont see your schema example - broken link - but I think I get the point). I am in principle very much in favor of having an txt file that stores the information and is then converted to the excel. @aliceminotto do you see any drawbacks?

emiliorighi · 2022-03-03T16:45:41Z

Hi Giulio, sorry for the link, I have updated the comment with the correct one: https://github.com/guigolab/biogenome-portal/blob/37454c7da460d63cd12d6d14076529e214c5ba06/client/src/utils/static-config.js#L3

aliceminotto · 2022-03-04T13:49:44Z

@gf777 From a technical perspective I think it's a very good idea to have a public JSON and beneficial to the development of other systems. From a project management perspective we need to ensure is kept up to date with rolling changes or I'm afraid it will have the opposite effect.

Mainly as a note to others that may read the thread, there are extra logical validations in COPO that would not be caught by the schema only, so it is possible there are failures.

emiliorighi · 2022-03-04T15:40:50Z

@aliceminotto I can understand it will require more effort at the beginning, however I don't understand how the JSON schema will impact the project management, as it would be the same as the current excel but in JSON format.

People will still use the excel to submit to your service and you will have to keep update just the JSON (as you are currently doing with the excel, with the advantage that with the JSON format there is an explicit version control).

How the extra logical validations would not be caught from the JSON and are instead caught from the excel if the content will be the same just in a different format?

aliceminotto · 2022-03-04T17:00:26Z

@emiliorighi oh yes the spreadsheet it's the same, we would need someone to be looking at these.
We are not doing the spreadsheet (sorry I realise now that this is the cause of the confusion).

In fact, we take the spreadsheet, and together with the SOP we use it as a baseline for COPO -since the SOP does not have the controlled vocabulary in it-. So it would definitely be beneficial for us too to have a JSON instead of tracking SOP and spreadsheet.

For the validation, that was an extra comment, no difference from the spreasheet, that also does not catch everything.

emiliorighi · 2022-03-04T17:04:23Z

@aliceminotto thank you for your response and sorry if I didn't explained it correctly.
I am more than happy to collaborate in its implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JSON schema #8

Add JSON schema #8

emiliorighi commented Feb 25, 2022

aliceminotto commented Feb 25, 2022

emiliorighi commented Feb 25, 2022

aliceminotto commented Mar 2, 2022

gf777 commented Mar 3, 2022

emiliorighi commented Mar 3, 2022 •

edited

Loading

gf777 commented Mar 3, 2022

emiliorighi commented Mar 3, 2022

aliceminotto commented Mar 4, 2022

emiliorighi commented Mar 4, 2022

aliceminotto commented Mar 4, 2022

emiliorighi commented Mar 4, 2022

Add JSON schema #8

Add JSON schema #8

Comments

emiliorighi commented Feb 25, 2022

aliceminotto commented Feb 25, 2022

emiliorighi commented Feb 25, 2022

aliceminotto commented Mar 2, 2022

gf777 commented Mar 3, 2022

emiliorighi commented Mar 3, 2022 • edited Loading

gf777 commented Mar 3, 2022

emiliorighi commented Mar 3, 2022

aliceminotto commented Mar 4, 2022

emiliorighi commented Mar 4, 2022

aliceminotto commented Mar 4, 2022

emiliorighi commented Mar 4, 2022

emiliorighi commented Mar 3, 2022 •

edited

Loading