-
Notifications
You must be signed in to change notification settings - Fork 7
submission prep script
This script was replaced by a new and improved CLI tool, see the updated Submission Guide and the new tool's documentation for more information about the new tooling.
python prepare_C2M2_submission.py (previously build_term_tables.py) is a Python script that automatically builds controlled-vocabulary (CV) term usage tables for C2M2 datapackage preparation, as well as performing some pre-submission data integrity checks.
The following files are built automatically by this script and should not be hand-created or edited; submit them along with the other required TSVs as part of your datapackage.
analysis_type.tsvanatomy.tsvassay_type.tsvbiofluid.tsvcompound.tsvdata_type.tsvdisease.tsvfile_format.tsvgene.tsvncbi_taxonomy.tsvphenotype.tsvphenotype_disease.tsvphenotype_gene.tsvprotein.tsvprotein_gene.tsvsample_prep_method.tsvsubstance.tsv
The following pre-submission validation checks are currently performed:
- Ensure that for any file with a non-null persistent ID, a checksum is also provided.
- Ensure that all (non-null) persistent IDs are unique (both within and across tables).
-
First build your
dcc.tsv,id_namespace.tsv,project.tsv,project_in_project.tsv,file.tsv,file_describes_biosample.tsv,file_describes_collection.tsv,file_describes_subject.tsv,file_in_collection.tsv,biosample.tsv,biosample_disease.tsv,biosample_from_subject.tsv,biosample_gene.tsv,biosample_in_collection.tsv,biosample_substance.tsv,subject.tsv,subject_disease.tsv,subject_in_collection.tsv,subject_phenotype.tsv,subject_race.tsv,subject_role_taxonomy.tsv,subject_substance.tsv,collection.tsv,collection_anatomy.tsv,collection_biofluid.tsv,collection_compound.tsv,collection_defined_by_project.tsv,collection_disease.tsv,collection_gene.tsv,collection_in_collection.tsv,collection_phenotype.tsv,collection_protein.tsv,collection_ptm.tsv,collection_substance.tsv,collection_taxonomy.tsv,biosample_ptm.tsv,collection_ptm.tsv,ptm.tsv,ptm_type.tsv,ptm_subtype.tsvanddomain_location.tsvtables. (Some of these can be left empty (as header-only TSVs) if desired: see the C2M2 table wiki for requirements. A zipped-folder containing empty core (and core-associated) tables can be downloaded from OSF.) -
Download the CV reference files [Last updated 27 Nov 2024] at OSF (select external_CV_reference_files and then 'Download as zip'.)
-
Unzip the external_CV_reference_files folder
-
Put external_CV_reference_files and
prepare_C2M2_submission.pyinto the same folder -
Create a subdirectory containing your pre-built
file.tsv,biosample.tsv, etc., then edit line 44 ofprepare_C2M2_submission.pyto match. -
Use the command line to run the script:
python prepare_C2M2_submission.py
This script is under active development: please contact us with any questions by emailing the helpdesk at [email protected] or posting to Discussions
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biofluid.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_ptm.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_biofluid.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_ptm.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv - disease.tsv
- domain_location.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- ptm.tsv
- ptm_type.tsv
- ptm_subtype.tsv
- sample_prep_method.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary