Skip to content

submission prep script

Mano Maurya edited this page Feb 26, 2025 · 15 revisions

Overview

python prepare_C2M2_submission.py (previously build_term_tables.py) is a Python script that automatically builds controlled-vocabulary (CV) term usage tables for C2M2 datapackage preparation, as well as performing some pre-submission data integrity checks.

The following files are built automatically by this script and should not be hand-created or edited; submit them along with the other required TSVs as part of your datapackage.

  • analysis_type.tsv
  • anatomy.tsv
  • assay_type.tsv
  • biofluid.tsv
  • compound.tsv
  • data_type.tsv
  • disease.tsv
  • file_format.tsv
  • gene.tsv
  • ncbi_taxonomy.tsv
  • phenotype.tsv
  • phenotype_disease.tsv
  • phenotype_gene.tsv
  • protein.tsv
  • protein_gene.tsv
  • sample_prep_method.tsv
  • substance.tsv

The following pre-submission validation checks are currently performed:

  • Ensure that for any file with a non-null persistent ID, a checksum is also provided.
  • Ensure that all (non-null) persistent IDs are unique (both within and across tables).

Usage

  1. First build your dcc.tsv, id_namespace.tsv, project.tsv, project_in_project.tsv, file.tsv, file_describes_biosample.tsv, file_describes_collection.tsv, file_describes_subject.tsv, file_in_collection.tsv, biosample.tsv, biosample_disease.tsv, biosample_from_subject.tsv, biosample_gene.tsv, biosample_in_collection.tsv, biosample_substance.tsv, subject.tsv, subject_disease.tsv, subject_in_collection.tsv, subject_phenotype.tsv, subject_race.tsv, subject_role_taxonomy.tsv, subject_substance.tsv, collection.tsv, collection_anatomy.tsv, collection_biofluid.tsv, collection_compound.tsv, collection_defined_by_project.tsv, collection_disease.tsv, collection_gene.tsv, collection_in_collection.tsv, collection_phenotype.tsv, collection_protein.tsv, collection_substance.tsv and collection_taxonomy.tsv tables. (Some of these can be left empty (as header-only TSVs) if desired: see the C2M2 table wiki for requirements. A zipped-folder containing empty core (and core-associated) tables can be downloaded from OSF.)

  2. Download the script [Last updated 25 Feb 2025] at OSF

  3. Download the CV reference files [Last updated 27 Nov 2024] at OSF (select external_CV_reference_files and then 'Download as zip'.)

  4. Unzip the external_CV_reference_files folder

  5. Put external_CV_reference_files and prepare_C2M2_submission.py into the same folder

  6. Create a subdirectory containing your pre-built file.tsv, biosample.tsv, etc., then edit line 44 of prepare_C2M2_submission.py to match.

  7. Use the command line to run the script: python prepare_C2M2_submission.py

This script is under active development: please contact us with any questions by emailing the helpdesk at [email protected] or posting to Discussions

Clone this wiki locally