-
Notifications
You must be signed in to change notification settings - Fork 7
Building a datapackage
Amanda Charbonneau edited this page Dec 6, 2021
·
17 revisions
There are many valid ways to build a datapackage depending on the underlying structure of your DCCs data. Although you must submit all tables to have a valid package, most tables, and most columns within most tables, can be left blank. This allows you to use only the pieces of the C2M2 that best fit your data. This page does not discuss modeling per se, and is instead designed to give a quick overview of how to finalize your datapackage. If you need help turning your internal data model into something compatible with the C2M2 please contact the helpdesk directly for individualized support.
- Build the tables that your DCC has data for according to the technical documentation. Supplemental table information is also available in this wiki
- Download the C2M2 JSON file from CFDE OSF space and save in the same directory as your tables
- Follow helper script instructions to generate controlled vocabulary tables from your data.
- If you do not have 33 tables, download blank headers for the missing tables here and save them to the same directory
- OPTIONAL: run the frictionless validator script
- Submit your datapackage using the cfde-submit tool NOTE: This will only work if you have been onboarded as a data submitter
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biofluid.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_ptm.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_biofluid.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_ptm.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv
- disease.tsv
- domain_location.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- ptm.tsv
- ptm_type.tsv
- ptm_subtype.tsv
- sample_prep_method.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary