The aim of this document is to centralize answers to key questions about the CTTV TD-association JSON format. It is intended to be read by individuals working on preparing their data to fit this model.
Table of Contents
- 'Evidence code' questions
- 'Creating TD-association JSONs' questions
- 1. Which fields are mandatory in the JSON schema?
- 2. Should I use ‘null’ when a field is not mandatory?
- 3. Where can I see an example of the type of data I am trying to fit?
- 4. Do I provide my data as a JSON data service or or as one large array?
- 5. How do I validate the data I am providing?
- 6. Do you have a package I can use to write out CTTV-JSON files?
- 7. What is the 'unique_association_fields' codeblock in the JSON?
- 8. When do I use an 'evidence chain'?
- 9. {evidence}{association_score} is a required field. But what if my dataset doesn't estimate this?
- Miriam registry questions
- 1. What is identifiers.org/Miriam registry and what is its relevance to the CTTV platform?
- 2. When do I use the “http:// identifiers.org” URI prefix in the JSON?
- 3. http:// identifiers.org/cttv.target and http:// identifiers.org/cttv.activity URIs don't exist in MIRIAM yet. Which specific CTTV URIs can I use?
- Experimental factors ontology (EFO) questions
- We are using the Evidence Codes Ontology.
- Please provide evidence codes using this URI syntax: "http://identifiers.org/eco/ECO:nnnnnnn".
- To see which evidence codes are assigned to your project, please click here
- If you need to use additional evidence codes to the ones above, please make sure you update this github markdown document with your new codes
- For requesting new evidence codes, please contact jamesmalone, cc samiulxhasan
Please look through the CTTV JSON schema for fields marked as "required" : true. The only field that is not set as "required" : true is 'unique_association_fields'.
No. You don't need to create these.
A list of examples can be browsed here
For now, please create a JSON array (one large file) and provide us with the URL to download this. Please update this github markdown document with your download URL. Once a CTTV production environment is in place, we will aim to establish a common fileshare for all groups to upload their JSON files.
- You can do this using an online validator like JSON schema lint validator:
- Open JSON schema lint validator in your web browser
- Copy the CTTV JSON schema into the 'JSON Schema' box
- Copy the example ChEMBL JSON instance into the 'JSON' box
- Both boxes should light up green if all has worked!
- Try changing some values in the 'JSON' box and see if the schema still validates it!
- You can do this locally using our python script
-
- The JSON processor jq is also powerful.
This is in development. You can pull the latest version from here
We need to know whether a given "target-disease" association can be uniquely identified in your database. This is an important requirement as we will need to track whether a unique "target-disease" association has changed properties (e.g. p-values) between release cycles. The 'unique_association_fields' is, therefore, implemented in the schema to capture an array of key:value pairs in your data that can help with this. If you have an array of values, please provide these as a comma-separated array. This will be different between different data providers so please use a set of keys that is specific to your database and that is consistent between release cycles. Here are some examples:
ChEMBL:
"unique_association_fields": {
"chembl_molecules": "http://identifiers.org/chembl.compound/CHEMBL2",
"chembl_targets": "http://identifiers.org/uniprot/P25100,http://identifiers.org/uniprot/P35348,http://identifiers.org/uniprot/P35368",
"atc_classification": "http://identifiers.org/atc/C"
}
ArrayAtlas:
"unique_association_fields": {
"geneID": "http://identifiers.org/ensembl/ENSG00000127720",
"study_id": "http://identifiers.org/gxa.expt/E-MEXP-3628",
"comparison_name": "'osteosarcoma' vs 'normal'"
}
Reactome:
"unique_association_fields": {
"biological_subjects": "http://identifiers.org/uniprot/P50443",
"reactome_id": "http://identifiers.org/reactome/REACT_267687.1",
"biological_objects": "http://identifiers.org/orphanet/Orphanet_93298"
}You use this when there are >1 independent analytical steps used to associate a target with a disease. There are 2 examples you can look at:
- biological target to disease association via drug - There are 2 independent analyses in this that has resulted in a chain: 1) Experimental analysis carried out to associate a protein/protein complex target to a drug, 2) Clinical analysis carried out to associate the drug to its effect in disease
- gene to disease association via snp - There are 2 independent analyses in this that has resulted in a chain: 1) Computational analysis carried out to associate a gene target to its nearest nucleotide polymorphism, 2) Genetics analysis carried out to associate the nucleotide polymorphism to its effect in disease
Probability: This is an estimate of the confidence of an assertion (as a probability) in the range 0-1. It is expected that this number will be specific to pipeline projects. If you do provide this value, please also indicate the method used to calculate it using the {probability}{method} field (currently free text). The 'probability' itself is a relative value within a data type and is not directly comparable between data types. The following is an example scenario where one could calculate this:
- Curator uses their own numerical scores (e.g. in the range 1-10) to indicate strength of evidence: Please normalize the values to the range 0-1. Please check with your curation group whether this is possible.
- If the above does not apply, please provide a 'null' value to indicate to us that such a calculation does not apply to your data.
The methods for scoring will inevitably evolve.
Pvalue:
Same rules as for probability above.
Examples:
GWAS study:
"association_score": {
"probability": {
"value": 0.0202556065553751,
"method": "Please describe the method"
},
"pvalue": {
"value": 1.316e-44,
"method": "calculated from GWAS study"
}
}
ArrayAtlas study:
"association_score": {
"probability": {
"value": null,
"method": null
},
"pvalue": {
"value": 0.0000418,
"method": "pvalue from expression array comparison study"
}
}Link to MIRIAM registry: This is maintained by the Biomodels group at the EBI.
It is a registry that provides a URI to URL mapping service (through identifiers.org). For example:
example URI = "http://identifiers.org/eco/ECO:0001113"
URLs resolved by identifiers.org =
"http://www.ebi.ac.uk/ontology-lookup/?termId=ECO%3A0001113",
"http://purl.bioontology.org/ontology/ECO/ECO%3A0001113"By using identifer.org URIs, we can point the CTTV web application to pre-resolved URLs.
- Please use the 'http://identifiers.org/namespace' URI prefix whenever you are referring to an identifiable biological entity.
- Examples: http://identifiers.org/uniprot/P25100, http://identifiers.org/eco/ECO:0000360, http://identifiers.org/chembl.compound/CHEMBL2
- Check the miriam registry to check if it exists.
- These are typically used as biological_subject or biological_object resources but can be used elsewhere too (e.g. within the 'experiment_specific' codeblock).
- When referring to cttv-specific resources, please use 'http://identifiers.org' URI prefix as well. For example 'http://identifiers.org/cttv.target/gene'. We will request allocation of the cttv URI namespace to the MIRIAM registry.
3. http:// identifiers.org/cttv.target and http:// identifiers.org/cttv.activity URIs don't exist in MIRIAM yet. Which specific CTTV URIs can I use?
- You can browse the list of the URIs here. These correspond to the terms in the CTTV core ontology.
- These are the 'biological_subject' fields in the JSON where this is used (full example):
"properties": {
"target_type": "http://identifiers.org/cttv.target/protein_complex_heteropolymer",
"activity": "http://identifiers.org/cttv.activity/drug_negative_modulator"
}If you have a disease term which isn't yet mapped to EFO, please:
- Use the efo URI 'http://identifiers.org/efo/0000000' in the JSON {biological_object}{about}[0] field.
- Provide your unmapped term under '{biological_object}{properties}{experiment_specific}{unmapped_disease_term}'
"biological_object": {
"about": [
"http://identifiers.org/efo/0000000"
],
"properties": {
"experiment_specific": {
"unmapped_disease_term" : "disease X which is not in EFO"
}
}
}See jamesmalone and tonyburdett's Confluence WIKI (Only for those with CTTV Confluence accounts)