Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Purpose of this document

The aim of this document is to centralize answers to key questions about the CTTV TD-association JSON format. It is intended to be read by individuals working on preparing their data to fit this model.

Table of Contents

'Evidence code' questions

1. Which evidence codes should I use?

'Creating TD-association JSONs' questions

1. Which fields are mandatory in the JSON schema?

Please look through the CTTV JSON schema for fields marked as "required" : true. The only field that is not set as "required" : true is 'unique_association_fields'.

2. Should I use ‘null’ when a field is not mandatory?

No. You don't need to create these.

3. Where can I see an example of the type of data I am trying to fit?

A list of examples can be browsed here

4. Do I provide my data as a JSON data service or or as one large array?

For now, please create a JSON array (one large file) and provide us with the URL to download this. Please update this github markdown document with your download URL. Once a CTTV production environment is in place, we will aim to establish a common fileshare for all groups to upload their JSON files.

5. How do I validate the data I am providing?

6. Do you have a package I can use to write out CTTV-JSON files?

This is in development. You can pull the latest version from here

7. What is the 'unique_association_fields' codeblock in the JSON?

We need to know whether a given "target-disease" association can be uniquely identified in your database. This is an important requirement as we will need to track whether a unique "target-disease" association has changed properties (e.g. p-values) between release cycles. The 'unique_association_fields' is, therefore, implemented in the schema to capture an array of key:value pairs in your data that can help with this. If you have an array of values, please provide these as a comma-separated array. This will be different between different data providers so please use a set of keys that is specific to your database and that is consistent between release cycles. Here are some examples:

ChEMBL:

    "unique_association_fields": {
        "chembl_molecules": "http://identifiers.org/chembl.compound/CHEMBL2",
        "chembl_targets": "http://identifiers.org/uniprot/P25100,http://identifiers.org/uniprot/P35348,http://identifiers.org/uniprot/P35368",
        "atc_classification": "http://identifiers.org/atc/C"
    }

ArrayAtlas:

    "unique_association_fields": {
        "geneID": "http://identifiers.org/ensembl/ENSG00000127720",
        "study_id": "http://identifiers.org/gxa.expt/E-MEXP-3628",
        "comparison_name": "'osteosarcoma' vs 'normal'"
    }

Reactome:

    "unique_association_fields": {
        "biological_subjects": "http://identifiers.org/uniprot/P50443",
        "reactome_id": "http://identifiers.org/reactome/REACT_267687.1",
        "biological_objects": "http://identifiers.org/orphanet/Orphanet_93298"
    }

8. When do I use an 'evidence chain'?

You use this when there are >1 independent analytical steps used to associate a target with a disease. There are 2 examples you can look at:

  • biological target to disease association via drug - There are 2 independent analyses in this that has resulted in a chain: 1) Experimental analysis carried out to associate a protein/protein complex target to a drug, 2) Clinical analysis carried out to associate the drug to its effect in disease
  • gene to disease association via snp - There are 2 independent analyses in this that has resulted in a chain: 1) Computational analysis carried out to associate a gene target to its nearest nucleotide polymorphism, 2) Genetics analysis carried out to associate the nucleotide polymorphism to its effect in disease

9. {evidence}{association_score} is a required field. But what if my dataset doesn't estimate this?

Probability: This is an estimate of the confidence of an assertion (as a probability) in the range 0-1. It is expected that this number will be specific to pipeline projects. If you do provide this value, please also indicate the method used to calculate it using the {probability}{method} field (currently free text). The 'probability' itself is a relative value within a data type and is not directly comparable between data types. The following is an example scenario where one could calculate this:

  • Curator uses their own numerical scores (e.g. in the range 1-10) to indicate strength of evidence: Please normalize the values to the range 0-1. Please check with your curation group whether this is possible.
  • If the above does not apply, please provide a 'null' value to indicate to us that such a calculation does not apply to your data.

The methods for scoring will inevitably evolve.

Pvalue:

Same rules as for probability above.

Examples:

GWAS study:

    "association_score": {
        "probability": {
            "value": 0.0202556065553751,
            "method": "Please describe the method"
        },
        "pvalue": {
            "value": 1.316e-44,
            "method": "calculated from GWAS study"
        }
    }
    
ArrayAtlas study:

    "association_score": {
        "probability": {
            "value": null,
            "method": null
        },
        "pvalue": {
            "value": 0.0000418,
            "method": "pvalue from expression array comparison study"
        }
    }

Miriam registry questions

1. What is identifiers.org/Miriam registry and what is its relevance to the CTTV platform?

Link to MIRIAM registry: This is maintained by the Biomodels group at the EBI.

It is a registry that provides a URI to URL mapping service (through identifiers.org). For example:

example URI = "http://identifiers.org/eco/ECO:0001113"

URLs resolved by identifiers.org = 
"http://www.ebi.ac.uk/ontology-lookup/?termId=ECO%3A0001113",
"http://purl.bioontology.org/ontology/ECO/ECO%3A0001113"

By using identifer.org URIs, we can point the CTTV web application to pre-resolved URLs.

2. When do I use the “http:// identifiers.org” URI prefix in the JSON?

3. http:// identifiers.org/cttv.target and http:// identifiers.org/cttv.activity URIs don't exist in MIRIAM yet. Which specific CTTV URIs can I use?

  • You can browse the list of the URIs here. These correspond to the terms in the CTTV core ontology.
  • These are the 'biological_subject' fields in the JSON where this is used (full example):
        "properties": {
            "target_type": "http://identifiers.org/cttv.target/protein_complex_heteropolymer",
            "activity": "http://identifiers.org/cttv.activity/drug_negative_modulator"
        }

Experimental factors ontology (EFO) questions

1. What if my disease term does not currently map to EFO?

If you have a disease term which isn't yet mapped to EFO, please:

  1. Use the efo URI 'http://identifiers.org/efo/0000000' in the JSON {biological_object}{about}[0] field.
  2. Provide your unmapped term under '{biological_object}{properties}{experiment_specific}{unmapped_disease_term}'
    "biological_object": {
        "about": [
            "http://identifiers.org/efo/0000000"
        ],
        "properties": {
            "experiment_specific": {
                "unmapped_disease_term" : "disease X which is not in EFO"
            }
        }
    }

2. Where can I get more information about how to map terms to EFO?

See jamesmalone and tonyburdett's Confluence WIKI (Only for those with CTTV Confluence accounts)