-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Two API endpoints that return variant data in CSV format (/score-sets/{urn}/scores
and /score-sets/{urn}/counts
) use column names specified by the original data uploader. These include
- Some nonempty set of HGVS columns (
hgvs_nt
,hgvs_pro
, andhgvs_splice
) score
, for the scores endpoint only- Score-set-specific custom column names, for both endpoints. There are separate sets of custom columns for counts and scores, and their names may overlap.
- And an
accession
column that gives each variant's MaveDB URN.
Ignoring column order, the download's content is identical to the raw CSV data that was originally uploaded, except for the MaveDB-supplied accession
column.
It may be useful to provide a namespace version of the CSV export, which would have
accession
: The variant's MaveDB URNhgvs_nt
,hgvs_pro
, and/orhgvs_splice
scores.score
: The main score columnscores.<custom column>
for each additional column originally uploaded in the "scores" CSV filecounts.<custom column>
for each column originally uploaded in the "counts" CSV file
In other words, we would namespace all columns except for accession
, hgvs_nt
, hgvs_pro
, and hgvs_splice
.
This would allow us to add columns computed by MaveDB or obtained from other data sources, such as
- The ClinGen allele ID;
- Mapped HGVS strings, such as
mavedb.mapped_hgvs_nt_g
, `mavedb.mapped_hgvs_nt_c; - And information from ClinVar, gnomAD, or other data sources, suitably namespaces.
It will also allow score, count, and other data to be obtained in a single CSV file without concern for name collision between score and count data or between these and MaveDB-provided columns.
While we do not intend MaveDB as a repository for variant data from other sources, the MaveDB UI will increasingly rely on having efficient access to variant data from ClinVar, gnomAD, etc.