Skip to content

Namespaced variant CSV export #446

@jstone-dev

Description

@jstone-dev

Two API endpoints that return variant data in CSV format (/score-sets/{urn}/scores and /score-sets/{urn}/counts) use column names specified by the original data uploader. These include

  • Some nonempty set of HGVS columns (hgvs_nt, hgvs_pro, and hgvs_splice)
  • score, for the scores endpoint only
  • Score-set-specific custom column names, for both endpoints. There are separate sets of custom columns for counts and scores, and their names may overlap.
  • And an accession column that gives each variant's MaveDB URN.

Ignoring column order, the download's content is identical to the raw CSV data that was originally uploaded, except for the MaveDB-supplied accession column.

It may be useful to provide a namespace version of the CSV export, which would have

  • accession: The variant's MaveDB URN
  • hgvs_nt, hgvs_pro, and/or hgvs_splice
  • scores.score: The main score column
  • scores.<custom column> for each additional column originally uploaded in the "scores" CSV file
  • counts.<custom column> for each column originally uploaded in the "counts" CSV file

In other words, we would namespace all columns except for accession, hgvs_nt, hgvs_pro, and hgvs_splice.

This would allow us to add columns computed by MaveDB or obtained from other data sources, such as

  • The ClinGen allele ID;
  • Mapped HGVS strings, such as mavedb.mapped_hgvs_nt_g, `mavedb.mapped_hgvs_nt_c;
  • And information from ClinVar, gnomAD, or other data sources, suitably namespaces.

It will also allow score, count, and other data to be obtained in a single CSV file without concern for name collision between score and count data or between these and MaveDB-provided columns.

While we do not intend MaveDB as a repository for variant data from other sources, the MaveDB UI will increasingly rely on having efficient access to variant data from ClinVar, gnomAD, etc.

Metadata

Metadata

Assignees

Labels

app: backendTask implementation touches the backendapp: frontendTask implementation touches the frontendsprint: carried overTask was carried over from a previous sprinttype: enhancementEnhancement to an existing featureworkstream: clinicalTask relates to clinical features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions