Namespaced variant CSV export

Two API endpoints that return variant data in CSV format (`/score-sets/{urn}/scores` and `/score-sets/{urn}/counts`) use column names specified by the original data uploader. These include

- Some nonempty set of HGVS columns (`hgvs_nt`, `hgvs_pro`, and `hgvs_splice`)
- `score`, for the scores endpoint only
- Score-set-specific custom column names, for both endpoints. There are separate sets of custom columns for counts and scores, and their names may overlap.
- And an `accession` column that gives each variant's MaveDB URN.

Ignoring column order, the download's content is identical to the raw CSV data that was originally uploaded, except for the MaveDB-supplied `accession` column.

It may be useful to provide a namespace version of the CSV export, which would have

- `accession`: The variant's MaveDB URN
- `hgvs_nt`, `hgvs_pro`, and/or `hgvs_splice`
- `scores.score`: The main score column
- `scores.<custom column>` for each additional column originally uploaded in the "scores" CSV file
- `counts.<custom column>` for each column originally uploaded in the "counts" CSV file

In other words, we would namespace all columns except for `accession`, `hgvs_nt`, `hgvs_pro`, and `hgvs_splice`.

This would allow us to add columns computed by MaveDB or obtained from other data sources, such as
- The ClinGen allele ID;
- Mapped HGVS strings, such as `mavedb.mapped_hgvs_nt_g`, `mavedb.mapped_hgvs_nt_c;
- And information from ClinVar, gnomAD, or other data sources, suitably namespaces.

It will also allow score, count, and other data to be obtained in a single CSV file without concern for name collision between score and count data or between these and MaveDB-provided columns.

While we do not intend MaveDB as a repository for variant data from other sources, the MaveDB UI will increasingly rely on having efficient access to variant data from ClinVar, gnomAD, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Namespaced variant CSV export #446

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Namespaced variant CSV export #446

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions