Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator update: only allow SOMATIC or GERMLINE values for SV_Status column #64

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

oplantalech
Copy link
Contributor

Following the documentation, SV_Status column of structural variant files can only contain SOMATIC or GERMLINE values. The validator assumes if SV_Status is empty, that the value will be SOMATIC. In practice, the loader loads the entry with an empty value and causes some parts to break (e.g. Structural Variant table in the Study View is empty).

I'm proposing to raise an error in the validator when the value is empty, and still allow to load studies with empty SV_Status by only using the loader. However, if that is really something we want to avoid, I can also change the PR and adjust the loader to follow the logic described by the validator (@rmadupuri and team let me know).

@rmadupuri
Copy link
Contributor

Hi @oplantalech, from a quick check, it seems the loader is filtering out the rows with null SV_Status during import? Need to double check on that but we think it may be better to report an error rather than defaulting to somatic. This PR is good to go once the unit test is fixed.

@oplantalech
Copy link
Contributor Author

oplantalech commented Oct 29, 2024

@rmadupuri In this line you can see how it adds null when the SV_Status has a NA value. That's the problem. The validator suggests that this null is replaced by "SOMATIC".

I will fix the tests.

@oplantalech
Copy link
Contributor Author

@averyniceday Just had a call with the data team and we think the loader might have to be fixed for structural variants regarding SV_Status. This is a required column, with only two possible values (SOMATIC and GERMLINE), but it looks like the loader allows to load empty values as null (see this line). That is a different behavior that what we see in the mutation data, where required fields should be filled or otherwise the loader breaks. Do you know maybe why this was introduced, or if we should avoid touching the loader?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants