Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

Open
vdauwera opened this issue Mar 7, 2015 · 21 comments

Comments

@vdauwera
Copy link
Contributor

vdauwera commented Mar 7, 2015

Originally from @vruano

Follow up from story https://www.pivotaltracker.com/story/show/68368324

Currently HC unclips hight quality soft-clip in order to aid the discovery of variation thru the local de-novo assembly and calculation of read vs haplotype likelihoods.

As a side effect processes that default to the original alignment (e.g to calculate HOM REF confidenced based on the pileup) take the unclipped ends as if it is the bona-fine aligned to that section of the reference.

In occasions this seems to be beneficial as it cast genuine doubt on overlapped HOM-REF calls… however some other times it has the opposite effect that casting doubt of genuine hom-ref calls just because they are close to a medium-large insertion that explain well the soft-clips.

The task is to tackle the issue making the soft-clips counts agains hom-ref only when these are not well explained by real variation. This may required to make the RCM to fully used the graph to calculate the HOM-REF likelihoods as initially intended (if fail through due to change in priorities) instead of defaulting to the original alignment (with unclipped soft-clips).

@akiezun
Copy link
Contributor

akiezun commented Jun 22, 2015

@vdauwera is this a bug or enhancement?

@vdauwera
Copy link
Contributor Author

Bug. It creates inaccurate ref calls. Test case is available in https://github.com/broadinstitute/gsa-unstable/issues/1271

@vdauwera vdauwera added bug and removed enhancement labels Feb 11, 2016
@pgrosu
Copy link

pgrosu commented Feb 11, 2016

Hi Geraldine,

Sorry to bother, but when I try to following this link, I get a 404 error:

https://github.com/broadinstitute/gsa-unstable/issues/1271

Thanks,
Paul

@vdauwera
Copy link
Contributor Author

Hi Paul, that means you don't have access to our internal repositories. Let me see if I can get you access.

@pgrosu
Copy link

pgrosu commented Feb 11, 2016

Thank you :)

@vdauwera
Copy link
Contributor Author

I added you as collaborator, you should have access now.

@pgrosu
Copy link

pgrosu commented Feb 11, 2016

Thank you Geraldine.

@vdauwera
Copy link
Contributor Author

Hey all, @vruano / @davidbenjamin, is this on your radar at all? I heard rumblings about a rewrite of the assembly machinery; would that address this?

@davidbenjamin
Copy link
Contributor

@vdauwera not on my radar.

@vdauwera vdauwera removed their assignment Feb 10, 2017
@vruano
Copy link
Contributor

vruano commented Feb 10, 2017

@vdauwera not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases.

@vdauwera
Copy link
Contributor Author

Ok, thanks for letting me know. We've been getting user complaints about this, FYI.

@chlangley
Copy link

chlangley commented Feb 10, 2017 via email

@vdauwera
Copy link
Contributor Author

Hi Chuck, as always it's a prioritization problem. Our internal stakeholders haven't indicated that this is a significant problem for them (right @ldgauthier ?) so it's difficult for us to justify putting resources into it ahead of other work. But if we start getting a lot of demand from external users to address this (especially if there is a well-documented impact on a research use case), then we could potentially reevaluate its priority.

@chlangley
Copy link

Thanks for getting this cleared up.
OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.

I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.

@ldgauthier
Copy link
Contributor

ldgauthier commented Feb 12, 2017 via email

@ldgauthier
Copy link
Contributor

ldgauthier commented Feb 13, 2017 via email

@vdauwera
Copy link
Contributor Author

vdauwera commented Mar 2, 2017

@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.

@chlangley
Copy link

Yes. That has potential. Let me consider your suggestion.

@chlangley
Copy link

chlangley commented Mar 18, 2017 via email

@vdauwera
Copy link
Contributor Author

Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc.

@chlangley
Copy link

chlangley commented Mar 18, 2017 via email

@droazen droazen added this to the 4.0 release milestone Mar 20, 2017
@droazen droazen modified the milestones: Engine-4.0, Engine-4.1 Oct 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants