Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

vdauwera · 2015-03-07T08:47:58Z

Originally from @vruano

Follow up from story https://www.pivotaltracker.com/story/show/68368324

Currently HC unclips hight quality soft-clip in order to aid the discovery of variation thru the local de-novo assembly and calculation of read vs haplotype likelihoods.

As a side effect processes that default to the original alignment (e.g to calculate HOM REF confidenced based on the pileup) take the unclipped ends as if it is the bona-fine aligned to that section of the reference.

In occasions this seems to be beneficial as it cast genuine doubt on overlapped HOM-REF calls… however some other times it has the opposite effect that casting doubt of genuine hom-ref calls just because they are close to a medium-large insertion that explain well the soft-clips.

The task is to tackle the issue making the soft-clips counts agains hom-ref only when these are not well explained by real variation. This may required to make the RCM to fully used the graph to calculate the HOM-REF likelihoods as initially intended (if fail through due to change in priorities) instead of defaulting to the original alignment (with unclipped soft-clips).

akiezun · 2015-06-22T20:55:59Z

@vdauwera is this a bug or enhancement?

vdauwera · 2016-02-11T18:33:48Z

Bug. It creates inaccurate ref calls. Test case is available in https://github.com/broadinstitute/gsa-unstable/issues/1271

pgrosu · 2016-02-11T19:07:50Z

Hi Geraldine,

Sorry to bother, but when I try to following this link, I get a 404 error:

https://github.com/broadinstitute/gsa-unstable/issues/1271

Thanks,
Paul

vdauwera · 2016-02-11T19:13:00Z

Hi Paul, that means you don't have access to our internal repositories. Let me see if I can get you access.

pgrosu · 2016-02-11T19:13:59Z

Thank you :)

vdauwera · 2016-02-11T19:36:21Z

I added you as collaborator, you should have access now.

pgrosu · 2016-02-11T19:40:12Z

Thank you Geraldine.

vdauwera · 2017-02-10T14:32:24Z

Hey all, @vruano / @davidbenjamin, is this on your radar at all? I heard rumblings about a rewrite of the assembly machinery; would that address this?

davidbenjamin · 2017-02-10T14:42:22Z

@vdauwera not on my radar.

vruano · 2017-02-10T17:45:47Z

@vdauwera not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases.

vdauwera · 2017-02-10T17:50:14Z

Ok, thanks for letting me know. We've been getting user complaints about this, FYI.

chlangley · 2017-02-10T18:11:47Z

Hello: Thanks for info. I don’t suppose it is my role to militate/plead for the solid fix of the this omission. But I must say that it would be appreciated and in its own way advance science. Thanks for any consideration. Cheers, Chuck

…

On 10/Feb/2017, at 9:45 AM, Valentin Ruano Rubio ***@***.***> wrote: @vruano not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

vdauwera · 2017-02-10T18:18:14Z

Hi Chuck, as always it's a prioritization problem. Our internal stakeholders haven't indicated that this is a significant problem for them (right @ldgauthier ?) so it's difficult for us to justify putting resources into it ahead of other work. But if we start getting a lot of demand from external users to address this (especially if there is a well-documented impact on a research use case), then we could potentially reevaluate its priority.

chlangley · 2017-02-11T05:21:41Z

Thanks for getting this cleared up.
OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.

I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.

ldgauthier · 2017-02-12T21:17:48Z

This is probably affecting some of the GWAS studies but in subtle ways that haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time to think about the issue. I'd need some uninterrupted time to work out the details and that's hard to come by at the moment.

…

On Feb 11, 2017 12:21 AM, "chlangley" ***@***.***> wrote: Thanks for getting this cleared up. OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics. I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#269 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o> .

ldgauthier · 2017-02-13T01:29:51Z

Hi Laura, hope you are enjoying your maternity leave! Unfortunately i will not have time to look into this, since I’m writing up a paper. cheers,

…

On Feb 12, 2017, at 4:17 PM, Laura Gauthier ***@***.***> wrote: This is probably affecting some of the GWAS studies but in subtle ways that haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time to think about the issue. I'd need some uninterrupted time to work out the details and that's hard to come by at the moment. On Feb 11, 2017 12:21 AM, "chlangley" ***@***.*** ***@***.***>> wrote: Thanks for getting this cleared up. OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics. I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#269 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o>.

vdauwera · 2017-03-02T03:56:03Z

@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.

chlangley · 2017-03-02T05:49:53Z

Yes. That has potential. Let me consider your suggestion.

chlangley · 2017-03-18T12:15:30Z

Hello Geraldine:

On 1/Mar/2017, at 7:56 PM, Geraldine Van der Auwera ***@***.***> wrote: @chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.

I started to work on this a bit and found myself blocked. At this point I have a simple question: The GATK blog is separate from the forum (?). When I am on the blog page I can’t seem to find a button to submit a new post. I must be missing something or the route to blog posting is only via the forum? Sorry to bother you with such mundane question. Cheers, Chuck

vdauwera · 2017-03-18T13:33:48Z

Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc.

chlangley · 2017-03-18T14:54:26Z

Sounds good. Thanks, Chuck

…

On Mar 18, 2017, at 06:33, Geraldine Van der Auwera ***@***.***> wrote: Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

akiezun added enhancement HaplotypeCaller labels Apr 15, 2015

akiezun assigned vdauwera Jun 22, 2015

vdauwera added bug and removed enhancement labels Feb 11, 2016

vdauwera removed their assignment Feb 10, 2017

droazen added this to the 4.0 release milestone Mar 20, 2017

droazen modified the milestones: Engine-4.0, Engine-4.1 Oct 17, 2017

droazen removed this from the Engine-2Q2018 milestone Oct 4, 2018

mmorgantaylor mentioned this issue Apr 9, 2021

Allow users to specify VQSLOD sensitivity and apply threshold in ExtractCohort #7194

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

vdauwera commented Mar 7, 2015

akiezun commented Jun 22, 2015

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 10, 2017

davidbenjamin commented Feb 10, 2017

vruano commented Feb 10, 2017 •

edited

Loading

vdauwera commented Feb 10, 2017

chlangley commented Feb 10, 2017 via email

vdauwera commented Feb 10, 2017

chlangley commented Feb 11, 2017

ldgauthier commented Feb 12, 2017 via email

ldgauthier commented Feb 13, 2017 via email

vdauwera commented Mar 2, 2017

chlangley commented Mar 2, 2017

chlangley commented Mar 18, 2017 via email

vdauwera commented Mar 18, 2017

chlangley commented Mar 18, 2017 via email

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

Comments

vdauwera commented Mar 7, 2015

akiezun commented Jun 22, 2015

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 11, 2016

pgrosu commented Feb 11, 2016

vdauwera commented Feb 10, 2017

davidbenjamin commented Feb 10, 2017

vruano commented Feb 10, 2017 • edited Loading

vdauwera commented Feb 10, 2017

chlangley commented Feb 10, 2017 via email

vdauwera commented Feb 10, 2017

chlangley commented Feb 11, 2017

ldgauthier commented Feb 12, 2017 via email

ldgauthier commented Feb 13, 2017 via email

vdauwera commented Mar 2, 2017

chlangley commented Mar 2, 2017

chlangley commented Mar 18, 2017 via email

vdauwera commented Mar 18, 2017

chlangley commented Mar 18, 2017 via email

vruano commented Feb 10, 2017 •

edited

Loading