-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269
Comments
@vdauwera is this a bug or enhancement? |
Bug. It creates inaccurate ref calls. Test case is available in https://github.com/broadinstitute/gsa-unstable/issues/1271 |
Hi Geraldine, Sorry to bother, but when I try to following this link, I get a 404 error: https://github.com/broadinstitute/gsa-unstable/issues/1271 Thanks, |
Hi Paul, that means you don't have access to our internal repositories. Let me see if I can get you access. |
Thank you :) |
I added you as collaborator, you should have access now. |
Thank you Geraldine. |
Hey all, @vruano / @davidbenjamin, is this on your radar at all? I heard rumblings about a rewrite of the assembly machinery; would that address this? |
@vdauwera not on my radar. |
@vdauwera not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases. |
Ok, thanks for letting me know. We've been getting user complaints about this, FYI. |
Hello:
Thanks for info.
I don’t suppose it is my role to militate/plead for the solid fix of the this omission.
But I must say that it would be appreciated and in its own way advance science.
Thanks for any consideration.
Cheers,
Chuck
… On 10/Feb/2017, at 9:45 AM, Valentin Ruano Rubio ***@***.***> wrote:
@vruano not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi Chuck, as always it's a prioritization problem. Our internal stakeholders haven't indicated that this is a significant problem for them (right @ldgauthier ?) so it's difficult for us to justify putting resources into it ahead of other work. But if we start getting a lot of demand from external users to address this (especially if there is a well-documented impact on a research use case), then we could potentially reevaluate its priority. |
Thanks for getting this cleared up. I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might. |
This is probably affecting some of the GWAS studies but in subtle ways that
haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time
to think about the issue. I'd need some uninterrupted time to work out the
details and that's hard to come by at the moment.
…On Feb 11, 2017 12:21 AM, "chlangley" ***@***.***> wrote:
Thanks for getting this cleared up.
OK, what next? I'll check with colleagues who may be aware this 'feature'.
Perhaps the case can be made more clearly by a group of users, including
visible labs working on human evolutionary genomics.
I don't know the CA genomics community well, but my shallow poling
suggests most are happily unaware that SNPs near indels will often be
assigned lower quality than they might.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#269 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o>
.
|
Hi Laura, hope you are enjoying your maternity leave!
Unfortunately i will not have time to look into this, since I’m writing up a paper.
cheers,
… On Feb 12, 2017, at 4:17 PM, Laura Gauthier ***@***.***> wrote:
This is probably affecting some of the GWAS studies but in subtle ways that haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time to think about the issue. I'd need some uninterrupted time to work out the details and that's hard to come by at the moment.
On Feb 11, 2017 12:21 AM, "chlangley" ***@***.*** ***@***.***>> wrote:
Thanks for getting this cleared up.
OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.
I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#269 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o>.
|
@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork. |
Yes. That has potential. Let me consider your suggestion. |
Hello Geraldine:
On 1/Mar/2017, at 7:56 PM, Geraldine Van der Auwera ***@***.***> wrote:
@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.
I started to work on this a bit and found myself blocked.
At this point I have a simple question: The GATK blog is separate from the forum (?). When I am on the blog page I can’t seem to find a button to submit a new post. I must be missing something or the route to blog posting is only via the forum?
Sorry to bother you with such mundane question.
Cheers,
Chuck
|
Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc. |
Sounds good.
Thanks,
Chuck
… On Mar 18, 2017, at 06:33, Geraldine Van der Auwera ***@***.***> wrote:
Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Originally from @vruano
Follow up from story https://www.pivotaltracker.com/story/show/68368324
Currently HC unclips hight quality soft-clip in order to aid the discovery of variation thru the local de-novo assembly and calculation of read vs haplotype likelihoods.
As a side effect processes that default to the original alignment (e.g to calculate HOM REF confidenced based on the pileup) take the unclipped ends as if it is the bona-fine aligned to that section of the reference.
In occasions this seems to be beneficial as it cast genuine doubt on overlapped HOM-REF calls… however some other times it has the opposite effect that casting doubt of genuine hom-ref calls just because they are close to a medium-large insertion that explain well the soft-clips.
The task is to tackle the issue making the soft-clips counts agains hom-ref only when these are not well explained by real variation. This may required to make the RCM to fully used the graph to calculate the HOM-REF likelihoods as initially intended (if fail through due to change in priorities) instead of defaulting to the original alignment (with unclipped soft-clips).
The text was updated successfully, but these errors were encountered: