Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wrong spanning deletion length with NON_REF of HaplotypeCaller #8292

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

TomofumiSaka
Copy link

Dear GATK team,

I found and fixed a bug in HaplotypeCaller that incorrectly lengthened the deletion by one nucleotide in the case of gVCF mode.
Like this vcf (this result is made from open data HG001/NA12878 sample).

chr21 10452597 . G <NON_REF> . . END=10452602 GT:DP:GQ:MIN_DP:PL 0/0:147:99:146:0,120,1800
chr21 10452603 . T C,<NON_REF> 255.64 . BaseQRankSum=-0.276;DP=156;ExcessHet=0.0000;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-7.712;RAW_MQandDP=487874,156;ReadPosRankSum=1.061 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:136,16,0:152:99:0|1:10452603_T_C:263,0,5564,675,5628,6356:10452603:59,77,0,16
chr21 10452604 . G *,A,<NON_REF> 255.64 . BaseQRankSum=-1.417;DP=158;ExcessHet=0.0000;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQRankSum=-7.712;RAW_MQandDP=495074,158;ReadPosRankSum=1.037 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|2:136,0,16,0:152:99:0|1:10452603_T_C:263,681,6518,0,5638,5564,681,6462,5634,6446:10452603:59,77,0,16
chr21 10452605 . C *,T,<NON_REF> 191.64 . BaseQRankSum=-0.601;DP=161;ExcessHet=0.0000;MLEAC=0,1,0;MLEAF=0.00,0.500,0.00;MQRankSum=1.656;RAW_MQandDP=505874,161;ReadPosRankSum=-0.404 GT:AD:DP:GQ:PL:SB 0/2:131,0,19,0:150:99:199,665,6662,0,4918,4665,652,6264,4868,6100:50,81,9,10
chr21 10452606 . C <NON_REF> . . END=10452611 GT:DP:GQ:MIN_DP:PL 0/0:154:99:150:0,120,1800
chr21 10452612 . G A,<NON_REF> 0 . BaseQRankSum=4.613;DP=166;ExcessHet=0.0000;MLEAC=0,0;MLEAF=0.00,0.00;MQRankSum=-7.647;RAW_MQandDP=530720,166;ReadPosRankSum=1.518 GT:AD:DP:GQ:PL:SB 0/0:146,13,0:159:12:0,12,4420,474,4719,5968:64,82,0,13

chr21 10452604 . G *,A,<NON_REF> and chr21 10452605 . C *,T,<NON_REF> are not covered any deletion clearly but treated as spanning deletion.

Command is following.
$java -jar $gatk HaplotypeCaller \ --reference Homo_sapiens_assembly38.fasta \ --input CNR0028194.gatk_best_practice.GRCh38.chr21_10451605-10453605.bam \ --output out.vcf \ --pcr-indel-model NONE \ -ERC GVCF \ -L chr21:10451605-10453605
Reference is GATK Bundle hg38.
Input bam is placed in https://pezycomputing-my.sharepoint.com/:f:/g/personal/sakai_pezy_co_jp/EuiUh7J-eCpOmA_Xkf3cOEwByf3lqKpm4N4FdYy7B5FCJA?e=dlZwDl

In recordDeletion method, NON_REF Allele is used for calculation of delesionSize as zero length allele.
I modified it to skip when NON_REF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant