fix insufficient MaxProbPropagationDistance #8294
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dear GATK Team,
I may have found code with unintended behavior in the HaplotypeCaller.
The field MaxProbPropagationDistance of the ActivityProfile class can be set in the --max-prob-propagation-distance option and defaults to 50.
The field MAX_FILTER_SIZE of the BandPassActivityProfile class is 50.
I guess that the default value for --max-prob-propagation-distance is set as a value well above the MAX_FILTER_SIZE of the BandPassActivityProfile.
In the findEndOfRegion method of the ActivityProfile class, it is checked that sufficient length of activity profile is already calculated and added. The MaxProbPropagationDistance value is should be plus one due to the center value of gaussian filter ( please see the processState method of the BandPassActivityProfile).
You can confirm this with uploaded bam in https://pezycomputing-my.sharepoint.com/:f:/g/personal/sakai_pezy_co_jp/EkjbywcfooxJgtlT9z_vOVUBTZAfHTSvH_s1Bl1dzIH3tw?e=Y8UlB4 and following command with --max-prob-propagation-distance 51 option or this pull request change. Uploaded bam is made from open data HG001/NA12878 sample.
$java -jar $gatk HaplotypeCaller --reference Homo_sapiens_assembly38.fasta --input CNR0028194.gatk_best_practice.GRCh38.chr10_41899453-41901453.bam --output out.vcf --pcr-indel-model NONE -L chr10:41899453-41901453
This fix has very small effect to variant call. In WGS, this change affects only 10 or fewer variants.
Changing --max-prob-propagation-distance default value to 51 from 50 is also OK.