-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vcf file processing #124
base: master
Are you sure you want to change the base?
Conversation
@@ -8,7 +8,7 @@ | |||
|
|||
def vcf_file_geno_lines(path, mode="genotyped", variant_mapping=None, whitelist=None, skip_palindromic=False, liftover_conversion=None): | |||
logging.log(9, "Processing vcf %s", path) | |||
vcf_reader = VCF(path) | |||
vcf_reader = VCF(path, gts012=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does gts012 represent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From cyvcf2.VCF documentation:
gts012 (bool) – if True, then gt_types will be 0=HOM_REF, 1=HET, 2=HOM_ALT, 3=UNKNOWN. If False, 3, 2 are flipped.
yield (variant_id, chr, pos, ref, alt, f) + tuple(d) | ||
|
||
elif mode == "imputed": | ||
if len(alts) > 1: | ||
logging.log("VCF imputed mode doesn't support multiple ALTs, skipping %s", variant_id) | ||
if (len(ref)) | (len(alts[0])) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The genotype contains indels too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and they could be coded as either REF or ALT.
I made some edits to get Predict.py running on UK Biobank data converted to VCF format. Note, I used plink2 to convert bgen to vcf with the modifer 'vcf-dosage=DS-force'. Otherwise, dosages were missing from genotyped variants if I used 'vcf-dosage=DS'.