Skip to content

Malformed severus vcf #61

@mdiaz09

Description

@mdiaz09

When running the CNA caller I get an error that the severus VCF file is malformed, but other programs (bcftools) do not have any issues with the file nor does it look off to me. I tried rerunning severus with the same samples and got the same error.

/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/vcf_parser/init.py:3: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import require
[2026-04-29 12:41:18] INFO: Starting Wakhan 0.4.2
[2026-04-29 12:41:18] INFO: Cmd: /home/diazm6/miniconda3/envs/wakhan/bin/wakhan cna --threads 16 --reference /home/diazm6/BWA_references/GRCh38_primary_assembly_plus_ebv_alt_decoy_hla.fasta --target-bam /common/bermanblab/data/private_data/MOC/bam_from_ONT/4096_FT_L2.sorted.haplotagged.bam --normal-phased-vcf /common/bermanblab/data/private_data/MOC/copy/rephased/4096_FT_L2/phasing_output/rephased.vcf.gz --genome-name 4096_FT_L2 --out-dir-plots /common/bermanblab/data/private_data/MOC/copy/rephased/4096_FT_L2 --breakpoints /common/bermanblab/data/private_data/MOC/vcf_files/severus_rephased/4096_FT_L2/somatic_SVs/severus_somatic.vcf --use-sv-haplotypes
[2026-04-29 12:41:18] INFO: Python version: 3.10.20 | packaged by conda-forge | (main, Mar 5 2026, 16:42:22) [GCC 14.3.0]
[2026-04-29 12:41:37] INFO: Starting cna() module...
Traceback (most recent call last):
File "/home/diazm6/miniconda3/envs/wakhan/bin/wakhan", line 10, in
sys.exit(main())
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/src/main.py", line 448, in main
cna_process(args) # cna()
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/src/main.py", line 521, in cna_process
breakpoints_additional = extract_breakpoints_additional(args)
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/src/main.py", line 742, in extract_breakpoints_additional
df_var_bins, df_var_bins_1 = get_chromosomes_bins(args.target_bam[0], args.bin_size, args)
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/src/coverage/binning.py", line 34, in get_chromosomes_bins
,,,, bps, bps_bnd = sv_vcf_bps_cn_check(args.breakpoints, args)
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/src/breakpoint/breakpoints.py", line 97, in sv_vcf_bps_cn_check
for variant in my_parser:
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/vcf_parser/parser.py", line 215, in iter
first_variant = format_variant(
File "/home/diazm6/miniconda3/envs/wakhan/lib/python3.10/site-packages/vcf_parser/utils/format_variant.py", line 40, in format_variant
raise SyntaxError("One of the variant lines is malformed: {0}".format(
SyntaxError: One of the variant lines is malformed: chr1 823057 severus_INS8462 N GCCAGTAGATCCACGCTATCTACACTACCTGCCTGGCCGGCAGATCCACCCTGCTCACACTGCGTGCTTGTCCAGCAGGTCCACCCTGTCTACACTACCTGCCTGCCCAGCAGATCCACCCTGTCTACATTACCTGCCTCTACACTACCTTCTTGTCCATGACGTCCACCCTGTCTACACTACTGCCTTCCCAGGGATATGCACCGTGTCTAGCAGATCCACCCCGTCTACACTACCTGCCTGTCCAGCAGATCTACCCTGTCTACACAACCTGCCTGGCTAGTAGATCCACGCTATCTACACTACCTGCCTGGCCAGCAGATCCACGCTATGTACACTAGGTGCCTGGTCCAGTATATCTACCCTGTCTACCTGCCTCCCCAGCAACTCCACCCTCCTGCCTCACTACCTGCCCTGTCGAGACACTCACCCTTCCACACCAACCTCACCCATCCAAGCAGCTCCCACTGTCACTCCCTGCCAGTGATCCACCCTGTCTCACTACTTGCCCTGCTAAGCCAGCACCCACCCATCACACCACCTGCTGCCAGGATTCACCCCTGTCACCACCTGCCTGGCCAGTAGTTCCATGCTATCTCCACTACCTCCCTGTCCAGCAGACCTGCCCTGTCTATACTACTTGCCTCACATCCGCCCTGTCTTTGCTACCTGCCTGAATAGTAGATCCACGCAATCTACACTACCGGCCTGGCCAGCAGATCCTCAAGTTTGCTCACACTACTTGCTTCCCCAGCAGGTCCACCCTGTCTACACTACCTGCCTGCCCAGCAGATCCACCCTCTCTACACTACCTGCTTTTCCAGCAGGTCCACCCTGTATACACTACCTGCCTTCCCTGCAGATCCACCCTGTCTACCCTACCTGCCTGGGCAGTAGTTCCACGCTATCTCCCCTACCTGCCTGTCCAGCAGACCCGCTCTGTCTACACTACCTGCCTGTCCAGTAGATCCACGGTATCTACACTACCTGCCTGTCCAGCAGATCCGCCCTGTCTATACTACCTGCCCCTCCAGCACATCCACCCTGTGTATACTACCTGCCTCTCCAGCAGATCCGCCCTGTCTACACTACCTGCCTGGCCAGGAGATCCACCCTATCTACATTACCTACCTGACCACCTGCCTGGCCATTATCTCGACGCTATCTACACTACCTGCCTGGCCAGCAGATACACCCTGTCTATACTGTCTGATTGTCCAGCAGATACACCCTGTCTATACTACCTGCCTTGCCAGCAGATCCACCGTGTCTATACTACCTGCCTGTCCAGCAGACCAACCTGTCTACACTTGTCCAGCATATCCGCCCTGTCTACACTACCTGCCTGTCCAACAGATCCGCCCTGTCTATACTACCTGCCTCTCCAGCAGATCCGCCCTGTCTATACTACCGGCCTGTCCAGCACATCCGCGTGCTGTCTACACAACCTGCCTGTCCAGCAGATCCGCCCTGTCTACACTACCTGCCTAGCCAGTACATCCGCCCTATCTACACTGCCTGCGTGGCTAGCAGATCCGCCCTGTCTACACTACCTACCTGCCCAGCAGATCCGCCCTGTCTACACTACCTGCCTGGCCAGTAGATCCACACTATCTACACTACCTGCCTGGCCAGGAGATGCACCCTGTCTACACTACCTGTTTGTCCAGCAGGACCACCCTGTCTACACTACCTGTTTATCCATCAGGTCCACCCTGTCTACACTAGCTGCCCGTCCAGCAGGTCCACCCTATCTACACTACCTACCCGTCCAGCAGATCCACCCTGTCTACACTACCTGCCTGTCCAGCAGATCCACCCTGTCTATACTACCTGCCTATCCAGCACATCTCACCCGTCTACACTACCTGCCTGCCCAGCAGATCCACCCTGTCTATACTACCTGCCTGTCCAGCAGATCCATCCTGTCGATACTACCTGCCTATCCAGCAGATCTACCCTGTCTACACTACCTGCCTGCCCAGCATATCCCCCGTCTATACTACCTGCCTGGCCAGTAGATCCACACTATCTACACTGCCTGCCTGTCCAGCAGATCCACCCTGTCTACACTACCTGCTTGTCCAGCAGGTCCACCCTGTCTACACTACCTACCTGCAAAGCAGATCCCACCCTGTCTACACTACCTGCTTGTCAAGCAGGTCCACCCTGTATACACTACCTGCCTTCCCAGCAGATCCACCCTGTCTACACTACCTGCCTGTCAAGCAGGTCCACCCTGTATACACTACCTGCCTTCCCAGCAGATCCACCCTGTCTACACTACCTGCCTGGCCAGTAGTTCCACGCTATCTCCACTACCTGCCTGTCCAGCAGACCCGCCCTGTATATACTACTTGCCTGTCCAGCAGATCCATTCTGTTCTCACTACCTGGCCTTTCAGCACTCGCCTGTCCAGCACCCTGCGTTCTAGCACTGACCCTGTCTATACTACCTGCCTGTCCAGCAGTTCTGTCCTGTCTACACTACCTGCCTGGCCAGTAGATCCACCCTATCTTCACTACATGCCTGGCCAGCAGATCCGCCCTGTCTACACTACCTACCTGATCAGATCTGCCCTGTCTACACAACCTACTTGTCCAGCAGATCCACCCTGTCTACACTACCTGCCTGCTCAGCAGATCCACCCTGTCTTTGCTACCTGCCTGGATAGTAGATCCATGCAATCTACACTACCGGCCTGGCCAGCAGATCCGCACTGTCTACACTACTTGCTCGCTATCTATACTACCTGCCTGGCCAGCAGATCCACCCTGTCTAAACTATCTGCCTGGCCATTATCTCCAGGTTATCTACACTACCTGCCTGGCCAGCAGACACACCCTGTCTATACTATCTGATTGTCCAGCAGATACACCCTATCTATACTACCTGCCTTGCCAGCAGATCCACGGTGTCTATACTACCTGCATGTCCAGCAGACCAACCTGTCTACACTACCTGCCTGCCCAGCAGGTCCGCCCTTTCTACACTACCTGCCTGCCCAGCAGATCCGCCCTGCCTACACTACCTGGCTGGCCAGTAGACCCACGCTATCTATACTACCTTCCTGTCCAGCAGATCCAACCTGTCTCACTACCTGCCTGCCCAGCAGGTCCGCCCTGTCTACACTAACTGCCTGCCCAGCAGATCCGCCCTGTCTATACTACCTGCCTGTCCAGCATATCCACCCTGTCTACACTACCTGCCTGTCCAACAGATCCGCCCTGTCTATACTACCTGCCTCTCCAGCAGATCCGCACTGTCAATAGTACCTACCTGTCCAGCAGGTCCATGCTGTCTACACTACCTGCCTGTCCAGCAGATCCGCCCTATCTACACTGCCTGCGTGACCTGCAGATCCGCCCTGTCTACACTACCTACCTGCCCAGCAGATCAGCCCTGTCTACACTACCTGCCTGGCCAGTAGATCCACGCTATGTACACTACATGCCTGGTCAGCAGATCCACCCTGTCTACAATACCTGCTTGTCCGAGCAGGAACCACCCTGTATACACTACCTGCCTGTCCAGCATGTCCGCCCTGTCTATACTACCTCCCTGGCCAGCAGATCCCCTATGGGTATACTCCCTGCCTGCCCAGCAGATCCATCTACACTACCTACATGGCCAGCAGATCCACCATGTCTACACCACCTACTTTTCCAGCAGATCCACACTGTCTACACTACCTGCCTGTCCAGCAGATCCACCCTGTCTACACTGCCTGCCTGGCCAGCATATCGACCCTGTCTACACTACCTGCTTTTCCAGTAGATCCGTCTTGTCTACAATACCTGCCTCTCCAGCACATCCACCCAGACTATACTACCTGCCTGTCCAGCAGATCCACCCTGTCTACACTACCTGCCTGTCAAGCAGATCCACCCTATCTACACTACCTGCCTGTCCAGCAGATCTGCCCTGTCTATACTACTTCCCAGTCCAGCAGATCGACCCTGTCTACACTACCGGCCTGGCCAGCAGATCCGCCCTGTCTATAATACCTGCCGCTCCAGCAGATCCACCCTGTCTATACCACCTACCTGTCCAGCAGATCCGCCCTGTCTACACTACCTGCCTGACCAGTATATCCACCCTGATCCACACTGCCTGCCTGGCCAGCAGGTCCGCCCTGTCTACACTACCTGCCTGGCCAGCAGATCCACCCTGTCTACACTACCTGCTTGTCCAGCAGGTCCACCCTTTCTACACTACCTGCCTGTCCAGCAGGTGCACTCTATCTACACTACCTGCCTGTCCAGAGATCTACCTGTATACACCACCTACCTGTCCAGCAGATCCACTCTGTCTATACTACCTGTCTATCCAGCAGATTTACCCTGTCTACACTACTTGCCTGGCCAGCCAAGATCCATGCTATCTACACTACCTGCCTGTCCAGCAGATCCACCCTGTCTATACTACCTGCATGTCAAGCAGATCCACCCTGTCTATACTACCTGTCTATCCAGCAGGTCCACCCTGTCTACACTACCTGCCTGAACAGCAGATCCACCCTGTCTACACTACCTCCCTGGCCAGTACATCCACGCTATCTACACTACCTGCCTGTCCAGCAGATCCACCCTGTCGATACTACCTGCCTGTCGAGCAGATCCACCCTGTCTATACTACCTGCCTATCCAGCAGGTCCACCTGTTCTCTATACTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTACCTGCCTGTCCAGCAGGTCCACCCTGTCTACACTACCTGCCTATCCAGCAGATCCACCCTGTCTACACTGGCTGCCTGACCAGCAGACCCGCCCTGTCTATACTACTTGCCTGTCCAGCAGATCCACCCTATCTACACTACCTGACTGGCCAGCAGATCTGCCCTGTCTACACTACCTGCCTGGCCGGTAGACCACGCTATCTACACTACCTGCCTGGCCAGCAGATCAACACTGTCTACACTACCTGCTTGTCCAGCAGGTCCTCCCTGGCTACACTACCTGCCTGTCCAGTAGATCCACCTTGTCTACACTACCTACCTGTCCAGCTGATCCACCCTGTCTATACTACCTGCCTATCCAGGAAATCTACTCTGTCTACACTACCTGCCTGTCCAGCAGATCCATGCTATCTACACTAGCTGCCTGTCCAGCAGATCCACCCTGTCTATACCACCTGCCTGTCGAGCAGATACACCCTGCCTACACTGCCTGCCTATCAAGCAGATCCACCCTATCTACACTACCTGCCTGCCCAGCAGATCCACCCAGTCTACACTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTACCTGTCTGGAAAGCAGATCCACCCTGTCTACAGTACCTGCTTTTCCAGCAGGTCCACCCTGTCTACACTACCTGCCCGCCCAGCAGATCCACCCTGTCTACACTACCTGCCTGGACAGTAGATCCACGTTATCTCCACTACCTGCCTGTCCAGCATATCCGCCCTATCGATACTACTTGATTGTCCAGCAGATCCACACTGTCTACACTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTGCCTGCCTGGCCAGCAGATCCACCCTGTCTACACTACCTGCTTGTCCAGCAGGTCCACCCTGTCTACACTACCTGCCCGTCCAGCAGGTCCACCCTATCTACACTACCTGACCGTCCAGCAGGTCCACCCTGTCTACACTACCTGCCCGTCCAGCAGATCCACCCTGTCTATACTACCTGCTTATCCAGCAGATCCACCCTGTCTATACTACCTGCCTGTCGAGCAGATCCACCCTGTCTATACTACCTGCCTATCCAGCAGATCCAACCTGTCTACACTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTACCTGCCTGGCCAGCAGATCCACCATGTCTACACTACCTGCTTGTCCAGCAGGTCCACCCTGTCCACACTACCTGCCTACCCAGCAGATCCACCCTGTCCACACTAGCTGCCTGACCAGCAGACCCGCCCTGTCTATACTACTTGCCTGTCCAGCAGATCCACACTGTCTCACACTACCTGCTCATTCCAGCAGATCCGCCCTGTCTACACTACCTGCCTGACCAGTAGATCCACCCTATCTACACTACCTGACTGGCCAGCAGATCCGCCCTGTCTACACTTCCTGCCTGGCCGGTAGATCCACGCTATCTACACTACCTGCCTGGCCAGCAGATCAACCCTGTCTACACTACCTGCTTGTCCAGCAGGTCCTCCCTGGCTACACTACCTGCCTGTCCAGCAGGGGCACCCTATGTACACTACCTGCCTGTCCAGTAGATCCACCCTGTCTACACTACCTACCTGTCCAGCAGATCCACCCTGTCTATACTACCTGCCTATCCAGGAAATCTACCCTGTCTACACTACCTGCCTGTCCAGCAGATCCATGCTATCTACACTAGCTGCCTGTCCAGCAGATCCACCCTGTCTATACGACCTGCCTGTCGAGCAGATAAACCCTGCCTACACTGCCTGCCTATCCAGCAGGTCCACCCTATCTACACTACCTGCCTGCCCAGCAGATCCACCCAGTCTACACTACCTGCCTGGCCAGTAGATCCACGTTATCTCCACTACCTCACTGTCCAGCATATCCGCCCTATCGATACTACTTGATTGTCCAGCAGATCCACACTGTCTACACTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTACCTGCCTGGCCAGCAGATCCACCCTGTCTACACTATCTGCTTGTTCAGCAGGTCCACCCTGTCTACACTACCTGCTCGTCCAGCAGGTCCACTCTATCAACACTACCTGCCCGTCCAGCAGATCCACCCTGTCTATACTACCTGCCTATCCAGCGTATCCACCCTGTCTATACTACCTGCCTATCCAGCAGAACTACCCTGTCTACAGTACCTGCCTGCCCAGCAGATCCACCCTGTCTATTCTACCTGCCTGGCCAGTAGATCCACGCTATCTACACTACGTGCCTGGCCAGCTGATCCACCCTGTCTACACTACCTGCTTGTCCAGCAGGTCCAACGTGTATACACTACGTGCCTGTCCAGCAGATCCGCCCTGTCTACACTACCTCCCTGGCCAGCAGATCCGCCCTGTCTATACTACCTGCCTGTCCAGCAGATCCACCCAGTCCATACTACCTGCCTGTCCAGCAGATCCACCCTGTCTACACTACCTCCCTGCCCAGCAGATCCACCGTTTCTCACACACCACCTACCTGCCCAGTAGATCCACGCTATCTAAACTATCTGCCTGCCCAGCAGATCCACCATGTCTACACTACCTGCCAGGCCATTAGCTCCACGCTATCTACACTACCTGCCTGGCCAGCAGATACACCCTGTCTCTATACTACCTGCCTGTCCAGCAGGCCACCCTGTCTACACTACCTGCCTGCCCAGCAGATCCGCCCTGTCTATACTACCTGCCTACACAGCAGATCCGCCCTGTCTACACAACCTGCTTGTCCAGCAGATCCACCCTGTCTACACTACCTGCCTGCTCAGCAGATACACCCTGTCTATACTACCTGCCTGGCCAGTAGATCCACGCAATCTACACTACCTGCCTGGCCAGCAGATCCGCACTGTCTACACTACTTGCTTGTCCAGCAGGTCCACCCTGTCTACACTACCTGCTTGCCCAGCAGATCCACCCTCTCTACGCTACCTGCTTGTCCAGGAGGTCCACCCTGTATACAATACCTGCCTTCCCAGCAGATCCACCCTGTCTACACTACCTGCCTGGGCAGTAGTTCCACGCTATCTCCCCTACCTGCCTGTCCAGCACACCCGCCCTGTCTACACTACCTGCTTGCCCAGCAGATCCACTCTGTCTACGCTACCTGCCTGTCCAGCAGATCCGCCCTGTCTATGCTACCTGCCTCTCCCACATCCACCCTGTCTATACGACCTGCCTGTCCAGCAGATCCCACCTGTCTACACTACCTGCCTGGCCAGTAGATCCACCCTATCTACACTACCTGCCTACCCAGCAGATCCACCCTGTCTATACTACCTGCCTGACCAGTAGATTCACGCTA 60.0 PASS IMPRECISE;SVTYPE=INS;SVLEN=8075;INSIDE_VNTR=TRUE;MAPQ=60.0;PHASESETID=1058418|1058418;HP=2|2;SUPP_READS=1:0:4:1:0:4;REF_READS=6:28:34:6:28:34 GT:VAF:hVAF:DR:DV 0/1:0.10:0.00,0.00,0.30:60:7 0/1:0.07:0.00,0.00,0.15:67:5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions