Skip to content

Error in get_snps_frquncies_coverage function during hapcorrect module #45

@pabloangulo7

Description

@pabloangulo7

Hi,

Thanks for the tool. I'm trying to run the hapcorrect module for my sample by using as input the bam file, reference genome fasta and the phased SNP vcf file in tumor only-mode, with this command:

python ${wakhan} hapcorrect --threads ${threads} --reference ${ref_genome_fasta} --target-bam ${bam_file} --tumor-phased-vcf ${snp_phased_vcf} --genome-name ${sample} --out-dir-plots ${HAP_correction}

It seems that it works well for all chromosomes except for chr14 and chr15 where I encounter this error. Which could be the reason?

[2025-11-16 23:56:53] INFO: Starting Wakhan 0.2.0
[2025-11-16 23:56:53] INFO: Cmd: /home/pangulo/Software/Wakhan/wakhan.py hapcorrect --threads 20 --reference /Test/GRCh38.primary_assembly.genome.fa --target-bam /Test/Sample.bam --tumor-phased-vcf /Test/Sample_snp_phased.vcf.gz --genome-name Sample --out-dir-plots /projects/CGS_shared/pangulo/BC/DNA/analysis/SNP/Hap_correction2/
[2025-11-16 23:56:53] INFO: Python version: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38) 
[GCC 7.3.0]
[2025-11-16 23:56:55] INFO: Starting hapcorrect() module...
[2025-11-16 23:56:55] INFO: Parsing reads from /Test/Sample.bam
[2025-11-17 00:07:56] INFO: Parsed 40225099 segments
[2025-11-17 00:07:56] INFO: Computing coverage histogram
[2025-11-17 00:14:20] INFO: Writing tumor coverage for bins
[2025-11-17 00:14:20] INFO: Parsing phaseblocks information
[2025-11-17 00:14:20] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
[2025-11-17 00:14:36] INFO: Computing coverage for bins
[2025-11-17 00:14:37] INFO: bcftools -> Query for het SNPs and creating a /Test/Sample_snp_phased.vcf_het_snps.csv CSV file
[2025-11-17 00:14:44] INFO: SNPs frequency -> CSV to dataframe conversion for heterozygous SNPs
[2025-11-17 00:14:49] INFO: SNPs frequency -> Computing SNPs frequency from tumor BAM
[2025-11-17 00:25:16] INFO: SNPs frequency -> Computing ACGTs frequencies for heterozygous SNPs
[2025-11-17 00:26:52] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
/home/pangulo/Software/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.
  dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-11-17 00:27:15] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr1
[2025-11-17 00:28:33] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr2
[2025-11-17 00:29:12] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr3
[2025-11-17 00:29:38] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr4
[2025-11-17 00:30:44] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr5
[2025-11-17 00:31:30] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr6
[2025-11-17 00:31:53] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr7
[2025-11-17 00:32:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr8
[2025-11-17 00:33:01] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr9
[2025-11-17 00:33:20] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr10
[2025-11-17 00:33:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr11
[2025-11-17 00:34:03] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr12
[2025-11-17 00:34:32] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr13
[2025-11-17 00:34:41] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr14
Traceback (most recent call last):
  File "/home/pangulo/Software/Wakhan/wakhan.py", line 28, in <module>
    main()
  File "/home/pangulo/Software/Wakhan/wakhan.py", line 24, in main
    sys.exit(main())
  File "/home/pangulo/Software/Wakhan/src/main.py", line 353, in main
    main_process(args)  # hapcorrect()
  File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/main_hapcorrect.py", line 234, in main_process
    get_snps_frquncies_coverage(df_snps_in_csv_loh, chrom, ref_start_values, args.bin_size_snps,
  File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/process_vcf.py", line 155, in get_snps_frquncies_coverage
    snps_df_vaf = [eval(i) for i in snps_df.vaf.str.split(',').str[0].values.tolist()]
  File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/process_vcf.py", line 155, in <listcomp>
    snps_df_vaf = [eval(i) for i in snps_df.vaf.str.split(',').str[0].values.tolist()]
TypeError: eval() arg 1 must be a string, bytes or code object 

Thank you very much in advanced,
Pablo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions