Hi,
Thanks for the tool. I'm trying to run the hapcorrect module for my sample by using as input the bam file, reference genome fasta and the phased SNP vcf file in tumor only-mode, with this command:
python ${wakhan} hapcorrect --threads ${threads} --reference ${ref_genome_fasta} --target-bam ${bam_file} --tumor-phased-vcf ${snp_phased_vcf} --genome-name ${sample} --out-dir-plots ${HAP_correction}
It seems that it works well for all chromosomes except for chr14 and chr15 where I encounter this error. Which could be the reason?
[2025-11-16 23:56:53] INFO: Starting Wakhan 0.2.0
[2025-11-16 23:56:53] INFO: Cmd: /home/pangulo/Software/Wakhan/wakhan.py hapcorrect --threads 20 --reference /Test/GRCh38.primary_assembly.genome.fa --target-bam /Test/Sample.bam --tumor-phased-vcf /Test/Sample_snp_phased.vcf.gz --genome-name Sample --out-dir-plots /projects/CGS_shared/pangulo/BC/DNA/analysis/SNP/Hap_correction2/
[2025-11-16 23:56:53] INFO: Python version: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:38)
[GCC 7.3.0]
[2025-11-16 23:56:55] INFO: Starting hapcorrect() module...
[2025-11-16 23:56:55] INFO: Parsing reads from /Test/Sample.bam
[2025-11-17 00:07:56] INFO: Parsed 40225099 segments
[2025-11-17 00:07:56] INFO: Computing coverage histogram
[2025-11-17 00:14:20] INFO: Writing tumor coverage for bins
[2025-11-17 00:14:20] INFO: Parsing phaseblocks information
[2025-11-17 00:14:20] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
[2025-11-17 00:14:36] INFO: Computing coverage for bins
[2025-11-17 00:14:37] INFO: bcftools -> Query for het SNPs and creating a /Test/Sample_snp_phased.vcf_het_snps.csv CSV file
[2025-11-17 00:14:44] INFO: SNPs frequency -> CSV to dataframe conversion for heterozygous SNPs
[2025-11-17 00:14:49] INFO: SNPs frequency -> Computing SNPs frequency from tumor BAM
[2025-11-17 00:25:16] INFO: SNPs frequency -> Computing ACGTs frequencies for heterozygous SNPs
[2025-11-17 00:26:52] INFO: bcftools -> Query for phasesets and GT, DP, VAF feilds by creating a CSV file
/home/pangulo/Software/Wakhan/src/hapcorrect/src/utils.py:62: DtypeWarning: Columns (5) have mixed types. Specify dtype option on import or set low_memory=False.
dataframe = pd.read_csv(path, sep=sept, names=names)
[2025-11-17 00:27:15] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr1
[2025-11-17 00:28:33] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr2
[2025-11-17 00:29:12] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr3
[2025-11-17 00:29:38] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr4
[2025-11-17 00:30:44] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr5
[2025-11-17 00:31:30] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr6
[2025-11-17 00:31:53] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr7
[2025-11-17 00:32:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr8
[2025-11-17 00:33:01] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr9
[2025-11-17 00:33:20] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr10
[2025-11-17 00:33:37] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr11
[2025-11-17 00:34:03] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr12
[2025-11-17 00:34:32] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr13
[2025-11-17 00:34:41] INFO: Loading coverage (bins) and coverage (phaseblocks) datasets for chr14
Traceback (most recent call last):
File "/home/pangulo/Software/Wakhan/wakhan.py", line 28, in <module>
main()
File "/home/pangulo/Software/Wakhan/wakhan.py", line 24, in main
sys.exit(main())
File "/home/pangulo/Software/Wakhan/src/main.py", line 353, in main
main_process(args) # hapcorrect()
File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/main_hapcorrect.py", line 234, in main_process
get_snps_frquncies_coverage(df_snps_in_csv_loh, chrom, ref_start_values, args.bin_size_snps,
File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/process_vcf.py", line 155, in get_snps_frquncies_coverage
snps_df_vaf = [eval(i) for i in snps_df.vaf.str.split(',').str[0].values.tolist()]
File "/home/pangulo/Software/Wakhan/src/hapcorrect/src/process_vcf.py", line 155, in <listcomp>
snps_df_vaf = [eval(i) for i in snps_df.vaf.str.split(',').str[0].values.tolist()]
TypeError: eval() arg 1 must be a string, bytes or code object
Thank you very much in advanced,
Pablo
Hi,
Thanks for the tool. I'm trying to run the hapcorrect module for my sample by using as input the bam file, reference genome fasta and the phased SNP vcf file in tumor only-mode, with this command:
python ${wakhan} hapcorrect --threads ${threads} --reference ${ref_genome_fasta} --target-bam ${bam_file} --tumor-phased-vcf ${snp_phased_vcf} --genome-name ${sample} --out-dir-plots ${HAP_correction}It seems that it works well for all chromosomes except for chr14 and chr15 where I encounter this error. Which could be the reason?
Thank you very much in advanced,
Pablo