Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does hifiasm support the assembly of haploid genomes larger than 10G using HiC data? #752

Open
wjx121 opened this issue Jan 2, 2025 · 3 comments

Comments

@wjx121
Copy link

wjx121 commented Jan 2, 2025

Does hifiasm support the assembly of haploid genomes larger than 10G using HiC data? I have been using my own data to assemble a genome (~15G), and the genotyping process has taken a very long time (over a month) without any results. My server cluster has 2T of memory, and I believe this should not be a limitation caused by hardware conditions.

@chhylp123
Copy link
Owner

If it is a haploid genome, please just run it without HiC using --l0. If the coverage is not too high, 2TB RAM might work.

@HuiyangYu
Copy link

Does hifiasm support the assembly of haploid genomes larger than 10G using HiC data? I have been using my own data to assemble a genome (~15G), and the genotyping process has taken a very long time (over a month) without any results. My server cluster has 2T of memory, and I believe this should not be a limitation caused by hardware conditions.

For a 15 Gb genome, without the UL ONT reads, assembly can be completed in approximately two days using 50 threads. If your sequencing depth is excessively high, you may consider downsampling the HiFi data to 50-60X. Additionally, you may want to examine the software's log files for further insights.

@wjx121
Copy link
Author

wjx121 commented Jan 6, 2025

Thank you for sharing your experience, I have also tested the assembly of HiFi data without using HiC data, which took a total of two days (this is in line with your experience). But when I add HiC data to assemble it, it's just like my initial question. The following is the information at the end of the log file at that time.

[M::ha_ct_shrink::170062.66454.61] ==> counted 633026861 distinct minimizer k-mers
[M::ha_pt_gen::] counting in normal mode
[M::yak_count] collected 23893930611 minimizers
[M::ha_pt_gen::173273.287
54.39] ==> indexed 23889178613 positions, counted 633026861 distinct minimizer k-mers
[M::ha_print_ovlp_stat_0] # overlaps: 2919674196
[M::ha_print_ovlp_stat_0] # strong overlaps: 1097287130
[M::ha_print_ovlp_stat_0] # weak overlaps: 1822387066
[M::ha_print_ovlp_stat_0] # exact overlaps: 2650184321
[M::ha_print_ovlp_stat_0] # inexact overlaps: 269489875
[M::ha_print_ovlp_stat_0] # overlaps without large indels: 2899192894
[M::ha_print_ovlp_stat_0] # reverse overlaps: 816242957
[M::ha_print_ovlp_stat_0] # running time: 11443.071
[M::ha_assemble::184755.180*[email protected]] ==> found overlaps for the final round
[M::ha_opt_update_cov_min] updated max_n_chain to 345
Writing reads to disk...
Reads has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
Writing ma_hit_ts to disk...
ma_hit_ts has been written.
bin files have been written.
[M::gen_telo_end_t::] ==> # 5'-telomeres::0, # 3'-telomeres::2, # tot::54843938, motif::TTTAGGG, motif_len::7
ERROR6
ERROR6
ERROR6
ERROR6
[M::ug_ext_gfa::] # tips::1800
Writing raw unitig GFA to disk...
ERROR6
ERROR6
Writing processed unitig GFA to disk...
ERROR6
ERROR6
[M::purge_dups] homozygous read coverage threshold: 56
[M::purge_dups] purge duplication coverage threshold: 70
ERROR-purge
[M::mc_solve:: # edges: 20450]
[M::mc_solve_core_adv::2.853] ==> Partition
[M::adjust_utg_by_primary] primary contig coverage range: [47, infinity]
Writing MD3.hic.p_ctg.gfa to disk...
[M::ha_opt_update_cov] updated max_n_chain to 345
ERROR6
ERROR6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants