Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation error with --ont and --ul #777

Open
VanessaUg opened this issue Feb 11, 2025 · 7 comments
Open

Segmentation error with --ont and --ul #777

VanessaUg opened this issue Feb 11, 2025 · 7 comments

Comments

@VanessaUg
Copy link

Hello,
I'm trying to run hifiasm v0.23.0 on multiple species, using --ont and --ul, with expected genome sizes ranging from 350Mb to1.3Gb. While the script completed successfully for one of my species, resulting in a t2t assembly, which had not been possible before, I encountered segmentation faults during the final steps for the others. For --ont, I use ONT reads ranging 15-40kb and for --ul I use ONT data filtered for >40kb reads. Here is the error message:

Writing Ca.hifiasm_5.2.bp.hap1.p_ctg.gfa to disk...
Writing Ca.hifiasm_5.2.bp.hap2.p_ctg.gfa to disk...
Inconsistency threshold for low-quality regions in BED files: 70%
/mnt/cluster/gridEngine/default/spool/aran/job_scripts/15733: line 30: 719505 Segmentation fault      (core dumped) /mnt/bin/hifiasm/hifiasm-v0.23.0/hifiasm -t $NSLOTS --ont -o Ca.hifiasm_5.2 --dual-scaf -l 3 --telo-m TTTAGGG ${path1} --ul ${path2} --ul-cut 40000```

The same error occurs using hifiasm v0.24.0.
I investigated the error further and found that the error is not caused by a specific read length used in --ul, as it did not occur when only longer reads (e.g. only >120kb reads), or only shorter reads (e.g. 40-120kb reads) are used for --ul within the same species with the same settings. The issue only occurs if I combine them.
Since the maximum number of sequences accepted for --ul before encountering a segmentation fault varies by species (e.g. species 1 completed successfully using 62,381 reads as --ul input, while species 2 failed using only 49992 reads as input), the error also seems to be influenced by the data distribution rather than a fixed sequence count number.
Therefore I investigated the read length distribution of the input used for --ul.

For the species that did not trigger the error, the read length composition of the data used as input for --ul looked as follows:

Image

For a species which did trigger the segmentation fault, the read length composition of the input for --ul looked as follows:

Image

I also tried using only 80% of the data for --ul, to have a lower coverage, as shown in the table below. However, I still encountered the same error.

Run num_seqs sum_len min_len avg_len max_len Q1 Q2 Q3 sum_gap N50 N50_num Q20(%) Q30(%) AvgQual GC(%)
Successful run 710,303 43,957,543,071 40,000 61,885.6 603,649 46,177 54,882 69,913 0 61,487 68,441 92.72 83.61 21.26 36.73
Segmentation error 664,351 42,753,693,181 40,000 64,354.1 500,826 47,281 57,409 74,329 0 65,258 67,037 88.71 76.57 18.98 36.45

I was wondering what could cause this issue.
Thank you very much for your help!

@chhylp123
Copy link
Owner

Hi @VanessaUg, thanks for letting me know. Could you please share the entire log file? It seems that hifiasm crashed after nearly completing everything, which I haven’t encountered before.

For each sample, the read file doesn’t seem too large. Would you be able to share one of the read files with me for debugging? That would help me quickly identify the issue. I can provide Globus endpoints if that works for you.

I’m also curious—have you tried assembling all reads directly using the --ont option without --ul?

@VanessaUg
Copy link
Author

VanessaUg commented Feb 11, 2025

Thank you @chhylp123 ! Yes, I have tried also assembling with only --ont with all reads, which run successfully.
I can share the read files. The Globus enpoints could work.

This is the log file for a failed --ont --ul run:
cs_ont_ul.txt

And the log file for a --ont run:
cs_ont.txt

@chhylp123
Copy link
Owner

Can you use Globus? Globus is easier and faster.

@VanessaUg
Copy link
Author

@chhylp123 Yes, Globus will work. Could you please send me your Globus endpoints?

@chhylp123
Copy link
Owner

Which email are you using? I could add your email to our endpoints.

@VanessaUg
Copy link
Author

Could I send you the details privately?

@VanessaUg
Copy link
Author

I have send you our Globus ID via email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants