-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
order in the trio files impacts assembly results #81
Comments
Could you please rerun with v0.14 and see the results? v0.14 has updated trio mode so that the results should be better. Besides, the difference between two haplotypes is too large so I worry there is something wrong. Is your parental data reliable? |
With version 0.14 we do not see the same differences in genome sizes when inverting parental yak files -1 mat.yak -2 pat.yak -2 mat.yak -1 pat.yak Does this indicate that our parental data is reliable? |
Have you evaluated the phasing results with yak trioeval? I'm not pretty sure since paternal assembly is much larger than maternal assembly. Is this the feature of your sample? Usually if yak trioeval reports low hamming error rate/switch error rate, it should be right. |
I just ran trioeval. Here are beginning and end of the result file S h1tg000001l 7571 41463 5624 1946 1946 39517 S h1tg005546l 56 2 54 1 1 1 How do I interpret these figures? |
is the the switch error rate.
is the hamming error rate. The hamming error rate is too high. How do you get the parental data? |
From short reads. What is the expected range and the possible meanings of this value? |
Just one more thing. |
The phasing error rate is fairly high. Reiterating Haoyu's question:
Are you sure the parental data are correct? |
From parental short reads. |
Sorry for the late reply. It seems to be not right. What's the size of primary assembly generated by hifiasm? Usually it won't be such high hamming error rate. |
Here are the metrics of the primary assembly |
I have no idea. But I still think it is more likely that the parental data has some issues. Based on the primary assembly size of your sample, the size of each haplotype should be around 3.2-3.3Gb. However, both the hamming error rate and the haplotype-resolved genome size indicate that the trio-binning phasing failed. |
Wait.... Could you please rerun hifiasm with current github HEAD? It fixed a relatively serious bug in trio-binning mode. It might be helpful to get two haplotypes with similar size but probably still cannot fix the high hamming error rate issue. |
We tested both orders for parental yak files for a 4Gb genome and got different results (hifiasm-0.12)
-1 mat.yak -2 pat.yak
3,3G 19 févr. 10:23 hifiasm_h1_h2.hap1.p_ctg.gfa
4,2G 19 févr. 10:25 hifiasm_h1_h2.hap2.p_ctg.gfa
-2 mat.yak -1 pat.yak
3,7G 6 mars 17:02 hifiasm_h2_h1.hap1.p_ctg.gfa
3,0G 6 mars 17:03 hifiasm_h2_h1.hap2.p_ctg.gfa
The total genome size is different as well a the haplotyped assembly sizes.
We expected to have the same inverted results.
How to chose the best order?
The text was updated successfully, but these errors were encountered: