-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partial contigs with half-reduced coverage ccs reads is misassembled? No, by contrast, it has assembled very long and novel tandem repeat sequeces #50
Comments
Could you please zoom in the utg graph around the this 20-Mb region? I'd like to see how the subgraph looks like. Also, could you please show the following numbers at hifiasm log?
|
Based on the mapping of genetic markers, can you assign this 20Mb to other chromosomes? |
Thank you! Dr Li. Very strange, this 20 Mb region did not have any genetic markers. |
A few more things to try:
|
|
|
As someone was referring to this issue, I have reread the thread. I am seeing:
If this description is right, this is not a contig misassembly. You have an inbred diploid genome. One possibility is that this region is diverged between the two haplotypes although the rest of the genome is nearly homozygous. The solution is to remove the diverged copy from the primary assembly. By the way, when you scaffolded the contigs, have you discarded prefix.a_ctg.gfa? |
Maybe you are right, this repeat region with half coverage may be divergence rapidly between the two haplotypes. Yes, I only use prefix.p_ctg.gfa for further assembly. |
Hi chhylp123,
I have sequenced a diploid genome (repeat content >70%) with 25X coverage HiFi reads. Luckly, I got a wonderful contigs with N50 of 44 Mb by hifiasm 0.12.
Then I anchored contigs to chromosomes by Allmaps with ~1500 high quality genetic markers. Finally, I obtained 10 pesudomolecules. However, I found there was 20-MB region in chr7 which is not supported by genetic markers and synteny with other homologus species.


Furthermore, I mapped ccs reads to the final assembly, and I found that 20-Mb region with half-reduced coverage reads.

Meanwhile, I also mapped the RNA-seq reads to the genome, and no reads covered this region. So, I think this 20-Mb region maybe misassembled.
However, this 20-Mb region was located in a single contig (108 Mb) which were constructed by sereval utgs (the length of both terminal utgs (utg000064l and utg000017l) are 29 Mb and 47 Mb, separately ), and there is no obvious evidence support to break this contig.

Therefore, I am wondering whether there are other probabilities for this assembly? And have you ever met that some assembly regions covered by half depth reads before? May be high heterozygosity for 20-Mb?
Thanks!
Dong An
The text was updated successfully, but these errors were encountered: