Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly result has some 0-depth point of TGS(hifi) data #757

Open
WangZhSi opened this issue Jan 13, 2025 · 4 comments
Open

Assembly result has some 0-depth point of TGS(hifi) data #757

WangZhSi opened this issue Jan 13, 2025 · 4 comments

Comments

@WangZhSi
Copy link

Hi:

I'm testing difference between assembly using 0.24.0 and 0.19.9 version, and I find some 0-depth assembly point in result of 0.24.0 after mapping TGS data back to p_ctg.fa. Here are the details:

  • Software version: 0.24.0
  • Data: SRR26555721(watermelon, hifi)
  • Assembly parameters: default parameter: hifiasm -o out_prefix -t 32 SRR26555721.fq.gz
  • Mapping parameters: minimap2 -x map-hifi -x asm20 --MD -a -t 24 asm0.24.0.bp.p_ctg.fa SRR26555721.fq.gz > out.sam; and then convert out.sam into sort.bam

Visualize bam using igv, here is the result:
image

It did not happen in version 0.19.9:
image

I've check this 0-depth area of 0.24.0 and found this part is repetitive unit of telomere(CCCTAAA).

Do you have any idea how this happened? Do you have any suggestion?

Thanks!

@chhylp123
Copy link
Owner

Do you think it is a misassembly?

@WangZhSi
Copy link
Author

@chhylp123 Thank you for your response!

I conducted further checks on ctg0001, and here is the feedback:

Assembly size:

  • v0.24.0: 35,325,984 bp
  • v0.19.9: 35,015,304 bp

It can be observed that the v0.24.0 assembly is approximately 300 kb larger. Initially, I thought this difference might be caused by low-depth regions, but after checking the number of telomeric repeat units (CCCTAAA), I found:

  • v0.24.0: 2,684 times
  • v0.19.9: 1,529 times

This difference does not fully explain the 300 kb size increase. Subsequently, I used mummer4 to compare ctg0001 between the two versions, and the results are as follows:

Image

It appears that the assembly results are largely consistent between the two versions. However, I am still uncertain how to interpret the uncovered regions(0-depth regions) and even wonder if it could be an issue with alignment.

Do you have any suggestions?

@chhylp123
Copy link
Owner

Could you check which reads hifiasm is using for this region by looking at the corresponding A-lines?

@WangZhSi
Copy link
Author

@chhylp123 Do you means A-lines in .gfa file? Sorry I'm not very familiar with this file.

For 0.24.0 assembly, the used reads is SRR26555721.2353299
Image
But in mapping paf file, this reads align to ctg0006
Image

I also check this in assembly from 0.19.9, the reads used for ctg0001 started region mapped back correctly.

Thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants