Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger size of genome assembly than what expected #102

Open
Karimi-81 opened this issue Apr 18, 2021 · 1 comment
Open

Larger size of genome assembly than what expected #102

Karimi-81 opened this issue Apr 18, 2021 · 1 comment

Comments

@Karimi-81
Copy link

Hi There,
I used the HiFi reads (~20x) to assemble a genome of an animal species. The output assembly is of high quality with N50 of 30 Mb. However, the genome size (2.6 G) is a bit larger than what we estimated using short reads (2.4 G). The former assembly of this species had also a size of 2.4 Gb, so I wonder if the size of my genome assembly is correct and if I require to do any additional step to correct that?
I also used the Hi-C data along with ccs reads to generate two separate haplotypes of assembly, but the results are strange and the size of each haplotypes is ~ 3.2 Gb. Do you have any idea what are the reason behind this?
Finally, despite having the high quality of assembly using hifi reads (N50 of 30 Mb), when I used the Hi-C data (using 3D-DNA) to achieve a chromosome-level assembly, the final output is more fragmented and the number of scaffolds increased while the size of them were shorter? Is this related to long reads technology? do you have any suggestion to scaffold assemblies generated by long reads?

@Karimi-81 Karimi-81 changed the title Larger size of genome assembly that what expected Larger size of genome assembly than what expected Apr 18, 2021
@lh3
Copy link
Collaborator

lh3 commented Apr 18, 2021

v0.15 was released yesterday. Try that. Its Hi-C module is much improved. As to the assembly size, hifiasm tends to be more accurate. The real genome size of a human female is 3.05Gb. Hifiasm gets this number but older assemblers often reach a size of 2.8-2.9Gb. Also, don't trust genomescope. It is not reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants