Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiC mode reports larger haplotypes assembly sizes #96

Open
dabitz opened this issue Apr 11, 2021 · 9 comments
Open

HiC mode reports larger haplotypes assembly sizes #96

dabitz opened this issue Apr 11, 2021 · 9 comments

Comments

@dabitz
Copy link

dabitz commented Apr 11, 2021

Hi Hifiasm developers,

I've tried with the latest version of hifiasm r315 to assembly haplotype phased contigs using the HiC mode of heterozygous plant, but somehow the results looks strange. Both hap1 (51Mb) and hap2 (460Mb) files are much larger than the expected haploid genome size (400mb). I would expect some divergences between the two haplotypes since large (Mb-size) INDELs are expected between them both based on coverage profile of some regions, but the much higher assembly size in both cases are very strange. Also I found that in both hap1 and hap2 there are many duplicated contigs that are easily removed by purging haplotigs. Doing this I finished hap1 with 425Mb and hap2 with 400Mb. However, it seems that even after purging the haps, individually, the genome does not look nicely phased, since hifiasm seems to be duplicating INDEL regions that should not be present in one of the haplotypes... I wonder if the new version to be released soon could solve that or if you guys could recommend some adjustments to the basic command line settings...

Thanks,
André

@chhylp123
Copy link
Owner

I will update hifiasm today with a new Hi-C model. Please wait me a moment.

@dabitz
Copy link
Author

dabitz commented Apr 17, 2021

Thanks a lot for the release of the new version!

Before I run this new version, could please give me some recommendations for the commands to run in a highly heterozygous plant. I also expect to observe few large Mb Indels between haplotypes, so one haplotype can be slightly bigger than the other... Looking forward to test it!

Cheers,
André

@chhylp123
Copy link
Owner

Probably just run in default. If the result looks not good enough, please have a try with larger '--n-perturb', '--f-perturb'. Looking forward to your results.

@dabitz
Copy link
Author

dabitz commented Apr 23, 2021

Dear Hifiasm developers,

First of all thank you very much for your efforts on improving the software and HiC partition module.

I have now finished running the new version on my plant genome (Heterozygosity 1.5%, 420Mb) and the results are way much better than the previous versions.
Command was as follow:
hifiasm -t 40 -o Asm.hifiasm.phased --primary --n-perturb 75000 --f-perturb 0.15 --seed 11 -l3 --h1 omnic_reads_1.fq.gz --h2 omnic_reads_2.fq.gz CCS.fastq

Now hap1 (420Mb) and hap2 (400Mb) seems quite accurate and false duplications are gone. Coverage profile of diploid assembly is very good and mostly uniform. Former versions were giving either one hap much bigger than the other or inflating both haps by false duplications. Also with this new version increasing both n-perturb (75000) and f-perturb (0.15) helped balancing haps but did not change total assembly size, so I guess these options improved my haps a bit compared to the default.
cov

@chhylp123
Copy link
Owner

Thank you so much for testing. We just cut a new release which fixed several bugs and incorporated a new option '--n-weight'. I guess with higher '--n-weight', the results should be slightly better.

@dabitz
Copy link
Author

dabitz commented Apr 26, 2021

Great! I will give it a try!

@chhylp123
Copy link
Owner

Probably current github HEAD is better, at least it is faster...

@lh3
Copy link
Collaborator

lh3 commented Apr 26, 2021

hap1 (420Mb) and hap2 (400Mb)

Just curious: does your sample have sex chromosomes? Is it XY or XX? What is the estimated size of X and Y? I am trying to understand if hifiasm can separate X and Y.

@dabitz
Copy link
Author

dabitz commented Apr 26, 2021

No, they do not have sex chromosomes. It seems that the homologs are not 100% balanced, they have some regions that are not present in the other and vice-versa. I think this explains why selfing is deleterious in this case :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants