Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for dataset to test --ont functionality #786

Open
SaimMomin12 opened this issue Feb 26, 2025 · 7 comments
Open

Request for dataset to test --ont functionality #786

SaimMomin12 opened this issue Feb 26, 2025 · 7 comments

Comments

@SaimMomin12
Copy link

Hi, I would like to test the new functionality of --ont and subsequently integrate it into Galaxy for our users.

Assemble genomes with ONT R10 reads rather than PacBio HiFi reads using the latest release of hifiasm (>0.21.0-r686)
hifiasm -o HG002.asm --ont -t32 HG002-ont.fq.gz

Could you please point me to a dataset that can be used to test this option?

Thank you!

@chhylp123
Copy link
Owner

Please have a try for 0.24.0 currently and we will also release a new version soon. The latest version will often give you better results.

You can find three R10 datasets (HG002_R10, HG00733_R10, and HG02723_R10) at: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/Napu_paper_ONT_Coriell_SingleFC_2023/. Data come from https://github.com/nanoporegenomics/napu_wf.

@SaimMomin12
Copy link
Author

@chhylp123 Thanks for the super prompt reply and for sharing the link!

I looked into the HG002_R10 directory but couldn't find the FASTQ files required for running hifiasm --ont mode. Am I missing something, or are the FASTQ files located elsewhere?

Thanks in advance for your help!

@chhylp123
Copy link
Owner

The reads are in bam format. For example, HG002 reads can be found: /HG002_R10/reads/GM24385_R10_638.bam. You will need samtools fastq to convert it into fastq format for hifiasm.

@SaimMomin12
Copy link
Author

Thanks for the clarification. However, the dataset seems quite large in terms of storage.

Would it be possible to provide a smaller toy dataset for testing the --ont functionality? That would help in quickly validating and further adding it to Galaxy.

@chhylp123
Copy link
Owner

You can align HG002 reads back to grch38. And then extract reads within that region using samtools. This subset of reads will work for hifiasm (ONT).

@chhylp123
Copy link
Owner

Sorry I forget to mention the coordinate. We often use chr11:10M-20M for test.

@SaimMomin12
Copy link
Author

Thanks for the suggestion, I will try out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants