Question regarding subsampled reads and uneven coverage

Not sure if this is appropriate to ask for here, but here goes!
I am trying to create a small sample read set from real sequence data (Klebsiella pneumoniae genomes) that I want to use for testing a pipeline I have made that contains Unicycler, among other tools. With these read sets I want to test the pipeline's ability to run, and not necessarily use it as a parameter testing dataset. Therefore, the resulting assembly doesn't have to "make sense". I have both Illumina and Nanopore reads, and I have currently been using Rasusa to generate subsets of these. I have been trying different coverage and genome sizes, but I am encountering uneven coverage errors in the SPAdes assembly step within Unicycler. I managed to make it work when I set the coverage to 30x and genome size to 1Mbp in Rasusa, but then the assembly takes too long time (this data will also be used by github actions, so need to be relatively fast). If I subset the reads with lower values, the uneven coverage error pops up.

I was wondering if you had any suggestions on how to solve this? I have been looking into different solutions for generating synthetic reads, but none of these have worked so far. Any help with this is greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding subsampled reads and uneven coverage #349

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question regarding subsampled reads and uneven coverage #349

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions