Not sure if this is appropriate to ask for here, but here goes!
I am trying to create a small sample read set from real sequence data (Klebsiella pneumoniae genomes) that I want to use for testing a pipeline I have made that contains Unicycler, among other tools. With these read sets I want to test the pipeline's ability to run, and not necessarily use it as a parameter testing dataset. Therefore, the resulting assembly doesn't have to "make sense". I have both Illumina and Nanopore reads, and I have currently been using Rasusa to generate subsets of these. I have been trying different coverage and genome sizes, but I am encountering uneven coverage errors in the SPAdes assembly step within Unicycler. I managed to make it work when I set the coverage to 30x and genome size to 1Mbp in Rasusa, but then the assembly takes too long time (this data will also be used by github actions, so need to be relatively fast). If I subset the reads with lower values, the uneven coverage error pops up.
I was wondering if you had any suggestions on how to solve this? I have been looking into different solutions for generating synthetic reads, but none of these have worked so far. Any help with this is greatly appreciated!
Not sure if this is appropriate to ask for here, but here goes!
I am trying to create a small sample read set from real sequence data (Klebsiella pneumoniae genomes) that I want to use for testing a pipeline I have made that contains Unicycler, among other tools. With these read sets I want to test the pipeline's ability to run, and not necessarily use it as a parameter testing dataset. Therefore, the resulting assembly doesn't have to "make sense". I have both Illumina and Nanopore reads, and I have currently been using Rasusa to generate subsets of these. I have been trying different coverage and genome sizes, but I am encountering uneven coverage errors in the SPAdes assembly step within Unicycler. I managed to make it work when I set the coverage to 30x and genome size to 1Mbp in Rasusa, but then the assembly takes too long time (this data will also be used by github actions, so need to be relatively fast). If I subset the reads with lower values, the uneven coverage error pops up.
I was wondering if you had any suggestions on how to solve this? I have been looking into different solutions for generating synthetic reads, but none of these have worked so far. Any help with this is greatly appreciated!