GitHub - downingtim/LSDV_hybrid_assembly: We used a combination of long ONT reads and short Illumina reads to create a new reference genome sequence for the Oman 2009 sample.

Construction & analysis of the LSDV Oman 2009 hybrid genome assembly.

We used a combination of long ONT reads and short Illumina reads to create a new reference genome sequence for the Oman 2009 sample. The optimised genome produced de novo was a single contig spanning 151,091 bp with a GC content of 25.9%. This came from Flye v2.9.1-b1784 (Kolmogorov et al 2019) output followed by polishing with Medaka v2.0.1 (https://github.com/nanoporetech/medaka) that made 15 changes, Pilon v1.24 (Walker et al 2014) that fixed two SNPs, and correction with Polypolish v0.6.0 (Bouras et al 2024) that fixed nine indel and two substitution errors. In total, the Illumina data only affected 28 bp (0.019%) of the 151,091 bp genome. Alternative approaches produced diffuse assemblies with large numbers of contigs and inaccurate total assembly length.

--Technical details--

Sequencing data:

The Illumina FASTQ files are in FASTQ_ILLUMINA The ONT FASTQ file links and merged file are in FASTQ_ONT

QC:

The script "1_runqc.illumina.sh" was used to run Fastqc, Fastp, and fastq_quality_trimmer on Illumina data. The script "1_runqc.ont.sh" was used to run Fastqc, Fastp, and nanoq on ONT data.

Viral read extraction:

The scripts "3_kraken.illumina.sh" and "4_kraken.ont.sh" were used to get the Kraken2 taxonomic assignment and report files. The scripts "5_kraken.illumina.extract.sh" and "6_kraken.ont.extract.sh" were used to extract the viral reads from the above Kraken2 analyses.

Read counts:

The script "7_get_read_stats.py" was used to measure read number and rates.

Assembly & optimisation:

Flye v2.9.1-b1784 using a subset of the longest reads for initial disjointig assembly, 10 polishing iterations, an expected genome size of 151 Kb, and high-quality ONT reads. See "8_flye.sh". SPAdes v3.13.1 was run in careful mode using a range of input: the ONT reads, the Illumina reads, and the Flye assembly. See "9_spades.sh". Medaka was run with comamnds in 10_medaka.sh. Polypolish and pypolca were run with commands in 11_polypolish_pypolca.sh. Quast was used to assess assembly quality with 12_quast.sh. Pilon: The script "13_run_pilon.sh" implemented the Pilon iterations. GC cotent plot: The script "14_gc.R" was used to visualise the GC content in R (Figure S1).

Read mapping to the reference & visualisation:

The scripts "15_run.sh" and "16_run_ont.sh" were used to map Illumin and ONT reads (respectively) to the Oman FASTA reference. Variations on these were used for mapping to the alternative FASTA versions. The script "17_cov.R" was used to visualise read depth and coverage levels. The script "18_plot_cov_combined.R" was used to make Figure 1.

Notes: These scripts are GAIT-assisted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
10_medaka.sh		10_medaka.sh
11_polypolish_pypolca.sh		11_polypolish_pypolca.sh
12_quast.sh		12_quast.sh
13_run_pilon.sh		13_run_pilon.sh
14_gc.R		14_gc.R
15_run.sh		15_run.sh
16_run_ont.sh		16_run_ont.sh
17_cov.R		17_cov.R
18_plot_cov_genome.R		18_plot_cov_genome.R
1_runqc.illumina.sh		1_runqc.illumina.sh
2_runqc.ont.sh		2_runqc.ont.sh
3_kraken.illumina.sh		3_kraken.illumina.sh
4_kraken.ont.sh		4_kraken.ont.sh
5_kraken.illumina.extract.sh		5_kraken.illumina.extract.sh
6_kraken.ont.extract.sh		6_kraken.ont.extract.sh
7_get_read_stats.py		7_get_read_stats.py
8_flye.sh		8_flye.sh
9_spades.sh		9_spades.sh
Figure_1_depth.pdf		Figure_1_depth.pdf
Figure_S1_GC.pdf		Figure_S1_GC.pdf
Oman.fasta		Oman.fasta
Oman.fasta.fai		Oman.fasta.fai
Oman.gb		Oman.gb
Oman.gff		Oman.gff
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages