You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In doing an analysis of gene presence/absence, I had some false positives that I tracked down to Prodigal not consistently calling the ORF, even though the sequence was present and identical in the other strains. I'll show an example of two genomes here: Mycobacterium tuberculosis isolates N0157 and N1216.
Pyrodigal Output
I ran pyrodigal 3.6.3 (build py312h0fa9677_1) from bioconda as follows:
to match as much as possible the way that Prokka invokes it, although I suppose defaults should be fine since I'm using complete assemblies with no gaps. By the way, Prokka has Prodigal output in the simple coordinate output format (-f sco), which it appears pyrodigal doesn't support. That would preclude being able to just symlink pyrodigal as prodigal to use it there as a drop-in replacement for those of us that haven't yet migrated to Bakta.
N1216
Pyrodigal calls three consecutive ORFs: The first is Rv0348, the second is a new prediction, and the third is Rv0349.
The sequence of the 204bp middle ORF that was called in N1216 is identical to a corresponding sequence in N0157 that is also between Rv0348-9:
$ blastn -query N1216.fasta -query_loc 421940-422143 -subject N0157.fasta -outfmt 7
# BLASTN 2.15.0+
# Query: 1 4393757
# Database: User specified sequence set (Input: /grp/valafar/data/genomes/N0157.fasta)
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
1 1 100.000 204 0 0 421940 422143 422059 422262 1.47e-105 377
# BLAST processed 1 queries
Epilogue and Data
I haven't been able to test pyrodigal on all my data, but if the pattern continues as it did with prodigal, the ORF in question here is called in 49 genomes and not called in 72 genomes. I'm including the genome sequences for the two strains I examined above.
Analogous to hyattpd/Prodigal#115, here's the pyrodigal version of the report.
In doing an analysis of gene presence/absence, I had some false positives that I tracked down to Prodigal not consistently calling the ORF, even though the sequence was present and identical in the other strains. I'll show an example of two genomes here: Mycobacterium tuberculosis isolates N0157 and N1216.
Pyrodigal Output
I ran pyrodigal 3.6.3 (build py312h0fa9677_1) from bioconda as follows:
to match as much as possible the way that Prokka invokes it, although I suppose defaults should be fine since I'm using complete assemblies with no gaps. By the way, Prokka has Prodigal output in the simple coordinate output format (
-f sco
), which it appears pyrodigal doesn't support. That would preclude being able to just symlinkpyrodigal
asprodigal
to use it there as a drop-in replacement for those of us that haven't yet migrated to Bakta.N1216
Pyrodigal calls three consecutive ORFs: The first is Rv0348, the second is a new prediction, and the third is Rv0349.
N0157
Pyrodigal does not call any ORF between Rv0348 and Rv0349.
Sequence Check
The sequence of the 204bp middle ORF that was called in N1216 is identical to a corresponding sequence in N0157 that is also between Rv0348-9:
Epilogue and Data
I haven't been able to test pyrodigal on all my data, but if the pattern continues as it did with prodigal, the ORF in question here is called in 49 genomes and not called in 72 genomes. I'm including the genome sequences for the two strains I examined above.
data.tar.gz
The text was updated successfully, but these errors were encountered: