Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corresponding ORF not called in different isolates despite identical sequence #68

Open
0xaf1f opened this issue Dec 28, 2024 · 0 comments
Labels
external Issue comes from a dependency or some external code.

Comments

@0xaf1f
Copy link

0xaf1f commented Dec 28, 2024

Analogous to hyattpd/Prodigal#115, here's the pyrodigal version of the report.

In doing an analysis of gene presence/absence, I had some false positives that I tracked down to Prodigal not consistently calling the ORF, even though the sequence was present and identical in the other strains. I'll show an example of two genomes here: Mycobacterium tuberculosis isolates N0157 and N1216.

Pyrodigal Output

I ran pyrodigal 3.6.3 (build py312h0fa9677_1) from bioconda as follows:

pyrodigal -i $genome.fasta -c -m -g 11 -p single -f gff

to match as much as possible the way that Prokka invokes it, although I suppose defaults should be fine since I'm using complete assemblies with no gaps. By the way, Prokka has Prodigal output in the simple coordinate output format (-f sco), which it appears pyrodigal doesn't support. That would preclude being able to just symlink pyrodigal as prodigal to use it there as a drop-in replacement for those of us that haven't yet migrated to Bakta.

N1216

Pyrodigal calls three consecutive ORFs: The first is Rv0348, the second is a new prediction, and the third is Rv0349.

1       pyrodigal_v3.6.3        CDS     421332  421985  70.5    +       0       ID=1_366;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.610;conf=100.00;score=70.47;cscore=48.90;sscore=21.57;rscore=14.24;uscore=1.95;tscore=4.12;
1       pyrodigal_v3.6.3        CDS     421940  422143  6.9     -       0       ID=1_367;partial=00;start_type=GTG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.627;conf=83.03;score=6.91;cscore=-0.31;sscore=7.22;rscore=9.20;uscore=0.73;tscore=-2.21;
1       pyrodigal_v3.6.3        CDS     422231  422647  11.2    +       0       ID=1_368;partial=00;start_type=ATG;rbs_motif=3Base/5BMM;rbs_spacer=13-15bp;gc_cont=0.624;conf=92.92;score=11.20;cscore=8.74;sscore=2.46;rscore=-2.24;uscore=0.57;tscore=4.12

N0157

Pyrodigal does not call any ORF between Rv0348 and Rv0349.

1       pyrodigal_v3.6.3        CDS     421451  422104  70.8    +       0       ID=1_370;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.609;conf=100.00;score=70.84;cscore=49.41;sscore=21.43;rscore=14.22;uscore=1.85;tscore=4.11;
1       pyrodigal_v3.6.3        CDS     422107  422766  21.5    +       0       ID=1_371;partial=00;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.629;conf=99.30;score=21.55;cscore=23.33;sscore=-1.78;rscore=-3.59;uscore=2.32;tscore=-1.76;

Sequence Check

The sequence of the 204bp middle ORF that was called in N1216 is identical to a corresponding sequence in N0157 that is also between Rv0348-9:

$ blastn -query N1216.fasta -query_loc 421940-422143 -subject N0157.fasta -outfmt 7
# BLASTN 2.15.0+
# Query: 1 4393757
# Database: User specified sequence set (Input: /grp/valafar/data/genomes/N0157.fasta)
# Fields: query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 1 hits found
1       1       100.000 204     0       0       421940  422143  422059  422262  1.47e-105       377
# BLAST processed 1 queries

Epilogue and Data

I haven't been able to test pyrodigal on all my data, but if the pattern continues as it did with prodigal, the ORF in question here is called in 49 genomes and not called in 72 genomes. I'm including the genome sequences for the two strains I examined above.

data.tar.gz

@althonos althonos added the external Issue comes from a dependency or some external code. label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external Issue comes from a dependency or some external code.
Projects
None yet
Development

No branches or pull requests

2 participants