Skip to content

Commit

Permalink
Document output formats in Sphinx documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
althonos committed Oct 22, 2022
1 parent cbfbc64 commit 751c404
Show file tree
Hide file tree
Showing 2 changed files with 106 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ Library
:maxdepth: 2

Installation <install>
Output Formats <outputs>
Contributing <contributing>
Publications <publications>
Benchmarks <benchmarks>
Expand Down
105 changes: 105 additions & 0 deletions docs/outputs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
Output Formats
==============

.. currentmodule:: pyrodigal

Pyrodigal was created among other things to skip parsing the results
from Prodigal while building a larger pipeline. Therefore, it's advised
to manipulate the genes predicted by Pyrodigal through the object layer
instead of writing them to a file only to parse it later.

Nevertheless, Pyrodigal provides convenience methods to write the genes
from a `~pyrodigal.Genes` collection to a variety of formats, as found in the
Prodigal command line. Note that all these functions need to know the
name of the sequence they were obtained from.

.. note::

The methods shown before are implemented using only the Python
interface, so it should be possible to replicate them using the
attributes of the individual `Gene` objects.


Genes
-----

Use `Genes.write_genes` to write the nucleotide sequences of all the genes
found by Pyrodigal to a file or a file-like object:

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("genes.fna", "w") as dst:
genes.write_genes(dst, sequence_id="seqXYZ")
It's also possible to change the width of the FASTA columns,
if needed (use `math.inf` as the width to write 2-line FASTA):

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("genes.fna", "w") as dst:
genes.write_genes(dst, sequence_id="seqXYZ", width=80)
Translations
------------

Use `Genes.write_translations` to write the protein sequences of all the
genes found by Pyrodigal to a file or file-object:

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("proteins.faa", "w") as dst:
genes.write_translations(dst, sequence_id="seqXYZ")
It's also possible to change the translation table and to control the
width of the FASTA columns, if needed:

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("proteins.faa", "w") as dst:
genes.write_translations(dst, sequence_id="seqXYZ", width=80, translation_table=11)
GFF
---

Use `Genes.write_gff` to write the genes to a file or file-like object in GFF
format:

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("genes.gff", "w") as dst:
genes.write_gff(dst, sequence_id="seqXYZ")
Use ``header=False`` to skip writing the GFF3 header. This can be useful when
writing genes from different sequences to the same result file:

.. code:: python
with open("genes.gff", "w") as dst:
for i, record in enumerate(Bio.SeqIO.parse("contig.fna")):
genes = orf_finder.find_genes(str(record.seq))
genes.write_gff(dst, sequence_id=record.id, header=(i == 0))
Scores
------

Use `Genes.write_scores` to write the scores of all potential genes in a format
proprietary to Prodigal:

.. code:: python
genes = orf_finder.find_genes(sequence)
with open("scores.txt", "w") as dst:
genes.write_scores(dst, sequence_id="seqXYZ")

0 comments on commit 751c404

Please sign in to comment.