Skip to content

Documentation / software bug for building custom database #11

@taltman

Description

@taltman

Hi LMAT team!

I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.

One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:

 The mapping is specified as a tab delimited file with the first column containing the tax id and the second
 column should contain the header associated with sequence stored in the input fasta file (WORK/test.fa below)
 For example:
 418127   >ref|NC_009782.1|gnl|NCBI_GENOMES|21340|gi|156978331|Staphylococcus aureus subsp. aureus Mu3, complete genome

When I provide my constructed GenomeToTaxID.txt file to build_header_table.py, it breaks:

reading: /media/ephemeral/taltman/lmat/GenomeToTaxID.txt
Traceback (most recent call last):
  File "./build_header_table.py", line 44, in <module>
    gi_to_tid[t[4]] = t[0]
IndexError: list index out of range

Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing t[4] to t[1] seems to fix it.

So, either there is a documentation bug, or there is a software bug.

Any feedback would be greatly appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions