Documentation / software bug for building custom database

Hi LMAT team!

I have been trying to follow the documentation in lmat-doc.txt to build a custom database for use with LMAT. I've been having issues doing so. I'll try to document a specific case here.

One step in the process of building a custom database is constructing a mapping file between NCBI Taxonomy Database identifiers and the full deflines from the multi-FASTA formatted file containing the reference sequences. I've followed the documentation below in that regard:

```Individual genome sequences must be mapped to a taxonomy identifier
 The mapping is specified as a tab delimited file with the first column containing the tax id and the second
 column should contain the header associated with sequence stored in the input fasta file (WORK/test.fa below)
 For example:
 418127   >ref|NC_009782.1|gnl|NCBI_GENOMES|21340|gi|156978331|Staphylococcus aureus subsp. aureus Mu3, complete genome
```

When I provide my constructed GenomeToTaxID.txt file to `build_header_table.py`, it breaks:

```
reading: /media/ephemeral/taltman/lmat/GenomeToTaxID.txt
Traceback (most recent call last):
  File "./build_header_table.py", line 44, in <module>
    gi_to_tid[t[4]] = t[0]
IndexError: list index out of range
```

Poking into the Python script, it seems to be expecting a file with at least five columns, not two. Changing `t[4]` to `t[1]` seems to fix it.

So, either there is a documentation bug, or there is a software bug.

Any feedback would be greatly appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation / software bug for building custom database #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documentation / software bug for building custom database #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions