Skip to content

Enrich backbone tree with 16S #15

@Ge94

Description

@Ge94

Hello,

I am trying to reproduce an approach similar to what GreenGenes2 performed through uDance. I believe I am missing a step, or I haven't understood the paper correctly, could you kindly guide me through this?

I generated a bacterial phylogenomic tree of MAGs and complete isolates with phylophlan, to which I would like to add new 16S sequences. I therefore generated an msa of 16S sequences, and 400 more msas of the genomes based on the 400 phylophlan markers. I then launched uDance under the "backbone: tree" configuration (not "de-novo") and I am currently stuck at this error:

Traceback (most recent call last):
  File "/hps/nobackup/rdf/metagenomics/service-team/users/germanab/uDance/run_udance.py", line 10, in <module>
    options.func(options)
  File "/hps/nobackup/rdf/metagenomics/service-team/users/germanab/uDance/uDance/decompose.py", line 148, in decompose
    aggregate_placements(index_to_node_map, jp['placements'])
  File "/hps/nobackup/rdf/metagenomics/service-team/users/germanab/uDance/uDance/decompose.py", line 27, in aggregate_placements
    index_to_node_map[index].placements += [seqname]
KeyError: -1

My understanding is that this comes from apples2, as it wasn't able to place the first of my 16S sequences:

Taxon AB025012.1 cannot be placed. At least three non-infinity distances should be observed to place a taxon. Consequently, this taxon is ignored (no output).

I have a couple concerns from this:

  • apples2 stopped reading query sequences after the first (which failed), even if it's input query.fa contains 460. Shouldn't ignore a query sequence and proceed with the following ones?
  • apples2 error about needing three points to compute a distance absolutely make sense, but how can I compute distances differently, if complete 16S sequences are aligned only to each other (e.g. they don't match any of the phylophlan markers of course)? Should I generate other msas with those sequences? Since the 16S msa contains both new 16S query sequences, and 16S sequences extracted from the genomes in the backbone tree, I thought that the alignment itself would have served for the computation of multiple distances.

Thank you in advance for your help. I'd be happy to clarify any of my doubts further.

Best
Germana

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions