Skip to content

CompletenessScore filtering keeps genes from exluded HOGs #11

@nromashchenko

Description

@nromashchenko

This snippet:

filename = "tests/test-data/case_filtering.orthoxml"
otree = OrthoXMLTree.from_file(filename,
                               score_threshold=0.95, 
                               score_id="CompletenessScore", 
                               high_child_as_rhogs=True)
otree.to_orthoxml("out.orthoxml")

for the test orthoxml will produce valid groups, but in the species section it keeps the genes that are exluded:

<?xml version='1.0' encoding='utf-8'?>
<orthoXML xmlns="http://orthoXML.org/2011/" version="0.5" origin="orthoXML.org" originVersion="1.0">

...
<species name="Chelonia mydas" NCBITaxId="0">
    <database name="someDB" version="42">
      <genes>
        <gene id="cm1" protId="C00001"/>
        <gene id="cm2" protId="C00002"/>
      </genes>
    </database>
  </species>

  <species name="Arabidopsis thaliana" NCBITaxId="0">
    <database name="someDB" version="42">
      <genes>
        <gene id="at1" protId="A00001"/>
        <gene id="at2" protId="A00002"/>
      </genes>
    </database>
  </species>

...
  <groups>
    <orthologGroup id="HOG_Cryptodira">
      <score id="CompletenessScore" value="1.0"/>
      <geneRef id="cm1"/>
      <geneRef id="cm2"/>
    </orthologGroup>
  </groups>

</orthoXML>

e.g. Arabidopsis genes are not listed in any groups anymore, but they are described in the header.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions