This snippet:
filename = "tests/test-data/case_filtering.orthoxml"
otree = OrthoXMLTree.from_file(filename,
score_threshold=0.95,
score_id="CompletenessScore",
high_child_as_rhogs=True)
otree.to_orthoxml("out.orthoxml")
for the test orthoxml will produce valid groups, but in the species section it keeps the genes that are exluded:
<?xml version='1.0' encoding='utf-8'?>
<orthoXML xmlns="http://orthoXML.org/2011/" version="0.5" origin="orthoXML.org" originVersion="1.0">
...
<species name="Chelonia mydas" NCBITaxId="0">
<database name="someDB" version="42">
<genes>
<gene id="cm1" protId="C00001"/>
<gene id="cm2" protId="C00002"/>
</genes>
</database>
</species>
<species name="Arabidopsis thaliana" NCBITaxId="0">
<database name="someDB" version="42">
<genes>
<gene id="at1" protId="A00001"/>
<gene id="at2" protId="A00002"/>
</genes>
</database>
</species>
...
<groups>
<orthologGroup id="HOG_Cryptodira">
<score id="CompletenessScore" value="1.0"/>
<geneRef id="cm1"/>
<geneRef id="cm2"/>
</orthologGroup>
</groups>
</orthoXML>
e.g. Arabidopsis genes are not listed in any groups anymore, but they are described in the header.
This snippet:
for the test orthoxml will produce valid groups, but in the species section it keeps the genes that are exluded:
e.g. Arabidopsis genes are not listed in any groups anymore, but they are described in the header.