Version: 1.2.0
Comand: orthoxml-tools filter --infile tests/test-data/case_filtering.orthoxml --threshold 0.3 --strategy extract --out debug.orthoxml
This extracts HOG_Mammals_1, HOG_Mammals_2 as rootHOGs. At the same time, they are still kept HOG_Tetrapoda as a paralog group as tetrapoda also pass the check.
Original issue:
With --strategy extract, the output file is bigger than the input file (3GB vs 2.3GB). For example the gene id 1053013941 appears two times in the output orthoXML
$ grep 1053013941 FastOMA_HOGs_Feb2026_orthoXMLtools_extract_0.3.orthoxml
<gene id="1053013941" protId="CALSQU_R06387"/>
<geneRef id="1053013941"/>
<geneRef id="1053013941"/>
I ran this:
orthoxml-tools filter --infile FastOMA_HOGs_Feb2026_raw.orthoxml --strategy extract --threshold 0.3 --outfile FastOMA_HOGs_Feb2026_orthoXMLtools_extract_0.3.orthoxml
Originally posted by @sinamajidian in #42
Version: 1.2.0
Comand:
orthoxml-tools filter --infile tests/test-data/case_filtering.orthoxml --threshold 0.3 --strategy extract --out debug.orthoxmlThis extracts
HOG_Mammals_1,HOG_Mammals_2as rootHOGs. At the same time, they are still keptHOG_Tetrapodaas a paralog group as tetrapoda also pass the check.Original issue:
Originally posted by @sinamajidian in #42