In this repository all code and datasets are available to reproduce the results described in the paper Compositionally constrained sites drive long branch attraction and to run the introduced CAT-PMSF pipeline on arbitrary datasets.
Structure of the repository:
- datasets: empirical datasets analyzed in the paper
- datasets/simulation: simulation dataset
- scripts: scripts to perform CAT-PMSF pipeline on a dataset
- step1_iqtree_lg: results of the 1st step of the CAT-PMSF pipeline applied to the empirical datasets
- step1_iqtree_lg/simulation: correct (good) and incorrect (bad) topologies of the simulations
- step2_pb: results of the 2nd step of the CAT-PMSF pipeline applied to the empirical datasets
- step2_pb/simulation: results of the 2nd step of the CAT-PMSF pipeline applied to the simulated trees
- step3_iqtree: results of the 3rd step of the CAT-PMSF pipeline applied to the empirical datasets
- step3_iqtree/simulation: results of the 3rd step of the CAT-PMSF pipeline applied to the simulated trees
- homo: results of the tests for compositional heterogeneity across lineages with Homo.v2.1.