|
| 1 | +# Discrete Trait Evolution |
| 2 | + |
| 3 | +With a phylogenetic network structure inferred, we can now estimate how quickly traits |
| 4 | +have evolved over time using a likelihood model. These traits should be discrete |
| 5 | +characteristics of a species such as feather color, diet type, or dna in aligned |
| 6 | +genetic sequences. |
| 7 | + |
| 8 | +## Discrete Trait Data |
| 9 | + |
| 10 | +As with continuous trait evolution, we assume a fixed network, correctly rooted, |
| 11 | +with branch lengths proportional to calendar time. We start with a network, then |
| 12 | +add data about the tips of this network. We allow data of two types. |
| 13 | +1. A vector of species names with a data frame of traits: |
| 14 | + |
| 15 | +```@setup fitDiscrete |
| 16 | +using PhyloNetworks, DataFrames |
| 17 | +mkpath("../assets/figures") |
| 18 | +``` |
| 19 | + |
| 20 | +```@example fitDiscrete |
| 21 | +#read in network |
| 22 | +simple_net = readTopology("(A:3.0,(B:2.0,(C:1.0,D:1.0):1.0):1.0);"); |
| 23 | +
|
| 24 | +#read in trait data |
| 25 | +simple_species = ["C","A","B","D"] |
| 26 | +simple_dat = DataFrame(trait=["hi","lo","lo","hi"]) |
| 27 | +``` |
| 28 | + |
| 29 | +If your species names and trait data are in the same data frame, read in your data |
| 30 | +frame then subset the data like this: |
| 31 | +```@example fitDiscrete |
| 32 | +dat = DataFrame(species=["C","A","B","D"], trait=["hi","lo","lo","hi"]) |
| 33 | +simple_species = dat[:species] |
| 34 | +simple_dat = DataFrame(trait = dat[:trait]) |
| 35 | +``` |
| 36 | + |
| 37 | +2. To use dna data, read in the network structure then start with a fasta |
| 38 | +file. Reading the data from this file using the `readfastatodna` function. This |
| 39 | +creates a data frame of dna data and a vector of dna pattern weights. |
| 40 | + |
| 41 | +```@example fitDiscrete |
| 42 | +#read in network |
| 43 | +dna_net = readTopology("((((((((((((((Ae_caudata_Tr275:1.0,Ae_caudata_Tr276:1.0):1.0,Ae_caudata_Tr139:1.0):1.0)#H1:1.0::0.6,((((((Ae_longissima_Tr241:1.0,Ae_longissima_Tr242:1.0):1.0,Ae_longissima_Tr355:1.0):1.0,(Ae_sharonensis_Tr265:1.0,Ae_sharonensis_Tr264:1.0):1.0):1.0,((Ae_bicornis_Tr408:1.0,Ae_bicornis_Tr407:1.0):1.0,Ae_bicornis_Tr406:1.0):1.0):1.0,((Ae_searsii_Tr164:1.0,Ae_searsii_Tr165:1.0):1.0,Ae_searsii_Tr161:1.0):1.0):1.0)#H2:1.0::0.6):1.0,(((Ae_umbellulata_Tr266:1.0,Ae_umbellulata_Tr257:1.0):1.0,Ae_umbellulata_Tr268:1.0):1.0,#H1:1.0::0.4):1.0):1.0,((Ae_comosa_Tr271:1.0,Ae_comosa_Tr272:1.0):1.0,(((Ae_uniaristata_Tr403:1.0,Ae_uniaristata_Tr357:1.0):1.0,Ae_uniaristata_Tr402:1.0):1.0,Ae_uniaristata_Tr404:1.0):1.0):1.0):1.0,(((Ae_tauschii_Tr352:1.0,Ae_tauschii_Tr351:1.0):1.0,(Ae_tauschii_Tr180:1.0,Ae_tauschii_Tr125:1.0):1.0):1.0,(#H2:1.0::0.4,((((Ae_mutica_Tr237:1.0,Ae_mutica_Tr329:1.0):1.0,Ae_mutica_Tr244:1.0):1.0,Ae_mutica_Tr332:1.0):1.0)#H4:1.0::0.6):1.0):1.0):1.0,(((T_boeoticum_TS8:1.0,(T_boeoticum_TS10:1.0,T_boeoticum_TS3:1.0):1.0):1.0,T_boeoticum_TS4:1.0):1.0,((T_urartu_Tr315:1.0,T_urartu_Tr232:1.0):1.0,(T_urartu_Tr317:1.0,T_urartu_Tr309:1.0):1.0):1.0):1.0):1.0,(((((Ae_speltoides_Tr320:1.0,Ae_speltoides_Tr323:1.0):1.0,Ae_speltoides_Tr223:1.0):1.0,Ae_speltoides_Tr251:1.0):1.0):1.0,#H4:1.0::0.4):1.0):1.0):1.0,Ta_caputMedusae_TB2:1.0):1.0,S_vavilovii_Tr279:1.0):1.0,Er_bonaepartis_TB1:1.0):1.0,H_vulgare_HVens23:1.0);"); |
| 44 | +
|
| 45 | +#read in dna data |
| 46 | +fastafile = joinpath(dirname(pathof(PhyloNetworks)), "..","examples", |
| 47 | +"Ae_bicornis_Tr406_Contig10132.aln") |
| 48 | +dna_dat, dna_weights = readfastatodna(fastafile, true); |
| 49 | +``` |
| 50 | + |
| 51 | +## Choosing a Substitution Model |
| 52 | + |
| 53 | +After reading in your data, choose a model to describe how evolutionary changes |
| 54 | +(or substitutions, in the case of DNA) happened over time. We offer a selection |
| 55 | +of Markov substitution models to describe the evolutionary process. |
| 56 | + |
| 57 | +### Generic Trait Models |
| 58 | + |
| 59 | +These models works well for any type of trait we may want to model. For general |
| 60 | +trait types, use one of these three models: |
| 61 | +`:ERSM` Equal Rates Substitution Model |
| 62 | +`:BTSM` Binary Trait Substitution Model |
| 63 | +`:TBTSM` Two Binary Trait Substituion Model |
| 64 | + |
| 65 | +### DNA-Specific Models |
| 66 | + |
| 67 | +The DNA-specific models are optimized for aligned sequence data. We offer JC69 |
| 68 | +and HKY85 models in both relative and absolute versions. The JC69 model was |
| 69 | +developed by Jukes and Cantor in 1969 and uses one rate for all type of substitutions. |
| 70 | +The HKY85 model was developed in 1985 by Hasegawa, Kishino, & Yano. It treats |
| 71 | +transitions differently from transversions. |
| 72 | +`:JC69` Jukes Cantor 69 Model |
| 73 | +`:HKY85` Hasegawa, Kishino and Yano 1985 |
| 74 | + |
| 75 | +## Running FitDiscrete |
| 76 | + |
| 77 | +To infer evolutionary rates, run the `fitDiscrete` function on the network and data. |
| 78 | +It will calculate the maximum likelihood score of a network given one or more |
| 79 | +discrete trait characters at the tips. Along each edge, evolutionary changes |
| 80 | +are modeled with a continous time Markov model, with parameters estimated by |
| 81 | +maximizing the likelihood. At each hybrid node, the trait is assumed to be |
| 82 | +inherited from the immediate parent (or parents, in the case of a hybrid edge). |
| 83 | +If there is a hybrid edge, the trait is modeled according to the parents' weighted |
| 84 | +average genetic contributions, as measured by inheritance gamma γ. The model |
| 85 | +currently ignores incomplete lineage sorting. |
| 86 | + |
| 87 | +### General Trait Data |
| 88 | + |
| 89 | +```@example fitDiscrete |
| 90 | +s1 = fitDiscrete(simple_net, :ERSM, simple_species, simple_dat; optimizeQ=false, |
| 91 | +optimizeRVAS=false) |
| 92 | +s2 = fitDiscrete(simple_net, :BTSM, simple_species, simple_dat; optimizeQ=false, |
| 93 | +optimizeRVAS=false) |
| 94 | +``` |
| 95 | +In this `fitDiscrete` call, we do not optimize rates or allow for rate variation |
| 96 | +across sites. |
| 97 | + |
| 98 | +If `optimizeQ = true`, the `fitDiscrete` function estimates the evolutionary rate |
| 99 | +or rates. Because we didn't allow for rate variation across sites in these models, |
| 100 | +we do not optimize the way rates may vary across trait types. |
| 101 | + |
| 102 | +```@example fitDiscrete |
| 103 | +s3 = fitDiscrete(simple_net, :ERSM, simple_species, simple_dat; optimizeQ=true, |
| 104 | +optimizeRVAS=true) |
| 105 | +s4 = fitDiscrete(simple_net, :BTSM, simple_species, simple_dat; optimizeQ=true, |
| 106 | +optimizeRVAS=true) |
| 107 | +``` |
| 108 | + |
| 109 | +### DNA Data |
| 110 | + |
| 111 | +For DNA data, use `:JC69` or `:HKY85` models. |
| 112 | +```@example fitDiscrete |
| 113 | +d1 = fitDiscrete(dna_net, :JC69, dna_dat, dna_weights, :RV; optimizeQ=false, |
| 114 | +optimizeRVAS=false) |
| 115 | +d2 = fitDiscrete(dna_net, :HKY85, dna_dat, dna_weights, :RV; optimizeQ=false, |
| 116 | +optimizeRVAS=false) |
| 117 | +``` |
| 118 | +In these `fitDiscrete` models, we do not optimize rates (`optimizeQ=false`), but |
| 119 | +we do allow for rate variation across sites. |
| 120 | + |
| 121 | +### Rate Variation Across Sites |
| 122 | + |
| 123 | +In its default version, `fitDiscrete` does not allow for rate variation across sites. |
| 124 | +To allow for rate variation across sites in your estimate of evolutionary rates |
| 125 | +(or rate variation across trait types, in the case of general trait types), |
| 126 | +include `:RV`. If you include `:RV` and `optimizeRVAS = true`, the model will |
| 127 | +not only allow for rate variation, but it will also optimize how rates vary across |
| 128 | +sites. |
| 129 | + |
| 130 | +We optimize the evolutionary rates and the way rates vary across sites for the |
| 131 | +DNA data here: |
| 132 | +```@example fitDiscrete |
| 133 | +d3 = fitDiscrete(dna_net, :JC69, dna_dat, dna_weights, :RV; optimizeQ=true, |
| 134 | +optimizeRVAS=false) |
| 135 | +d4 = fitDiscrete(dna_net, :HKY85, dna_dat, dna_weights, :RV; optimizeQ=true, |
| 136 | +optimizeRVAS=false) |
| 137 | +``` |
0 commit comments