Ancestral States
This short example describes how to use Treeline to perform ancestral state reconstruction. This process results in a tree with estimated states at each internal node. Maximum likelihood trees infer ancestral states based on likelihood, while the other tree building methods use parsimony.Instructions
First it is necessary to install DECIPHER and load the library in R. After optimizing a tree with reconstruct set to TRUE, it is possible to see the estimated ancestral states or use MapCharacters to tabulate state changes.# load the DECIPHER library in R
> library(DECIPHER)
>
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
width seq names
[1] 819 ATGGCTT...AAGAAAA Rickettsia prowaz...
[2] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[3] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[4] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[5] 819 ATGGCTA...TGGTAAA Pasteurella multo...
... ... ...
[313] 819 ATGGCAA...TACTAAA Pectobacterium at...
[314] 822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315] 864 ATGGGCA...TCAGTCT Thermosynechococc...
[316] 831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317] 840 ATGGGCA...GCGAGGT Gloeobacter viola...
>
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
|========================================| 100%
Time difference of 0.31 secs
Clustering into groups by similarity:
|========================================| 100%
Time difference of 0.02 secs
Aligning Sequences:
|========================================| 100%
Time difference of 0.46 secs
Iteration 1 of 2:
Determining distance matrix based on alignment:
|========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity:
|========================================| 100%
Time difference of 0.03 secs
Realigning Sequences:
|========================================| 100%
Time difference of 0.21 secs
Iteration 2 of 2:
Determining distance matrix based on alignment:
|========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity:
|========================================| 100%
Time difference of 0.02 secs
Realigning Sequences:
|========================================| 100%
Time difference of 0.03 secs
>
> # optimize the tree
> tree <- Treeline(seqs,
+ method="ME",
+ model="TN93+F", # choose a model
+ reconstruct=TRUE, # ancestral state reconstuction
+ processors=NULL) # use all CPUs
Optimizing up to 400 candidate trees:
Tree #68. length = 19.918 (0.000%), 18 Climbs, 0 Grafts of 3
Finalizing the best tree (#8):
length = 19.918 (0.000%), 0 Climbs
Time difference of 3.65 secs
>
> # examine one ancestral state
> attr(tree, "state") # estimated root state
[1] "GTGAGCTTTTCCCAGAGAAGCTTAGAACAAGGAGGTAAAAAAATGGCAATTAAAAAATTAAAGCCAACTACAAACCCAGGGAGAAGACACAAGACTATTTCCGATTTTGAAGAAATAGAAAAAATCACAAAAACAGACAAAAAATAGGGGAAAAAAAAAAAAAACAAAGTAACACCAGAAAAGTCCCTGCTAGTGCCGATGGTAGCAAAGAAAACAGGAGGACGCAACAGAAATAACGGTAAAATTACCACTCGTCACAAAGGTGGCGGACACAAAAAAAAATACCGAATAATAGATTTCAATAAGAGATACAACCACAAAGACGAAGTTCCTGCAAAAGTAACGGCAATCGAGTACGACCCGAACAGAACTGCAAGGATTGCTCTGCTTCATTACGCTGAAGATGGAGAAAAGAGTTATATTCTCGCTCCCAAAGGTTTGAAAGTGGGCGACACAGTCATAACAGGTGAAAAAGAAGACGCGGAAGCCGGAAAGCCCAAAGCCGAAATCAAACCAGGAAATGCCCTGCCTCTGGAAAACATACCGGTCGGTACCATTATCCACAACATTGAGTTGAATCCTGGAAAGGGTGGACAGATAGCAAGATCTGCCGGAACATATGCCCAGCTTACGGCTAATGACAAAGAAGGAAAATACGCTATGATCAGAATGCCTTCAGGTGAAGTGAGAAAGATACACAACAAGTGCAAGGCCACCATCGGTGAAGTTGGAAACGCAGATCACGAAAACGTAAATCTAGGTAAGGCTGGACGCTCGCGATGGCTAGGTAGCCGACCGCACATCCGTGGTATGGCAATGAACCCGGTTGATCACCCGCTCGGTGGTGGTGAAGGTAGAACGAAATCTGCTAGAGGTCAAAAGCACCCAAAAACTCCTTGGGGACAGCCGACTAAGGGTTACAAGACTAGAAATAATAAGAAACCTTCCAATAAGTTCATCATCAAGAGAAGAAAAAAAAAGAAAAAAAAACAATTGAAACTCCGAAAGCGCGGAGGACGTGAGTCT"
>
> # tabulate state changes at each site
> head(MapCharacters(tree, type="table"), n=100)
A306G T567C C258T A573G A174G A48T A612T A639G
29 25 24 22 19 19 19 19
A892T A935G C228T C273T C285T T420C T792C A616G
19 19 19 19 19 19 19 18
A798G C591T A292G A313G C357T C366T C515T T735C
18 18 17 17 17 17 17 17
A276G A513G A57G A582G A637C A684G A694G A858T
16 16 16 16 16 16 16 16
A961G C267T C697T C840T T327C A183G A243C A270T
16 16 16 16 16 15 15 15
A326G A370T A399T A456T A471T A603T A738T A974G
15 15 15 15 15 15 15 15
C297T C327T C351T C396T C405T C609T C657T C708T
15 15 15 15 15 15 15 15
C720T C723T C960T T522C T768A A174T A186T A250G
15 15 15 15 15 14 14 14
A289T A321G A442C A454C A457G A462T A470C A501T
14 14 14 14 14 14 14 14
A556G A597G A667C A717T A759T A774T A780T A942G
14 14 14 14 14 14 14 14
C360T C47G C567T C735T C795T C815T C877A G397A
14 14 14 14 14 14 14 14
T264C T357C A110C A180G A189G A192T A213G A443C
14 14 13 13 13 13 13 13
A454G A499G A553T A568G
13 13 13 13