DECIPHER logo

  • Alignment▸
  • Classification▸
  • Homology▸
  • Oligo Design▸
  • Phylogenetics▾
  • Minimum Evolution
  • Maximum Likelihood
  • Maximum Parsimony
  • Ancestral States
  • Tutorials▸
  • Home
  • News
  • Downloads
  • Contact
  • Citations

Ancestral States

This short example describes how to use Treeline to perform ancestral state reconstruction. This process results in a tree with estimated states at each internal node. Maximum likelihood trees infer ancestral states based on likelihood, while the other tree building methods use parsimony.

For an in-depth tutorial on phylogenetics, see the "Growing Phylogenetic Trees with Treeline" vignette, available from the Documentation page.

How do I perform ancestral state reconstruction?

First it is necessary to install DECIPHER and load the library in R. After optimizing a tree with reconstruct set to TRUE, it is possible to see the estimated ancestral states or use MapCharacters to tabulate state changes.

Hide output
# load the DECIPHER library in R
> library(DECIPHER)
> 
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
      width seq               names               
  [1]   819 ATGGCTT...AAGAAAA Rickettsia prowaz...
  [2]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [3]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [4]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [5]   819 ATGGCTA...TGGTAAA Pasteurella multo...
  ...   ... ...
[313]   819 ATGGCAA...TACTAAA Pectobacterium at...
[314]   822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315]   864 ATGGGCA...TCAGTCT Thermosynechococc...
[316]   831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317]   840 ATGGGCA...GCGAGGT Gloeobacter viola...
> 
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
  |========================================| 100%

Time difference of 0.31 secs
Clustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Aligning Sequences: |========================================| 100%
Time difference of 0.46 secs
Iteration 1 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.03 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.21 secs
Iteration 2 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.03 secs
> > # optimize the tree > tree <- Treeline(seqs, + method="ME", + model="TN93+F", # choose a model + reconstruct=TRUE, # ancestral state reconstuction + processors=NULL) # use all CPUs Optimizing up to 400 candidate trees: Tree #68. length = 19.918 (0.000%), 18 Climbs, 0 Grafts of 3
Finalizing the best tree (#8): length = 19.918 (0.000%), 0 Climbs
Time difference of 3.65 secs
> > # examine one ancestral state > attr(tree, "state") # estimated root state [1] "GTGAGCTTTTCCCAGAGAAGCTTAGAACAAGGAGGTAAAAAAATGGCAATTAAAAAATTAAAGCCAACTACAAACCCAGGGAGAAGACACAAGACTATTTCCGATTTTGAAGAAATAGAAAAAATCACAAAAACAGACAAAAAATAGGGGAAAAAAAAAAAAAACAAAGTAACACCAGAAAAGTCCCTGCTAGTGCCGATGGTAGCAAAGAAAACAGGAGGACGCAACAGAAATAACGGTAAAATTACCACTCGTCACAAAGGTGGCGGACACAAAAAAAAATACCGAATAATAGATTTCAATAAGAGATACAACCACAAAGACGAAGTTCCTGCAAAAGTAACGGCAATCGAGTACGACCCGAACAGAACTGCAAGGATTGCTCTGCTTCATTACGCTGAAGATGGAGAAAAGAGTTATATTCTCGCTCCCAAAGGTTTGAAAGTGGGCGACACAGTCATAACAGGTGAAAAAGAAGACGCGGAAGCCGGAAAGCCCAAAGCCGAAATCAAACCAGGAAATGCCCTGCCTCTGGAAAACATACCGGTCGGTACCATTATCCACAACATTGAGTTGAATCCTGGAAAGGGTGGACAGATAGCAAGATCTGCCGGAACATATGCCCAGCTTACGGCTAATGACAAAGAAGGAAAATACGCTATGATCAGAATGCCTTCAGGTGAAGTGAGAAAGATACACAACAAGTGCAAGGCCACCATCGGTGAAGTTGGAAACGCAGATCACGAAAACGTAAATCTAGGTAAGGCTGGACGCTCGCGATGGCTAGGTAGCCGACCGCACATCCGTGGTATGGCAATGAACCCGGTTGATCACCCGCTCGGTGGTGGTGAAGGTAGAACGAAATCTGCTAGAGGTCAAAAGCACCCAAAAACTCCTTGGGGACAGCCGACTAAGGGTTACAAGACTAGAAATAATAAGAAACCTTCCAATAAGTTCATCATCAAGAGAAGAAAAAAAAAGAAAAAAAAACAATTGAAACTCCGAAAGCGCGGAGGACGTGAGTCT" > > # tabulate state changes at each site > head(MapCharacters(tree, type="table"), n=100)
A306G T567C C258T A573G A174G A48T A612T A639G 29 25 24 22 19 19 19 19 A892T A935G C228T C273T C285T T420C T792C A616G 19 19 19 19 19 19 19 18 A798G C591T A292G A313G C357T C366T C515T T735C 18 18 17 17 17 17 17 17 A276G A513G A57G A582G A637C A684G A694G A858T 16 16 16 16 16 16 16 16 A961G C267T C697T C840T T327C A183G A243C A270T 16 16 16 16 16 15 15 15 A326G A370T A399T A456T A471T A603T A738T A974G 15 15 15 15 15 15 15 15 C297T C327T C351T C396T C405T C609T C657T C708T 15 15 15 15 15 15 15 15 C720T C723T C960T T522C T768A A174T A186T A250G 15 15 15 15 15 14 14 14 A289T A321G A442C A454C A457G A462T A470C A501T 14 14 14 14 14 14 14 14 A556G A597G A667C A717T A759T A774T A780T A942G 14 14 14 14 14 14 14 14 C360T C47G C567T C735T C795T C815T C877A G397A 14 14 14 14 14 14 14 14 T264C T357C A110C A180G A189G A192T A213G A443C 14 14 13 13 13 13 13 13 A454G A499G A553T A568G 13 13 13 13