DECIPHER - Maximum Likelihood

Maximum Likelihood

This short example describes how to use Treeline to optimize maximum likelihood (ML) trees. The ML optimality criterion is grounded in the theory of molecular evolution and popular, although it is relatively slow.

For an in-depth tutorial on phylogenetics, see the "Growing Phylogenetic Trees with Treeline" vignette, available from the Documentation page.

How do I build a maximum likelihood phylogenetic tree?

First it is necessary to install DECIPHER and load the library in R. Next, provide Treeline with a sequence alignment and model of evolution that will be used to optimize the tree.

Hide output

# load the DECIPHER library in R
> library(DECIPHER)
> 
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
      width seq               names               
  [1]   819 ATGGCTT...AAGAAAA Rickettsia prowaz...
  [2]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [3]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [4]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [5]   819 ATGGCTA...TGGTAAA Pasteurella multo...
  ...   ... ...
[313]   819 ATGGCAA...TACTAAA Pectobacterium at...
[314]   822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315]   864 ATGGGCA...TCAGTCT Thermosynechococc...
[316]   831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317]   840 ATGGGCA...GCGAGGT Gloeobacter viola...
> 
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
  |========================================| 100%


Time difference of 0.32 secs


Clustering into groups by similarity:
  |========================================| 100%


Time difference of 0.02 secs


Aligning Sequences:
  |========================================| 100%


Time difference of 0.49 secs


Iteration 1 of 2:


Determining distance matrix based on alignment:
  |========================================| 100%


Time difference of 0.05 secs


Reclustering into groups by similarity:
  |========================================| 100%


Time difference of 0.03 secs


Realigning Sequences:
  |========================================| 100%


Time difference of 0.22 secs


Iteration 2 of 2:


Determining distance matrix based on alignment:
  |========================================| 100%


Time difference of 0.05 secs


Reclustering into groups by similarity:
  |========================================| 100%


Time difference of 0.02 secs


Realigning Sequences:
  |========================================| 100%


Time difference of 0.03 secs


> 
> # optimize the tree
> tree <- Treeline(seqs,
+ method="ML",
+ model=MODELS, # choose a model or test all
+ showPlot=TRUE,
+ processors=NULL) # use all CPUs
Fitting initial tree to models:
JC69  -ln(L) = 71307, AICc = 147914, BIC = 146854
K80   -ln(L) = 70693, AICc = 146721, BIC = 145633
F81   -ln(L) = 71270, AICc = 147947, BIC = 146800
HKY85 -ln(L) = 70650, AICc = 146745, BIC = 145568
T92   -ln(L) = 70682, AICc = 146736, BIC = 145619
TN93  -ln(L) = 70308, AICc = 146096, BIC = 144890
SYM   -ln(L) = 69984, AICc = 145449, BIC = 144242
GTR   -ln(L) = 69699, AICc = 144991, BIC = 143692


The selected model was:  GTR


Optimizing up to 400 candidate trees:
Tree #133. -ln(L) = 69081.540 (0.000%), 13 Climbs, 0 Grafts of 12  


Finalizing the best tree (#90):
-ln(L) = 69081.539 (0.000%), 0 Climbs            


Model parameters:
Frequency(A) = 0.253
Frequency(C) = 0.227
Frequency(G) = 0.369
Frequency(T) = 0.151
Rate A <-> C = 1.760
Rate A <-> G = 2.133
Rate A <-> T = 3.914
Rate C <-> G = 1.107
Rate C <-> T = 5.616
Rate G <-> T = 1.000


Time difference of 307.1 secs


> 
> # optionally, output a Newick file
> WriteDendrogram(tree, file="")
((((('Thermoanaerobacter tengcongensis MB4':0.2144063,('Clostridium acetobutylicum ATCC 824':0.1143762,...