DECIPHER logo

  • Alignment▸
  • Classification▸
  • Homology▸
  • Oligo Design▸
  • Phylogenetics▾
  • Minimum Evolution
  • Maximum Likelihood
  • Maximum Parsimony
  • Ancestral States
  • Tutorials▸
  • Home
  • News
  • Downloads
  • Contact
  • Citations

Maximum Likelihood

This short example describes how to use Treeline to optimize maximum likelihood (ML) trees. The ML optimality criterion is grounded in the theory of molecular evolution and popular, although it is relatively slow.

For an in-depth tutorial on phylogenetics, see the "Growing Phylogenetic Trees with Treeline" vignette, available from the Documentation page.

How do I build a maximum likelihood phylogenetic tree?

First it is necessary to install DECIPHER and load the library in R. Next, provide Treeline with a sequence alignment and model of evolution that will be used to optimize the tree.

Hide output
# load the DECIPHER library in R
> library(DECIPHER)
> 
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
      width seq               names               
  [1]   819 ATGGCTT...AAGAAAA Rickettsia prowaz...
  [2]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [3]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [4]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [5]   819 ATGGCTA...TGGTAAA Pasteurella multo...
  ...   ... ...
[313]   819 ATGGCAA...TACTAAA Pectobacterium at...
[314]   822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315]   864 ATGGGCA...TCAGTCT Thermosynechococc...
[316]   831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317]   840 ATGGGCA...GCGAGGT Gloeobacter viola...
> 
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
  |========================================| 100%

Time difference of 0.32 secs
Clustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Aligning Sequences: |========================================| 100%
Time difference of 0.49 secs
Iteration 1 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.03 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.22 secs
Iteration 2 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.03 secs
> > # optimize the tree > tree <- Treeline(seqs, + method="ML", + model=MODELS, # choose a model or test all + showPlot=TRUE, + processors=NULL) # use all CPUs Fitting initial tree to models: JC69 -ln(L) = 71307, AICc = 147914, BIC = 146854 K80 -ln(L) = 70693, AICc = 146721, BIC = 145633 F81 -ln(L) = 71270, AICc = 147947, BIC = 146800 HKY85 -ln(L) = 70650, AICc = 146745, BIC = 145568 T92 -ln(L) = 70682, AICc = 146736, BIC = 145619 TN93 -ln(L) = 70308, AICc = 146096, BIC = 144890 SYM -ln(L) = 69984, AICc = 145449, BIC = 144242 GTR -ln(L) = 69699, AICc = 144991, BIC = 143692
The selected model was: GTR
Optimizing up to 400 candidate trees: Tree #133. -ln(L) = 69081.540 (0.000%), 13 Climbs, 0 Grafts of 12
Finalizing the best tree (#90): -ln(L) = 69081.539 (0.000%), 0 Climbs
Model parameters: Frequency(A) = 0.253 Frequency(C) = 0.227 Frequency(G) = 0.369 Frequency(T) = 0.151 Rate A <-> C = 1.760 Rate A <-> G = 2.133 Rate A <-> T = 3.914 Rate C <-> G = 1.107 Rate C <-> T = 5.616 Rate G <-> T = 1.000
Time difference of 307.1 secs
> > # optionally, output a Newick file > WriteDendrogram(tree, file="") ((((('Thermoanaerobacter tengcongensis MB4':0.2144063,('Clostridium acetobutylicum ATCC 824':0.1143762,...