DECIPHER logo

  • Alignment▸
  • Classification▸
  • Homology▸
  • Oligo Design▸
  • Phylogenetics▾
  • Minimum Evolution
  • Maximum Likelihood
  • Maximum Parsimony
  • Ancestral States
  • Tutorials▸
  • Home
  • News
  • Downloads
  • Contact
  • Citations

Maximum Parsimony

This short example describes how to use Treeline to optimize maximum parsimony (MP) trees. The MP optimality criterion is fast, easier to interpret, and relies on a cost matrix for state changes.

For an in-depth tutorial on phylogenetics, see the "Growing Phylogenetic Trees with Treeline" vignette, available from the Documentation page.

How do I build a maximum parsimony phylogenetic tree?

First it is necessary to install DECIPHER and load the library in R. Next, provide Treeline with a sequence alignment and cost matrix that will be used to optimize the tree.

Hide output
# load the DECIPHER library in R
> library(DECIPHER)
> 
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
      width seq               names               
  [1]   819 ATGGCTT...AAGAAAA Rickettsia prowaz...
  [2]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [3]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [4]   822 ATGGGAA...GAAAAAG Porphyromonas gin...
  [5]   819 ATGGCTA...TGGTAAA Pasteurella multo...
  ...   ... ...
[313]   819 ATGGCAA...TACTAAA Pectobacterium at...
[314]   822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315]   864 ATGGGCA...TCAGTCT Thermosynechococc...
[316]   831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317]   840 ATGGGCA...GCGAGGT Gloeobacter viola...
> 
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
  |========================================| 100%

Time difference of 0.33 secs
Clustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Aligning Sequences: |========================================| 100%
Time difference of 0.47 secs
Iteration 1 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.03 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.22 secs
Iteration 2 of 2:
Determining distance matrix based on alignment: |========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity: |========================================| 100%
Time difference of 0.02 secs
Realigning Sequences: |========================================| 100%
Time difference of 0.03 secs
> > # construct a cost matrix > costMatrix <- 2*(1 - diag(4)) > colnames(costMatrix) <- DNA_BASES > rownames(costMatrix) <- DNA_BASES > costMatrix["A", "G"] <- 1 > costMatrix["G", "A"] <- 1 > costMatrix["C", "T"] <- 1 > costMatrix["T", "C"] <- 1 > > # optimize the tree > tree <- Treeline(seqs, + method="MP", + model=MODELS, # choose a model or test all + showPlot=TRUE, + processors=NULL) # use all CPUs Optimizing up to 400 candidate trees: Tree #145. score = 16263.000 (0.000%), 8 Climbs, 0 Grafts of 14
Finalizing the best tree (#79): score = 16263.000 (0.000%), 0 Climbs
Time difference of 20.47 secs
> > # optionally, output a Newick file > WriteDendrogram(tree, file="") ((('Chlorobium tepidum TLS':0.1808308,('Geobacter sulfurreducens PCA':0.1699809,...