Minimum Evolution
This short example describes how to use Treeline to optimize balanced minimum evolution (ME) trees. The ME optimality criterion is fast, model based, statistically consistent, and accurate on empirical benchmarks. These features make it the default go-to tree building method in Treeline().Instructions
First it is necessary to install DECIPHER and load the library in R. Next, convert a sequence alignment into the distance matrix the will be used to optimize the tree.# load the DECIPHER library in R
> library(DECIPHER)
>
> # load the target sequences from a file
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> seqs <- readDNAStringSet(fas) # use AA, DNA, or RNA
> seqs
DNAStringSet object of length 317:
width seq names
[1] 819 ATGGCTT...AAGAAAA Rickettsia prowaz...
[2] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[3] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[4] 822 ATGGGAA...GAAAAAG Porphyromonas gin...
[5] 819 ATGGCTA...TGGTAAA Pasteurella multo...
... ... ...
[313] 819 ATGGCAA...TACTAAA Pectobacterium at...
[314] 822 ATGCCTA...CGTCAAG Acinetobacter sp....
[315] 864 ATGGGCA...TCAGTCT Thermosynechococc...
[316] 831 ATGGCAC...GAAGAAG Bradyrhizobium ja...
[317] 840 ATGGGCA...GCGAGGT Gloeobacter viola...
>
> # align coding sequences
> seqs <- AlignTranslation(seqs,
+ type="DNAStringSet") # choose AA or DNA
Determining distance matrix based on shared 5-mers:
|========================================| 100%
Time difference of 0.32 secs
Clustering into groups by similarity:
|========================================| 100%
Time difference of 0.02 secs
Aligning Sequences:
|========================================| 100%
Time difference of 0.46 secs
Iteration 1 of 2:
Determining distance matrix based on alignment:
|========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity:
|========================================| 100%
Time difference of 0.03 secs
Realigning Sequences:
|========================================| 100%
Time difference of 0.21 secs
Iteration 2 of 2:
Determining distance matrix based on alignment:
|========================================| 100%
Time difference of 0.05 secs
Reclustering into groups by similarity:
|========================================| 100%
Time difference of 0.02 secs
Realigning Sequences:
|========================================| 100%
Time difference of 0.03 secs
>
> # construct a distance matrix
> D <- DistanceMatrix(seqs,
+ corr="F81+F", # choose a model
+ type="dist",
+ processors=NULL) # use all CPUs
|========================================| 100%
Time difference of 0.02 secs
>
> # optimize the tree
> tree <- Treeline(myDistMatrix=D,
+ method="ME",
+ showPlot=TRUE,
+ processors=NULL) # use all CPUs
Optimizing up to 400 candidate trees:
Tree #125. length = 19.631 (0.000%), 16 Climbs, 0 Grafts of 1
Finalizing the best tree (#66):
length = 19.631 (0.000%), 0 Climbs
Time difference of 5.57 secs
>
> # optionally, output a Newick file
> WriteDendrogram(tree, file="")
(((((('Treponema denticola ATCC 35405':0.001110335,'Treponema denticola F0402'...;