Align Translation
This short example describes how to use DECIPHER to align a set of protein coding DNA sequences, as described in:ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment." BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.
Instructions
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of unaligned sequences (e.g., "~/mySeqs.fas").# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> seqs <- readDNAStringSet(fas)
>
> # look at some of the sequences (optional)
> seqs
A DNAStringSet instance of length 317
width seq names
[1] 819 ATGGCTTTA...AAAAGAAAA 1
[2] 822 ATGGGAATA...AGGAAAAAG 2
[3] 822 ATGGGAATA...AGGAAAAAG 3
[4] 822 ATGGGAATA...AGGAAAAAG 4
[5] 819 ATGGCTATC...CGTGGTAAA 5
... ... ...
[313] 819 ATGGCAATT...CGTACTAAA 313
[314] 822 ATGCCTATT...CGCGTCAAG 314
[315] 864 ATGGGCATT...CGTCAGTCT 315
[316] 831 ATGGCACTG...CGGAAGAAG 316
[317] 840 ATGGGCATT...GGGCGAGGT 317
>
> # for help, see the AlignTranslation help page (optional)
> ?AlignTranslation
>
> # perform the alignment via the translations
> # change NA to 1, 2 or 3 if the readingFrame is known
> aligned <- AlignTranslation(seqs,
+ readingFrame=NA,
+ type="AAStringSet") # return AA or DNA sequences?
Determining distance matrix based on shared 4-mers:
|============================================| 100%
Time difference of 1.88 secs
Clustering into groups by similarity:
|============================================| 100%
Time difference of 0.62 secs
Aligning Sequences:
|============================================| 100%
Time difference of 4.63 secs
Determining distance matrix based on alignment:
|============================================| 100%
Time difference of 0.34 secs
Reclustering into groups by similarity:
|============================================| 100%
Time difference of 0.44 secs
Realigning Sequences:
|============================================| 100%
Time difference of 5.16 secs
Refining the alignment:
|============================================| 100%
Time difference of 0.01 secs
>
> # view the alignment in a browser (optional)
> BrowseSeqs(aligned, highlight=0)
>
> # write the alignment to a new FASTA file
> writeXStringSet(aligned,
+ file="<<REPLACE WITH PATH TO OUTPUT FASTA FILE>>")