Align Sequences
This short example describes how to use DECIPHER to align sets of homologous DNA, RNA, or amino acid sequences, as described in:ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment." BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.
ES Wright (2020) "RNAconTest: Comparing Tools for Noncoding RNA Multiple Sequence Alignment Based on Structural Consistency." RNA, doi:10.1261/rna.073015.119.
Instructions
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of unaligned sequences (e.g., "~/mySeqs.fas"). Then load the sequences according to their type: DNA, RNA, or amino acids (AA) and proceed with alignment.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> # change "DNA" to "RNA" or "AA" if necessary
> seqs <- readDNAStringSet(fas)
>
> # look at some of the sequences (optional)
> seqs
A DNAStringSet instance of length 4
width seq names
[1] 1359 ATGGCCGGCT...CAGGCAGTAG 1
[2] 1359 ATGGCCGGCT...CAGGCAGTAG 2
[3] 1359 ATGGCCGGCT...CAGGCAGTAG 3
[4] 1359 ATGGCCGGCT...CAGGCAGTAG 4
>
> # nucleotide sequences need to be in the same orientation
> # if they are not, then they can be reoriented (optional)
> seqs <- OrientNucleotides(seqs)
|============================================| 100%
Time difference of 0.08 secs
>
> # perform the alignment
> aligned <- AlignSeqs(seqs)
Determining distance matrix based on shared 8-mers:
|============================================| 100%
Time difference of 0.01 secs
Clustering into groups by similarity:
|============================================| 100%
Time difference of 0.1 secs
Aligning Sequences:
|============================================| 100%
Time difference of 0.07 secs
Determining distance matrix based on alignment:
|============================================| 100%
Time difference of 0.01 secs
Reclustering into groups by similarity:
|============================================| 100%
Time difference of 0.1 secs
Realigning Sequences:
|============================================| 100%
Time difference of 0.09 secs
Refining the alignment:
|============================================| 100%
Time difference of 0.01 secs
>
> # view the alignment in a browser (optional)
> BrowseSeqs(aligned, highlight=0)
>
> # write the alignment to a new FASTA file
> writeXStringSet(aligned,
+ file="<<REPLACE WITH PATH TO OUTPUT FASTA FILE>>")