DECIPHER logo

  • Home
  • Alignment▸
  • Align Sequences
  • Align Translation
  • Align Synteny
  • Align Profiles
  • Classification▸
  • Find Chimeras
  • Oligo Design▸
  • Downloads
  • Tutorials▸
  • Contact
  • Citation

Align Sequences

This short example describes how to use DECIPHER to align sets of homologous DNA, RNA, or amino acid sequences, as described in:

ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment." BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.

ES Wright (2020) "RNAconTest: Comparing Tools for Noncoding RNA Multiple Sequence Alignment Based on Structural Consistency." RNA, doi:10.1261/rna.073015.119.

For an in-depth tutorial on sequence alignment, see "The Art of Multiple Sequence Alignment in R", available from the Documentation page.

Instructions

First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of unaligned sequences (e.g., "~/mySeqs.fas"). Then load the sequences according to their type: DNA, RNA, or amino acids (AA) and proceed with alignment.

Hide output
# load the DECIPHER library in R
> library(DECIPHER)
> 
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> 
> # load the sequences from the file
> # change "DNA" to "RNA" or "AA" if necessary
> seqs <- readDNAStringSet(fas)
> 
> # look at some of the sequences (optional)
> seqs
  A DNAStringSet instance of length 4
    width seq                     names               
[1]  1359 ATGGCCGGCT...CAGGCAGTAG 1
[2]  1359 ATGGCCGGCT...CAGGCAGTAG 2
[3]  1359 ATGGCCGGCT...CAGGCAGTAG 3
[4]  1359 ATGGCCGGCT...CAGGCAGTAG 4
> 
> # nucleotide sequences need to be in the same orientation
> # if they are not, then they can be reoriented (optional)
> seqs <- OrientNucleotides(seqs)
  |============================================| 100%

Time difference of 0.08 secs > > # perform the alignment > aligned <- AlignSeqs(seqs) Determining distance matrix based on shared 8-mers: |============================================| 100%
Time difference of 0.01 secs
Clustering into groups by similarity: |============================================| 100%
Time difference of 0.1 secs
Aligning Sequences: |============================================| 100%
Time difference of 0.07 secs
Determining distance matrix based on alignment: |============================================| 100%
Time difference of 0.01 secs
Reclustering into groups by similarity: |============================================| 100%
Time difference of 0.1 secs
Realigning Sequences: |============================================| 100%
Time difference of 0.09 secs
Refining the alignment: |============================================| 100%
Time difference of 0.01 secs
> > # view the alignment in a browser (optional) > BrowseSeqs(aligned, highlight=0) > > # write the alignment to a new FASTA file > writeXStringSet(aligned, +    file="<<REPLACE WITH PATH TO OUTPUT FASTA FILE>>")