Align Profiles
This short example describes how to use DECIPHER to merge two alignments, as described in:ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment." BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.
ES Wright (2020) "RNAconTest: Comparing Tools for Noncoding RNA Multiple Sequence Alignment Based on Structural Consistency." RNA, doi:10.1261/rna.073015.119.
Instructions
There are two options for merging alignments:# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to both FASTA files (in quotes)
> fas1 <- "<<REPLACE WITH PATH TO FASTA FILE1>>"
> fas2 <- "<<REPLACE WITH PATH TO FASTA FILE2>>"
>
> # load the sequences from the file
> # change "DNA" to "RNA" or "AA" if necessary
> seqs1 <- readDNAStringSet(fas1)
> seqs2 <- readDNAStringSet(fas2)
>
> # perform the alignment
> aligned <- AlignProfiles(seqs1, seqs2)
>
> # view the alignment in a browser (optional)
> BrowseSeqs(aligned, highlight=0)
>
> # write the alignment to a new FASTA file
> writeXStringSet(aligned,
+ file="<<REPLACE WITH PATH TO OUTPUT FASTA FILE>>")
# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to both FASTA files (in quotes)
> fas1 <- "<<REPLACE WITH PATH TO FASTA FILE1>>"
> fas2 <- "<<REPLACE WITH PATH TO FASTA FILE2>>"
>
> # specify where to create the new sequence database
> db <- "<<REPLACE WITH PATH TO SEQUENCE DATABASE>>"
>
> Seqs2DB(fas1, "FASTA", db, "Alignment1")
Reading FASTA file from line 1 to 1e+05
175 total sequences in table DNA.
Time difference of 0.11 secs
> Seqs2DB(fas2, "FASTA", db, "Alignment2")
Reading FASTA file from line 1 to 1e+05
Added 175 new sequences to table DNA.
350 total sequences in table DNA.
Time difference of 0.11 secs
>
> # perform the alignment
> AlignDB(db,
+ identifier=c("Alignment1", "Alignment2"),
+ add2tbl="OutputAlignment")
|============================================| 100%
Added 350 aligned sequences to table OutputAlignment
with identifier 'Alignment1_Alignment2'.
>
> # efficiently write the alignment to a new FASTA file
> DB2Seqs("<<REPLACE WITH PATH TO OUTPUT FASTA.gz FILE>>",
+ db,
+ tblName="OutputAlignment",
+ compress=TRUE)
|============================================| 100%
Wrote 350 sequences.
Time difference of 0.26 secs