Align Pairs
This short example shows how to use DECIPHER for pairwise alignment, as described in:ES Wright (2024) "Fast and Flexible Search for Homologous Biological Sequences with DECIPHER v3." The R Journal, doi:in Press.
Instructions
The example below shows how to use DECIPHER to perform pairwise alignment. The AlignPairs() function takes as input pattern and subject sequences, as well as an optional data frame with indices of the sequence pairs to be aligned. To begin, it is necessary to install DECIPHER and load the library in R.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> # change "DNA" to "RNA" or "AA" if necessary
> seqs <- readDNAStringSet(fas)
>
> # look at some of the sequences (optional)
> seqs
DNAStringSet object of length 317:
width seq names
[1] 819 ATGGCTTTAAAA...AAAAAAAGAAAA Rickettsia prowaz...
[2] 822 ATGGGAATACGT...AGAAGGAAAAAG Porphyromonas gin...
[3] 822 ATGGGAATACGT...AGAAGGAAAAAG Porphyromonas gin...
[4] 822 ATGGGAATACGT...AGAAGGAAAAAG Porphyromonas gin...
[5] 819 ATGGCTATCGTT...CGTCGTGGTAAA Pasteurella multo...
... ... ...
[313] 819 ATGGCAATTGTT...CGCCGTACTAAA Pectobacterium at...
[314] 822 ATGCCTATTCAA...CGTCGCGTCAAG Acinetobacter sp....
[315] 864 ATGGGCATTCGC...GGTCGTCAGTCT Thermosynechococc...
[316] 831 ATGGCACTGAAG...AAGCGGAAGAAG Bradyrhizobium ja...
[317] 840 ATGGGCATTCGC...TCCGGGCGAGGT Gloeobacter viola...
>
> # specify which sequence pairs to align
> Pattern <- 1 # align first to the rest
> Subject <- seq_along(seqs)
> pairs <- data.frame(Pattern, Subject)
>
> # view the first few pairs to be aligned
> head(pairs)
Pattern Subject
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
>
> # use the same set of sequences for pattern and subject
> aligned <- AlignPairs(pattern=seqs,
+ subject=seqs,
+ pairs=pairs,
+ type="both",
+ processors=NULL)
|==================================================| 100%
Time difference of 0.09 secs
>
> # view the output metadata
> head(aligned[[1]])
Pattern PatternStart PatternEnd Subject SubjectStart
1 1 1 819 1 1
2 1 1 819 2 1
3 1 1 819 3 1
4 1 1 819 4 1
5 1 1 819 5 1
6 1 1 819 6 1
SubjectEnd Matches Mismatches AlignmentLength Score
1 819 819 0 819 1638.0
2 822 442 374 822 473.2
3 822 443 373 822 476.2
4 822 442 374 822 473.2
5 819 487 329 819 626.6
6 819 487 329 819 626.6
PatternGapPosition PatternGapLength SubjectGapPosition
1
2 777, 820 3, 3 728
3 777, 820 3, 3 728
4 777, 820 3, 3 728
5 820 3 723
6 820 3 723
SubjectGapLength
1
2 3
3 3
4 3
5 3
6 3
>
> # view the aligned patterns
> head(aligned[[2]])
DNAStringSet object of length 6:
width seq names
[1] 819 ATGGCTTTAAAAA...AAAAAAAAGAAAA Rickettsia prowaz...
[2] 825 ATGGCTTTAAAAA...AAAAAGAAAA--- Rickettsia prowaz...
[3] 825 ATGGCTTTAAAAA...AAAAAGAAAA--- Rickettsia prowaz...
[4] 825 ATGGCTTTAAAAA...AAAAAGAAAA--- Rickettsia prowaz...
[5] 822 ATGGCTTTAAAAA...AAAAAGAAAA--- Rickettsia prowaz...
[6] 822 ATGGCTTTAAAAA...AAAAAGAAAA--- Rickettsia prowaz...
>
> # view the aligned subjects
> head(aligned[[3]])
DNAStringSet object of length 6:
width seq names
[1] 819 ATGGCTTTAAAAA...AAAAAAAAGAAAA Rickettsia prowaz...
[2] 825 ATGGGAATACGTA...GAGAAGGAAAAAG Porphyromonas gin...
[3] 825 ATGGGAATACGTA...GAGAAGGAAAAAG Porphyromonas gin...
[4] 825 ATGGGAATACGTA...GAGAAGGAAAAAG Porphyromonas gin...
[5] 822 ATGGCTATCGTTA...TCGTCGTGGTAAA Pasteurella multo...
[6] 822 ATGGCTATCGTTA...TCGTCGTGGTAAA Pasteurella multo...
>
> # view the first aligned pair in a browser (optional)
> BrowseSeqs(c(aligned[[2]][1], aligned[[3]][1]))