Align Synteny
This short example describes how to use DECIPHER to align genomes. For an in-depth tutorial on sequence alignment, see "The Art of Multiple Sequence Alignment in R", available from the Documentation page.Instructions
First install DECIPHER and load the library in R. Next, it is necessary to build a sequence database using DECIPHER. A sequence database is simply another type of file, but it contains all of the genomes organized in such a way that they can be efficiently accessed by their unique name ('identifier').# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to each FASTA file (in quotes)
> # each genome must be given a unique identifier here
> # for example: Genome1, Genome2, etc.
> fas <- c(Genome1="<<REPLACE WITH PATH TO FASTA FILE 1>>",
+ Genome2="<<REPLACE WITH PATH TO FASTA FILE 2>>",
+ Genome3="<<REPLACE WITH PATH TO FASTA FILE 3>>")
>
> # specify where to create the new sequence database
> db <- "<<REPLACE WITH PATH TO SEQUENCE DATABASE>>"
>
> # load the sequences from the file in a loop
> for (i in seq_along(fas)) {
+ Seqs2DB(fas[i], "FASTA", db, names(fas[i]))
+ }
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
1 total sequence in table DNA.
Time difference of 1 secs
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
2 total sequences in table DNA.
Time difference of 1 secs
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
3 total sequences in table DNA.
Time difference of 1 secs
>
> # map the syntenic regions between each genome pair
> synteny <- FindSynteny(db)
|============================================| 100%
Time difference of 26.3 secs
>
> # print a summary of the syntenic map (optional)
> synteny
Genome1 Genome2 Genome3
Genome1 1 seq 49% hits 62% hits
Genome2 39 blocks 1 seq 50% hits
Genome3 19 blocks 25 blocks 1 seq
>
> # view the syntenic regions (optional)
> pairs(synteny) # displays a dot plot of all pairs
> plot(synteny) # displays a bar plot of adjacent pairs
>
> # perform alignments of all pairs of genomes
> DNA <- AlignSynteny(synteny, db)
|============================================| 100%
Time difference of 62.18 secs
>
> # view the aligned syntenic blocks for each pair
> unlist(DNA[[1]]) # Genomes 1 and 2
A DNAStringSet instance of length 78
width seq names
[1] 345520 AATTTCGAA...AACTGCATC 1.Genome1
[2] 345520 AATTTCGAA...AACTGCCTC 1.Genome2
[3] 332175 TATTTCTTT...AGTTATGAT 2.Genome1
[4] 332175 TATTTCTTC...AGCTATGAT 2.Genome2
[5] 152776 AAGTACACA...GTAGCGGAT 3.Genome1
... ... ...
[74] 2570 ACTGTCACA...GATAAAAAA 37.Genome2
[75] 5479 TACTGCGTT...TAATAACAA 38.Genome1
[76] 5479 TAGTAGGTA...TCAAAATAA 38.Genome2
[77] 1219 AGCAAAAAA...TGCAAAAAA 39.Genome1
[78] 1219 AGAAAAAAA...TGCAAAAAA 39.Genome2
> unlist(DNA[[2]]) # Genomes 1 and 3
A DNAStringSet instance of length 38
width seq names
[1] 420181 AAGTACACA...TATAGCAAA 1.Genome1
[2] 420181 AGGTACACA...TGTAGCAAA 1.Genome3
[3] 413750 ATCATTATT...TTTTCACAT 2.Genome1
[4] 413750 ATAATTATT...TTTTTACAT 2.Genome3
[5] 282526 ATGCCCTCG...TATTTTAAA 3.Genome1
... ... ...
[34] 42624 TCACTATAA...ATATAAAAC 17.Genome3
[35] 25576 GGACGTAAA...AAAAAAACA 18.Genome1
[36] 25576 GGACGCAAA...AAAAAATCA 18.Genome3
[37] 15118 TTAATTAAG...TACTGAAAT 19.Genome1
[38] 15118 TTAAATAAG...TACTAAAAT 19.Genome3
> unlist(DNA[[3]]) # Genomes 2 and 3
A DNAStringSet instance of length 50
width seq names
[1] 676200 TTTTCATTT...TTAATAGAA 1.Genome2
[2] 676200 TTTTTATTT...TTAAGAAAA 1.Genome3
[3] 422433 TTGTCCATT...AACTGCCTC 2.Genome2
[4] 422433 TCGTTTTTT...AACTGCATC 2.Genome3
[5] 238313 ACTGTCACA...TTAAATGCT 3.Genome2
... ... ...
[46] 10690 TAGAAAATA...AGAAAATGG 23.Genome3
[47] 1400 CATATCCTC...AGCAAAAAA 24.Genome2
[48] 1400 CATATCATT...AGCCTAAAA 24.Genome3
[49] 1637 CGAATTAAA...AAAATAAAA 25.Genome2
[50] 1637 CAAAAAAAA...AAAATAAAA 25.Genome3