Align Synteny
This short example describes how to use DECIPHER to align genomes. For an in-depth tutorial on sequence alignment, see "The Art of Multiple Sequence Alignment in R", available from the Documentation page.How do I align genomes?
First install DECIPHER and load the library in R. Next, it is necessary to build a sequence database using DECIPHER. A sequence database is simply another type of file, but it contains all of the genomes organized in such a way that they can be efficiently accessed by their unique name ('identifier').# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to each FASTA file (in quotes)
> # each genome must be given a unique identifier here
> # for example: Genome1, Genome2, etc.
> fas <- c(Genome1="<<REPLACE WITH PATH TO FASTA FILE 1>>",
+ Genome2="<<REPLACE WITH PATH TO FASTA FILE 2>>",
+ Genome3="<<REPLACE WITH PATH TO FASTA FILE 3>>")
>
> # specify where to create the new sequence database
> db <- "<<REPLACE WITH PATH TO SEQUENCE DATABASE>>"
>
> # load the sequences from the file in a loop
> for (i in seq_along(fas)) {
+ Seqs2DB(fas[i], "FASTA", db, names(fas[i]))
+ }
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
1 total sequence in table DNA.
Time difference of 1 secs
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
2 total sequences in table DNA.
Time difference of 1 secs
Reading FASTA file from line 1 to 1e+05
Added 1 new sequence to table DNA.
3 total sequences in table DNA.
Time difference of 1 secs
>
> # map the syntenic regions between each genome pair
> synteny <- FindSynteny(db)
|============================================| 100%
Time difference of 26.3 secs
>
> # print a summary of the syntenic map (optional)
> synteny
Genome1 Genome2 Genome3
Genome1 1 seq 49% hits 62% hits
Genome2 39 blocks 1 seq 50% hits
Genome3 19 blocks 25 blocks 1 seq
>
> # view the syntenic regions (optional)
> pairs(synteny) # displays a dot plot of all pairs
> plot(synteny) # displays a bar plot of adjacent pairs
>
> # perform alignments of all pairs of genomes
> DNA <- AlignSynteny(synteny, db)
|============================================| 100%
Time difference of 62.18 secs
>
> # view the aligned syntenic blocks for each pair
> unlist(DNA[[1]]) # Genomes 1 and 2
A DNAStringSet instance of length 78
width seq names
[1] 345520 AATTTCGAA...AACTGCATC 1.Genome1
[2] 345520 AATTTCGAA...AACTGCCTC 1.Genome2
[3] 332175 TATTTCTTT...AGTTATGAT 2.Genome1
[4] 332175 TATTTCTTC...AGCTATGAT 2.Genome2
[5] 152776 AAGTACACA...GTAGCGGAT 3.Genome1
... ... ...
[74] 2570 ACTGTCACA...GATAAAAAA 37.Genome2
[75] 5479 TACTGCGTT...TAATAACAA 38.Genome1
[76] 5479 TAGTAGGTA...TCAAAATAA 38.Genome2
[77] 1219 AGCAAAAAA...TGCAAAAAA 39.Genome1
[78] 1219 AGAAAAAAA...TGCAAAAAA 39.Genome2
> unlist(DNA[[2]]) # Genomes 1 and 3
A DNAStringSet instance of length 38
width seq names
[1] 420181 AAGTACACA...TATAGCAAA 1.Genome1
[2] 420181 AGGTACACA...TGTAGCAAA 1.Genome3
[3] 413750 ATCATTATT...TTTTCACAT 2.Genome1
[4] 413750 ATAATTATT...TTTTTACAT 2.Genome3
[5] 282526 ATGCCCTCG...TATTTTAAA 3.Genome1
... ... ...
[34] 42624 TCACTATAA...ATATAAAAC 17.Genome3
[35] 25576 GGACGTAAA...AAAAAAACA 18.Genome1
[36] 25576 GGACGCAAA...AAAAAATCA 18.Genome3
[37] 15118 TTAATTAAG...TACTGAAAT 19.Genome1
[38] 15118 TTAAATAAG...TACTAAAAT 19.Genome3
> unlist(DNA[[3]]) # Genomes 2 and 3
A DNAStringSet instance of length 50
width seq names
[1] 676200 TTTTCATTT...TTAATAGAA 1.Genome2
[2] 676200 TTTTTATTT...TTAAGAAAA 1.Genome3
[3] 422433 TTGTCCATT...AACTGCCTC 2.Genome2
[4] 422433 TCGTTTTTT...AACTGCATC 2.Genome3
[5] 238313 ACTGTCACA...TTAAATGCT 3.Genome2
... ... ...
[46] 10690 TAGAAAATA...AGAAAATGG 23.Genome3
[47] 1400 CATATCCTC...AGCAAAAAA 24.Genome2
[48] 1400 CATATCATT...AGCCTAAAA 24.Genome3
[49] 1637 CGAATTAAA...AAAATAAAA 25.Genome2
[50] 1637 CAAAAAAAA...AAAATAAAA 25.Genome3