Design Signatures
The DesignSignatures function assists in the design of primers for sequencing, HRM, or FLP analysis with the goal of obtaining diverse amplicon signatures, as described in:Wright, E. S., & Vetsigian, K. H. (2016) "DesignSignatures: a tool for designing primers that yields amplicons with distinct signatures." Bioinformatics, doi:10.1093/bioinformatics/btw047.
How do I design primers for amplicon sequencing?
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of unaligned sequences (e.g., "~/mySeqs.fas").library(DECIPHER)
> library(RSQLite)
>
> # load sequences into a database
> fas <- "~/mySeqs.fas"
> dbConn <- dbConnect(SQLite(),
+ ":memory:")
> Seqs2DB(fas,
+ type="FASTA",
+ dbFile=dbConn,
+ identifier="")
Reading FASTA file chunk 1
118 total sequences in table Seqs.
Time difference of 0.06 secs
>
> # identify the sequences based on their description
> x <- dbGetQuery(dbConn,
+ "select description from Seqs")$description
> ns <- unlist(lapply(strsplit(x,
+ split=" "),
+ FUN=`[`,
+ 1L))
> Add2DB(myData=data.frame(identifier=ns),
+ dbFile=dbConn)
Expression:
update `Seqs` set `identifier` = (select
`temp`.`identifier` from `temp` where
`temp`.`row_names` = `Seqs`.`row_names`) where
exists (select `temp`.`identifier` from `temp`
where `temp`.`row_names` = `Seqs`.`row_names`)
Added to table Seqs: "identifier".
Time difference of 0.01 secs
>
> # Primers for high-throughput sequencing:
> primers <- DesignSignatures(dbConn,
+ type="sequence", # sequencing signatures
+ minProductSize=300, # base pairs
+ maxProductSize=600,
+ resolution=5, # 5-mers
+ levels=5) # counts of each k-mer
Tallying 8-mers for 19 groups:
|========================================| 100%
Time difference of 0.19 secs
Designing primer sequences based on the group 'Streptococcus':
|========================================| 100%
Time difference of 17.13 secs
Selecting the most common primer sequences:
|========================================| 100%
Time difference of 6.03 secs
Determining PCR products from each group:
|========================================| 100%
Time difference of 18.05 secs
Scoring primer pair combinations:
|========================================| 100%
Time difference of 1.38 secs
Choosing optimal forward and reverse pairs:
|========================================| 100%
Time difference of 3.93 secs
> head(primers) # top scoring primers
forward_primer reverse_primer
1 CGGCTAACTMYGTGCCAGCA TCACRRCACGAGCTGACGA
2 CGGCTAACTMYGTGCCAGCA CTCACRRCACGAGCTGACG
3 CGGCTAACTMYGTGCCAGCA ACRRCACGAGCTGACGACA
4 CCAGCAGCCGCGGTAATAC TCACRRCACGAGCTGACGA
5 CCAGCAGCCGCGGTAAT TCACRRCACGAGCTGACGA
6 CGGCTAACTMYGTGCCAGCA CACRRCACGAGCTGACGAC
score coverage products
1 0.169827.... 1 118
2 0.169564.... 1 118
3 0.169325.... 1 118
4 0.168742.... 1 118
5 0.168682.... 1 118
6 0.168608.... 1 118
similar_signatures missing_signatures
1
2
3
4
5
6
>
> # High Resolution Melt (HRM) assay:
> primers <- DesignSignatures(dbConn,
+ type="melt", # melt curve signatures
+ resolution=seq(75, 100, 0.25), # degrees Celsius
+ minProductSize=55, # base pairs
+ maxProductSize=400)
Tallying 8-mers for 19 groups:
|========================================| 100%
Time difference of 0.18 secs
Designing primer sequences based on the group 'Streptococcus':
|========================================| 100%
Time difference of 17.11 secs
Selecting the most common primer sequences:
|========================================| 100%
Time difference of 5.77 secs
Determining PCR products from each group:
|========================================| 100%
Time difference of 113.6 secs
Scoring primer pair combinations:
|========================================| 100%
Time difference of 0.26 secs
Choosing optimal forward and reverse pairs:
|========================================| 100%
Time difference of 0.67 secs
> head(primers) # top scoring primers
forward_primer reverse_primer
1 MRACTCCTACGGGAGGCAG CTGCTGGCACRKAGTTAGCC
2 CCTACGGGAGGCAGCAGT CTGCTGGCACRKAGTTAGCC
3 YCCTACGGGAGGCAGCA CTGCTGGCACRKAGTTAGCC
4 YCCTACGGGAGGCAGCAG CTGCTGGCACRKAGTTAGCC
5 CMRACTCCTACGGGAGGCA CTGCTGGCACRKAGTTAGCC
6 MRACTCCTACGGGAGGCAG GCTGGCACRKAGTTAGCCG
score coverage products
1 0.076803.... 0.947368.... 114
2 0.075581.... 0.947368.... 114
3 0.075165.... 0.947368.... 114
4 0.075165.... 0.947368.... 114
5 0.074834.... 0.947368.... 114
6 0.074710.... 0.947368.... 114
similar_signatures
1 (Clostridium, Deinococcus, Escherichia, Neisseria); (Acinetobacter, Bacteroides, Helicobacter, Lactobacillus, Porphyromonas, Staphylococcus, Streptococcus); (Bacillus, Rhodobacter); (Bifidobacterium, Propionibacterium)
2 (Bacillus, Rhodobacter); (Clostridium, Escherichia, Neisseria); (Acinetobacter, Bacteroides, Enterococcus, Helicobacter, Lactobacillus, Porphyromonas, Staphylococcus); (Bifidobacterium, Propionibacterium)
3 (Bacillus, Rhodobacter); (Clostridium, Escherichia, Neisseria); (Acinetobacter, Bacteroides, Enterococcus, Helicobacter, Lactobacillus, Porphyromonas, Staphylococcus, Streptococcus)
4 (Bacillus, Rhodobacter); (Clostridium, Escherichia, Neisseria); (Acinetobacter, Bacteroides, Enterococcus, Helicobacter, Lactobacillus, Porphyromonas, Staphylococcus, Streptococcus)
5 (Clostridium, Deinococcus, Escherichia, Neisseria); (Acinetobacter, Bacteroides, Enterococcus, Helicobacter, Lactobacillus, Porphyromonas, Staphylococcus, Streptococcus); (Bacillus, Rhodobacter); (Bifidobacterium, Propionibacterium)
6 (Acinetobacter, Bacteroides, Helicobacter, Lactobacillus, Staphylococcus); (Bacillus, Rhodobacter); (Clostridium, Deinococcus, Escherichia, Neisseria); (Bifidobacterium, Propionibacterium)
missing_signatures
1 Pseudomonas
2 Pseudomonas
3 Pseudomonas
4 Pseudomonas
5 Pseudomonas
6 Pseudomonas
>
> # Primers for community fingerprinting:
> primers <- DesignSignatures(dbConn,
+ type="length", # amplicon length signatures
+ levels=2, # presence/absence
+ minProductSize=200, # base pairs
+ maxProductSize=1400,
+ resolution=c(seq(200, 700, 3), # length bins
+ seq(705, 1000, 5),
+ seq(1010, 1400, 10)))
Tallying 8-mers for 19 groups:
|========================================| 100%
Time difference of 0.17 secs
Designing primer sequences based on the group 'Streptococcus':
|========================================| 100%
Time difference of 17.8 secs
Selecting the most common primer sequences:
|========================================| 100%
Time difference of 5.98 secs
Determining PCR products from each group:
|========================================| 100%
Time difference of 5.86 secs
Scoring primer pair combinations:
|========================================| 100%
Time difference of 0.46 secs
Choosing optimal forward and reverse pairs:
|========================================| 100%
Time difference of 0.5 secs
> head(primers) # top scoring primers
forward_primer reverse_primer
1 YCCTACGGGAGGCAGCA CACRRCACGAGCTGACGAC
2 CCTACGGGAGGCAGCAGT TCACRRCACGAGCTGACGA
3 MRACTCCTACGGGAGGCAG TCACRRCACGAGCTGACGA
4 YCCTACGGGAGGCAGCAG CACRRCACGAGCTGACGAC
5 CMRACTCCTACGGGAGGCA CACRRCACGAGCTGACGAC
6 YGAGTGGCGRACGGGTGAGT ACRRCACGAGCTGACGACA
score coverage products
1 0.007167.... 1 118
2 0.007167.... 1 118
3 0.007167.... 1 118
4 0.007167.... 1 118
5 0.007167.... 1 118
6 0.006771.... 0.947368.... 114
similar_signatures
1 (Acinetobacter, Bacillus, Lactobacillus, Listeria, Staphylococcus, Streptococcus); (Escherichia, Neisseria, Pseudomonas); (Actinomyces, Porphyromonas); (Bacteroides, Deinococcus)
2 (Acinetobacter, Bacillus, Lactobacillus, Listeria, Staphylococcus, Streptococcus); (Escherichia, Neisseria, Pseudomonas); (Actinomyces, Porphyromonas); (Bacteroides, Deinococcus)
3 (Acinetobacter, Bacillus, Lactobacillus, Listeria, Staphylococcus, Streptococcus); (Escherichia, Neisseria, Pseudomonas); (Actinomyces, Porphyromonas); (Bacteroides, Deinococcus)
4 (Acinetobacter, Bacillus, Lactobacillus, Listeria, Staphylococcus, Streptococcus); (Escherichia, Neisseria, Pseudomonas); (Actinomyces, Porphyromonas); (Bacteroides, Deinococcus)
5 (Acinetobacter, Bacillus, Lactobacillus, Listeria, Staphylococcus, Streptococcus); (Escherichia, Neisseria, Pseudomonas); (Actinomyces, Porphyromonas); (Bacteroides, Deinococcus)
6 (Lactobacillus, Streptococcus); (Bacillus, Listeria, Staphylococcus); (Actinomyces, Escherichia, Neisseria, Pseudomonas); (Deinococcus, Propionibacterium)
missing_signatures
1
2
3
4
5
6 Porphyromonas
>
> # Primers for restriction fragment length polymorphism (RFLP):
> data(RESTRICTION_ENZYMES)
> myEnzymes <- RESTRICTION_ENZYMES[c("EcoRI", "HinfI")]
> primers <- DesignSignatures(dbConn,
+ type="length", # amplicon length signatures
+ levels=2, # presence/absence
+ minProductSize=200, # base pairs
+ maxProductSize=600,
+ resolution=c(seq(50, 100, 3), # length bins
+ seq(105, 200, 5),
+ seq(210, 600, 10)),
+ enzymes=myEnzymes)
Tallying 8-mers for 19 groups:
|========================================| 100%
Time difference of 0.18 secs
Designing primer sequences based on the group 'Streptococcus':
|========================================| 100%
Time difference of 17.84 secs
Selecting the most common primer sequences:
|========================================| 100%
Time difference of 6 secs
Determining PCR products from each group:
|========================================| 100%
Time difference of 3.46 secs
Scoring primer pair combinations:
|========================================| 100%
Time difference of 0.21 secs
Choosing optimal forward and reverse pairs:
|========================================| 100%
Time difference of 0.45 secs
Finding the best restriction enzyme:
|========================================| 100%
Time difference of 18.04 secs
> head(primers) # top scoring primers
forward_primer reverse_primer
1 YGAGTGGCGRACGGGTGAGT YGTATYACCGCGGCTGCT
2 YGAGTGGCGRACGGGTGAGT YGTATYACCGCGGCTGCTG
3 YGAGTGGCGRACGGGTGAGTA YGTATYACCGCGGCTGCT
4 YGAGTGGCGRACGGGTGAGTA YGTATYACCGCGGCTGCTG
5 YGAGTGGCGRACGGGTGAGTAA YGTATYACCGCGGCTGCT
6 YGAGTGGCGRACGGGTGAGTAA YGTATYACCGCGGCTGCTG
score coverage products
1 0.019621.... 0.947368.... 114
2 0.019621.... 0.947368.... 114
3 0.019621.... 0.947368.... 114
4 0.019621.... 0.947368.... 114
5 0.019621.... 0.947368.... 114
6 0.019621.... 0.947368.... 114
similar_signatures
1 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
2 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
3 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
4 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
5 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
6 (Bacillus, Streptococcus); (Clostridium, Listeria, Staphylococcus); (Escherichia, Neisseria)
missing_signatures enzyme digest_score
1 Porphyromonas HinfI 0.04524469
2 Porphyromonas HinfI 0.04524469
3 Porphyromonas HinfI 0.04524469
4 Porphyromonas HinfI 0.04524469
5 Porphyromonas HinfI 0.04524469
6 Porphyromonas HinfI 0.04524469
fragments
1 299
2 299
3 299
4 299
5 299
6 299
>
> dbDisconnect(dbConn)