DECIPHER - Classify Functions

IDTAXA – Classify Functions

This short example describes how to use IDTAXA to annotate the functions of protein sequences, as described in:

NP Cooley & ES Wright (2021) "Accurate annotation of protein coding sequences with IDTAXA." NAR Genomics and Bioinformatics, doi:10.1093/nargab/lqab080.

For an in-depth tutorial on sequence alignment, see the "Classify Sequences" vignette, available from the Documentation page.

How do I assign functions to sequences?

First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of sequences (e.g., "~/mySeqs.fas"). Second, trained classifiers for protein sequences can be found on the Downloads page.

Hide output

# load the DECIPHER library in R
> library(DECIPHER)
> 
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
> 
> # load the sequences from the file
> seqs <- readAAStringSet(fas)
> 
> # remove any gaps (if needed)
> seqs <- RemoveGaps(seqs)
> 
> # for help, see the IdTaxa help page (optional)
> ?IdTaxa
> 
> # load a training set object (trainingSet)
> # see http://DECIPHER.codes/Downloads.html
> load("<<REPLACE WITH PATH TO RData file>>")
> 
> # classify the sequences
> ids <- IdTaxa(seqs,
+    trainingSet,
+    threshold=50, # 60 (cautious) or 50 (sensible)
+    processors=NULL) # use all available processors
  |============================================| 100%


Time difference of 135.83 secs


> 
> # look at the results
> print(ids)
  A test set of class 'Taxa' with length 1000
       confidence name                 taxon
   [1]      78.0% ENA|OBRS01158965|... Root; Bacter...
   [2]      44.7% ENA|OBRS01551965|... Root; unclas...
   [3]      74.8% ENA|OBRS01920881|... Root; Bacter...
   [4]      15.9% ENA|OBRS01851995|... Root; unclas...
   [5]      19.7% ENA|OBRS01760119|... Root; unclas...
   ...        ... ...                  ...
 [996]      54.0% ENA|OBRS01119407|... Root; unclas...
 [997]      56.0% ENA|OBRS01447422|... Root; unclas...
 [998]      51.5% ENA|OBRS01883532|... Root; unclas...
 [999]      64.7% ENA|OBRS01350537|... Root; Bacter...
[1000]      47.5% ENA|OBRS01488581|... Root; unclas...
> plot(ids)