IDTAXA – Classify Functions
This short example describes how to use IDTAXA to annotate the functions of protein sequences, as described in:
NP Cooley & ES Wright (2021) "Accurate annotation of protein coding sequences with IDTAXA." NAR Genomics and Bioinformatics, doi:10.1093/nargab/lqab080.
Instructions
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of sequences (e.g., "~/mySeqs.fas"). Second, trained classifiers for protein sequences can be found on the Downloads page.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> seqs <- readAAStringSet(fas)
>
> # remove any gaps (if needed)
> seqs <- RemoveGaps(seqs)
>
> # for help, see the IdTaxa help page (optional)
> ?IdTaxa
>
> # load a training set object (trainingSet)
> # see http://DECIPHER.codes/Downloads.html
> load("<<REPLACE WITH PATH TO RData file>>")
>
> # classify the sequences
> ids <- IdTaxa(seqs,
+ trainingSet,
+ threshold=50, # 60 (cautious) or 50 (sensible)
+ processors=NULL) # use all available processors
|============================================| 100%
Time difference of 135.83 secs
>
> # look at the results
> print(ids)
A test set of class 'Taxa' with length 1000
confidence name taxon
[1] 78.0% ENA|OBRS01158965|... Root; Bacter...
[2] 44.7% ENA|OBRS01551965|... Root; unclas...
[3] 74.8% ENA|OBRS01920881|... Root; Bacter...
[4] 15.9% ENA|OBRS01851995|... Root; unclas...
[5] 19.7% ENA|OBRS01760119|... Root; unclas...
... ... ... ...
[996] 54.0% ENA|OBRS01119407|... Root; unclas...
[997] 56.0% ENA|OBRS01447422|... Root; unclas...
[998] 51.5% ENA|OBRS01883532|... Root; unclas...
[999] 64.7% ENA|OBRS01350537|... Root; Bacter...
[1000] 47.5% ENA|OBRS01488581|... Root; unclas...
> plot(ids)